Book Scanner Software

From Noisebridge
Revision as of 21:57, 16 March 2017 by 192.195.83.130 (talk)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Introduction

This page is for document efforts beginning in 2017 to create a software OCR pipeline.

This is a continuation of efforts by the Digital Archivists Working Group to make the book scanner more convenient to work with.


Platform

The current hardware platform consists of the following elements-

  • A wooden book scanner\ frame, built by members of the Digital Archivists working group
  • Two cameras (currently Canon EOS Rebel T3), mounted in the book scanner
  • Lighting, glass plates, book platen
  • USB cables connecting cameras to computer
  • Apple Mac Mini (mid 2010: Core2Duo 2.4 GHz, 8GB RAM, NVIDIA GeForce 320M 256MB)
  • macOS Sierra 10.12.3

Intent

The intent of this effort is provide an OCR facility to help create digital documents from the pages photographed by the book scanner platform. Aspects of this may include-

  • Photo manipulation (cropping, rotation, image adjustments)
  • Partitioning of photos to provide hints to the software about text versus image regions
  • OCR conversion of characters in photos to text files
  • Hooking into the OCR API to obtain confidence level or probability data about each image conversion result, possibly for directing the human operator to regions that may need correcting
  • Connecting together and automating of any of these aspects

Efforts