Editing
Book Scanner Software
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Introduction === This page is for document efforts beginning in 2017 to create a software OCR pipeline. This is a continuation of efforts by the [[Digital Archivists]] Working Group to make the book scanner more convenient to work with. ---- === Platform === The current hardware platform consists of the following elements- * A wooden book scanner\ frame, built by members of the Digital Archivists working group * Two cameras (currently Canon EOS Rebel T3), mounted in the book scanner * Lighting, glass plates, book platen * USB cables connecting cameras to computer * Apple Mac Mini (mid 2010: Core2Duo 2.4 GHz, 8GB RAM, NVIDIA GeForce 320M 256MB) * macOS Sierra 10.12.3 ---- === Intent === The intent of this effort is provide an OCR facility to help create digital documents from the pages photographed by the book scanner platform. Aspects of this may include- * Photo manipulation (cropping, rotation, image adjustments) * Partitioning of photos to provide hints to the software about text versus image regions * OCR conversion of characters in photos to text files * Hooking into the OCR API to obtain confidence level or probability data about each image conversion result, possibly for directing the human operator to regions that may need correcting * Connecting together and automating of any of these aspects === Softwares === ==== Scanner & Camera Control ==== * [[CHDK]]: Camera control package that [[User:SteeleNivenson|Steele Nivenson]] has been hacking on, to the point it's functioning quite well with used, low-cost 16MP mirrorless Canon pocket cameras (PowerShot A2200, A2500), in addition to the EOS DSLR lineup. ==== Acquisition & Image Post-Processing ==== * [https://github.com/DIYBookScanner/spreads spreads]: Project on Github implementing acquisition through OCR, apparently by [https://github.com/jbaiter Johannes Baiter]. === Efforts === * [[4 March 2017: A session with Tesseract]] * [[16 March 2017: Tested a Trial Copy of ABBYY FineReader]], an image-to-PDF OCR package starting around $120 * [[18 March 2017: Install and try out Spreads on a Mac Mini]] * [[20 March 2017: Continue Spreads Install on a Mac Mini]] * [[30 May 2017: Test a copy of PDFScanner]], a macOS scanner/image-to-PDF package costing about US$16
Summary:
Please note that all contributions to Noisebridge are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
Noisebridge:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Log in
Request account
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Dig in!
Noisebridge
- Status: MOVED
- Donate
- ABOUT
- Accessibility
- Vision
- Blog
Manual
MANUAL
Visitors
Participation
Community Standards
Channels
Operations
Events
EVENTS
Guilds
GUILDS
- Meta
- Electronics
- Fabrication
- Games
- Music
- Library
- Neuro
- Philosophy
- Funding
- Art
- Crypto
- Documentation/Wiki
Wiki
Recent Changes
Random Page
Help
Categories
(Edit)
Tools
What links here
Related changes
Special pages
Page information