Digital Archivists: Difference between revisions
(meeting notes for May 22, 2014) |
|||
(40 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
The Noisebridge Digital Archivists group is for anyone interested in a range of topics including book scanning, digitizing video/audio, open access, creative commons, and more. | ===INTRODUCTION=== | ||
---- | |||
The Noisebridge Digital Archivists group is for anyone interested in a range of topics including book scanning, digitizing video/audio, open access, creative commons, and more. Lately we have focused on bound book digitization. | |||
Join the mailing list | We usually meet on '''Wednesdays at 7pm'''. Join the mailing list and send us an email to see if there's a meeting this week. | ||
---- | |||
The Noisebridge book scanner was recently demonstrated at the 2015 Maker Faire in San Mateo | |||
[[ | [[file:ScannerMF.jpg|800px]] | ||
===MAILING LIST=== | |||
---- | |||
Many weekly updates are made exclusively on our mailing list. Please join to keep in touch and ask questions: | |||
Join the mailing list here: [http://www.noisebridge.net/mailman/listinfo/digitalarchivists digitalarchivists] | |||
A summary of topics in our mailing list will be posted soon. | |||
===HARDWARE=== | |||
---- | |||
Our first project was to build a book scanner at Noisebridge based on the open-source [http://diybookscanner.myshopify.com/products/diy-book-scanner-kit DIY Book Scanner] design. Check out the schematic, below, to see all the components. You can build your own using inspiration and guidance from this [http://www.diybookscanner.org/archivist/ open source design] [http://www.diybookscanner.org/DIY%20Book%20Scanner%20920.zip from simple cut plywood] or order [http://store.diybookscanner.org/ pre-fabricated kits], though they are in demand and often sold out. | |||
[[File:Reetz_Scanner_Schematic.png|800px]] | |||
===MECHANICAL OPERATION=== | |||
---- | |||
Our style of book scanner is intended for use on bound books that can be sufficiently flattened by pushing the book up to the platen glass. Although the process requires turning each page by hand, one page at a time, the book is not damaged by process. Affordable automated book scanners are being developed by [http://linearbookscanner.org/ Dany Qumsiyeh]. Note there is some risk putting your rare book on an automated scanner unattended such as ripped paged, etc. | |||
<u>Please see this animation to illustrate the manual process:</u> | |||
[[File:BookScanHand_Quick.gif|Page Turning]] | |||
Each time the page is changed a trigger is hit by the operator to capture images of the even and odd pages simultaneously from SLR cameras. By capturing two pages at the same time our model has a speed advantage compared to flatbed scanners enabling us to scan about 600 pages an hour for a skilled operator. Also, most bound books cannot be properly imaged near the margin of the binding on flatbed scanners. This scanner recovers the region near the binding with great success. | |||
===SOFTWARE=== | |||
---- | |||
The device captures from both cameras simultaneously by using USB drivers provide by gphoto2 wrapped into a nice python script [https://github.com/danyq/diybookscanner here]. Check out gphoto2 [http://gphoto.org/ here] and the list of supported cameras [http://gphoto.org/proj/libgphoto2/support.php here]. If you have linux you can just type: sudo apt-get install gphoto2 | |||
Currently we are making simple scripts to perform Optical Character Recognition (OCR) and produce searchable full text as PDF documents [https://github.com/andrewdefries/TesseractOCR Github Code Repository]. Check it out and fork our repos. | |||
[[file:ReetzWorkFlow.png|800px]] | |||
===CURRENT BOOK SCANNING PROJECTS=== | |||
---- | |||
<u>Digitization of the Mycological Society of San Francisco collection</u> | |||
*Danny Newman, librarian for the [http://www.mssf.org/ Mycological Society of San Francisco], will be digitizing the journal and book collection of the Mycological Society. | |||
<u>The Pesticide Files</u> | |||
*Andrew Defries PhD has been creating a custom assembled and rare digital corpus comprised of bound books on the topic of pesticides used in agriculture. Please see the github front page for a detailed description [https://github.com/andrewdefries/ThePesticideFiles The Pesticide Files]. This corpus will not be made available for download or public use, though access may be granted in the future via collaborative research agreements. | |||
===Related links=== | ===Related links=== | ||
---- | |||
diy book scanner | diy book scanner | ||
http://diybookscanner.org | http://diybookscanner.org | ||
Line 56: | Line 62: | ||
linear book scanner project | linear book scanner project | ||
http:// | http://linearbookscanner.org | ||
the internet archive | the internet archive | ||
Line 75: | Line 81: | ||
The Digital Public Library of America (DPLA) | The Digital Public Library of America (DPLA) | ||
http://dp.la/ | http://dp.la/ | ||
For details about the book scanner, see: '''[[Bookscanner]]''' | |||
===Meeting notes=== | |||
---- | |||
2015-07-22 - replaced camera firmware, installed spreads on beaglebone | |||
2015-06-24 - installed chdk on new cameras, prepared for library events | |||
2015-06-17 - moved new scanner into darkroom, installed and wired up led lights | |||
2015-05-13 - discussed maker faire demo | |||
2015-04-29 - assembled body of new scanner | |||
2015-04-08 - discussed new scanner | |||
2015-04-01 - discussed mycological society library and scanner location | |||
2015-03-18 - painted reetz 2.0 scanner parts | |||
2015-02-25 - training session for scanner, looked at linear book scanner prototype | |||
2015-02-11 - scanner demo, discussion | |||
2015-01-25 - discussed Andrew Defries' scanning booth and post-processing station, experimented with Vinux | |||
2014-09-24 - reassembled and demoed the scanner to visitors, some linear book scanner updates, discussed funding and directions for the group | |||
2014-08-28 - social meeting at Open Drinks | |||
2014-07-17 - discussed scanning stand design at Muddy Waters | |||
2014-06-19 - short meeting and 5mof | |||
2014-06-12 - discussed plans for a portable phone-based scanner, assembled cardboard prototype of prism scanner | |||
2014-06-05 - discussed improvements to the scanner and ocr pipeline | |||
2014-05-29 - main topics were portable scanners for the blind and OCR comparison algorithms. full notes: [[Digital Archivists 2014-05-29]] | |||
2014-05-22 - discussed lessons learned from DIY Scanner 1.0 and a range of new initiatives. full notes: [[Digital Archivists 2014-05-22]] | |||
2014-01-09 - scanned books, investigated glare and alignment issues | |||
2013-12-19 - bug fixes in the code, reviewed linear scanners from Michigan, improved documentation, discussed larger book scanning network | |||
2013-12-12 - replaced LED shields with black cloth, more scanning | |||
2013-12-05 - scanned books | |||
2013-11-21 - scanned a book, wrote tesseract ocr script, looked at old computers, 5mof presentation | |||
2013-11-14 - improved scanning gui, added LED shields | |||
2013-11-07 - replaced bunjees, mounted LEDs, tried to run spreads | |||
2013-10-31 - reassembled scanner, looked at 'spreads' software, screen readers | |||
2013-10-24 - disassembled and painted scanner parts | |||
2013-10-17 - investigated glare and LED positioning, plan to paint black, 5mof presentation | |||
2013-10-10 - mounted cameras, built trigger mechanism, wrote capture script | |||
2013-10-03 - picked up scanner and equipment from internet archive | |||
2013-07-27 - internet archive borrows the book scanner | |||
2013-07-18 - 5mof presentation on book scanning | |||
2013-07-08 - discussed scanner location, security, transportation, reimbursement, promotion. full notes: [[Digital Archivists 2013-07-08]] | |||
2013-05-23 - scavenged and cut glass from flatbed scanners, plan to attend internet archive friday lunch | |||
2013-04-30 - assembled scanner kit. needs cameras, glass, trigger mechanism, software | |||
2013-04-21 - looked at reetz scanner and prism scanner, discussed content and licensing. will order reetz kit. full notes: [[Digital Archivists 2013-04-21]] |
Revision as of 00:35, 28 July 2015
INTRODUCTION
The Noisebridge Digital Archivists group is for anyone interested in a range of topics including book scanning, digitizing video/audio, open access, creative commons, and more. Lately we have focused on bound book digitization.
We usually meet on Wednesdays at 7pm. Join the mailing list and send us an email to see if there's a meeting this week.
The Noisebridge book scanner was recently demonstrated at the 2015 Maker Faire in San Mateo
MAILING LIST
Many weekly updates are made exclusively on our mailing list. Please join to keep in touch and ask questions:
Join the mailing list here: digitalarchivists
A summary of topics in our mailing list will be posted soon.
HARDWARE
Our first project was to build a book scanner at Noisebridge based on the open-source DIY Book Scanner design. Check out the schematic, below, to see all the components. You can build your own using inspiration and guidance from this open source design from simple cut plywood or order pre-fabricated kits, though they are in demand and often sold out.
MECHANICAL OPERATION
Our style of book scanner is intended for use on bound books that can be sufficiently flattened by pushing the book up to the platen glass. Although the process requires turning each page by hand, one page at a time, the book is not damaged by process. Affordable automated book scanners are being developed by Dany Qumsiyeh. Note there is some risk putting your rare book on an automated scanner unattended such as ripped paged, etc.
Please see this animation to illustrate the manual process:
Each time the page is changed a trigger is hit by the operator to capture images of the even and odd pages simultaneously from SLR cameras. By capturing two pages at the same time our model has a speed advantage compared to flatbed scanners enabling us to scan about 600 pages an hour for a skilled operator. Also, most bound books cannot be properly imaged near the margin of the binding on flatbed scanners. This scanner recovers the region near the binding with great success.
SOFTWARE
The device captures from both cameras simultaneously by using USB drivers provide by gphoto2 wrapped into a nice python script here. Check out gphoto2 here and the list of supported cameras here. If you have linux you can just type: sudo apt-get install gphoto2
Currently we are making simple scripts to perform Optical Character Recognition (OCR) and produce searchable full text as PDF documents Github Code Repository. Check it out and fork our repos.
CURRENT BOOK SCANNING PROJECTS
Digitization of the Mycological Society of San Francisco collection
- Danny Newman, librarian for the Mycological Society of San Francisco, will be digitizing the journal and book collection of the Mycological Society.
The Pesticide Files
- Andrew Defries PhD has been creating a custom assembled and rare digital corpus comprised of bound books on the topic of pesticides used in agriculture. Please see the github front page for a detailed description The Pesticide Files. This corpus will not be made available for download or public use, though access may be granted in the future via collaborative research agreements.
Related links
diy book scanner http://diybookscanner.org http://vimeo.com/29184137
linear book scanner project http://linearbookscanner.org
the internet archive http://archive.org/about/
prelinger archives http://archive.org/details/prelinger
new alexandria archive http://www.newalexandria.org/archive/
using archival information in interesting ways http://maptcha.org/
a film about nitrile film decomposition http://www.youtube.com/watch?v=r-FJyJjH6IE
The Digital Public Library of America (DPLA) http://dp.la/
For details about the book scanner, see: Bookscanner
Meeting notes
2015-07-22 - replaced camera firmware, installed spreads on beaglebone
2015-06-24 - installed chdk on new cameras, prepared for library events
2015-06-17 - moved new scanner into darkroom, installed and wired up led lights
2015-05-13 - discussed maker faire demo
2015-04-29 - assembled body of new scanner
2015-04-08 - discussed new scanner
2015-04-01 - discussed mycological society library and scanner location
2015-03-18 - painted reetz 2.0 scanner parts
2015-02-25 - training session for scanner, looked at linear book scanner prototype
2015-02-11 - scanner demo, discussion
2015-01-25 - discussed Andrew Defries' scanning booth and post-processing station, experimented with Vinux
2014-09-24 - reassembled and demoed the scanner to visitors, some linear book scanner updates, discussed funding and directions for the group
2014-08-28 - social meeting at Open Drinks
2014-07-17 - discussed scanning stand design at Muddy Waters
2014-06-19 - short meeting and 5mof
2014-06-12 - discussed plans for a portable phone-based scanner, assembled cardboard prototype of prism scanner
2014-06-05 - discussed improvements to the scanner and ocr pipeline
2014-05-29 - main topics were portable scanners for the blind and OCR comparison algorithms. full notes: Digital Archivists 2014-05-29
2014-05-22 - discussed lessons learned from DIY Scanner 1.0 and a range of new initiatives. full notes: Digital Archivists 2014-05-22
2014-01-09 - scanned books, investigated glare and alignment issues
2013-12-19 - bug fixes in the code, reviewed linear scanners from Michigan, improved documentation, discussed larger book scanning network
2013-12-12 - replaced LED shields with black cloth, more scanning
2013-12-05 - scanned books
2013-11-21 - scanned a book, wrote tesseract ocr script, looked at old computers, 5mof presentation
2013-11-14 - improved scanning gui, added LED shields
2013-11-07 - replaced bunjees, mounted LEDs, tried to run spreads
2013-10-31 - reassembled scanner, looked at 'spreads' software, screen readers
2013-10-24 - disassembled and painted scanner parts
2013-10-17 - investigated glare and LED positioning, plan to paint black, 5mof presentation
2013-10-10 - mounted cameras, built trigger mechanism, wrote capture script
2013-10-03 - picked up scanner and equipment from internet archive
2013-07-27 - internet archive borrows the book scanner
2013-07-18 - 5mof presentation on book scanning
2013-07-08 - discussed scanner location, security, transportation, reimbursement, promotion. full notes: Digital Archivists 2013-07-08
2013-05-23 - scavenged and cut glass from flatbed scanners, plan to attend internet archive friday lunch
2013-04-30 - assembled scanner kit. needs cameras, glass, trigger mechanism, software
2013-04-21 - looked at reetz scanner and prism scanner, discussed content and licensing. will order reetz kit. full notes: Digital Archivists 2013-04-21