testing Kris Shaffer's script for extracting Hypothes.is annotations to Jekyll
The script can be found on Kris’s github. Here, I’m taking his output and banging it into my open notebook templates. I’ll need to fix some layout stuff I’m sure.
Here goes!
Experiment - Determining Bad OCR via Automated Spellcheck
all editions
6600 files were downloaded; 333 files appear to be these missing editions with the placeholder text. I have not yet manually verified all of this… which is partly the point, right? (shawn.graham)
Experiment - Determining Bad OCR via Automated Spellcheck
each text file
6267 print editions, from 1893 - 2010 (shawn.graham)
Experiment - Determining Bad OCR via Automated Spellcheck
.75 range
that is, from .72 to .9. They all have the same placeholder text, but the quality of the ocr makes some consistent errors, which is interesting. (shawn.graham)
Created: 31 May 2016 | Modified: 31 May 2016 | History | Permalink |
- Tags:
- experiment
- Annotations