The lords of search over at Google recently announced an interesting new feature for PDFs created from scanned pages.
Searchable PDF files are nothing new – and neither are searchable PDF files produced from scanned pages. Simply run OCR and voila – your scanned PDFs are now searchable.
But let’s say you didn’t OCR your files. Maybe you didn’t want to take the time, maybe its impractical, or maybe you didn’t even WANT your files to be searchable (my legal friends should take note here).
Post those PDFs on a publicly accessible site and now Google will OCR and index them for you, no extra charge.
I’m sure there are some limits here. Google isn’t saying, but I’m guessing it won’t download a 500 MB PDF just to discover that there’s no text to index.
I’m also unsure as to the quality of the OCR. I’d have to believe that it’s super-quick, and therefore, less than super-accurate, but then again, Google has computing resources that defy my paltry imagination, so no bets there either.
I’ll be running some tests before long, but I’m curious to know what you think.
Do you WANT your scanned PDFs indexed by Google? Are you tempted to post oceans of scanned content online? Or is this a big yawn, something you thought Google was doing all along, so what’s the big deal?
Originally posted on Duff Johnson’s PDF Perspective blog for acrobatusers.com.
By Duff Johnson