Thursday, June 24, 2010

Google Docs Now Does OCR for Images & PDFs


 




         OCR (optical character recognition) fans that are frustrated with the current offering of online services may be pleased to learn that Google Docs will now grab text from images and PDFs quickly and cost free. According to the blog Google Operating System, the new feature has quietly been pushed live by Google after several months of experimentation and development, but will it replace commercial software or online solutions?
When uploading files to their account, Docs users will now see an option to run an OCR scan, which will extract characters and place them within a new text document. As far as accuracy goes, PDFs fair much better than images, especially basic black text on a white background.
googleocr_jun10.jpg
I uploaded a picture of my business card and Google Docs had trouble recognizing the largest text and clearest text on the card, but surprisingly did better with smaller text. A test of a PDF document turned up nearly perfect recognition results, but Google Docs strips nearly all of the formatting out, spewing out the text in a stream of letters and spaces. Other examples from Google Operating System produced decent results, but far from perfect or useful.
Additionally, when scanning a PDF, Google Docs does not save a copy of the PDF, so scanning to text and saving an original file requires two separate uploads. This feature is great for casual OCR users that want to quickly grab text from PDFs and some images or business cards. Those who rely on OCR heavily will likely be disappointed with the features and may have better results with commercial solutions.

No comments:

Post a Comment

Infolinks

ShareThis