Tuesday, December 22, 2015

How do you extract textual information in an image file?

Text Extract app from Microsoft Store can be used to extract textual information in an image file.
This is quite a useful app that you can use to recognize characters in an image (aka OCR). It is limited to roman alphabets and therefore useful for English, French and other roman (Latin) alphabet based languages.

For other languages like Japanese, Hindi, Chinese which do not use Latin alphabet it is not useful. It is best for English and less useful for French for example, as it does not faithfully retrieve the letters with accents: (e.g. ó, ò, ñ)

You can download this app from here:
https://www.microsoft.com/en-us/store/apps/text-extract/9wzdncrdt09w



It is very easy to use. Double click the app in the All Apps view to launch the app as shown.
Then use Open Image to locate the image file in your Folder/File system and it should work immediately as shown. The image is in the top pane and the extracted information (which you can copy and paste to another text file) is in the bottom pane. In this file only the link is in Latin alphabet and therefore extracted.



 This one is in French and you can see all of the text in the image file is extracted to the bottom pane.

 

This is the original page on the Internet. 


This is the extracted text copied over in the next paragraph

  Dollar : le retour bienvenu ä
I 'orthodoxie
Editorial. Pour la premiere fois depuis la crise de
2008, la Réserve fédérale améncaine a relevé,
mercredi 16 décembre, ses taux d'intérät d'un quart
de point. Une décision justifiée par les circonstances
macroeconom qu es_

There are a couple of differences including, the diacritical accent tilde becomes umlaut and accent grave is absent.

Now review this image in the top pane and the extracted text in the bottom. You can see it is very good if the language is English.

 I also tested for few others besides English.

No comments: