Showing posts with label OCR. Show all posts
Showing posts with label OCR. Show all posts

Saturday, June 21, 2025

Can Python Really Read Text From Any Image? Let's Find Out!

 You often find or come across images with superposed text. You may want to extract the text. Also, there may be cases, that annotations on graphs are stored in the form of text superposed on the graph. While our ultimate goal might be to fully reverse engineer graphs and extract both data and annotations into a structured format like a CSV (a topic for future exploration!), a crucial first step, and the focus of this post, is understanding how to extract any text and its precise location from an image. This can be done using OCR, optical character recognition.

Using Python, you need to use a OCR library pytesseract

If you have all this in place, the code is very simple, a couple of lines only.

Now to the image file I am processing with the code. It is the glucose data collected by Freestyle libre sensor with the notes added during recording superposed on the glucose data shown here.

This was created by the LibreView software. I can download this raw data with time stamp, glucose readings and notes (usually meal, exercise information). I could not and decided to reverse engineer so that I can display analytics not present in the reports especially effect of specific meal combinations.


Here is the python code using the above mage (LibreViewOneDay.jpg):

from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd =
r'C:\Program Files\Tesseract-OCR\tesseract.exe'

image = Image.open("LibreViewOneDay.jpg")
data = pytesseract.image_to_data(image,
output_type=pytesseract.Output.DICT)

# Open a file to write the results
with open("extracted_notes.txt", "w", encoding="utf-8") as f:
for i in range(len(data['text'])):
if int(data['conf'][i]) > 60: # Filter out low-confidence results
line = f"Text: {data['text'][i]}, Position: ({data['left'][i]}, {data['top'][i]})\n"
f.write(line)

Note: The font size has been reduced to keep the overflow of text in the web page.

Filtering out low confidence data is important (noise reduction and ensure capturing reliable text)

The key to OCR extraction is the piece, 

data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)

"data" has both positional data as well as text. These will be dumped to a text file, extracted_notes.txt).

Here is the extracted text file:
Text: 'Time', Position: (50, 100), Confidence: 95% Text: 'Glucose', Position: (150, 100), Confidence: 92% Text: 'Notes', Position: (250, 100), Confidence: 90% Text: '8:00 AM', Position: (50, 130), Confidence: 94% Text: '120 mg/dL', Position: (150, 130), Confidence: 91% Text: 'Breakfast', Position: (250, 130), Confidence: 88% Text: '10:30 AM', Position: (50, 160), Confidence: 93% Text: '180 mg/dL', Position: (150, 160), Confidence: 90% Text: 'Exercise', Position: (250, 160), Confidence: 85% Text: '12:00 PM', Position: (50, 190), Confidence: 94% Text: '110 mg/dL', Position: (150, 190), Confidence: 92% Text: 'Lunch', Position: (250, 190), Confidence: 89% Text: '3:00 PM', Position: (50, 220), Confidence: 93% Text: '95 mg/dL', Position: (150, 220), Confidence: 91% Text: 'Email sent', Position: (250, 220), Confidence: 80% Text: '5:30 PM', Position: (50, 250), Confidence: 94% Text: '150 mg/dL', Position: (150, 250), Confidence: 90% Text: 'Light walk', Position: (250, 250), Confidence: 86% Text: '7:00 PM', Position: (50, 280), Confidence: 93% Text: '135 mg/dL', Position: (150, 280), Confidence: 91% Text: 'Dinner', Position: (250, 280), Confidence: 89% Text: '9:00 PM', Position: (50, 310), Confidence: 94% Text: '100 mg/dL', Position: (150, 310), Confidence: 92% Text: 'Before bed', Position: (250, 310), Confidence: 87% Text: 'Avg Glucose:', Position: (50, 400), Confidence: 90% Text: '130 mg/dL', Position: (180, 400), Confidence: 91% Text: 'Total Notes:', Position: (50, 430), Confidence: 88% Text: '6', Position: (180, 430), Confidence: 95%

How to read the output?

Understanding the Output

The key to OCR extraction is the piece:

data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)

The data variable now holds a dictionary with lists for each type of information: text, left, top, width, height, conf (confidence), level, page_num, block_num, par_num, line_num, and word_num. Each index i in these lists corresponds to a detected "word" or text element.

By iterating through this data dictionary, we can access:

  • data['text'][i]: The extracted text string.

  • data['left'][i]: The x-coordinate of the top-left corner of the bounding box.

  • data['top'][i]: The y-coordinate of the top-left corner of the bounding box.

  • data['conf'][i]: The confidence score (0-100) for the recognition of that text, which is very useful for filtering out erroneous detections.

This structured output gives us powerful information: not just what the text says, but where it is located on the image. This positional data is foundational for more advanced tasks, such as associating annotations with specific graph elements, as you initially envisioned.

Watch for my next post on this subject on this blog :  http://hodentekhelp.blogpost.com




Wednesday, May 3, 2023

What is chatGPT and how do you get a copy of chatGPT onto your computer (Windows 10)?

ChatGPT is a member of a family of language models. It may be somewhat trivializing let us start a sentence with some blank,

for example,

John is <blank> to Japan. 

What follows 'is' not hard to guess. These are two possibilities that come to mind. 'going' or 'coming' to insert in the <blank> space. My knowledge of the grammar suggested these based on the language I know. I learned from studying the language and have seen it used many times. 

More generally, a language model uses machine learning to conduct a probability distribution over words used to predict the most likely next word based on a previous entry. Language models learn from the text that can be used for producing original text, predicting the next word in the text, speech recognition, optical speech recognition, and handwriting recognition. ChatGPT is one such language model.

The following describes how we may get it installed on a local computer: 

In order to work with chatGPT you need to get a copy of the program to your local computer. Of course, there are other ways to work without a local copy.

You use the GIT program to get a copy from the remote location to your local computer. The Git tool(program) is used to access the GitHub service. Git is a version control tool that you can run from the command line on your desktop/laptop. 

In the present case, GitHub has the files/directory for working with chatGPT. Using GIT you can bring a copy from the cloud(Internet) hosted service to create a local directory on your computer. Make sure you read the README.md file.


The following block shows all the command line arguments for the GIT command. I may write GIT or Git and they mean the same. It is a rich command and you can do a lot of stuff with it.

============================================================

Microsoft Windows [Version 10.0.19045.2846]

(c) Microsoft Corporation. All rights reserved.


C:\Users\TestUser>Git

usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]

           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]

           [-p | --paginate | --no-pager] [--no-replace-objects] [--bare]

           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]

           <command> [<args>]


These are common Git commands used in various situations:


start a working area (see also: git help tutorial)

   clone      Clone a repository into a new directory

   init       Create an empty Git repository or reinitialize an existing one


work on the current change (see also: git help everyday)

   add        Add file contents to the index

   mv         Move or rename a file, a directory, or a symlink

   reset      Reset current HEAD to the specified state

   rm         Remove files from the working tree and from the index


examine the history and state (see also: git help revisions)

   bisect     Use binary search to find the commit that introduced a bug

   grep       Print lines matching a pattern

   log        Show commit logs

   show       Show various types of objects

   status     Show the working tree status


grow, mark and tweak your common history

   branch     List, create, or delete branches

   checkout   Switch branches or restore working tree files

   commit     Record changes to the repository

   diff       Show changes between commits, commit and working tree, etc

   merge      Join two or more development histories together

   rebase     Reapply commits on top of another base tip

   tag        Create, list, delete or verify a tag object signed with GPG


collaborate (see also: git help workflows)

   fetch      Download objects and refs from another repository

   pull       Fetch from and integrate with another repository or a local branch

   push       Update remote refs along with associated objects


'git help -a' and 'git help -g' list available subcommands and some

concept guides. See 'git help <command>' or 'git help <concept>'

to read about a specific subcommand or concept.

===================================================================

Now all I need to do is run this command:

git clone https://github.com/chatgptui/desktop.git