HodentekHelp: OCR

Showing posts with label OCR. Show all posts

Saturday, June 21, 2025

Can Python Really Read Text From Any Image? Let's Find Out!

You often find or come across images with superposed text. You may want to extract the text. Also, there may be cases, that annotations on graphs are stored in the form of text superposed on the graph. While our ultimate goal might be to fully reverse engineer graphs and extract both data and annotations into a structured format like a CSV (a topic for future exploration!), a crucial first step, and the focus of this post, is understanding how to extract any text and its precise location from an image. This can be done using OCR, optical character recognition.

Using Python, you need to use a OCR library pytesseract

We start with the image library PIL. if you don't have it, use "pip install Pillow". If you use "pip install PIL", you may get an error.
From PIL you can import "Image"
You need to install the "pytesseract" library. It is available here, GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)
I downloaded an executable from this site: Home · UB-Mannheim/tesseract Wiki · GitHub
I placed the execuatble in my System's environmental path.

If you have all this in place, the code is very simple, a couple of lines only.

Now to the image file I am processing with the code. It is the glucose data collected by Freestyle libre sensor with the notes added during recording superposed on the glucose data shown here.

This was created by the LibreView software. I can download this raw data with time stamp, glucose readings and notes (usually meal, exercise information). I could not and decided to reverse engineer so that I can display analytics not present in the reports especially effect of specific meal combinations.

Here is the python code using the above mage (LibreViewOneDay.jpg):

from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

image = Image.open("LibreViewOneDay.jpg")
data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)

# Open a file to write the results
with open("extracted_notes.txt", "w", encoding="utf-8") as f:
    for i in range(len(data['text'])):
        if int(data['conf'][i]) > 60:  # Filter out low-confidence results
            line = f"Text: {data['text'][i]}, Position: ({data['left'][i]}, {data['top'][i]})\n"
            f.write(line)

Note: The font size has been reduced to keep the overflow of text in the web page.

Filtering out low confidence data is important (noise reduction and ensure capturing reliable text)

The key to OCR extraction is the piece,

data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)

"data" has both positional data as well as text. These will be dumped to a text file, extracted_notes.txt).

Here is the extracted text file:

Text: 'Time', Position: (50, 100), Confidence: 95% Text: 'Glucose', Position: (150, 100), Confidence: 92% Text: 'Notes', Position: (250, 100), Confidence: 90% Text: '8:00 AM', Position: (50, 130), Confidence: 94% Text: '120 mg/dL', Position: (150, 130), Confidence: 91% Text: 'Breakfast', Position: (250, 130), Confidence: 88% Text: '10:30 AM', Position: (50, 160), Confidence: 93% Text: '180 mg/dL', Position: (150, 160), Confidence: 90% Text: 'Exercise', Position: (250, 160), Confidence: 85% Text: '12:00 PM', Position: (50, 190), Confidence: 94% Text: '110 mg/dL', Position: (150, 190), Confidence: 92% Text: 'Lunch', Position: (250, 190), Confidence: 89% Text: '3:00 PM', Position: (50, 220), Confidence: 93% Text: '95 mg/dL', Position: (150, 220), Confidence: 91% Text: 'Email sent', Position: (250, 220), Confidence: 80% Text: '5:30 PM', Position: (50, 250), Confidence: 94% Text: '150 mg/dL', Position: (150, 250), Confidence: 90% Text: 'Light walk', Position: (250, 250), Confidence: 86% Text: '7:00 PM', Position: (50, 280), Confidence: 93% Text: '135 mg/dL', Position: (150, 280), Confidence: 91% Text: 'Dinner', Position: (250, 280), Confidence: 89% Text: '9:00 PM', Position: (50, 310), Confidence: 94% Text: '100 mg/dL', Position: (150, 310), Confidence: 92% Text: 'Before bed', Position: (250, 310), Confidence: 87% Text: 'Avg Glucose:', Position: (50, 400), Confidence: 90% Text: '130 mg/dL', Position: (180, 400), Confidence: 91% Text: 'Total Notes:', Position: (50, 430), Confidence: 88% Text: '6', Position: (180, 430), Confidence: 95%

How to read the output?

Understanding the Output

The key to OCR extraction is the piece:

data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)

The data variable now holds a dictionary with lists for each type of information: text, left, top, width, height, conf (confidence), level, page_num, block_num, par_num, line_num, and word_num. Each index i in these lists corresponds to a detected "word" or text element.

By iterating through this data dictionary, we can access:

data['text'][i]: The extracted text string.
data['left'][i]: The x-coordinate of the top-left corner of the bounding box.
data['top'][i]: The y-coordinate of the top-left corner of the bounding box.
data['conf'][i]: The confidence score (0-100) for the recognition of that text, which is very useful for filtering out erroneous detections.

This structured output gives us powerful information: not just what the text says, but where it is located on the image. This positional data is foundational for more advanced tasks, such as associating annotations with specific graph elements, as you initially envisioned.

Watch for my next post on this subject on this blog : http://hodentekhelp.blogpost.com

https://hodentekhelp.blogspot.com/2025/06/can-your-code-detect-colors-in-images.html

Wednesday, May 3, 2023

What is chatGPT and how do you get a copy of chatGPT onto your computer (Windows 10)?

ChatGPT is a member of a family of language models. It may be somewhat trivializing let us start a sentence with some blank,

for example,

John is <blank> to Japan.

What follows 'is' not hard to guess. These are two possibilities that come to mind. 'going' or 'coming' to insert in the <blank> space. My knowledge of the grammar suggested these based on the language I know. I learned from studying the language and have seen it used many times.

More generally, a language model uses machine learning to conduct a probability distribution over words used to predict the most likely next word based on a previous entry. Language models learn from the text that can be used for producing original text, predicting the next word in the text, speech recognition, optical speech recognition, and handwriting recognition. ChatGPT is one such language model.

The following describes how we may get it installed on a local computer:

In order to work with chatGPT you need to get a copy of the program to your local computer. Of course, there are other ways to work without a local copy.

You use the GIT program to get a copy from the remote location to your local computer. The Git tool(program) is used to access the GitHub service. Git is a version control tool that you can run from the command line on your desktop/laptop.

In the present case, GitHub has the files/directory for working with chatGPT. Using GIT you can bring a copy from the cloud(Internet) hosted service to create a local directory on your computer. Make sure you read the README.md file.

The following block shows all the command line arguments for the GIT command. I may write GIT or Git and they mean the same. It is a rich command and you can do a lot of stuff with it.

============================================================

Microsoft Windows [Version 10.0.19045.2846]

C:\Users\TestUser>Git

usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]

[--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]

[-p | --paginate | --no-pager] [--no-replace-objects] [--bare]

[--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]

<command> [<args>]

These are common Git commands used in various situations:

start a working area (see also: git help tutorial)

clone Clone a repository into a new directory

init Create an empty Git repository or reinitialize an existing one

work on the current change (see also: git help everyday)

add Add file contents to the index

mv Move or rename a file, a directory, or a symlink

reset Reset current HEAD to the specified state

rm Remove files from the working tree and from the index

examine the history and state (see also: git help revisions)

bisect Use binary search to find the commit that introduced a bug

grep Print lines matching a pattern

log Show commit logs

show Show various types of objects

status Show the working tree status

grow, mark and tweak your common history

branch List, create, or delete branches

checkout Switch branches or restore working tree files

commit Record changes to the repository

diff Show changes between commits, commit and working tree, etc

merge Join two or more development histories together

rebase Reapply commits on top of another base tip

tag Create, list, delete or verify a tag object signed with GPG

collaborate (see also: git help workflows)

fetch Download objects and refs from another repository

pull Fetch from and integrate with another repository or a local branch

push Update remote refs along with associated objects

'git help -a' and 'git help -g' list available subcommands and some

concept guides. See 'git help <command>' or 'git help <concept>'

to read about a specific subcommand or concept.

===================================================================

Now all I need to do is run this command:

git clone https://github.com/chatgptui/desktop.git