System supporting data collection and processing using language models and OCR technology

Authors

  • Łukasz Łapiak Kazimierz Wielki University
  • Piotr Kotlarz Kazimierz Wielki University

DOI:

https://doi.org/10.34767/SIMIS.2026.01.02

Keywords:

OCR, Language models, Text analysis, Image processing, Flask, MySQL, Tesseract, GPT-4, Gemini, Levenshtein

Abstract

This article presents a comprehensive system supporting real-time collection and processing of visual data using OCR technology and language models. The research compares six text recognition tools: local engines (Tesseract, EasyOCR), external services (OCR.space), and multimodal language models (GPT-4, Gemini 1.5 Flash, Claude 3 Haiku). The study demonstrates that the effectiveness of the technology depends on the data type. Multimodal language models achieved significantly higher accuracy in analyzing complex data (such as handwriting), whereas for standard digital text, local OCR solutions offered comparable precision with significantly faster processing times. The Flask-based web application with MySQL enables efficient data management. Levenshtein distance metric was used for accuracy measurement. Results indicate the validity of a hybrid approach, integrating the speed of traditional OCR with the semantic capabilities of modern AI models.

References

Dokumentacja Anthropic - docs.anthropic.com

Dokumentacja bazy danych MySQL - dev.mysql.com/doc

Dokumentacja Flask - flask.palletsprojects.com/en/stable/

Dokumentacja Gemini - ai.google.dev/gemini-api

Dokumentacja OpenAI - platform.openai.com/docs/concepts/dostęp źródła

Dokumentacja Python - docs.python.org/3

Grinberg M., Flask. Tworzenie aplikacji internetowych w Pythonie, Helion.

Levenshtein distance - algorytm porównywania tekstów, en.wikipedia.org/wiki/Levenshtein_distance

Matplotlib biblioteka do generowania wykresów - matplotlib.org/3.5.3/index.html

OCR Space - narzędzie do rozpoznawania tekstu - ocr.space

Projekt EasyOCR na GitHub - github.com/JaidedAI/EasyOCR

Projekt Tesseract OCR na GitHub – github.com/tesseractocr/tesseract

Downloads

Published

2026-04-16

How to Cite

System supporting data collection and processing using language models and OCR technology. (2026). Studies and Materials in Applied Computer Science, 18(1), 11-15. https://doi.org/10.34767/SIMIS.2026.01.02