System supporting data collection and processing using language models and OCR technology
DOI:
https://doi.org/10.34767/SIMIS.2026.01.02Keywords:
OCR, Language models, Text analysis, Image processing, Flask, MySQL, Tesseract, GPT-4, Gemini, LevenshteinAbstract
This article presents a comprehensive system supporting real-time collection and processing of visual data using OCR technology and language models. The research compares six text recognition tools: local engines (Tesseract, EasyOCR), external services (OCR.space), and multimodal language models (GPT-4, Gemini 1.5 Flash, Claude 3 Haiku). The study demonstrates that the effectiveness of the technology depends on the data type. Multimodal language models achieved significantly higher accuracy in analyzing complex data (such as handwriting), whereas for standard digital text, local OCR solutions offered comparable precision with significantly faster processing times. The Flask-based web application with MySQL enables efficient data management. Levenshtein distance metric was used for accuracy measurement. Results indicate the validity of a hybrid approach, integrating the speed of traditional OCR with the semantic capabilities of modern AI models.
References
Dokumentacja Anthropic - docs.anthropic.com
Dokumentacja bazy danych MySQL - dev.mysql.com/doc
Dokumentacja Flask - flask.palletsprojects.com/en/stable/
Dokumentacja Gemini - ai.google.dev/gemini-api
Dokumentacja OpenAI - platform.openai.com/docs/concepts/dostęp źródła
Dokumentacja Python - docs.python.org/3
Grinberg M., Flask. Tworzenie aplikacji internetowych w Pythonie, Helion.
Levenshtein distance - algorytm porównywania tekstów, en.wikipedia.org/wiki/Levenshtein_distance
Matplotlib biblioteka do generowania wykresów - matplotlib.org/3.5.3/index.html
OCR Space - narzędzie do rozpoznawania tekstu - ocr.space
Projekt EasyOCR na GitHub - github.com/JaidedAI/EasyOCR
Projekt Tesseract OCR na GitHub – github.com/tesseractocr/tesseract
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.