Tesseract Vs EasyOcr
- General
Tesseract Vs EasyOcr
In this article, we will use and compare the accuracy of Tesseract and EasyOcr as free popular OCR Engines.
WHY DO WE NEED OCR
Optical Character Recognition (OCR) becomes more popular as document digitalization evolves. More and more companies are looking for automating documentation, and OCR plays a vital role in processing image-based documents. Common use-cases are:
- Extracting text from Pictures
- Converting handwritten messages or notes
GETTING STARTED WITH TESSERACT OCR
The following steps would guide you through setting up Tesseract on linux, performing ocr action on images.
1. Installing Tesseract
To install Tesseract on Debian or Ubuntu Linux distribution, use apt as shown below
apt-get install tesseract-ocr
2. Installing Language (Optional)
apt-get install tesseract-ocr-all
Usually, the tesseract comes with the english pack by default if you want all the language packs to be downloaded, you can run the following command.
3. Usage
tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]
For example OCR an image and printing on console
tesseract /home/auriga/Desktop/picture.png stdout
For more options like language, order, segmentation modes.
https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html
4. Language Specific Wrapper
Tess4j:
A Java JNA wrapper for Tesseract OCR API. Tess4J is released and distributed under the Apache License, v2.0 and is also available from Maven Central Repository.
Maven Dependency:
<dependency> <groupId>net.sourceforge.tess4j</groupId> <artifactId>tess4j</artifactId> <version>5.2.0</version> </dependency>
Features:
The library provides optical character recognition (OCR) support for:
- TIFF, JPEG, GIF, PNG, and BMP image formats
- Multi-page TIFF images
- PDF document format
PYTESSERACT:
Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine. It is also useful as a stand-alone invocation script to tesseract.
Pip Installation:
pip install pytesseract
pip install -U git+https://github.com/madmaze/pytesseract.git
Features:
The library provides optical character recognition (OCR) support for:
- It can read all image types supported by the Pillow and Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff, and others.
- Additionally, if used as a script, Python-tesseract will print the recognized text instead of writing it to a file.
GETTING STARTED WITH EasyOCR
The following steps would guide you through setting up EasyOCR on linux, performing ocr action on images.
Pre-Requisite: You need Python and PyTorch pre-installed for further process.
1. Installing Pip
To install pip package manager on Debian or Ubuntu Linux distribution, use apt as shown below
apt install python3-pip
2. Installing EasyOCR
To install EasyOCR on Debian or Ubuntu Linux distribution, use pip as shown below
pip install easyocr
Or
pip install git+git://github.com/jaidedai/easyocr.git
2. Usage
easyocr -l en --detail 0 --paragraph True -f image.png
For example OCR an image and printing on console
easyocr -l en --detail 0 -f /home/auriga/Downloads/temp.jpg
For more options like language list, detector, recognizer, etc.
https://www.jaided.ai/easyocr/documentation/
Conclusions
As per my testing,
- Tesseract is preferable for CPU wheras EasyOCR for GPU machine.
- Tesseract works better on character level, while EasyOCR does a better job on words.
Related content
Toll mangement and command centre with TMCC
We’re passionately committed to helping our clients and their customers thrive, working side by side to drive customer value and results..
A Smarter Health Safety Solution
We’re passionately committed to helping our clients and their customers thrive, working side by side to drive customer value and results..
Building fastest loan portal in India
We’re passionately committed to helping our clients and their customers thrive, working side by side to drive customer value and results..
Toll mangement and command centre with TMCC
We’re passionately committed to helping our clients and their customers thrive, working side by side to drive customer value and results...
Toll mangement and command centre with TMCC
We’re passionately committed to helping our clients and their customers thrive, working side by side to drive customer value and results..
Entreprise IT Transformation and Automation
We understand user and market, create product strategy and design experience for customers and employees to make breakthrough digital products and services