Install tesseract for windows — Ваш верный помощник с OS Windows

What is Tesseract OCR?

Tesseract is an open-source software library, released under Apache license agreement. It was originally developed by Hewlett Packard in 1980s. It is a text recognition tool primarily used for identifying and extracting texts from images. Tesseract OCR provides a command prompt interface for performing this functionality.

How to Download Tesseract OCR in Windows

Download Tesseract Installer for Windows
Install Tesseract OCR
Add installation path to Environment Variables
Run Tesseract OCR

1. Download Tesseract Installer for Windows

To use Tesseract command on Windows, we first need to download Tesseract OCR binaries .exe Windows Installer.

There are many places where people can download the latest version of Tesseract OCR. Once such place is from UB Mannheim, which is forked from tesseract-ocr/tesseract (Main Repository).

Tesseract Wiki

Download the tesseract-ocr-w64-setup-5.3.0.20221222.exe (64 bit) Windows Installer.

Tesseract can be installed in Python prompt on macOS using either of the commands below:

brew install tesseract

sudo port install tesseract

2. Install Tesseract OCR

Next, we’ll install Tesseract using the .exe file that we downloaded in the previous step. Launch the .exe installer to start Tesseract installation.

Installer Language

Once the unpacking of the setup is completed, the installer’s language data dialog will appear. You can install Tesseract to use multiple languages by selecting additional language packs, but here we’ll just install the language data for the English language.

Tesseract Installer

Click OK and the Installer language for Tesseract OCR is set.

Tesseract OCR Setup

Next, the setup wizard will appear. This Setup Wizard will guide the Tesseract installation for Windows.

Install Tesseract, Figure 3: Tesseract OCR

Tesseract OCR Setup Wizard

Click Next to continue the installation.

Accept License Agreement

Tesseract OCR is licensed under Apache License Version 2.0. As it is open source and free to use, you can redistribute and modify versions of Tesseract without any loyalty concerns.

Install Tesseract, Figure 4: Tesseract License

Tesseract OCR is licensed under Apache License v2.0. Please accept this license to continue with the installation.

Click I Agree to proceed to installation.

Choose Users

You can choose to install Tesseract for multiple users or for a single user.

Choose to install Tesseract OCR for the Current User (you) or for all user accounts

Click Next to choose components to install with Tesseract.

Choose Components

From the components list to install, ScrollView, Training Tools, Shortcuts creation, and Language data are all selected by default. We will keep all of the default selected options. You can choose any or skip any component based on the needs. Usually all are necessary to install.

Install Tesseract, Figure 6: Tesseract Components

Here, you can choose to include or exclude Tesseract OCR components. For the best results, continue the installation with the default components selected.

Click Next to choose installation location.

Choose Installation Location

Next, we’ll choose the location to install Tesseract. Make sure you copy the destination folder path. We will need this later to add the installation location to the machine’s path Environment Variable.

Install Tesseract, Figure 7: Tesseract Install Location

Select a install location for the Tesseract OCR library, and remember this location for later.

Click Next to further setup the installation of Tesseract.

This is the last step in which we will create shortcuts in Start menu. You can name the folder anything but I’ve kept it the same as default.

Install Tesseract, Figure 8: Tesseract Start Menu

Choose the name of Tesseract OCR’s Start Menu Folder

Now, click Install and wait for the installation to complete. Once the installation is done, following screen will appear. Click Finish and we are done with installing Tesseract OCR in Windows successfully.

Install Tesseract, Figure 9: Tesseract Installer

Tesseract OCR Installation is now complete.

3. Add Installation Path to System Environment Variables

Now, we will add the Tesseract installation path to Windows’ Environment Variables.

In the Start menu, type «environment variables» or «advanced system settings«

Install Tesseract, Figure 10: System Path Variables

The Windows System Properties Dialog Box

System Properties

Once the System Properties dialog box opens, click on the Advanced, and then click the Environment Variables button, located towards the bottom right of the screen.

The Environment Variables dialog box will be presented to you.

Environment Variables

Under System variables, click on the Path variable.

Accessing the Windows’ System Environment Variables

Now, click Edit.

Add Tesseract OCR for Windows Installation Directory to Environment Variables

From the Edit environment variable dialog box, click New. Paste the installation location path which was copied during the second step, and click OK.

Install Tesseract, Figure 12: Edit Environment Variable

Edit Windows’ Path System Environment Variable by adding an entry that includes the Absolute path to the Tesseract OCR installation

That’s it! We have successfully downloaded, installed, and set the environment variable for Tesseract OCR in Windows machine.

4. Run Tesseract OCR

To check that Tesseract OCR for Windows was successfully installed and added to Environment Variables, open Command prompt (cmd) on your Windows machine, then run the «tesseract» command. If everything worked fine, then a quick explanation usage guide must be displayed with OCR and single options such as Tesseract version.

Install Tesseract, Figure 13: Edit Environment Variable

Run the tesseract command in Windows Commandline (or Windows Powershell) to make sure that the above installation steps were done correctly. The console output is the expected result of a successful Windows installation.

Congratulations! We have successfully installed Tesseract OCR for Windows.

IronOCR Library

IronOCR is a Tesseract-based C# library that allows .NET software developers to identify and extract text from images and PDF documents. It is purely built in .NET, using the most advanced Tesseract engine known anywhere.

Install with NuGet Package Manager

Installing IronOCR in Visual Studio or using Command line with the NuGet Package Manager is very easy. In Visual Studio, navigate to the Menu options with:

Tools > NuGet Package Manager > Package Manager Console

Then in Command line, type the following command:

This will install IronOCR with ease and now you can use it to extract its full potential.

You can also download other IronOCR NuGet Packages for different platforms:

Windows: https://www.nuget.org/packages/IronOcr
Linux: https://www.nuget.org/packages/IronOcr.Linux
MacOs: https://www.nuget.org/packages/IronOcr.MacOs
MacOs ARM https://www.nuget.org/packages/IronOcr.MacOs.ARM

IronOCR with Tesseract 5

The below sample code shows how easy it is to use IronOCR Tesseract to read text from an image and perform OCR using C#.

string Text = new IronTesseract().Read(@"test-files/redacted-employmentapp.png").Text;
Console.WriteLine(Text); // Printed text

string Text = new IronTesseract().Read(@"test-files/redacted-employmentapp.png").Text;
Console.WriteLine(Text); // Printed text

Dim Text As String = (New IronTesseract()).Read("test-files/redacted-employmentapp.png").Text
Console.WriteLine(Text) ' Printed text

$vbLabelText $csharpLabel

If you want more robust code, then the following should help you in achieving the same task:

using IronOcr;

var Ocr = new IronTesseract();
using (var Input = new OcrInput()){
    Input.AddImage("test-files/redacted-employmentapp.png");
    // you can add any number of images
    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text);
}

using IronOcr;

var Ocr = new IronTesseract();
using (var Input = new OcrInput()){
    Input.AddImage("test-files/redacted-employmentapp.png");
    // you can add any number of images
    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text);
}

Imports IronOcr

Private Ocr = New IronTesseract()
Using Input = New OcrInput()
	Input.AddImage("test-files/redacted-employmentapp.png")
	' you can add any number of images
	Dim Result = Ocr.Read(Input)
	Console.WriteLine(Result.Text)
End Using

$vbLabelText $csharpLabel

Input Image

Sample input image for IronOCR processing

Ouput Image

The output is printed on the Console as:

Install Tesseract, Figure 15: Output Image

The console returned from the execution of IronOCR on the sample image.

Why Choose IronOCR?

IronOCR is very easy to install. It provides a complete and well-documented .NET software library.

IronOCR achieves a 99.8% text-detection accuracy rate without the need for other third-party libraries or webservices.

It also provides multithreading support. Most importantly, IronOCR can work with well over 125 international languages.

Conclusion

In this tutorial, we learned how to download and install Tesseract OCR for Windows machine. Tesseract OCR is an excellent software for C++ developers but however it has some limits. It is not fully developed for .NET. Scanned image files or photographed images need to be processed and standardized to high-resolution, keeping it free from digital noise. Only then, Tesseract can accurately work on them.

In contrast, IronOCR can work with any image provided whether scanned or photographed, with just a single line of code. IronOCR also uses Tesseract as its internal OCR engine, but it is very finely tuned to get the best out of Tesseract especially built for C#, with a high performance and improved features.

You can download the IronOCR software product from this link.

Источник

Tesseract is an open source text recognition (OCR) Engine, available under the Apache 2.0 license. It can be used directly, or (for programmers) using an API to extract printed text from images. It supports a wide variety of languages.

Tesseract doesn’t have a built-in GUI, but there are several available from the 3rdParty page.

Installation

There are two parts to install, the engine itself, and the traineddata for the languages.

Tesseract is available directly from many Linux distributions. The package is generally called ‘tesseract’ or ‘tesseract-ocr’ — search your distribution’s repositories to find it.

Packages for over 130 languages and over 35 scripts are also available directly from the Linux distributions. The language traineddata packages are called ‘tesseract-ocr-langcode’ and ‘tesseract-ocr-script-scriptcode’, where langcode is three letter language code and scriptcode is four letter script code.

Examples: tesseract-ocr-eng (English), tesseract-ocr-ara (Arabic), tesseract-ocr-chi-sim (Simplified Chinese), tesseract-ocr-script-latn (Latin Script), tesseract-ocr-script-deva (Devanagari script), etc.

** FOR EXPERTS ONLY. **

If you are experimenting with OCR Engine modes, you will need to manually install language training data beyond what is available in your Linux distribution.

Various types of training data can be found on GitHub. Unpack and copy the .traineddata file into a ‘tessdata’ directory. The exact directory will depend both on the type of training data, and your Linux distribution. Possibilities are /usr/share/tesseract-ocr/tessdata or /usr/share/tessdata or /usr/share/tesseract-ocr/4.00/tessdata.

Training data for obsolete Tesseract versions =< 3.02 reside in another location.

Platforms

If Tesseract is not available for your distribution, or you want to use a newer version than they offer, you can compile your own.

Ubuntu

You can install Tesseract and its developer tools on Ubuntu by simply running:

sudo apt install tesseract-ocr
sudo apt install libtesseract-dev

Note for Ubuntu users: In case apt is unable to find the package try adding universe entry to the sources.list file as shown below.

sudo vi /etc/apt/sources.list

Copy the first line "deb http://archive.ubuntu.com/ubuntu bionic main" and paste it as shown below on the next line.
If you are using a different release of ubuntu, then replace bionic with the respective release name.

deb http://archive.ubuntu.com/ubuntu bionic universe

Debian packages

Tesseract 4
Tesseract 5
Tesseract 5 (devel)

Raspbian packages

Tesseract 4
Tesseract 5
Tesseract 5 (devel)

Ubuntu packages

Tesseract 4
Tesseract 5
Tesseract 5 (devel)

Ubuntu ppa

Tesseract 4
Tesseract 5
Tesseract 5 (devel-daily)

RHEL/CentOS/Scientific Linux, Fedora, openSUSE packages

Tesseract 4
Tesseract 5

See Installation on OpenSuse page for detailed instructions.

AppImage

Instruction

Download AppImage from releases page
Open your terminal application, if not already open
Browse to the location of the AppImage
Make the AppImage executable:
$ chmod a+x tesseract*.AppImage
Run it:
./tesseract*.AppImage -l eng page.tif page.txt

AppImage compatibility

Debian: ≥ 10
Fedora: ≥ 29
Ubuntu: ≥ 18.04
CentOS ≥ 8
openSUSE Tumbleweed

Included traineddata files

deu — German
eng — English
fin — Finnish
fra — French
osd — Script and orientation
por — Portuguese
rus — Russian
spa — Spanish

snap

For distributions that are supported by snapd you may also run the following command to install the tesseract built binaries(Don’t have snapd installed?):

sudo snap install --channel=edge tesseract

The traineddata is currently not shipped with the snap package and must be placed manually to ~/snap/tesseract/current.

macOS

You can install Tesseract using either MacPorts or Homebrew.

A macOS wrapper for the Tesseract API is also available at Tesseract macOS.

MacPorts

To install Tesseract run this command:

sudo port install tesseract

To install any language data, run:

sudo port install tesseract-<langcode>

List of available langcodes can be found on MacPorts tesseract page.

Homebrew

To install Tesseract run this command:

The tesseract directory can then be found using brew info tesseract,
e.g. /usr/local/Cellar/tesseract/3.05.02/share/tessdata/.

Windows

Installer for Windows for Tesseract 3.05, Tesseract 4 and Tesseract 5 are available from Tesseract at UB Mannheim. These include the training tools. Both 32-bit and 64-bit installers are available.

An installer for the OLD version 3.02 is available for Windows from our download page.
This includes the English training data.
If you want to use another language, download the appropriate training data,
unpack it using 7-zip, and copy the .traineddata file into the ‘tessdata’ directory, probably C:\Program Files\Tesseract-OCR\tessdata.

To access tesseract-OCR from any location you may have to add the directory where the tesseract-OCR binaries are located to the Path variables, probably C:\Program Files\Tesseract-OCR.

Experts can also get binaries build with Visual Studio from the build artifacts of the Appveyor Continuous Integration.

Cygwin

Released version >= 3.02 of tesseract-ocr are part of Cygwin

The latest version available is 4.1.0. Please see announcement.

MSYS2

Install tesseract-OCR:

 pacman -S mingw-w64-{i686,x86_64}-tesseract-ocr

and the data files:

 pacman -S mingw-w64-{i686,x86_64}-tesseract-data-eng

In the above command, “eng” may be replaced with the ISO 639 3-letter language code for supported languages. For a list of available language packages use:

  pacman -Ss tesseract-data

Other Platforms

Tesseract may work on more exotic platforms too. You can either try compiling it yourself, or take a look at the list of other projects using Tesseract.

Running Tesseract

Tesseract is a command-line program, so first open a terminal or command prompt. The command is used like this:

  tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

So basic usage to do OCR on an image called ‘myscan.png’ and save the result to ‘out.txt’ would be:

Or to do the same with German:

  tesseract myscan.png out -l deu

It can even be used with multiple languages traineddata at a time eg. English and German:

  tesseract myscan.png out -l eng+deu

Tesseract also includes a hOCR mode, which produces a special HTML file with the coordinates of each word. This can be used to create a searchable pdf, using a tool such as Hocr2PDF. To use it, use the ‘hocr’ config option, like this:

  tesseract myscan.png out hocr

You can also create a searchable pdf directly from tesseract ( versions >=3.03):

  tesseract myscan.png out pdf

More information about the various options is available in the Tesseract manpage.

Other Languages

Tesseract has been trained for many languages, check for your language in the Tessdata repository.

It can also be trained to support other languages and scripts; for more details see TrainingTesseract.

Development

Tesseract can also be used in your own project, under the terms of the Apache License 2.0. It has a fully featured API, and can be compiled for a variety of targets including Android and the iPhone. See the 3rdParty page for a sample of what has been done with it. Note that as yet there are very few 3rdParty Tesseract OCR projects being developed for Mac (with the only one being Tesseract macOS.md), although there are several online OCR services that can be used on Mac that may use Tesseract as their OCR engine.

Also, it is free software, so if you want to pitch in and help, please do!
If you find a bug and fix it yourself, the best thing to do is to attach the patch to your bug report in the Issues List

Support

First read the documentation, particularly the FAQ to see if your problem is addressed there.
If not, search the Tesseract user forum or the
Tesseract developer forum, and if you still can’t find what you need, please ask us there.

Источник

- GitHub Copilot
  
  Write better code with AI
- GitHub Advanced Security
  
  Find and fix vulnerabilities
- Actions
  
  Automate any workflow
- Codespaces
  
  Instant dev environments
- Issues
  
  Plan and track work
- Code Review
  
  Manage code changes
- Discussions
  
  Collaborate outside of code
- Code Search
  
  Find more, search less
Explore
- Learning Pathways
- Events & Webinars
- Ebooks & Whitepapers
- Customer Stories
- Partners
- Executive Insights
- GitHub Sponsors
  
  Fund open source developers
- The ReadME Project
  
  GitHub community articles
- Enterprise platform
  
  AI-powered developer platform
Pricing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Источник

In this tutorial, we will configure our development environment for OCR. Once your machine is configured, we’ll start writing Python code to perform OCR, paving the way for you to develop your own OCR applications.

A text-image dataset is useful when installing and testing Tesseract and PyTesseract. It helps in verifying the successful installation and allows for the initial exploration of these OCR tools.

Roboflow has free tools for each stage of the computer vision pipeline that will streamline your workflows and supercharge your productivity.

Sign up or Log in to your Roboflow account to access state of the art dataset libaries and revolutionize your computer vision pipeline.

You can start by choosing your own datasets or using our PyimageSearch’s assorted library of useful datasets.

Bring data in any of 40+ formats to Roboflow, train using any state-of-the-art model architectures, deploy across multiple platforms (API, NVIDIA, browser, iOS, etc), and connect to applications or 3rd party tools.

To learn how to configure your development environment, just keep reading.

Learning Objectives

In this tutorial, you will:

Learn how to install the Tesseract OCR engine on your machine
Learn how to create a Python virtual environment (a best practice in Python development)
Install the necessary Python packages you need to run the examples in this tutorial (and develop OCR projects of your own)

OCR Development Environment Configuration

In the first part of this tutorial, you will learn how to install the Tesseract OCR engine on your system. From there, you’ll learn how to create a Python virtual environment and then install OpenCV, PyTesseract, and all the other necessary Python libraries you’ll need for OCR, computer vision, and deep learning.

A Note on Install Instructions

The Tesseract OCR engine has existed for over 30 years. The install instructions for Tesseract OCR are fairly stable. Therefore I have included the steps.

With that said, let’s install the Tesseract OCR engine on your system!

Installing Tesseract

Inside this tutorial, you will learn how to install Tesseract on your machine.

Installing Tesseract on macOS

Installing the Tesseract OCR engine on macOS is quite simple if you use the Homebrew package manager.

Use the link above to install Homebrew on your system if it is not already installed.

From there, all you need to do is use the brew command to install Tesseract:

 $ brew install tesseract

Provided that the above command does not exit with an error, you should now have Tesseract installed on your macOS machine.

Installing Tesseract on Ubuntu

Installing Tesseract on Ubuntu 18.04 is easy — all we need to do is utilize apt-get:

 $ sudo apt install tesseract-ocr

The apt-get package manager will automatically install any prerequisite libraries or packages required for Tesseract.

Installing Tesseract on Windows

Please note that the PyImageSearch team and I do not officially support Windows, except for customers who use our pre-configured Jupyter/Colab Notebooks, which you can find at PyImageSearch University. These notebooks run on all environments, including macOS, Linux, and Windows.

We instead recommend using a Unix-based machine such as Linux/Ubuntu or macOS, both of which are better suited for developing computer vision, deep learning, and OCR projects.

That said, if you wish to install Tesseract on Windows, we recommend that you follow the official Windows install instructions put together by the Tesseract team.

Verifying Your Tesseract Install

Provided that you were able to install Tesseract on your operating system, you can verify that Tesseract is installed by using the tesseract command:

 $ tesseract -v
 tesseract 4.1.1
  leptonica-1.79.0
   libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1
  Found AVX2
  Found AVX
  Found FMA
  Found SSE

Your output should look similar to mine.

Creating a Python Virtual Environment for OCR

Python virtual environments are a best practice for Python development, and we recommend using them to have more reliable development environments.

Installing the necessary packages for Python virtual environments, as well as creating your first Python virtual environment, can be found in our pip Install OpenCV tutorial. We recommend you follow that tutorial to create your first Python virtual environment.

Installing OpenCV and PyTesseract

Now that you have your Python virtual environment created and ready, we can install both OpenCV and PyTesseract, the Python package that interfaces with the Tesseract OCR engine.

Both of these can be installed using the following commands:

 $ workon <name_of_your_env> # required if using virtual envs
 $ pip install numpy opencv-contrib-python
 $ pip install pytesseract

Next, we’ll install other Python packages we’ll need for OCR, computer vision, deep learning, and machine learning.

Installing Other Computer Vision, Deep Learning, and Machine Learning Libraries

Let’s now install some other supporting computer vision and machine learning/deep learning packages that we’ll need throughout the rest of this tutorial:

 $ pip install pillow scipy
 $ pip install scikit-learn scikit-image
 $ pip install imutils matplotlib
 $ pip install requests beautifulsoup4
 $ pip install h5py tensorflow textblob

What’s next? We recommend PyImageSearch University.

Course information:
86+ total classes • 115+ hours hours of on-demand code walkthrough videos • Last updated: May 2025
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you’re serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you’ll find:

✓ 86+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 86 Certificates of Completion
✓ 115+ hours hours of on-demand video
✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this tutorial, you learned how to install the Tesseract OCR engine on your machine. You also learned how to install the required Python packages you will need to perform OCR, computer vision, and image processing.

Now that your development environment is configured, we will write an OCR code in our next tutorial!

Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF

Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.

Источник