How to Install Languages in gImageReader (Fix “Data Not Found”)
gImageReader relies on Tesseract .traineddata files. By default, only English is installed to keep the download size small. You must manually download additional language packs (like Spanish, French, German, or Russian) from the official GitHub repository and place them in the tessdata folder inside your installation directory.
You have just installed gImageReader, ready to convert your scanned PDF documents or images into editable text. You import a document, perhaps in German, French, or Italian, and you look at the “Source Language” dropdown menu.
But there is a problem: The list is empty, or it only shows “English”.
Don’t panic. Your software is not broken. This is a common hurdle faced by almost every new user of gImageReader. Because gImageReader is a front-end for the Tesseract OCR engine, it requires specific data files to “teach” the AI how to recognize different alphabets and vocabularies. These files can be large, so they are not bundled with the initial installer.
In this comprehensive guide, we will walk you through exactly how to find, download, and install these language packs. We will cover methods for Windows (Installer & Portable) and Linux, and explain which data models give you the best accuracy.
Understanding the “Brain” of OCR: What are .traineddata Files?
Before we dive into the folders, it is important to understand what we are actually installing. Tesseract, the engine powering gImageReader, uses neural networks (specifically LSTM – Long Short-Term Memory) to recognize text.
A neural network needs to be “trained”. It needs to see millions of examples of the letter “A” in different fonts, sizes, and noise levels to learn what an “A” looks like. The result of this massive training process is saved into a single file with the extension .traineddata.
For example:
eng.traineddatacontains the brain for reading English.deu.traineddatacontains the brain for reading German (Deutsch).fra.traineddatacontains the brain for reading French.spa.traineddatacontains the brain for reading Spanish.
The Critical Choice: “Fast” vs. “Best” Models
Here is a detail that most tutorials miss, but it is crucial for your OCR accuracy. Tesseract offers three different types of language data. Choosing the right one depends on your hardware and your needs.
1. Tessdata_Best (Recommended for gImageReader)
These models use floating-point arithmetic. They are slow but offer the highest possible accuracy. Since gImageReader is a desktop application where you usually process documents one by one (not millions per second), speed is rarely an issue. We strongly recommend using the “Best” models to ensure complex layouts and old fonts are recognized correctly.
2. Tessdata_Fast
These models use “integer” arithmetic. They are significantly smaller in file size and faster, but they sacrifice some accuracy. If you are running gImageReader on a very old laptop or a Raspberry Pi, you might choose this.
3. Tessdata (Standard/Legacy)
This is the standard version. It supports both the modern neural network engine and the older legacy engine. Unless you have a specific compatibility need, you generally don’t need this.
Our Verdict: For 99% of gImageReader users on Windows 10 or 11, you should download the Tessdata_Best files. The wait time of an extra 2 seconds per page is worth the significant reduction in spelling errors.
Method A: Installing Languages on Windows
The process on Windows involves manually downloading the file and moving it into a specific system folder. gImageReader does not currently have a “one-click download” button inside the app settings, so we must do this manually.
Step 1: Download the Language Data (.traineddata)
First, we need to acquire the official data files. As discussed in the previous section, we will use the “Best” models for optimal results.
Navigate to the official tessdata_best repository. For safety and security, always download these files from the official source.
🔗 Direct Link: github.com/tesseract-ocr/tessdata_best
The files are named using ISO 639-2 three-letter codes. You need to scroll down (or use Ctrl+F) to find the language you need. Here are some common examples:
- English:
eng.traineddata - French:
fra.traineddata - Spanish:
spa.traineddata - German:
deu.traineddata - Italian:
ita.traineddata - Russian:
rus.traineddata - Japanese:
jpn.traineddata(andjpn_vert.traineddatafor vertical text)
This is where 50% of users fail. Do not right-click the link in the file list and choose “Save link as”. This will save a GitHub HTML webpage, not the data file.
The Correct Way:
- Click on the filename (e.g.,
fra.traineddata). - You will see a “Download” button on the right side.
- Click Download (or right-click the “Raw” button and select “Save link as”).
- The file size should be between 10MB to 25MB. If it is only a few kilobytes, you downloaded the webpage by mistake.
Step 2: Place the File in the Correct Directory
Now that you have the file, you need to move it to where gImageReader looks for it. This location depends on whether you installed the software or are using the portable version.
Scenario 1: You used the Installer (.exe)
If you installed gImageReader normally to your Program Files, the directory is usually protected. You will need Administrator permission to copy files here.
Instructions:
- Open File Explorer and navigate to the path above.
- Drag and drop your downloaded
.traineddatafile into this folder. - Windows will ask for permission: “You’ll need to provide administrator permission to copy to this folder”.
- Click Continue.
Scenario 2: You are using the Portable Version (.zip)
If you are running gImageReader from a USB drive or a folder on your Desktop, the logic is simpler. You need to find the folder relative to where you extracted the ZIP file.
For example, if you extracted gImageReader to your Desktop, go to Desktop > gImageReader > share > tessdata. Drag and drop your file there.
If you prefer to keep your language files on a separate drive (e.g., to save space on C: drive), you can create a custom folder anywhere. Then, open gImageReader, go to Settings, and change the “Tessdata Prefix” path to point to your new custom folder.
Method B: Installing Languages on Linux
Linux users have it much easier. You typically do not need to manually download files from GitHub. Instead, you can use your distribution’s package manager to install the official Tesseract language packages, and gImageReader will automatically detect them.
Option 1: Using the Package Manager (Recommended)
Open your terminal and run the command corresponding to your distribution. Replace [lang] with the 3-letter language code (e.g., fra for French, deu for German).
For Ubuntu / Debian / Linux Mint:
user@linux:~$ sudo apt install tesseract-ocr-[lang]
# Example for German:
user@linux:~$ sudo apt install tesseract-ocr-deu
For Fedora:
For Arch Linux:
Option 2: Manual Location (If needed)
If you prefer to download the .traineddata files manually (for example, to use a custom model that isn’t in the repositories), you should place them in the system-wide tessdata directory.
Or, if you installed Tesseract locally:
Final Step: Verifying the Installation
Whether you are on Windows or Linux, moving the file is only half the battle. You must tell gImageReader to refresh its database.
✅ The Verification Checklist
- Restart the Application: If gImageReader was open while you copied the files, close it completely and open it again. It only scans the directory on startup.
- Check the Dropdown: Look at the “Source Language” toolbar located at the top (Qt version) or side (Gtk version).
- Click the Arrow: Click the dropdown arrow. You should now see your new language (e.g., “German” or “deu”) listed alongside English.
- Test Recognition: Import a document in that language and try to recognize a small area. If the output contains correct accented characters (like ü, é, ñ), congratulations! You have successfully installed the language pack.
Double-check the file size. If your
.traineddata file is less than 1MB, you likely downloaded the GitHub HTML page instead of the raw binary file. Go back to Step 1 and re-download.
Frequently Asked Questions about Language Packs
osd.traineddata. OSD stands for “Orientation and Script Detection”. Tesseract uses this file to automatically rotate pages that are scanned upside down or sideways. You generally do not need to select it manually; the engine uses it in the background.
.txt extension because it thinks the file is text. You must rename the file. Right-click it, select “Rename”, and remove the .txt part so it ends strictly in .traineddata.
fra is for modern French. frk (Frankish) is for very old, medieval-style German text (Fraktur script). Make sure you download the correct code for your document’s era.
Conclusion
Installing language packs for gImageReader might seem like a hassle initially, but it is a one-time process. Once you have copied the correct .traineddata files into your tessdata folder, you unlock the full potential of the Tesseract 5 engine.
Remember, the quality of your OCR output depends heavily on two things: the quality of your source image (always aim for 300 DPI) and using the correct language model. By using the tessdata_best models we linked above, you are ensuring the highest possible accuracy for your projects.
Haven’t installed gImageReader yet?
Get the latest version for Windows 10/11 and Linux now. Free, Open Source, and Secure.
Download gImageReader Now