You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
^5.0.4
Describe the bug
I try to recognize a text with a lot of dots, and some times it adds instead (and sometimes after those dots some garbage of random letters.
I believe what you are describing is a limitation of the Tesseract recognition model(s) rather than something specific to Tesseract.js. Tesseract.js is the Javascript/Webassembly port of Tesseract, so making changes to the model is outside of the scope of this repo.
From personal experience, I can confirm that the LSTM model (oem value 1) is prone to hallucinating text given dots or squiggles. The Legacy model (oem value 0) will be less prone to hallucinating words that are completely at odds with what you see on the page, as it relies more on the shape of the individual letters, so you could try using that. However, in addition to being less accurate in general, the Legacy model is known to struggle with italics, which your image contains. Therefore, I would not expect either model to give excellent results for your image without pre or post processing to filter off the junk.
Tesseract offers many different configuration settings that you can experiment with, so it is possible that there is some setting that would help in this case. These options would be documented in the main Tesseract documentation or repo, rather than here in the Tesseract.js repo. Every configuration setting for Tesseract can also be used in Tesseract.js.
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
^5.0.4
Describe the bug
I try to recognize a text with a lot of dots, and some times it adds instead (and sometimes after those dots some garbage of random letters.
To Reproduce
I use this code:
Expected behavior
A clear and concise description of what you expected to happen.
Device Version:
source image:
The resylt looks like:
The text was updated successfully, but these errors were encountered: