You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The words and symbols that are seen as valid are determined by the language data loaded (spa.traineddata in this instance). By default, Tesseract.js loads .traineddata files provided by the main Tesseract project--we use integerized versions of the tessdata_best data. If you were to find or create a .traineddata file that does not have this issue, you can use it by setting the langPath argument.
Issues related to language data are outside of the scope of this repo. The goal of Tesseract.js is to bring the Tesseract OCR engine to the browser--we do not make any edits to the recognition engine or .traineddata provided by Tesseract. If you are interested in learning more about how language data works, and what tools exist to modify it, you should look at the documentation provided by the main Tesseract project. Their website is here and repo is here.
In the Spanish ('spa') tereract interpretation, the currency symbol ₡ is interpreted as a number 2.
Tesseract interprets a character as a number.
English language, Tesseract interprets well as a character, but it is in 'spa' the need.
Any way to improve or correct this "error"?
Additional context
Number with a currency simbol
thanks
The text was updated successfully, but these errors were encountered: