New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading from v2 to v5 Guide #771
Comments
[Archive] v2 to v4 GuideThe following comment contains the old guide for upgrading from v2 to v4. Users are encouraged to update to the latest version (v5), but this is still provided for informational purposes. Changes Impacting Most Users
Changes Impacting Fewer Users
|
Balearica
changed the title
Upgrading from v2 to v4 Guide
Upgrading from v2 to v5 Guide
Sep 29, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Overview
According to npm statistics (and Git Issues), many users are still using Tesseract.js v2. Version 2 was released in 2019 and includes many bugs, memory leaks, and performance issues that have been fixed in subsequent versions (in some cases v2 is 20x slower than the current version), so updating is strongly recommended. Additionally, v2 is no longer supported, so updating is a requirement to receive support in Git Issues.
While the changes made in each release are fully documented, to make upgrading as easy as possible, below is a guide describing all changes that v2 users may need to make to use the latest version. This guide describes the process of upgrading from v2 to v5. If (for whatever reason) you wish to update from v2 to v4, see the comment below.
Changes Impacting Most Users
createWorker
is now asyncworker = Tesseract.createWorker()
should be replaced withworker = await Tesseract.createWorker()
createWorker
have changed--the first two arguments are now language andoem
createWorker('eng', 1, { logger: m => console.log(m) })
worker.load
,worker.loadLanguage
, andworker.initialize
are no longer neededChanges Impacting Fewer Users
getPDF
functionpdf
recognize option (GetPDF() with Scheduler returns the same PDF file #488)cacheMethod: 'none'
orcacheMethod: 'refresh'
as workaround for caching bugcorePath
argumentcorePath
must be pointed to a directory containing all 4 of the following files from Tesseract.js-core v5:tesseract-core.wasm.js
tesseract-core-simd.wasm.js
tesseract-core-lstm.wasm.js
tesseract-core-simd-lstm.wasm.js
worker.detect
functionlegacyCore: true
andlegacyLang: true
increateWorker
optionsTesseract.createWorker("eng", 1, {legacyCore: true, legacyLang: true})
The text was updated successfully, but these errors were encountered: