Releases: naptha/tesseract.js
Releases 路 naptha/tesseract.js
v5.0.4
What's Changed
- Fixed support for setting "init only" parameters using
config
option ofcreateWorker
(#862)- For example,
load_number_dawg
is an "init only" parameter that cannot be set using eitherworker.setParameters
or theoptions
argument ofworker.recognize
. - However,
load_number_dawg
can be set by the followingcreateWorker
statement.worker.initialize('eng', "0", {load_number_dawg: "0"});
- For example,
- Improvements to documentation
New Contributors
Full Changelog: v5.0.3...v5.0.4
v5.0.3
What's Changed
- Minor changes to types, documentation, and build
New Contributors
- @dora-micha made their first contribution in #843
Full Changelog: v5.0.2...v5.0.3
v5.0.2
What's Changed
- Fixed bugs with wrong lang data being loaded per #834 and #835 by @Balearica in #836
Version 5.0.1
is nearly identical to 5.0.2
and was the latest version for under a day, so does not have its own release notes.
Full Changelog: v5.0.0...v5.0.2
v5.0.0
What's Changed
Major New Features
- Significantly smaller file sizes
- 54% smaller file sizes for English, 73% smaller for Chinese (see #806 for details)
- This results in a ~50% decrease in runtime for first-time users (who do not yet have the data downloaded/cached)
- Significantly lower memory usage
- Worker memory utilization in the web benchmark is reduced from 311 MB to 164 MB (47% reduction)
- The lower memory footprint makes it feasible to use more workers, significantly improving performance for projects that utilize schedulers for parallel processing
- Compatible with iOS 17 (using default settings)
- iOS 17 broke compatibility with Tesseract.js v4--upgrading to v5 should resolve
- See discussion section below for details
- iOS 17 broke compatibility with Tesseract.js v4--upgrading to v5 should resolve
Breaking Changes Impacting Many Users
createWorker
arguments changed- Setting non-default language and OEM now happens in
createWorker
- E.g.
createWorker("chi_sim", 1)
- E.g.
- Setting non-default language and OEM now happens in
worker.initialize
andworker.loadLanguage
functions now do nothing and can be deleted from code- Loading the language and initialization now occurs in
createWorker
- Workers can be re-initialized with different settings using
worker.reinitialize
- Loading the language and initialization now occurs in
In other words, code should be modified from this:
const worker = await Tesseract.createWorker();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const ret = await worker.recognize(file);
To this:
const worker = await Tesseract.createWorker("eng");
const ret = await worker.recognize(file);
Breaking Changes Impacting Fewer Users
- Users who manually set
corePath
will need to update the contents of theircorePath
directorycorePath
should point to a directory that contains all 4 of the files below from Tesseract.js-core v5:tesseract-core.wasm.js
tesseract-core-simd.wasm.js
tesseract-core-lstm.wasm.js
tesseract-core-simd-lstm.wasm.js
- Tesseract.js will automatically select the correct version to use
worker.detect
function disabled by default- Orientation + script detection is a function of the Legacy model only, which is no longer included by default
- To enable, set arguments
legacyCore: true
andlegacyLang: true
increateWorker
options- E.g.
Tesseract.createWorker("eng", 1, {legacyCore: true, legacyLang: true});
- E.g.
- Language of progress logs standardized
- This should only impact users who parse status logs (e.g. to update a loading bar)
Non-Breaking Changes
- Language data loaded from
jsdelivr
by default (rather than GitHub pages)- This should result in improved performance and uptime
- Separate "development" build (that produced
tesseract.dev.js
andworker.dev.js
removed - Documentation and examples were modified to prevent new users from using
Tesseract.recognize
andTesseract.detect
- Users who already use these functions are encouraged to modify their code to use
worker.recognize
andworker.detect
instead
- Users who already use these functions are encouraged to modify their code to use
Considering upgrading from v2 to v5? See #771 for a full guide for updating.
Full Changelog: v4.1.3...v5.0.0
v4.1.4
What's Changed
- Restored compatibility with certain versions of Node.js v14
Full Changelog: v4.1.3...v4.1.4
v4.1.3
What's Changed
- Detect browsers in a Deno-compatible way by @yudai-nkt in #821
- Minor changes (#821, ff173ce)
New Contributors
- @yudai-nkt made their first contribution in #821
Full Changelog: v4.1.2...v4.1.3
v4.1.2
What's Changed
- Fixed bug causing excessive memory use when using
FS
+writeFile
function (#812) - Fixed bug where setting output option
debug: true
was forcing recognition to be run (#788) - Added warning message when
setParameters
is used to set options that can only be set duringinitialize
(#816) - Minor edits to reduce memory use (#815)
- Minor changes to documentation, types, and example code (#575, #791, #803, #805, #810, #817)
New Contributors
- @Simple7575 made their first contribution in #791
- @jiakuan made their first contribution in #805
- @omarboulbaze made their first contribution in #575
Full Changelog: v4.1.1...v4.1.2
v4.1.1
What's Changed
- Fixed detection of image orientation metadata (#783)
- Allows Tesseract.js to work with images taken on iOS devices
- See this comment for explanation
- Allows Tesseract.js to work with images taken on iOS devices
- Minor changes to documentation and types (#781, #782, #778)
New Contributors
- @racosa made their first contribution in #781
- @Tshetrim made their first contribution in #782
- @l2ysho made their first contribution in #778
Full Changelog: v4.1.0...v4.1.1
v4.1.0
What's Changed
- Added ability to run layout analysis without recognition (#656)
- See this comment for instructions
- Added support for
OffscreenCanvas
in browser version by @nathanbabcock (#766) - Fixed bug where
recognize
was running OCR even when not necessary (#769) - Fixed bug where certain valid
langPath
URLs caused errors in browser version (#558) - Removed problematic
file-type
andresolve-url
dependencies (#773, #711)
Full Changelog: v4.0.6...v4.1.0
v4.0.6
What's Changed
- Invalid langData (
.traineddata
files) are now cleared from cache (#753)- Note: setting
cacheMethod: 'none'
orcacheMethod: 'refresh'
to prevent invalid files from being cached should no longer be necessary- See this comment for an explanation
- Note: setting
- Added source maps to esm build (#761)
- Various updates to documentation
Full Changelog: v4.0.5...v4.0.6