EScriptorium

eScriptorium
Initial release2018; 8 years ago (2018)
Stable release
v1.0.0[1] / 30 January 2026
Operating systemplatform independent
Repository

eScriptorium is a platform for manual or automated segmentation and text recognition of historical manuscripts and prints.

Details

Screenshot with eScriptorium transcription of Johann Reinhold Forster's diary Journal of a Voyage on Board the Resolution 1772-1774 Vol. 1

The software is an open source software developed at the Paris Sciences et Lettres University as part of the projects Scripta[2] and RESILIENCE[3] with contributions from other institutions, partly funded by the EU's Horizon 2020 funding program and a grant from the Andrew W. Mellon Foundation.

Scanned pages from manuscripts and prints can be imported into eScriptorium and exported as text in various formats (text, ALTO or PAGE XML, TEI). The text areas with text lines in the images are first recognized manually or automatically (segmentation). The text lines are then transcribed manually or automatically.[4]

Both automatic segmentation and text recognition can be trained using manually created or corrected examples (ground truth). The new models created in this way can be shared with others and can therefore be easily reused.[5]

eScriptorium is built on top of the free OCR software Kraken by Benjamin Kiessling, a derivative of the OCR software OCRopus, which is suitable for handwritten and printed texts and also supports scripts such as Hebrew and Arabic, which are written from right to left.[6]

Comparable programs that offer similar functions to eScriptorium are OCR4All[7] and Transkribus.

Individual references

  1. ^ "Release eScriptorium v1.0.0 — first stable release featuring the new UI, Kraken 6 support and other features". Retrieved 7 February 2026.
  2. ^ "Scripta-PSL. History and practices of writing". Retrieved 2022-03-13.
  3. ^ "RESILIENCE - The Religious Studies Research Infrastructure". Retrieved 2022-03-13.
  4. ^ "eScriptorium Documentation". Retrieved 2024-01-21.
  5. ^ "Export data - eScriptorium Documentation". Retrieved 2024-01-21.
  6. ^ "lunch/kraken: OCR engine for all the languages". Retrieved 2022-03-13.
  7. ^ "OCR4all | forTEXT". Retrieved 2023-06-20.

Content Disclaimer

Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.

  1. The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
  2. There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
  3. It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
  4. Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
  5. Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.