Analyzed Layout and Text Object
This article needs additional citations for verification. (July 2023) |
Analyzed Layout and Text Object (ALTO) is an open XML schema originally developed by the EU-funded METAe project.[1] ALTO files describe the placement, size, and style of text in an image of a digitized document, as well as other elements of the document's layout, such as margins, headings, columns, and illustrations.
The text and placement information in ALTO files is usually generated by specialized optical character recognition (OCR) software, and is often used in combination with the Metadata Encoding and Transmission Standard (METS) to describe a larger digitized object (such as a book) and create references across ALTO files (such as pages), as might be necessary to describe a reading sequence.
From version 1.0 in June 2004 to 1.4 in 2007, ALTO was developed and maintained by Content Conversion Specialists (CCS) GmbH, Hamburg. In August 2009, maintenance for the schema was transferred to the Library of Congress, and from then overseen by a separate editorial board created for that purpose.[2]
Structure
An ALTO file consists of three major sections as children of the root <alto> element:[3]
<Description>section contains metadata about the ALTO file itself and processing information on how the file was created.<Styles>section contains the text and paragraph styles with their individual descriptions:<TextStyle>has font descriptions<ParagraphStyle>has paragraph descriptions, e.g. alignment information
<Layout>section contains the content information. It is subdivided into<Page>elements.
<?xml version="1.0"?>
<alto>
<Description>
<MeasurementUnit/>
<sourceImageInformation/>
<Processing/>
</Description>
<Styles>
<TextStyle/>
<ParagraphStyle/>
</Styles>
<Layout>
<Page>
<TopMargin/>
<LeftMargin/>
<RightMargin/>
<BottomMargin/>
<PrintSpace/>
</Page>
</Layout>
</alto>
Software support
See also
- Metadata Encoding and Transmission Standard (METS)
- Dublin Core, an ISO metadata standard
- Preservation Metadata: Implementation Strategies (PREMIS)
- Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)
- hOCR
- PAGE (XML)
References
- ^ Stehno, Birgit; Egger, Alexander; Retti, Gregor (April 2003). "METAe—Automated Encoding of Digitized Texts". Literary and Linguistic Computing. 18 (1): 77–88. doi:10.1093/llc/18.1.77.
- ^ "ALTO News". Library of Congress. Retrieved 8 October 2025.
- ^ Structure of ALTO Files
External links
- ALTO: Technical Metadata for Layout and Text Objects at the Library of Congress
- ALTO XML GitHub website
- The METAe project website at the Wayback Machine (archived 2016-03-18)
- METS / ALTO Introduction by CCS GmbH at the Wayback Machine (archived 2014-09-04)
- XSLT-Transformations from and to ALTO at GitHub
Content Disclaimer
Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.
- The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
- There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
- It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
- Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
- Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.