Skip to content

16.0.1

📅 2021-02-26

This 16.0.1 version introduces a new export format: ALTO XML, standing for Analyzed Layout and Text Object.

This open XML schema aims at providing a standardized way of describing OCR and layout information for digitized material. Go to https://www.loc.gov/standards/alto for more details.

This new format is added as an extra export type: IDRS_EXPORT_TYPE::IDRS_EXPORT_FORMAT_XML_ALTO (C++) or ExportType.XmlAlto (.NET). It can therefore be provided as an argument of CExport class’ constructor.

Also, a new member is added to class CExport in order to allow appending the export of a new page to an existing ALTO XML. You can set it via method CExport::SetAppendMode() (C++) or property CExport.AppendMode (.NET).

PDF loading with graphical zones coordinates

Section titled “PDF loading with graphical zones coordinates”

The SDK is now able to retrieve the location of graphical zones and segments when loading a PDF’s content.

PDF graphical zones loading is performed whenever text loading is also requested; to reflect this, the methods CImageLoadOptionsPdf::Get/SetLoadTextContent (C++) and property CImageLoadOptionsPdf.LoadTextContent (.NET) are renamed to CImageLoadOptionsPdf::Get/SetLoadPageContent / CImageLoadOptionsPdf.LoadPageContent respectively.

Now you can select at which resolution a PDF input page should be rasterized. This can be useful to fine-tune output size (smaller resolution) or maximize quality (higher resolution).

To do so, use method CImageLoadOptionsPdf::SetLoadingResolution() (C++) or property CImageLoadOptionsPdf.LoadingResolution (.NET).

The default value is 300 dpi, as used in the previous version of the SDK; it ensures the best compromise between size and quality.

N/A

N/A

IDDescription
IDRSRD-5921The iDRS should retrieve graphical zones coordinates when loading PDF’s content
IDRSRD-5911The iDRS fails to export OCR results to XML FMT for specific documents
IDRSRD-5902The iDRS should allow an integrator to choose PDF loading resolution
IDRSRD-5899The iDRS does not properly detect font size for Korean language
IDRSRD-5895The iDRS does not serialize CPageParagraphStyle.FontStyle member properly
IDRSRD-5888The iDRS loads Pdf text layer with incorrect font sizes
IDRSRD-5887The iDRS is not rasterizing Pdfs having forms with fillable fields
IDRSRD-5874The iDRS should propose exporting OCR results to ALTO standard XML format
IDRSRD-5868The iDRS charset limitation feature is broken for Korean language
IDRSRD-5861The iDRS fails to create PDF document when recognizing a specific Korean image
IDRSRD-5749The iDRS finds an incorrect orientation for specific Greek and Hebrew images

N/A