16.0.1
📅 2021-02-26
New features
Section titled “New features”Export OCR results to ALTO XML format
Section titled “Export OCR results to ALTO XML format”This 16.0.1 version introduces a new export format: ALTO XML, standing for Analyzed Layout and Text Object.
This open XML schema aims at providing a standardized way of describing OCR and layout information for digitized material. Go to https://www.loc.gov/standards/alto for more details.
This new format is added as an extra export type: IDRS_EXPORT_TYPE::IDRS_EXPORT_FORMAT_XML_ALTO (C++) or ExportType.XmlAlto (.NET). It can therefore be provided as an argument of CExport class’ constructor.
Also, a new member is added to class CExport in order to allow appending the export of a new page to an existing ALTO XML. You can set it via method CExport::SetAppendMode() (C++) or property CExport.AppendMode (.NET).
Improvements
Section titled “Improvements”PDF loading with graphical zones coordinates
Section titled “PDF loading with graphical zones coordinates”The SDK is now able to retrieve the location of graphical zones and segments when loading a PDF’s content.
PDF graphical zones loading is performed whenever text loading is also requested; to reflect this, the methods CImageLoadOptionsPdf::Get/SetLoadTextContent (C++) and property CImageLoadOptionsPdf.LoadTextContent (.NET) are renamed to CImageLoadOptionsPdf::Get/SetLoadPageContent / CImageLoadOptionsPdf.LoadPageContent respectively.
PDF loading resolution
Section titled “PDF loading resolution”Now you can select at which resolution a PDF input page should be rasterized. This can be useful to fine-tune output size (smaller resolution) or maximize quality (higher resolution).
To do so, use method CImageLoadOptionsPdf::SetLoadingResolution() (C++) or property CImageLoadOptionsPdf.LoadingResolution (.NET).
The default value is 300 dpi, as used in the previous version of the SDK; it ensures the best compromise between size and quality.
Deprecated/removed features
Section titled “Deprecated/removed features”N/A
Added/removed resources
Section titled “Added/removed resources”N/A
Fixed bugs
Section titled “Fixed bugs”| ID | Description |
|---|---|
| IDRSRD-5921 | The iDRS should retrieve graphical zones coordinates when loading PDF’s content |
| IDRSRD-5911 | The iDRS fails to export OCR results to XML FMT for specific documents |
| IDRSRD-5902 | The iDRS should allow an integrator to choose PDF loading resolution |
| IDRSRD-5899 | The iDRS does not properly detect font size for Korean language |
| IDRSRD-5895 | The iDRS does not serialize CPageParagraphStyle.FontStyle member properly |
| IDRSRD-5888 | The iDRS loads Pdf text layer with incorrect font sizes |
| IDRSRD-5887 | The iDRS is not rasterizing Pdfs having forms with fillable fields |
| IDRSRD-5874 | The iDRS should propose exporting OCR results to ALTO standard XML format |
| IDRSRD-5868 | The iDRS charset limitation feature is broken for Korean language |
| IDRSRD-5861 | The iDRS fails to create PDF document when recognizing a specific Korean image |
| IDRSRD-5749 | The iDRS finds an incorrect orientation for specific Greek and Hebrew images |
Known issues
Section titled “Known issues”N/A