16.0.5

📅 2021-07-22

`New features`

N/A

`Improvements`

High Quality OCR for Latin alphabet

The support of High Quality OCR is extended to most languages using the Latin alphabet.
This change is transparent to you and improves OCR accuracy, especially on challenging documents such as newspapers, magazines, poor quality scans or photos.

That concerns the following languages:

Albanian, Azeri Latin, Basque, Breton, Bosnian Latin, Catalan, Cebuano, Corse, Croatian, Czech, Danish, Esperanto, Estonian, Faroese, Finnish, Frisian, Galician, Greenlandic, Haitian Creole, Hungarian, Icelandic, Irish Gaelic, Kurdish, Latvian, Lithuanian, Maltese, Norwegian, Polish, Rhaeto_roman, Romanian, Sardinian, Slovak, Scottish Gaelic, Slovenian, Swedish, Turkish, and Welsh.

Improved PDF image loading

The quality of the image generated by the iDRS when rasterizing a PDF page is improved, resulting in a better visual appearance and improved OCR.

However, this improvement requires an extra processing when detecting black and white original content.
That is why a new enum CImageLoadOptionsPdf.eBlackAndWhiteDetectionMode is introduced.

Possible values are:

BLACK_AND_WHITE_DETECTION_DISABLED (C++) or eBlackAndWhiteDetectionMode.Disabled (.NET)
Black and white content is not detected, but loaded as greyscale only. This is the fastest mode.
BLACK_AND_WHITE_DETECTION_FAST (C++) or eBlackAndWhiteDetectionMode.Fast (.NET)
The PDF page is inspected and loaded as black and white only if containing such images. In any other cases, the page is loaded as greyscale.
BLACK_AND_WHITE_DETECTION_ACCURATE (C++) or eBlackAndWhiteDetectionMode.Accurate (.NET)
The PDF page raster is analyzed in detail to detect black and white content. As this may involve two rasterizations (with and without smoothing), this mode is the slowest but the most accurate to detect black and white pages. This is the default mode.

The black and white detection mode can be accessed or modified via the method CImageLoadOptionsPdf::Get/SetBlackAndWhiteDetectionMode() (C++) or the property CImageLoadOptionsPdf.BlackAndWhiteDetectionMode (.NET).

`Deprecated/removed features`

Output formats WordML and XPS

The output formats WordML and XPS are removed from the iDRS API, preventing you from creating such documents.

The XML Format of Microsoft Office Word 2003, or WordML, has been replaced since 2007 by the new Office Open XML formats (DOCX, XLSX, PPTX).

Microsoft XML Paper Specification, or XPS, is also deprecated because it has low business value compared to its immediate competitor, PDF.

`Added/removed resources`

N/A

`Fixed bugs`

ID	Description
IDRSRD-5666	The iDRS PDF loading may erase some parts of the text on the rasterized image
IDRSRD-5747	The iDRS takes a long time to analyze a specific image
IDRSRD-5925	The iDRS can create invalid pdf files when integrators specify custom fonts with postscript names containing spaces
IDRSRD-5927	The iDRS can recognize diacritics without base characters, leading to PDF creation failure
IDRSRD-5955	The High Quality OCR engine does not find all characters on a specific image
IDRSRD-5958	The iDRS fails to create output PDF when OCR engine recognizes Arial Unicode symbols
IDRSRD-5970	The iDRS should allow creating an image with dimensions larger than OCR limitations
IDRSRD-5971	The page analysis is taking too much time processing this specific image
IDRSRD-5977	The iDRS is not able to load a specific PDF
IDRSRD-5980	The iDRS license installer does not check for the correct Visual Studio redistributable
IDRSRD-5981	The docx created with Editable display do not indicate the expected document language when no text is selected
IDRSRD-5983	Implementations of IFontProviderCallback provided by integrators via the .NET API are not called by the iDRS
IDRSRD-5984	The iDRS does not set BaseLine property in CPageTextLine when loading content from a pdf file
IDRSRD-5985	The iDRS may leak memory when the idrsbarcodeext engine encounters a timeout
IDRSRD-5986	The iDRS cannot load a specific png image
IDRSRD-5987	The iDRS does not include information about the pdf extension in the output pdf files
IDRSRD-5989	The iDRS is generating a non compliant PDF/A-1b document
IDRSRD-5991	When the iDRS updates an existing PDF with several signatures, all signatures have the same title, which is incorrect
IDRSRD-5992	The iDRS does not properly load the text layer of a specific PDF document
IDRSRD-5993	The iDRS can request font data with incorrect bold and italic properties when generating a PDF document
IDRSRD-6004	The PDF loading with page content throws an exception when a PDF object has the coordinates out of the bound of the page
IDRSRD-6007	The PDF loading with page content throws an exception when a text element is out of the bound of the page
IDRSRD-6009	The iDRS is setting DropCapFont property for a paragraph when loading page content
IDRSRD-6017	The iDRS cannot use the CPageResultsParser on a CPage without source image

`Known issues`

ID	Description
IDRSRD-6019	When the iDRS applies several signatures to a PDF, in some cases only the last one is valid
IDRSRD-5968	The iDRS should apply all supported features when creating PDF output with IStreamFactory interface