16.0.5
📅 2021-07-22
New features
Section titled “New features”N/A
Improvements
Section titled “Improvements”High Quality OCR for Latin alphabet
Section titled “High Quality OCR for Latin alphabet”The support of High Quality OCR is extended to most languages using the Latin alphabet.
This change is transparent to you and improves OCR accuracy, especially on challenging documents such as newspapers, magazines, poor quality scans or photos.
That concerns the following languages:
Albanian, Azeri Latin, Basque, Breton, Bosnian Latin, Catalan, Cebuano, Corse, Croatian, Czech, Danish, Esperanto, Estonian, Faroese, Finnish, Frisian, Galician, Greenlandic, Haitian Creole, Hungarian, Icelandic, Irish Gaelic, Kurdish, Latvian, Lithuanian, Maltese, Norwegian, Polish, Rhaeto_roman, Romanian, Sardinian, Slovak, Scottish Gaelic, Slovenian, Swedish, Turkish, and Welsh.
Improved PDF image loading
Section titled “Improved PDF image loading”The quality of the image generated by the iDRS when rasterizing a PDF page is improved, resulting in a better visual appearance and improved OCR.
However, this improvement requires an extra processing when detecting black and white original content.
That is why a new enum CImageLoadOptionsPdf.eBlackAndWhiteDetectionMode is introduced.
Possible values are:
BLACK_AND_WHITE_DETECTION_DISABLED(C++) oreBlackAndWhiteDetectionMode.Disabled(.NET)
Black and white content is not detected, but loaded as greyscale only. This is the fastest mode.BLACK_AND_WHITE_DETECTION_FAST(C++) oreBlackAndWhiteDetectionMode.Fast(.NET)
The PDF page is inspected and loaded as black and white only if containing such images. In any other cases, the page is loaded as greyscale.BLACK_AND_WHITE_DETECTION_ACCURATE(C++) oreBlackAndWhiteDetectionMode.Accurate(.NET)
The PDF page raster is analyzed in detail to detect black and white content. As this may involve two rasterizations (with and without smoothing), this mode is the slowest but the most accurate to detect black and white pages. This is the default mode.
The black and white detection mode can be accessed or modified via the method CImageLoadOptionsPdf::Get/SetBlackAndWhiteDetectionMode() (C++) or the property CImageLoadOptionsPdf.BlackAndWhiteDetectionMode (.NET).
Deprecated/removed features
Section titled “Deprecated/removed features”Output formats WordML and XPS
Section titled “Output formats WordML and XPS”The output formats WordML and XPS are removed from the iDRS API, preventing you from creating such documents.
The XML Format of Microsoft Office Word 2003, or WordML, has been replaced since 2007 by the new Office Open XML formats (DOCX, XLSX, PPTX).
Microsoft XML Paper Specification, or XPS, is also deprecated because it has low business value compared to its immediate competitor, PDF.
Added/removed resources
Section titled “Added/removed resources”N/A
Fixed bugs
Section titled “Fixed bugs”| ID | Description |
|---|---|
| IDRSRD-5666 | The iDRS PDF loading may erase some parts of the text on the rasterized image |
| IDRSRD-5747 | The iDRS takes a long time to analyze a specific image |
| IDRSRD-5925 | The iDRS can create invalid pdf files when integrators specify custom fonts with postscript names containing spaces |
| IDRSRD-5927 | The iDRS can recognize diacritics without base characters, leading to PDF creation failure |
| IDRSRD-5955 | The High Quality OCR engine does not find all characters on a specific image |
| IDRSRD-5958 | The iDRS fails to create output PDF when OCR engine recognizes Arial Unicode symbols |
| IDRSRD-5970 | The iDRS should allow creating an image with dimensions larger than OCR limitations |
| IDRSRD-5971 | The page analysis is taking too much time processing this specific image |
| IDRSRD-5977 | The iDRS is not able to load a specific PDF |
| IDRSRD-5980 | The iDRS license installer does not check for the correct Visual Studio redistributable |
| IDRSRD-5981 | The docx created with Editable display do not indicate the expected document language when no text is selected |
| IDRSRD-5983 | Implementations of IFontProviderCallback provided by integrators via the .NET API are not called by the iDRS |
| IDRSRD-5984 | The iDRS does not set BaseLine property in CPageTextLine when loading content from a pdf file |
| IDRSRD-5985 | The iDRS may leak memory when the idrsbarcodeext engine encounters a timeout |
| IDRSRD-5986 | The iDRS cannot load a specific png image |
| IDRSRD-5987 | The iDRS does not include information about the pdf extension in the output pdf files |
| IDRSRD-5989 | The iDRS is generating a non compliant PDF/A-1b document |
| IDRSRD-5991 | When the iDRS updates an existing PDF with several signatures, all signatures have the same title, which is incorrect |
| IDRSRD-5992 | The iDRS does not properly load the text layer of a specific PDF document |
| IDRSRD-5993 | The iDRS can request font data with incorrect bold and italic properties when generating a PDF document |
| IDRSRD-6004 | The PDF loading with page content throws an exception when a PDF object has the coordinates out of the bound of the page |
| IDRSRD-6007 | The PDF loading with page content throws an exception when a text element is out of the bound of the page |
| IDRSRD-6009 | The iDRS is setting DropCapFont property for a paragraph when loading page content |
| IDRSRD-6017 | The iDRS cannot use the CPageResultsParser on a CPage without source image |
Known issues
Section titled “Known issues”| ID | Description |
|---|---|
| IDRSRD-6019 | When the iDRS applies several signatures to a PDF, in some cases only the last one is valid |
| IDRSRD-5968 | The iDRS should apply all supported features when creating PDF output with IStreamFactory interface |