Loading a PDF with its content
The IRISOCR™ SDK allows you to load the page content (i.e. the text layer and the graphical zones) from PDFs without the need of running OCR.
For this, two parameters from the CImageLoadOptionsPdf class are available:
| Parameter | Default value | Explanation |
|---|---|---|
LoadPageContent | IDRS_FALSE | This parameter enables/disables the loading of the PDF page content. |
AllowIncompleteTextLoading | IDRS_TRUE | The iDRS does not support unicode characters with value higher than U+FFFF. If set to IDRS_TRUE and such characters are encountered, then they are replaced by U+FFFD (replacement character). If set to IDRS_FALSE, then an exception with error code IDRS_ERROR_IMAGE_FILE_PDF_UNSUPPORTED_CHARACTER is thrown. |
Code Snippet(s)
CIDRS objIdrs = CIDRS::Create();
// Set PDF load optionsCImageLoadOptionsPdf objImageLoadOptionsPdf = CImageLoadOptionsPdf::Create();objImageLoadOptionsPdf.SetLoadPageContent(IDRS_TRUE);
// Do load operationCImageIO objImageIO = CImageIO::Create(objIdrs);objImageIO.SetPdfLoadOptions(objImageLoadOptionsPdf);CPage objPage = objImageIO.LoadPage("myfile.pdf");CPageContent objPageContent = objPage.GetPageContent();//Use objPageContent to get the text layer and the graphical zones