Skip to content

Loading a PDF with its content

The IRISOCR™ SDK allows you to load the page content (i.e. the text layer and the graphical zones) from PDFs without the need of running OCR.

For this, two parameters from the CImageLoadOptionsPdf class are available:

ParameterDefault valueExplanation
LoadPageContentIDRS_FALSEThis parameter enables/disables the loading of the PDF page content.
AllowIncompleteTextLoadingIDRS_TRUEThe iDRS does not support unicode characters with value higher than U+FFFF. If set to IDRS_TRUE and such characters are encountered, then they are replaced by U+FFFD (replacement character). If set to IDRS_FALSE, then an exception with error code IDRS_ERROR_IMAGE_FILE_PDF_UNSUPPORTED_CHARACTER is thrown.

Code Snippet(s)

CIDRS objIdrs = CIDRS::Create();
// Set PDF load options
CImageLoadOptionsPdf objImageLoadOptionsPdf = CImageLoadOptionsPdf::Create();
objImageLoadOptionsPdf.SetLoadPageContent(IDRS_TRUE);
// Do load operation
CImageIO objImageIO = CImageIO::Create(objIdrs);
objImageIO.SetPdfLoadOptions(objImageLoadOptionsPdf);
CPage objPage = objImageIO.LoadPage("myfile.pdf");
CPageContent objPageContent = objPage.GetPageContent();
//Use objPageContent to get the text layer and the graphical zones