Loading a PDF with its content

The IRISOCR™ SDK allows you to load the page content (i.e. the text layer and the graphical zones) from PDFs without the need of running OCR.

For this, two parameters from the CImageLoadOptionsPdf class are available:

Parameter	Default value	Explanation
`LoadPageContent`	IDRS_FALSE	This parameter enables/disables the loading of the PDF page content.
`AllowIncompleteTextLoading`	IDRS_TRUE	The iDRS does not support unicode characters with value higher than U+FFFF. If set to IDRS_TRUE and such characters are encountered, then they are replaced by U+FFFD (replacement character). If set to IDRS_FALSE, then an exception with error code IDRS_ERROR_IMAGE_FILE_PDF_UNSUPPORTED_CHARACTER is thrown.

Code Snippet(s)

CIDRS objIdrs = CIDRS::Create();

// Set PDF load options
CImageLoadOptionsPdf objImageLoadOptionsPdf = CImageLoadOptionsPdf::Create();
objImageLoadOptionsPdf.SetLoadPageContent(IDRS_TRUE);

// Do load operation
CImageIO objImageIO = CImageIO::Create(objIdrs);
objImageIO.SetPdfLoadOptions(objImageLoadOptionsPdf);
CPage objPage = objImageIO.LoadPage("myfile.pdf");
CPageContent objPageContent = objPage.GetPageContent();
//Use objPageContent to get the text layer and the graphical zones