Extract plain text from selected pages in a PDF document
With this building block you can extract the text of a PDF document.
You can select the pages from which to extract the text.
- Extract values from incoming invoices and automatically update your records
Clicking on given file in the title of the building block lets you pick the desired file from your Google Drive to extract the text from.
You can deselect the picked file by clicking on the
x button on the right of the selected filename.
If no file has been picked, Ultradox will load the given file that is stored in the input variable.
When loading files stored in a variable, make sure that the input prefix matches the output prefix of the building block that provides the document.
Click on the bold part of the title of the building block to open the configuration dialog to configure the pages where the content will be extracted.
Enter the page numbers to be extracted delimited by a comma.
Page numbers are starting with 1. If you enter
1,3,5 the resulting document will contain the text from the first, the third an the fifth page of the given PDF document.
You can also specify ranges of pages, e.g.
2,4-6 will extract the text from the second and pages 4,5 and 6 into the target document.
If the entered page numbers are greater than the number of pages, text extraction will end at the last page. If you for example enter
2-999 and our PDF document has only 5 pages, the text from all pages except the first page will be extracted.
If you enter negative values the pages are calculated from the end of the document. For example entering -3--1 will extract the text of the last two pages of the document.
Make sure not to include any spaces in the list of pages!
The extracted text contains return/newline characters “rn”. If you want to output the content to a HTML document, you’ll have to use “(wrap)” to convert then to HTML breaks. E.g.
Questions and Feedback
If you have any comments on this page, feel free to add suggestions right to the Google document that we are using to create this site.
If you are not yet member of the Ultradox community on Google+, please join now to get updates from our end or to provide feedback, bug reports or discuss with other users.
Last Updated: 04.03.19