Saving PDF file as plain text file or HTML file, the text selection tools

Asked by fri

The problem to solve is: the PDF, that I have, can be displayed, but can't be searched - it's a lexicon, also I need to find searched text in it and currently I can't - the only option, that is working, is scrolling it, and this method is slow (in this case the text is formatted (after the one column introduction, some tables etc) in two columns).

My solution would be to save it as plain text and maybe reformat it into one column, if needed (but I suppose, that newly created PDF would be searchable (perhaps - I don't know, what's wrong with the existing PDF, or whether the viewers simply cannot do it, because it is naturally complex for them, when the readers (including Acrobat Reader) can't do that (if I would know how to do it, how to load/import the PDF correctly, when the apps (other than viewers), that could load the PDF, like LibreOffice, doesn't display the characters correctly - they don't load it precisely).

The existence of the tool for text selection in qpdfview would be welcomw.

In case of conversion from PDF to HTML, I discovered an interesting app, that would be a solution of tha issue, called pdf2htmlEX (http://coolwanglu.github.io/pdf2htmlEX/), but am not able to force it to work (there is some kind of problem on my system, when trying to open any PDF with it via the recomended command)

Question information

Language:
English Edit question
Status:
Answered
For:
qpdfview Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Adam Reichold (adamreichold) said :
#1

Hello Pavel,

As a first possibility: Did you try to use the utilities included with Poppler, i.e. "pdftotext" and "pdftohtml"? Sometimes part of an extra package "poppler-utils" depending on the distribution you use. But may be they are easier to get working. But to be honest, if Poppler-based viewers don't access textual content, then "pdftotext" etc. probably won't either. Maybe what you need would be an OCR application or service? Is it a scanned document?

Concerning text-selection in qpdfview, we currently waiting for tagged PDF support to be integrated into Poppler before rethinking the text-selection support, so that selection by rectangle using "Copy to clipboard" is currently the only way. But we do plan to improve this and progress on this is tracked in bug #958634 .

Best regards, Adam.

Revision history for this message
JonyGreen (jonygreen) said :
#2

you can try this free online pdf to text converter(http://www.online-code.net/pdf-to-word.html) to convert pdf to plain text file online.

Can you help with this problem?

Provide an answer of your own, or ask fri for more information if necessary.

To post a message you must log in.