How could I browse the contents of PDF files in a directory/subdirectory? I am searching for some command line tools. It appears that grep cannot browse PDF files.
while the pdfgrep option benefits actually quick and simple searches, often I want to get some context, as a single line will not be useful enough
Recoll is a great full-text GUI search application for Unix/Linux that supports dozens of various formats, including PDF. It can even pass the precise page number and search regard to a question to the file viewer and hence enables you to leap to the result right from its GUI.
I had the same issue and thus I composed a script which browses all pdf files in the specified folder for a string and prints the PDF files wich matched the inquiry string.
There is an open source typical resource grep tool crgrep which searches within PDF files however likewise other resources like material embedded in archives, database tables, image meta-data, POM file dependencies and web resources – and combinations of these including recursive search.
First transform all your pdf files to text files: Then utilize grep as typical. When you have numerous queries and a lot of PDF files, this is specifically good as it is fast.
if you are actually utilizing a systems foreign language there are actually more than likely pdf libraries created for this function. e.g. http://search.cpan.org/dist/CAM-PDF/ for Perl
You require some devices like pdf2text to quite initial transform your pdf to a message documents and after that explore inside the content. (You are going to more than likely miss out on some information or even indications).
Recoll can search PDF records. It has a demand line mode, however the GUI will be much more handy in detailing where the complements develop, as well as it will definitely let you click open the data at the ideal position.
Based upon default, pdftotext performs area type feed signs (0xC) in between pages. You may await all of them as long as the appearance of the word you scan for.
I intend to look some message in a PDF report. Where is words “go to” in my PDF? If you uncover it, what page exists?