How could I browse the contents of PDF files in a directory/subdirectory? I am searching for some command line tools. It appears that grep cannot browse PDF files.
while the pdfgrep option benefits actually quick and simple searches, often I want to get some context, as a single line will not be useful enough
Recoll is a great full-text GUI search application for Unix/Linux that supports dozens of various formats, including PDF. It can even pass the precise page number and search regard to a question to the file viewer and hence enables you to leap to the result right from its GUI.
I had the same issue and thus I composed a script which browses all pdf files in the specified folder for a string and prints the PDF files wich matched the inquiry string.
There is an open source typical resource grep tool crgrep which searches within PDF files however likewise other resources like material embedded in archives, database tables, image meta-data, POM file dependencies and web resources – and combinations of these including recursive search.
First transform all your pdf files to text files: Then utilize grep as typical. When you have numerous queries and a lot of PDF files, this is specifically good as it is fast.
You need some tools like pdf2text to very first transform your pdf to a text file and then search inside the text. (You will most likely miss some information or signs).
, if you are using a programs language there are most likely pdf libraries written for this function.. e.g. http://search.cpan.org/dist/CAM-PDF/ for Perl
I want to search some text in a PDF file. Where is the word “go to” in my PDF? If you discover it, what page is there?
Based on default, pdftotext does place form feed characters (0xC) in between pages. You can count them as much as the look of the word you browse for.
Recoll can browse PDF documents. It has a command line mode, however the GUI will be more helpful in detailing where the matches occur, and it will let you click open the file at the ideal position.