-2

I have a file with 9k PDF documents. I wish to read the source code (not content) of each of them using cat and store them in a separate text file with the same name as that of the PDF.

For example

cat 030.pdf > 030.txt

I want to do this for all the files in the folder. How can I do it?

BlackSwan
  • 275
  • 3
  • 12
  • Those downvoting may explain their motives. The question and the help means a lot to me. If you think that the question is worthless, first answer and then downvote. Don't go around downvoting coz you have enough rep to sustain a -1 for you! – BlackSwan Jan 17 '18 at 18:37
  • 1
    Downvoting a question doesn't cost any reputation, FYI. – Benjamin W. Jan 17 '18 at 18:50
  • Also, why do you assume that is why your question was downvoted? Did you consider, just for a second, that maybe your question was poorly researched? – Félix Adriyel Gagnon-Grenier Jan 17 '18 at 18:51
  • @FélixGagnon-Grenier What kind of a research should I do here? I read the man pages, I googled but all they show is concatenating the files. – BlackSwan Jan 17 '18 at 18:52
  • For a start, researching about putting the content of a pdf file into a text file. The first 10 results when googling "bash put pdf into text file" are all about batch processing pdf files into text files. The very first google result have an answer for your problem! – Félix Adriyel Gagnon-Grenier Jan 17 '18 at 18:55
  • @FélixGagnon-Grenier I am talking about the source code of a PDF file and not the content as I mentioned in the question but I'll try what you've suggested. – BlackSwan Jan 17 '18 at 18:58
  • @FélixGagnon-Grenier I am examining the static features of a PDF as a part of the project. I need to have a look at the keywords in the code. – BlackSwan Jan 17 '18 at 19:03
  • 1
    you might be interested by https://unix.stackexchange.com/q/17220/72867 – Félix Adriyel Gagnon-Grenier Jan 17 '18 at 19:04

2 Answers2

1

Instead of using cat use cp it's more efficient.

find . -name \*.pdf -exec cp {} $(basename {}).txt \;
Stephen M. Webb
  • 1,705
  • 11
  • 18
0

You could use create a shell script to loop through files in the directory and execute the cat command, and substring the filename to remove .pdf and add .txt:

for entry in "$search_dir"/*
do
  if [ -f "$entry" ];then
   cat "$entry" > "${entry%%.*}.txt"
fi
done
jmoney
  • 443
  • 2
  • 10