Introducing myself as I'm just started to join stack overflow after searching around for some days. I'm working on a little project with my RasPi sorting out my PDF documents with speaking filenames.
I'm going to grep with pdfgrep the companyname and the date from various documents.
Here is the code:
#!/bin/bash
# set work directory
workpath=~pi/Documents/
find $workpath/ -iname '*.pdf' -print | while read FILENAME
do
if pdfgrep -i --max-count 1 'company1' "${FILENAME}";
then
echo "$FILENAME";
pdfgrep --max-count 1 '(([0-9][0-9]{,1}\.)\s+('Januar'|'Februar'|'März'|'April'|'Mai'|'Juni'|'Juli'|'August'|'September'|'Oktober'|'November'|'Dezember')\s+([1-9][0-9][0-9][0-9]{1,}))' "${FILENAME}";
echo "company1";
elif pdfgrep -i --max-count 1 'company2' "${FILENAME}";
then
echo "$FILENAME";
pdfgrep --max-count 1 '('Datum:')\s+(([0-9][0-9]{,1}\.)([0-9][0-9]{,1}\.)([1-9][0-9][0-9][0-9]{1,}))'
echo "company2";
else
echo "$FILENAME";
echo "undefined document -- Error!!";
fi
done
For each file I get different content as:
companyname
paper of conduct companyname
companyname and companyaddress
and more different stuff
The date comes also different
dd.mm.yyyy
date: dd.mm.yyyy
some text dd. month yyyy
_______________________dd.month yyyy
I'm looking for a way to write only the needed content, without text around, into variables as:
comp=companyname
datey=yyyy
datem=mm / here I need also an idea how to translate month to mm
dated=dd
result should be: yyyymmdd-companyname.pdf
I started with bash scripting, as this is I get pdfgrep working and I'm not quite familar with programming languages. Maybe I did some lines in python :S
Your help will be very welcome!
cheers, bdream