0

I have list of pdf files (their names) like

  1. Financial_Statement_Q1_2015_En belongs to Quarterly Report.
  2. Financial_Statement_Yealy_2015 belongs to Not Quarterly Report.

I need to classify names of pdf's upon Quarterly and Not Quarterly Reports. Kindly tell me the approach, how can I achieve this task using appropriate tool.

Sir Cornflakes
  • 675
  • 13
  • 26
Raja
  • 33
  • 5
  • What languages do you know, you can do this is in pretty much every language by parsing the file name if you used a convention – johnny 5 Aug 28 '15 at 14:27
  • I know c++,java,php. – Raja Aug 28 '15 at 14:37
  • @johnny , do I need to train dataset ?? – Raja Aug 28 '15 at 14:37
  • Java is probably better for this. You just going to want to Pull out all the pdf files name in the directory and iterate over them, create two list quaterly and not quaterly if the name contains Q1, Q2, Q3, Q4 its quaterly else put it in the non quaterly – johnny 5 Aug 28 '15 at 15:08
  • Are all the filenames structured like the examples you gave? – Bob Dillon Aug 28 '15 at 15:08

1 Answers1

0

When your files are named as openly as in your example, a simple pattern matching will work just fine.

Here's some pseudocode:

if "_Q\d_" in filename:
  print filename " belongs to quarterly reports"
else:
  print filename " does not belong to quarterly reports"
Sir Cornflakes
  • 675
  • 13
  • 26