0

I have a few sentences and I have put each sentence in a row of a dataframe. I am looking at extracting date from these sentences. I came across the package "datefinder".

When I send single sentence to 'string_with_dates', it properly extracts all dates and gives back.

import datefinder
string_with_dates = '''  They have released Proposals for period October 1, 2018 ’ September 30, 2019. Manufacturers are encouraged to submit proposals for stores located basis throughout the fiscal year ending September 30, 2018, pending availability of funds., '''

matches = datefinder.find_dates(string_with_dates)
for match in matches:
    match = str(match)
    print(match)

output = 2018-10-01 00:00:00
         2019-09-30 00:00:00
         2018-09-30 00:00:00

But when I put multiple sentences of a dataframe and loop over using a "for" loop, it is getting messed up. It doesn't show multiple dates (if any) in a cell of dataframe properly. description_df is the name of my dataframe. In column 9, I have the sentences and in column 13, I wish to store the extracted dates.

    import datefinder
    for i in range (len(description_df)):
        string_with_dates = description_df.iloc[i,9]
        matches = datefinder.find_dates(string_with_dates)
        for match in matches:
            match = str(match)
            print(match)
            description_df.iloc[i,13] =  match
Output of the extracted date column of the dataframe is:
2019-09-30 00:00:00
2019-05-07 00:00:00
""
0310-08-07 00:00:00
2019-08-07 00:00:00
developer
  • 257
  • 1
  • 3
  • 15
  • For every match you overwrite the contents of the cell. Is that your problem? – IcedLance Aug 07 '19 at 09:42
  • @IcedLance , Thanks for the reply. I didn't understand what you asked. But the issue is, some sentences have multiple dates, but datefinder isn't extracting all of them. Sometimes, it is taking current date and sometimes it is taking just one of the multiple dates present in the sentences – developer Aug 07 '19 at 09:47
  • Datefinder finds all dates and puts them all in a `matches` list. Then you take apart that list by `for match in matches` and for every date present you write that date alone into your dataframe cell. In a loop. `a = match` does not add `match` to list. It replaces old value with new one. And you do it in a loop. You write every date to the cell and then overwrite it with the next one in the list and so on. – IcedLance Aug 07 '19 at 09:52
  • @IcedLance How do I make it take all dates for a particular sentence at once and store it in a dataframe? – developer Aug 07 '19 at 10:06
  • `description_df.iloc[i,13] = [str(date) for date in matches]` – IcedLance Aug 07 '19 at 10:13
  • @IcedLance, That gave an error saying "dateutil\parser\_parser.py", line 1227, in _build_naive naive = default.replace(**repl) OverflowError: Python int too large to convert to C long" – developer Aug 08 '19 at 04:19

0 Answers0