0

I want to extract a date from a string using python's dateutil package. The date comes in different formats, but the day part of the date is not present in any one of these string.

Month when written in Alphabets preceeds year like Sep 2016, but while written as numeric succeeds year like 2016-09 or 201609

import dateutil.parser as dparser
print(dparser.parse("The file is for month Sep 2016.",fuzzy=True).month)
   9
print(dparser.parse("The file is for month Sept-2016.",fuzzy=True).month)
   9
print(dparser.parse("The file is for month 2016-09.",fuzzy=True).month)
   9

How to deal with the case when there is no hyphen - between year and month as shown below -

print(dparser.parse("The file is for month 201609.",fuzzy=True).month)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-123-566e083c8313> in <module>()
----> 1 dparser.parse("The file is for month 201609.",fuzzy=True).month

~\AppData\Local\Continuum\anaconda3\lib\site-packages\dateutil\parser.py in parse(timestr, parserinfo, **kwargs)
   1180         return parser(parserinfo).parse(timestr, **kwargs)
   1181     else:
-> 1182         return DEFAULTPARSER.parse(timestr, **kwargs)
   1183 
   1184 

Is there an option inside this library to do so?

cph_sto
  • 7,189
  • 12
  • 42
  • 78

1 Answers1

0

A completely different solution if you don't mind using a external package is to use dateparser (https://pypi.org/project/dateparser/) which can parse dates in multiple formats (even includes some NLP features)

Otherwise you can use regular expression to extract this date format (e.g. '[0-9]{6}') and then separate the year from the month. This only works if the year precedes the month.

  • Can you provide examples demonstrating that it works with `dateparser`? – cph_sto Sep 26 '19 at 14:59
  • you can use the module: from `dateparser.search import search_dates` then pass your strings to `search_dates` along with the desirable settings (e.g. `search_dates("The file is for month Sept-2016.", settings={"PREFER_DAY_OF_MONTH": "first"})` `PREFER_DAY_OF_MONTH` is set to first because the day is missing and dateparser will always return a full date. – Blue Three Wheeler Sep 26 '19 at 15:40
  • Well, problem is with format like 2016-09 or 201609. It doesn’t work for that. – cph_sto Sep 26 '19 at 17:23