0

I'm trying to parse a large set of files with records that include dates in Spanish with formats like this one 'Ago 01, 2022'. For this task, I'm using the function parse from dataparser module. In the past, I could use successfully that function for a similar purpose, but now it fails with string in Spanish even if I set languages or locales parameters for parse function.

I import the function parse with this line:

from dateparser import parse
  1. If I call the function with a date in English it run successfully, as I expect:
parse('Aug 01, 2021', date_formats=['%b %d, %Y'] )

# Returns
datetime.datetime(2022, 8, 1, 0, 0)
  1. If I call the function with a date in Spanish without any other parameter it runs unsuccessfully, as I expect too:

    (August in Spanish is Agosto):

parse('Ago 01, 2021', date_formats=['%b %d, %Y'] )

# Raises an exception in regex that ends with:

~\anaconda3\lib\site-packages\regex\_regex_core.py in _compile_replacement(source, pattern, is_unicode)
   1735                 return False, [value]
   1736 
-> 1737         raise error("bad escape \\%s" % ch, source.string, source.pos)
   1738 
   1739     if isinstance(source.sep, bytes):

error: bad escape \d at position 7

I suppose that this error has something related to a regex pattern in Spanish, but I cannot be sure what is the problem beyond the language.

  1. Giving to parse a language parameter doesn't change the results.
parse('Ago 01, 2021', date_formats=['%b %d, %Y'], languages=['es'])

# Raises the same exception that ends with:

~\anaconda3\lib\site-packages\regex\_regex_core.py in _compile_replacement(source, pattern, is_unicode)
   1735                 return False, [value]
   1736 
-> 1737         raise error("bad escape \\%s" % ch, source.string, source.pos)
   1738 
   1739     if isinstance(source.sep, bytes):

error: bad escape \d at position 7

  1. Something similar occurs if I set the parameter locales.
parse('Ago 01, 2021', date_formats=['%b %d, %Y'], locales=['es'])

# Raises the same exception that ends with:

~\anaconda3\lib\site-packages\regex\_regex_core.py in _compile_replacement(source, pattern, is_unicode)
   1735                 return False, [value]
   1736 
-> 1737         raise error("bad escape \\%s" % ch, source.string, source.pos)
   1738 
   1739     if isinstance(source.sep, bytes):

error: bad escape \d at position 7


I'm not sure if the problem is related to an update or a change in the module, but I want to mention that when I call parse for the first time, I get this warning message.

~\anaconda3\lib\site-packages\dateparser\utils\__init__.py:130: PytzUsageWarning: The localize
method is no longer necessary, as this time zone supports the fold attribute (PEP 495). 
For more details on migrating to a PEP 495-compliant implementation, see 
https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
date_obj = tz.localize(date_obj)

Looking for an insight I tried to use a dateparser's demo located in this URL https://dateparser-demo.netlify.app/ cited in this github's repository https://github.com/scrapinghub/dateparser cited in PyPi https://pypi.org/project/dateparser/. But, in this demo, my string in Spanish is successfully parsed. I supposed that I have an old version of dateparser so I checked and I have the latest version available in PyPi.

  • I'm using python 3.7.3 and dateparser 1.1.1 (currently the latest) on a machine with Windows 10 in Spanish.
Augusto Sisa
  • 548
  • 4
  • 15
  • For what it's worth: I ran your example 1 and 3 on an macOS (English) with Python 3.7.13 and dateparser 1.1.1 without error. Although I do get the `PytzUsageWarning`, but that's so far just a warning. And the resulting date is correct. Makes me wonder if Windows is to blame here. – 9769953 Sep 20 '22 at 23:15
  • The examples on the PyPI page don't show any use of `date_formats` argument. Since the error mentions `\b`, and there is a `%b` in the format, could you try without: `parse('Ago 01, 2021', languages=['es'])`? – 9769953 Sep 20 '22 at 23:18
  • Apparently related: https://github.com/scrapinghub/dateparser/issues/1052 . It appears a fix is suggested in https://github.com/scrapinghub/dateparser/pull/1067 , but that is not in yet, and alalso well past the release of 1.1.1. – 9769953 Sep 20 '22 at 23:21
  • Given the comments in the GitHub issues, you could try to downgrade the `regex` module by a few (minor) versions. I can't tell you which one exactly, because the internal versioning doesn't match what is on PyPI, so while my local regex version is not the most recent (and works properly), I don't know which PyPI version it is. – 9769953 Sep 20 '22 at 23:30

1 Answers1

1

This has been fixed in recent versions.
dateparser 1.1.3
Can you check that everything is working as expected now?

>>> parse('Ago 01, 2021', date_formats=['%b %d, %Y'] )
datetime.datetime(2021, 8, 1, 0, 0)
>>> parse('Ago 01, 2021', date_formats=['%b %d, %Y'], languages=['es'])
datetime.datetime(2021, 8, 1, 0, 0)
>>> parse('Ago 01, 2021', date_formats=['%b %d, %Y'], locales=['es'])
datetime.datetime(2021, 8, 1, 0, 0)
Serhii
  • 1,367
  • 3
  • 13
  • 31