I have a working code, but I think my logic isn't on the right path (although it works). I just need some help with optimizing it. Essentially, to see if what I did was an acceptable way of doing what I am doing or if there's a better way. I am rooting for the latter, because I know what I did isn't the "right" way.
I have a pd
column of strings with "year" in it and I am trying to extract it from it. The problem is that a few entries do not have a year listed. So something like this:
Index | string_values |
---|---|
0 | String A (1995) |
1 | String B (1995) |
2 | String C (1995) |
3 | String D has no year |
4 | String E has (something in braces) AND also the year (2003) |
re.search('\d{4}', df['string_values '][0]).group(0)
works, but in a for loop, it throws this error (I guess when it hit the non-4-digit string): AttributeError: 'NoneType' object has no attribute 'group'
. I think this because len(_temp)
gives 15036
and it has the years listed. Just that it's throwing this error.
Here's the for
loop:
_temp = []
for i in df['string_values']:
year = re.search("\d{4}", i)
if year.group():
_temp.append(year.group())
else:
_temp.append(None)
Then I also tried the Try-Except way to do it, and that works - len(<var>)
gives 62423
, which is also the total row in the df. And here's the code:
_without_year = []
_with_year = []
for i in df['string_values']:
year = re.search("\d{4}", i)
try:
if year.group():
# _with_year.append(year.group())
pass
except:
_without_year.append(i)
I just need to know if what I did is acceptable. It works, like I said. _without_year
does display all the entries without the year.
The thing with the Try-Except block is that I am pass
ing on the if
condition catching the i
th error.