1

So I have the Edition Column which contains data in unevenly pattern, as some have ',' followed by the date and some have ',-' pattern.

df.head()

17                Paperback,– 1 Nov 2016
18    Mass Market Paperback,– 1 Jan 1991
19                      Paperback,– 2016
20               Hardcover,– 24 Nov 2018
21        Paperback,– Import, 4 Oct 2018

How can I extract the date to a separate column. I tried using str.split() but can't find specific pattern to extract.Is there any method I could do it?

Suhas Mucherla
  • 1,383
  • 1
  • 5
  • 17
Kushagra
  • 61
  • 2
  • 11

2 Answers2

3
obj = df['Edition']
obj.str.split('((?:\d+\s+\w+\s+)?\d{4}$)', expand=True)

or

obj.str.split('[,–]+').str[0]
obj.str.split('[,–]+').str[-1] # date
Ferris
  • 5,325
  • 1
  • 14
  • 23
2

Try using dateutil

from dateutil.parser import parse
 
df['Dt']=[parse(i, fuzzy_with_tokens=True)[0] for i in df['column']]
Suhas Mucherla
  • 1,383
  • 1
  • 5
  • 17