2

How to change a date that is expressed as an interval to be expressed as a regular %Y-%M-%D format.

I originally had a df that looked like this:

   Id       Date  Quantity
1000A 2018-03-22      20.0
1000A 2018-03-29       8.0
1000A 2018-03-27       4.0
1000A 2018-03-28      10.0


or:

all_data = pd.DataFrame({'Id': ['1000A','1000A','1000A','1000A'], 'Date': ['2018-03-28', '2018-04-12', '2018-05-02', '2018-06-28'], 'Quantity' : [20.0, 8.0, 4.0, 10.0]})

all_data.Date = pd.to_datetime(all_data.Date)

I applied stuff on it to get basically monthly-grouped dates going backward from today's date.

today1 = pd.to_datetime('today').normalize()
frequency1 = '30D'
Nbin1 = (today1 - all_data['Date'].min()) // pd.Timedelta(frequency1) + 1  # Number of bins
bins1 = [today1 - n * pd.Timedelta(frequency1) for n in range(Nbin1, -1, -1)]
data11 = all_data.groupby(['Id', pd.cut(all_data['Date'], bins=bins1)]).sum().fillna(0).reset_index()

the output looks like this:

        Id                      Date  Quantity
0   1000A  (2018-03-02, 2018-04-01]     20.0
1   1000A  (2018-04-01, 2018-05-01]      8.0
2   1000A  (2018-05-01, 2018-05-31]      4.0
3   1000A  (2018-05-31, 2018-06-30]     10.0
....
n.  1000A  (2020-11-16, 2020-12-16]     0.0

I cannot find a way to convert the date column back into regular date format like this:

      Date   
2018-04-01   
2018-05-01   
2018-05-31   
2018-06-30   

I feel like I have tried every tool I could find but nothing does the tric, any help would be appreciated.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Murcielago
  • 905
  • 1
  • 8
  • 30

1 Answers1

2

You have a categorial dtype in Date. One way to handle it is by converting it to str so that you can extract the pattern you want, then convert it to datetime:

data11['Date'] = data11.Date.astype(str).str.extract(', (.+?)]').astype('datetime64[ns]')
Cainã Max Couto-Silva
  • 4,839
  • 1
  • 11
  • 35
  • 1
    miracle! thank you! i have tried using ```astype(str).split()``` to solve it but it would not work, could you tell me why repeating ```str``` is the way to go? – Murcielago Dec 16 '20 at 01:02
  • Sure! To access any `str` method in a pandas Series, you need to use it, just like you have to use `dt` for accessing the datetime properties in a Series. – Cainã Max Couto-Silva Dec 16 '20 at 01:05
  • On the other hand, if you're accessing a single value from a Series, like `data11.Date.astype(str)[0]`, then you can (and should) use the `str` method directly. – Cainã Max Couto-Silva Dec 16 '20 at 01:08
  • I understand, super useful, thank you! – Murcielago Dec 16 '20 at 01:10
  • 1
    @Peyo and Caina this is not the correct way to extract values from an `Interval`: `type(data11.iloc[0, 1]) → pandas._libs.interval.Interval`. See the duplicate. `data11.Date = pd.IntervalIndex(data11.Date).right` or `data11.Date = data11.Date.apply(lambda x: x.right)`, however the first option is vectorized. – Trenton McKinney Dec 16 '20 at 01:30
  • 1
    Hi @TrentonMcKinney! Thanks for pointing out a better way and more appropriate way to get it done. I totally agree with you on that, although I also disagree with statements like "the correct way" since multiple approaches can work just fine to achieve the same goal (like in this case). That said, you just increased our knowledge, so thank you again! – Cainã Max Couto-Silva Dec 16 '20 at 01:41
  • 1
    Glad that's informative. Also, I say _correct way_ in the vein that it is the correct way to use the `.dt` accessor to work with datetime data. As such, it's the "correct way" to use `Interval` methods to extract the left or right limit. Acknowledged, there may be more than one way, but it reference to _The Zen of Python_ (`import this`), **There should be one-- and preferably only one --obvious way to do it.** Happy Holidays! – Trenton McKinney Dec 16 '20 at 01:50