1

I have a dataset with some incomplete dates. I.e., while the default is "2020-03-20" some dates only have the year (i.e. 2020).

In these cases (year only) it seems like pd.DatetimeIndex(["2020"]).month[or day] sets the month and day to 01-01 automatically to 01. I'd rather have it return a NaN instead. I feel like this should be fairly easy to do, but I can't seem to find a way via Google. Any pointers on how to solve this would be greatly appreciated.

Is there maybe a way to identify "year only" dates easily so I can skip them when calculating the months?

Thanks!

Franka
  • 33
  • 4

1 Answers1

1

Please view the solution below, this is one way to look at this problem. Suppose you have a DataFrame -

df=pd.DataFrame({"Date":["2020-02-01","2020-01-01","2020"]})

Create another column "Count" by the following lines of code -

df.loc[:,"Count"]=df.loc[:,"Date"].apply(lambda x: len(x.split('-')))

Now, you can easily separate the values which have complete date or only year. The following line gets you the indices where either month, day or year is present.

indices=df[df.loc[:,"Count"]==1].index
Dharman
  • 30,962
  • 25
  • 85
  • 135
Anant Kumar
  • 611
  • 5
  • 20
  • Thanks so much Dharman and Anat Kumar, this was very helpful indeed, and so quick as well! Appreciate it – Franka Jul 31 '20 at 11:04