1

My dataframe until now, enter image description here

and I am trying to convert cols which is a list of all columns from 0 to 188 ( cols = list(hdata.columns[ range(0,188) ]) ) which are in this format yyyy-mm to datetimeIndex. There are other few columns as well which are 'string' Names and can't be converted to dateTime hence,so I tried doing this,

hdata[cols].columns = pd.to_datetime(hdata[cols].columns) #convert columns to **datetimeindex**

But this is not working. Can you please figure out what is wrong here?

Edit: A better way to work on this type of data is to use Split-Apply-Combine method.

Step 1: Split the data which you want to perform some specific operation.

nonReqdf = hdata.iloc[:,188:].sort_index()
reqdf= reqdf.drop(['CountyName','Metro','RegionID','SizeRank'],axis=1)

Step 2: do the operations. In my case it was converting the dataframe columns with year and months to datetimeIndex. And resample it quarterly.

reqdf.columns = pd.to_datetime(reqdf.columns)
reqdf = reqdf.resample('Q',axis=1).mean()
reqdf = reqdf.rename(columns=lambda x: str(x.to_period('Q')).lower()).sort_index() # renaming so that string is yyyy**q**<1/2/3/4> like 2012q1 or 2012q2 likewise

Step 3: Combine the two splitted dataframe.(merge can be used but may depend on what you want)

reqdf = pd.concat([reqdf,nonReqdf],axis=1)
Anurag Dhadse
  • 1,722
  • 1
  • 13
  • 26

1 Answers1

1

In order to modify some of the labels from an Index (be it for rows or columns), you need to use df.rename as in

for i in range(188):
    df.rename({df.columns[i]: pd.to_datetime(df.columns[i])},
        axis=1, inplace=True)

Or you can avoid looping by building a full sized index to cover all the columns with

df.columns = (
    pd.to_datetime(cols) # pass the list with strings to get a partial DatetimeIndex
    .append(df.columns.difference(cols)) # complete the index with the rest of the columns
)
RichieV
  • 5,103
  • 2
  • 11
  • 24
  • On splitting the dataframe, `nonReqdf` and `dfwithdate&time` both are having **10730** rows. It's after `merge` which does the rows are added.. there are some duplicates. But one thing is that the dataframe both `outer` and `inner` merge produce the same dataframe with 11208 rows. The merge is done on index 'State' & 'RegionName' but both df rows were same before merging. – Anurag Dhadse Aug 20 '20 at 07:19
  • Nothing showed up.`MultiIndex([], names=['State', 'RegionName'])` & checking `len` gave me length `0` – Anurag Dhadse Aug 20 '20 at 07:57
  • 1
    Do `pd.concat([df1, df2], axis=1)` instead of merge, the rows will be aligned on index before concatenating – RichieV Aug 20 '20 at 08:03
  • I required `concat` and **not** `merge` – Anurag Dhadse Aug 20 '20 at 08:07