0

I am new to Python and working my way through my crawling project. I have two questions regarding few pandas module.

Below is my data table "js"

              apple     banana
period      
2017-01-01  100.00000   22.80130
2017-02-01  94.13681    16.28664
2017-03-01  85.34201    13.68078
2017-04-01  65.79804    9.77198
2017-05-01  43.32247    13.35504
2017-06-01  72.63843    9.44625
2017-07-01  78.82736    9.77198
2017-08-01  84.03908    10.09771
2017-09-01  90.55374    13.35504
2017-10-01  86.64495    9.12052

Below is my code to apply apple and banana values to new DataFrame.

import pandas as pd
from datetime import datetime, timedelta

dd = pd.date_range('2017-01-01',datetime.now().date() - timedelta(1))
df = pd.DataFrame.set_index(dd) #this part has error

first step is to set my df index as data_range ('2017-01-01' to yesterday (daily)). And the error message is saying I am missing 1 required positional argument: 'keys'. Is it possible to set index as daily dates from '2017-01-01' to yesterday?

After that is solved, I am trying to put my "js" Data such as 'apple' and 'banana' as column, and put each value respect to df index dates. This example only shows 'apple' and 'banana' columns, but in my real data set, I have thousands more...

Please let me know the efficient way to solve my problem. Thanks in advance!

------------------EDIT------------------------ The date indexing works perfect with @COLDSPEED answer.

dd = pd.date_range('2017-01-01',datetime.now().date() - timedelta(1))
df.index = pd.to_datetime(df.index)   # ignore if not needed
df = df.reindex(dd, fill_value=0.0)

One problem is that if I have another dataframe "js2"(below) and combine these data in a single df (above) I believe it will not work. Any sugguestions?

        kiwi    mango
period      
2017-01-01  9.03614     100.00000
2017-02-01  5.42168     35.54216
2017-03-01  7.83132     50.00000
2017-04-01  10.24096    55.42168
2017-05-01  10.84337    60.84337
2017-06-01  12.04819    65.66265
2017-07-01  17.46987    34.93975
2017-08-01  9.03614     30.72289
2017-09-01  9.63855     56.02409
2017-10-01  12.65060    45.18072
EJ Kang
  • 455
  • 2
  • 5
  • 17

1 Answers1

1

You can use pd.to_datetime and pd.Timedelta -

idx = pd.date_range('2017-01-01', pd.to_datetime('today') - pd.Timedelta(days=1))    
idx

DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
               '2017-01-09', '2017-01-10',
               ...
               '2017-11-18', '2017-11-19', '2017-11-20', '2017-11-21',
               '2017-11-22', '2017-11-23', '2017-11-24', '2017-11-25',
               '2017-11-26', '2017-11-27'],
              dtype='datetime64[ns]', length=331, freq='D')

This, you can then use to reindex your dataframe -

df.index = pd.to_datetime(df.index)   # ignore if not needed
df = df.reindex(idx, fill_value=0.0)

If your date are day-first (day first, followed by month), make sure you specify that when converting your index -

df.index = pd.to_datetime(df.index, dayfirst=True)
cs95
  • 379,657
  • 97
  • 704
  • 746
  • @kang `reindex` will do that automatically for you, or am I mistaken? What is your current output looking like and what do you need? – cs95 Nov 29 '17 at 01:53
  • Your code works perfectly. Only question is that how does your code fill value from my "js" DataFrame? I do not see any js in the code..@COLDSPEED I'm asking this question because if I have two set of data from with different value, then what is my step to combine those two DataFrame to df? – EJ Kang Nov 29 '17 at 02:16
  • I edit my question above with your code. Please take a look and let me know the next step. @COLDSPEED – EJ Kang Nov 29 '17 at 02:54
  • @kang Okay, so you want `df = pd.concat([js, js2], axis=1)` and then do the reindexing? – cs95 Nov 29 '17 at 04:29