0

i have dataframe like this:

tst=
Date    % on Merchant   % on Customer   Merchants   Location    
2021-08-04  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-05  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-06  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-01  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-02  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-03  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-04  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-05  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-06  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    

uni_ind= ['% on Merchant','% on Customer','Merchants','Location']

i am looking for output:

Date    % on Merchant   % on Customer   Merchants   Location    
2021-08-01  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-02  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-03  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-04  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-05  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-06  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-01  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-02  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-03  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-04  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-05  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-06  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    

tst.groupby(uni_ind).resample('D').bfill()..reset_index(level=(0,1,2,3),drop= True).reset_index()
U13-Forward
  • 69,221
  • 14
  • 89
  • 114

2 Answers2

0
  • create date range for days at bringing of month for Merchants where they are missing
  • outer join to original data frame and fillna(method="bfill")
import pandas as pd
import io

df = pd.read_csv(io.StringIO("""Date    % on Merchant   % on Customer   Merchants   Location    
2021-08-04  0.0  0.10    Zwarma - The Shawarma Maker  Palani  
2021-08-05  0.0  0.10    Zwarma - The Shawarma Maker  Palani  
2021-08-06  0.0  0.10    Zwarma - The Shawarma Maker  Palani  
2021-08-01  0.0  0.12    Zwarma - The Shawarma Maker  Pollachi    
2021-08-02  0.0  0.12    Zwarma - The Shawarma Maker  Pollachi    
2021-08-03  0.0  0.12    Zwarma - The Shawarma Maker  Pollachi    
2021-08-04  0.0  0.12    Zwarma - The Shawarma Maker  Pollachi    
2021-08-05  0.0  0.12    Zwarma - The Shawarma Maker  Pollachi    
2021-08-06  0.0  0.12    Zwarma - The Shawarma Maker  Pollachi """), sep="\s\s+", engine="python")
df["Date"] = pd.to_datetime(df["Date"])

df = (
    df.merge(
        df.groupby(
            [df["Date"].dt.year, df["Date"].dt.month, "Merchants", "Location"], as_index=False
        )
        .agg({"Date": "min"})
        .loc[lambda d: d["Date"].dt.day.gt(1)]
        .apply(
            lambda r: pd.Series(
                {
                    "Date": list(
                        pd.date_range(
                            r["Date"] - pd.offsets.MonthBegin(1),
                            r["Date"] - pd.Timedelta(days=1),
                        )
                    ),
                    "Merchants": r["Merchants"],
                    "Location": r["Location"]
                }
            ),
            axis=1,
        )
        .explode("Date"),
        on=["Date", "Merchants", "Location"],
        how="outer",
    )
    .sort_values(["Merchants", "Location", "Date"])
    .fillna(method="bfill")
)

df

Date % on Merchant % on Customer Merchants Location
9 2021-08-01 00:00:00 0 0.1 Zwarma - The Shawarma Maker Palani
10 2021-08-02 00:00:00 0 0.1 Zwarma - The Shawarma Maker Palani
11 2021-08-03 00:00:00 0 0.1 Zwarma - The Shawarma Maker Palani
0 2021-08-04 00:00:00 0 0.1 Zwarma - The Shawarma Maker Palani
1 2021-08-05 00:00:00 0 0.1 Zwarma - The Shawarma Maker Palani
2 2021-08-06 00:00:00 0 0.1 Zwarma - The Shawarma Maker Palani
3 2021-08-01 00:00:00 0 0.12 Zwarma - The Shawarma Maker Pollachi
4 2021-08-02 00:00:00 0 0.12 Zwarma - The Shawarma Maker Pollachi
5 2021-08-03 00:00:00 0 0.12 Zwarma - The Shawarma Maker Pollachi
6 2021-08-04 00:00:00 0 0.12 Zwarma - The Shawarma Maker Pollachi
7 2021-08-05 00:00:00 0 0.12 Zwarma - The Shawarma Maker Pollachi
8 2021-08-06 00:00:00 0 0.12 Zwarma - The Shawarma Maker Pollachi
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • while importing IO, location column got dropped and it got merged with merchant. The solution is working for that but i dont want to merge the merchant and location column.. Please let me know.. It would be great if the same solution can fill forward for missing dates at the month end for by picking the latest available value.. – Vamshi Krishna Sep 06 '21 at 11:02
  • from you sample data I couldn't see how to delim merchant and location. is it the final space? solution really is the same, add location to groupby and construction of series – Rob Raymond Sep 06 '21 at 11:13
  • updated yo include location as well, just systematically added – Rob Raymond Sep 06 '21 at 11:18
-1

Got a simpler version of answer below.

step 1 : Getting the first date of month by resmaple Month start
tst1 = tst.groupby(uni_ind).resample('MS').bfill().reset_index(level=(0,1,2,3,4,5),drop= True).reset_index()
Step 2 : Appending the month first with original df
tst3 = tst.reset_index().append(tst1)
Step 3 : removing duplicates as there might be month starts for few months
tst3.drop_duplicates( inplace= True, ignore_index= False , keep= 'first')
step 4 : setting Date as index for resample function to use
tst3.set_index('Date',inplace=True)
step 5 : Resample the df
tst3.groupby(uni_ind , dropna= False).resample('D').ffill().reset_index( level=(0,1,2,3,4,5),drop= True).reset_index()