0

I have an excel file with multiple sheets containing when and where employees went for a sale. The columns are different in all sheets. Example

Sheet 1

 Date/Place      PlaceA  PlaceB   PlaceD   PlaceE   PlaceF  PlaceG
2019-03-01        A        B        C        D        E        F
2019-03-02        A        B        C        D        E        F
2019-03-03        A        B        C        D        E        F
2019-03-08        A        B        C        D        E        F
2019-03-05        A        B        C        T        E        F
2019-03-06        G        B        K        D        N        Y

Sheet 2

 Date/Place         PlaceE   PlaceF  PlaceG PlaceH PlaceI
2019-03-05           T        E        U      A     B
2019-03-06           X        P        Y      N     H

I need to find the duplicates, corresponding to the Date and Place and replace it with an @ symbol using python

The output will be an excel will the same sheets

Sheet 1

 Date/Place      PlaceA  PlaceB   PlaceD   PlaceE   PlaceF  PlaceG
2019-03-01        A        B        C        D        E        F
2019-03-02        A        B        C        D        E        F
2019-03-03        A        B        C        D        E        F
2019-03-08        A        B        C        D        E        F
2019-03-05        A        B        C        T        E        F
2019-03-06        G        B        K        D        N        Y

Sheet 2

 Date/Place         PlaceE   PlaceF  PlaceG PlaceH PlaceI
2019-03-05           @        @        U      A     B
2019-03-06           X        P        @      N     H

The data - T was in PlaceE on 2019-03-05 is seen in both Sheet1 and Sheet2, hence in Sheet2 it is replaced with @. There are multiple sheets in the file and the duplicates needs to checked across all sheets. Thanks in advance for the answer!

I tried using the df.duplicated function from pandas, but it checks only the same sheet and also, the since this is not direct row duplication, the duplication is based on the first column, it did not work out

Zoey Nightshade
  • 126
  • 2
  • 5

1 Answers1

0

Assuming df1/df2, you can align df1 on df2 and mask the cells that are different, except in Date/Place:

tmp = df2.set_index('Date/Place')

out = tmp.mask(tmp.eq(df1.set_index('Date/Place')), '@').reset_index()

Or:

out = df2.mask(
 df2[['Date/Place']].merge(df1, how='left')
 .set_axis(df2.index)
 .eq(df2).assign(**{'Date/Place': False}),
 '@'
 )

Output:

   Date/Place PlaceE PlaceF PlaceG PlaceH PlaceI
0  2019-03-05      @      @      U      A      B
1  2019-03-06      X      P      @      N      H
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Thanks for the answer! This is really helpful. This is a case where there are two dataframes yeah; the excel file has multiple sheets and after the replacement I need the output excel file to also have the same sheets but with the replaced values. Each sheet has to be checked with every other sheet. 1 to be check with 2,3,...n. Then 2 is be checked with 1,3,4,..n. Basically duplication across everything, so maybe like joining all df or something, but the columns are different in all sheets. A bit confused in that area. Any tips on how to do that? – Zoey Nightshade Jul 12 '23 at 06:04