2

I have a header DataFrame df_h

df_h = pd.DataFrame({'k':[1,2],'h1':['a','d'],'h2':['b','e'],'h3':['c','f']})

    k  h1  h2  h3
0   1   a   b   c
2   2   d   e   f

and a detail DataFrame df_d

df_d = pd.DataFrame({'k':[1,1,2,2],'d1':[10,11,12,13],'d2':[20,21,22,23],'d3':[30,31,32,33]})

    k  d1  d2  d3
0   1  10  20  30
1   1  11  21  31
2   2  12  22  32
3   2  13  23  33

Then I left joined them in df on k

df = df_h.merge(df_d, how='left', on='k')

    k  h1  h2  h3  d1  d2  d3
0   1   a   b   c  10  20  30
1   1   a   b   c  11  21  31
2   2   d   e   f  12  22  32
2   2   d   e   f  13  23  33

I want to_csv it, but I need it to end up in this way:

    k  h1  h2  h3  d1  d2  d3
0   1   a   b   c  10  20  30
1                  11  21  31
2   2   d   e   f  12  22  32
2                  13  23  33

with no repeated df_h rows.

Is there an option on to_csv or what do I need to do on df to make it look that way?

Notes.

There may be repeated values on h1, h2, h3, d1, d2, or d3.

kis not repeated on df_h.

Rosa Alejandra
  • 732
  • 5
  • 21
  • Possible duplicate of [Replace duplicate values across columns in Pandas](https://stackoverflow.com/questions/39907315/replace-duplicate-values-across-columns-in-pandas) – user3483203 Mar 08 '18 at 23:23

1 Answers1

3

With your result, you can use pd.DataFrame.duplicated to apply your mask and .loc accessor to update:

cols = ['k', 'h1', 'h2', 'h3']
df.loc[df.duplicated(cols), cols] = ''

#    k h1 h2 h3  d1  d2  d3
# 0  1  a  b  c  10  20  30
# 1              11  21  31
# 2  2  d  e  f  12  22  32
# 2              13  23  33

Then just use df.to_csv() as normal.

jpp
  • 159,742
  • 34
  • 281
  • 339