0

I have a data such as below:

x x y y
0 1 2 3

I want to merge duplicate columns one below another row wise like this

x y
0 2
1 3

Is there a way I can achieve this in pandas dataframe?

  • 1
    What have you tried so far? – FloLie Jul 22 '21 at 12:22
  • 1
    Is it guaranteed that the number of rows for each column name will be equal in the end? So is N(x1) + N(x2) + N(xn) == N(y1) + N(y2).. – FloLie Jul 22 '21 at 12:23
  • Beny has the perfect answer for this. Check this answer: https://stackoverflow.com/a/50083006/6660373. `df.groupby(df.columns.values, axis=1).agg(lambda x: x.values.tolist()).sum().apply(pd.Series).T ` – Pygirl Jul 22 '21 at 13:09
  • Does this answer your question? [Unpivot multiple columns with same name in pandas dataframe](https://stackoverflow.com/questions/50082846/unpivot-multiple-columns-with-same-name-in-pandas-dataframe) – Pygirl Jul 22 '21 at 13:10
  • @Pygirl correct me if I am wrong, but that does not work, if the pivoted data is read from file right? – FloLie Jul 22 '21 at 13:14
  • @FloLie: I guess yes it won;t work. It's only work when you are having a dataframe in this form. – Pygirl Jul 22 '21 at 13:17

1 Answers1

0

If the data is the result of a groupby / pivot on an existing dataframe @Pygirl comment is corrent and I refer to her/his link as Link

However it does not work if the data is to be read from file, as pandas does not allow for duplicate named columns and modifies them as name, name.1, name.n.

My answer refers to that case with dummy input test_import

X,X,Y
1,2,1
3,4,2
5,6,3

It basically finds all associated columns x -> (X, X.1,..) Creates a dict with unique keys, and concatinates the associated columns together.

import pandas as pd
import re

df = pd.read_csv("test_import", sep=",")
pattern = r"([^\.]*)\.{0,1}.*"
unique_header = set(re.match(pattern, i).groups()[0] for i in df.columns)
result = dict()

for header in unique_header:
    result[header] = []
    for column in df.columns:
        if column[:len(header)] == header:
            result[header].append(df[column])
    result[header] = pd.concat(result[header])
    
final = pd.DataFrame()
for key, value in result.items():
    final[key] = value

print(final)

I do admit, that this is not a sexy answer and I am happy for input for a leaner way, but I didn't come up with one.

   X  Y
0  1  1
1  3  2
2  5  3
0  2  1
1  4  2
2  6  3
FloLie
  • 1,820
  • 1
  • 7
  • 19