Combine repeated columns in the same Dataframe

Question

I have data from various csv files I am trying to put together. I put it all in one Dataframe. How can I combine the data into the corresponding A, B, C columns and include a header for each row?

for data_base in data:
    base_data.append(data_base['A'])
    base_data.append(data_base[' B'])
    base_data.append(data_base[' C'] )
#    np.append(base_data, np.nan)
df_name = pd.DataFrame(name_join)
df_data = pd.DataFrame(base_data)
trp = np.transpose(df_data)

Actual:

A           B       C       A       B       C       A       B       C
0.7283  0.743   0.01    0.7283  0.7512  0.02    0.7283  0.7456  0.02
0.5165  0.488   0.03    0.5165  0.4756  0.04    0.5165  0.4707  0.05
0.5087  0.4781  0.03    0.5087  0.4611  0.05    0.5087  0.4467  0.06
0.4598  0.4834  0.02    0.4598  0.4938  0.03    0.4598  0.4793  0.02
0.4883  0.5235  0.04    0.4883  0.5173  0.03    0.4883  0.5278  0.04
0.5993  0.6229  0.02    0.5993  0.6223  0.02    0.5993  0.6258  0.03
0.5351  0.5983  0.06    0.5351  0.6029  0.07    0.5351  0.613   0.08
0.6105  0.6314  0.02    0.6105  0.6434  0.03    0.6105  0.6361  0.03
0.5946  0.6495  0.05    0.5946  0.6452  0.05    0.5946  0.6463  0.05
0.7335  0.7506  0.02    0.7335  0.7559  0.02    0.7335  0.7497  0.02

Expected:

    A       B       C
Cow 0.7283  0.743   0.01
    0.5165  0.488   0.03
    0.5087  0.4781  0.03
    0.4598  0.4834  0.02
    0.4883  0.5235  0.04
    0.5993  0.6229  0.02
    0.5351  0.5983  0.06
    0.6105  0.6314  0.02
    0.5946  0.6495  0.05
    0.7335  0.7506  0.02
Cat 0.7283  0.7512  0.02
    0.5165  0.4756  0.04
    0.5087  0.4611  0.05
    0.4598  0.4938  0.03
    0.4883  0.5173  0.03
    0.5993  0.6223  0.02
    0.5351  0.6029  0.07
    0.6105  0.6434  0.03
    0.5946  0.6452  0.05
    0.7335  0.7559  0.02
Dog 0.7283  0.7456  0.02
    0.5165  0.4707  0.05
    0.5087  0.4467  0.06
    0.4598  0.4793  0.02
    0.4883  0.5278  0.04
    0.5993  0.6258  0.03
    0.5351  0.613   0.08
    0.6105  0.6361  0.03
    0.5946  0.6463  0.05
    0.7335  0.7497  0.02

Does each of your csv have the same amount of columns and the same names ? — Benoit Drogou, Aug 14 '19 at 19:34
Why have the `nan` row? Probably easier to deal with the `MultiIndex` — ALollz, Aug 14 '19 at 20:03
That's a bad idea. Your columns become object and then it becomes painful to do simple arithmetic operations like `.sum()`. You should really use a MultiIndex and store the names in the first level: `pd.concat(data, keys=['name_1', 'name_2', 'name_3'])` — ALollz, Aug 14 '19 at 20:16
Thanks for the heads up. I re-phrased my question a bit. I also don't plan to much arithmetic, if any, functions on my data. I just want to allocate my data into one file rather than clicking through each folder one by one — Nycbros, Aug 14 '19 at 20:26
@Nycbros You can read each csv into a seperate dataframe and then merge on a unique identifier, see: [pandas documentation on merging dataframes] https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas-dataframe-merge] — campellcl, Aug 14 '19 at 20:43
I still learning the in's and out's of python. How would I do that? — Nycbros, Aug 14 '19 at 22:35
I understand what you are saying but I don't know how to loop my files into individual Dataframes — Nycbros, Aug 14 '19 at 22:36

score 0 · Answer 1 · answered Aug 15 '19 at 19:14

Here's a solution based on Nycbros comment.

import pandas as pd

# Dummy data
data_double = pd.DataFrame(data=[{'x': x, 'y': 2 * x} for x in range(5)])
data_triple = pd.DataFrame(data=[{'x': x, 'y': 3 * x} for x in range(5)])

print(data_double)

Output:

print(data_triple)

Output:

# You will need to get a list of keys which equate to your data
data = [data_double, data_triple]
keys = ['Double', 'Triple']

# Concatenate the dataframes in your data array, give it the keys to index with
combo = pd.concat(data, keys=keys)
print(combo)

Output:

          x   y
Double 0  0   0
       1  1   2
       2  2   4
       3  3   6
       4  4   8
Triple 0  0   0
       1  1   3
       2  2   6
       3  3   9
       4  4  12

# If you don't want the original indexes, you can drop them
combo = combo.reset_index(level=1, drop=True)
print(combo)

Output:

        x   y
Double  0   0
Double  1   2
Double  2   4
Double  3   6
Double  4   8
Triple  0   0
Triple  1   3
Triple  2   6
Triple  3   9
Triple  4  12

Combine repeated columns in the same Dataframe

1 Answers1