1

I have data from various csv files I am trying to put together. I put it all in one Dataframe. How can I combine the data into the corresponding A, B, C columns and include a header for each row?

for data_base in data:
    base_data.append(data_base['A'])
    base_data.append(data_base[' B'])
    base_data.append(data_base[' C'] )
#    np.append(base_data, np.nan)
df_name = pd.DataFrame(name_join)
df_data = pd.DataFrame(base_data)
trp = np.transpose(df_data)

Actual:

A           B       C       A       B       C       A       B       C
0.7283  0.743   0.01    0.7283  0.7512  0.02    0.7283  0.7456  0.02
0.5165  0.488   0.03    0.5165  0.4756  0.04    0.5165  0.4707  0.05
0.5087  0.4781  0.03    0.5087  0.4611  0.05    0.5087  0.4467  0.06
0.4598  0.4834  0.02    0.4598  0.4938  0.03    0.4598  0.4793  0.02
0.4883  0.5235  0.04    0.4883  0.5173  0.03    0.4883  0.5278  0.04
0.5993  0.6229  0.02    0.5993  0.6223  0.02    0.5993  0.6258  0.03
0.5351  0.5983  0.06    0.5351  0.6029  0.07    0.5351  0.613   0.08
0.6105  0.6314  0.02    0.6105  0.6434  0.03    0.6105  0.6361  0.03
0.5946  0.6495  0.05    0.5946  0.6452  0.05    0.5946  0.6463  0.05
0.7335  0.7506  0.02    0.7335  0.7559  0.02    0.7335  0.7497  0.02

Expected:

    A       B       C
Cow 0.7283  0.743   0.01
    0.5165  0.488   0.03
    0.5087  0.4781  0.03
    0.4598  0.4834  0.02
    0.4883  0.5235  0.04
    0.5993  0.6229  0.02
    0.5351  0.5983  0.06
    0.6105  0.6314  0.02
    0.5946  0.6495  0.05
    0.7335  0.7506  0.02
Cat 0.7283  0.7512  0.02
    0.5165  0.4756  0.04
    0.5087  0.4611  0.05
    0.4598  0.4938  0.03
    0.4883  0.5173  0.03
    0.5993  0.6223  0.02
    0.5351  0.6029  0.07
    0.6105  0.6434  0.03
    0.5946  0.6452  0.05
    0.7335  0.7559  0.02
Dog 0.7283  0.7456  0.02
    0.5165  0.4707  0.05
    0.5087  0.4467  0.06
    0.4598  0.4793  0.02
    0.4883  0.5278  0.04
    0.5993  0.6258  0.03
    0.5351  0.613   0.08
    0.6105  0.6361  0.03
    0.5946  0.6463  0.05
    0.7335  0.7497  0.02
Nycbros
  • 95
  • 8
  • You can use `concat` – ansev Aug 14 '19 at 19:33
  • Does each of your csv have the same amount of columns and the same names ? – Benoit Drogou Aug 14 '19 at 19:34
  • yes they are all the same format – Nycbros Aug 14 '19 at 19:47
  • Why have the `nan` row? Probably easier to deal with the `MultiIndex` – ALollz Aug 14 '19 at 20:03
  • I was planning to replace nan with a custom name – Nycbros Aug 14 '19 at 20:06
  • 1
    That's a bad idea. Your columns become object and then it becomes painful to do simple arithmetic operations like `.sum()`. You should really use a MultiIndex and store the names in the first level: `pd.concat(data, keys=['name_1', 'name_2', 'name_3'])` – ALollz Aug 14 '19 at 20:16
  • Thanks for the heads up. I re-phrased my question a bit. I also don't plan to much arithmetic, if any, functions on my data. I just want to allocate my data into one file rather than clicking through each folder one by one – Nycbros Aug 14 '19 at 20:26
  • @Nycbros You can read each csv into a seperate dataframe and then merge on a unique identifier, see: [pandas documentation on merging dataframes] https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas-dataframe-merge] – campellcl Aug 14 '19 at 20:43
  • I still learning the in's and out's of python. How would I do that? – Nycbros Aug 14 '19 at 22:35
  • I understand what you are saying but I don't know how to loop my files into individual Dataframes – Nycbros Aug 14 '19 at 22:36

1 Answers1

0

Here's a solution based on Nycbros comment.

import pandas as pd

# Dummy data
data_double = pd.DataFrame(data=[{'x': x, 'y': 2 * x} for x in range(5)])
data_triple = pd.DataFrame(data=[{'x': x, 'y': 3 * x} for x in range(5)])

print(data_double)

Output:

   x  y
0  0  0
1  1  2
2  2  4
3  3  6
4  4  8
print(data_triple)

Output:

   x   y
0  0   0
1  1   3
2  2   6
3  3   9
4  4  12

# You will need to get a list of keys which equate to your data
data = [data_double, data_triple]
keys = ['Double', 'Triple']

# Concatenate the dataframes in your data array, give it the keys to index with
combo = pd.concat(data, keys=keys)
print(combo)

Output:

          x   y
Double 0  0   0
       1  1   2
       2  2   4
       3  3   6
       4  4   8
Triple 0  0   0
       1  1   3
       2  2   6
       3  3   9
       4  4  12
# If you don't want the original indexes, you can drop them
combo = combo.reset_index(level=1, drop=True)
print(combo)

Output:

        x   y
Double  0   0
Double  1   2
Double  2   4
Double  3   6
Double  4   8
Triple  0   0
Triple  1   3
Triple  2   6
Triple  3   9
Triple  4  12
alexbclay
  • 1,389
  • 14
  • 19