0

I have a data frame made by using a blank data frame that I concat multiple data frames onto by looping. Using the following. final = pd.concat([final, out], axis=1, sort=True) That gave me something similar to

Date    Count   Date    Count   Date    Count   Date    Count
1/1/2019    1   1/1/2019    1   N/A N/A 1/1/2019    1
1/2/2019    1   1/2/2019    1   1/2/2019    1   1/2/2019    1
1/3/2019    1   1/3/2019    1   1/3/2019    1   1/3/2019    1
N/A N/A 1/4/2019    1   1/4/2019    1   1/4/2019    1
1/5/2019    1   1/5/2019    1   1/5/2019    1   1/5/2019    1
1/6/2019    1   1/6/2019    1   1/6/2019    1   N/A N/A
N/A N/A 1/7/2019    1   1/7/2019    1   1/7/2019    1
1/8/2019    1   1/8/2019    1   N/A N/A 1/8/2019    1
1/9/2019    1   1/9/2019    1   1/9/2019    1   1/9/2019    1
N/A N/A N/A N/A 1/10/2019   1   1/10/2019   1
1/11/2019   1   1/11/2019   1   1/11/2019   1   1/11/2019   1
1/12/2019   1   1/12/2019   1   1/12/2019   1   1/12/2019   1
1/13/2019   1   1/13/2019   1   1/13/2019   1   N/A N/A

However my goal is to get this

Date    Count   Count   Count   Count
1/1/2019    1   1   N/A 1
1/2/2019    1   1   1   1
1/3/2019    1   1   1   1
1/4/2019    N/A 1   1   1
1/5/2019    1   1   1   1
1/6/2019    1   1   1   N/A
1/7/2019    N/A 1   1   1
1/8/2019    1   1   N/A 1
1/9/2019    1   1   1   1
1/10/2019   N/A N/A 1   1
1/11/2019   1   1   1   1
1/12/2019   1   1   1   1
1/13/2019   1   1   1   N/A
  • Does this help? https://stackoverflow.com/questions/13078751/combine-duplicated-columns-within-a-dataframe/54300430#54300430 You can use `df.groupby(level=1, axis=1).max()`. – cs95 Jan 28 '19 at 20:32
  • 2
    `pd.concat([df[['Date']].ffill().iloc[:,-1],df[['Count']]],1)` and please not using duplicate names in `DataFrame` – BENY Jan 28 '19 at 20:32

2 Answers2

0

You are using concat when you want to be using merge. I'm assuming that out has the data with some values that are going to be missing. Each round of concatenation should be:

 final = final.merge(out, on='Date', how='outer')

You also might want to use suffixes that make sense for your data for example. suffixes=['','new_data'] in the merge (E.x. final = final.merge(out, on='Date', how='outer',suffixes=['','new_data']). That will help you understand what data came from where

Polkaguy6000
  • 1,150
  • 1
  • 8
  • 15
  • This would be the correct answer if I had not already used a loop to concat each individual piece one at a time. If I was to redo my thinking this would have been the correct approach. I'd mark it but I don't have enough reputation. – Brandon Campbell Feb 07 '19 at 16:18
0

From what I can see, you want to combine your Date columns together so there are no missing values in the first Date column.

Here is the input data

df = pd.read_clipboard()
print(df)
         Date  Count     Date.1  Count.1     Date.2  Count.2     Date.3  Count.3
0    1/1/2019    1.0   1/1/2019      1.0        NaN      NaN   1/1/2019      1.0
1    1/2/2019    1.0   1/2/2019      1.0   1/2/2019      1.0   1/2/2019      1.0
2    1/3/2019    1.0   1/3/2019      1.0   1/3/2019      1.0   1/3/2019      1.0
3         NaN    NaN   1/4/2019      1.0   1/4/2019      1.0   1/4/2019      1.0
4    1/5/2019    1.0   1/5/2019      1.0   1/5/2019      1.0   1/5/2019      1.0
5    1/6/2019    1.0   1/6/2019      1.0   1/6/2019      1.0        NaN      NaN
6         NaN    NaN   1/7/2019      1.0   1/7/2019      1.0   1/7/2019      1.0
7    1/8/2019    1.0   1/8/2019      1.0        NaN      NaN   1/8/2019      1.0
8    1/9/2019    1.0   1/9/2019      1.0   1/9/2019      1.0   1/9/2019      1.0
9         NaN    NaN        NaN      NaN  1/10/2019      1.0  1/10/2019      1.0
10  1/11/2019    1.0  1/11/2019      1.0  1/11/2019      1.0  1/11/2019      1.0
11  1/12/2019    1.0  1/12/2019      1.0  1/12/2019      1.0  1/12/2019      1.0
12  1/13/2019    1.0  1/13/2019      1.0  1/13/2019      1.0        NaN      NaN

And one possible approach is to then fill the Date column NaNs with the other Date columns one at a time (in this approach, Date.3 does not appear to be needed)

df['Date'].fillna(df['Date.1'], inplace=True)
df['Date'].fillna(df['Date.2'], inplace=True)
df = df.drop(['Date.1','Date.2','Date.3'], axis=1)

Output

print(df)
         Date  Count  Count.1  Count.2  Count.3
0    1/1/2019    1.0      1.0      NaN      1.0
1    1/2/2019    1.0      1.0      1.0      1.0
2    1/3/2019    1.0      1.0      1.0      1.0
3    1/4/2019    NaN      1.0      1.0      1.0
4    1/5/2019    1.0      1.0      1.0      1.0
5    1/6/2019    1.0      1.0      1.0      NaN
6    1/7/2019    NaN      1.0      1.0      1.0
7    1/8/2019    1.0      1.0      NaN      1.0
8    1/9/2019    1.0      1.0      1.0      1.0
9   1/10/2019    NaN      NaN      1.0      1.0
10  1/11/2019    1.0      1.0      1.0      1.0
11  1/12/2019    1.0      1.0      1.0      1.0
12  1/13/2019    1.0      1.0      1.0      NaN
edesz
  • 11,756
  • 22
  • 75
  • 123
  • Sorry to everyone for the delay in marking the answer. This is probably not the best approach, but it is the most simple way to solve my problem! Without changing how the rest of my script works. – Brandon Campbell Feb 07 '19 at 16:14