1

So this is more of a question than a problem i have.

I wanted to .append() some pandas series' together and without thinking i just did total=series1+series2+series3.

The length of each series is 2199902,171175, and 178989 respectively and sum(pd.isnull(i) for i in total) = 2214596

P.S all 3 series' had no null values to start with, is it something to do with merging 3 series' of different lengths which creates missing values? Even if that is the case why aer 2,214,596 null values created?

cs95
  • 379,657
  • 97
  • 704
  • 746
mystery man
  • 417
  • 2
  • 5
  • 15

2 Answers2

4

If you're trying to append series, you're doing it wrong. The + operator calls .add which ends up adding each corresponding elements in the series. If your series are not aligned, this results in a lot of NaNs being generated.

If you're looking to append these together into one long series, you can use pd.concat:

pd.concat([s1, s2, s3], ignore_index=True)
0     1
1     2
2     4
3     5
4     4
5     7
6    40
7    70
dtype: int64

If you're going to use append, you can do this in a loop, or with reduce:

s = s1

for i in [s2, s3]:
    s = s.append(i, ignore_index=True)

s
0     1
1     2
2     4
3     5
4     4
5     7
6    40
7    70
dtype: int64
from functools import reduce

reduce(lambda x, y: x.append(y, ignore_index=True), [s1, s2, s3])

0     1
1     2
2     4
3     5
4     4
5     7
6    40
7    70
dtype: int64

Both solutions generalise to multiple series quite nicely, but they are slow in comparison to pd.concat or np.concatenate.

cs95
  • 379,657
  • 97
  • 704
  • 746
2

If sum Series all index are align. So if some index exist in series1 and not in another Series, get NaNs.

So need add with fill_value=0:

s = s1.add(s2, fill_value=0).add(s3, fill_value=0)

Sample:

s1 = pd.Series([1,2,4,5])
s2 = pd.Series([4,7], index=[10,11])
s3 = pd.Series([40,70], index=[2,4])

s = s1.add(s2, fill_value=0).add(s3, fill_value=0)
print (s)
0      1.0
1      2.0
2     44.0
3      5.0
4     70.0
10     4.0
11     7.0
dtype: float64

But if need append them together (or use concat as mentioned cᴏʟᴅsᴘᴇᴇᴅ):

s = s1.append(s2, ignore_index=True).append(s3, ignore_index=True)
print (s)
0     1
1     2
2     4
3     5
4     4
5     7
6    40
7    70
dtype: int64

And numpy alternative:

#alternative, thanks cᴏʟᴅsᴘᴇᴇᴅ - np.concatenate([s1, s2, s3])
s = pd.Series(np.concatenate([s1.values, s2.values, s3.values]))

print (s)
0     1
1     2
2     4
3     5
4     4
5     7
6    40
7    70
dtype: int64

If want use + for append then need convert Series to lists:

s = pd.Series(s1.tolist() + s2.tolist() + s3.tolist())
print (s)
0     1
1     2
2     4
3     5
4     4
5     7
6    40
7    70
dtype: int64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252