3

i'm new to Python so go easy!

I have a Dataframe like the following. I would like to forward fill the NaN's in the shares_owned column but stop when the string in df['ticker'] changes. And only start when another number appears in shares_owned again.

date ticker shares_owned price
01/01/2020 EZY NaN £2
02/01/2020 EZY 10 £2.1
03/01/2020 EZY NaN £2.12
04/01/2020 EZY NaN £12.5
01/01/2020 FTSE NaN £11
02/01/2020 FTSE NaN £12
03/01/2020 FTSE 2 £12.5
04/01/2020 FTSE NaN £12.5

For example, the output table would look like this:

date ticker shares_owned price
01/01/2020 EZY NaN £2
02/01/2020 EZY 10 £2.1
03/01/2020 EZY 10 £2.12
04/01/2020 EZY 10 £12.5
01/01/2020 FTSE NaN £11
02/01/2020 FTSE NaN £12
03/01/2020 FTSE 2 £12.5
04/01/2020 FTSE 2 £12.5

So far I have been trying to use the .fillna(method='ffill') to no avail.

  • Hi @Barnaby, so since you're new, you should read up on providing a [minimum reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) meaning you should provide enough here to show that you've worked the problem, a starting point for your solution, what you expected, and what you got instead. Errors are gold. – mayosten Feb 19 '21 at 16:38
  • You can group the dataframe on `ticker` then `ffill` the `shares_column` per group: `df['shares_owned'] = df.groupby('ticker')['shares_owned'].ffill()`. – Shubham Sharma Feb 19 '21 at 16:40
  • And this could be helpful... looks like they're trying to do the same thing: https://stackoverflow.com/questions/58181262/groupby-with-ffill-deletes-group-and-does-not-put-group-in-index – mayosten Feb 19 '21 at 16:42

1 Answers1

3
  • you note groups, hence groupby() does the grouping
  • within the group fillna(method="fill") within a transform()
df = pd.read_csv(io.StringIO("""date    ticker  shares_owned    price
01/01/2020  EZY NaN £2
02/01/2020  EZY 10  £2.1
03/01/2020  EZY NaN £2.12
04/01/2020  EZY NaN £12.5
01/01/2020  FTSE    NaN £11
02/01/2020  FTSE    NaN £12
03/01/2020  FTSE    2   £12.5
04/01/2020  FTSE    NaN £12.5"""), sep="\t")

df["shares_owned"] = df.groupby("ticker")["shares_owned"].transform(lambda s: s.fillna(method="ffill"))

output

date ticker shares_owned price
0 01/01/2020 EZY nan £2
1 02/01/2020 EZY 10 £2.1
2 03/01/2020 EZY 10 £2.12
3 04/01/2020 EZY 10 £12.5
4 01/01/2020 FTSE nan £11
5 02/01/2020 FTSE nan £12
6 03/01/2020 FTSE 2 £12.5
7 04/01/2020 FTSE 2 £12.5
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30