I have following dataframe (called items) for example:
| index | itemID | maintopic | subtopics |
|:----- |:------:|:---------:| ------------------:|
| 1 | 235 | FBR | [FZ, 1RH, FL] |
| 2 | 1787 | NaN | [1RH, YRS, FZ, FL] |
| 3 | 2454 | NaN | [FZX, 1RH, FZL] |
| 4 | 3165 | NaN | [YHS] |
I would like to fill the NaN-Values in the maintopic-column with the first element of the subtopics list which starts with a letter. Does someone has an idea? (Question No 1)
I tried this, but it didn´t work:
import pandas as pd
import string
alphabet = list(string.ascii_lowercase)
items['maintopic'] = items['maintopic'].apply(lambda x : items['maintopic'].fillna(items['subtopics'][x][0]) if items['subtopics'][x][0].lower().startswith(tuple(alphabet)) else x)
Advanced (Question No 2): Even better would be to have a look at all elements of the subtopics list and if there are more elements which have the first letter or even the first and the second letter in common, then I would like to take this. For example in line 2 there is FZ and FL, so i would like to fill the maintopic in this row with an F. And in line 3 there is FZX and FZL, then I would like to fill the maintopic with FZ. But if this is way too complicated then I would be also very happy with an answer to Question No 1.
I appreciate any help!