0

I have a DataFrame where I would like to rearrange the data of a given columns.

What I have:

    text                                                KEYWORD
0   Fetch.ai will transform economies, healthcare,...   supplies chain issues
1                                                       self
2                                                       secured key partnership
3                                                       real world challenge
4                                                       autonomous economic agent
5                                                       learning traffic signal
6                                                       autonomous machine learning
7                                                       disruptive ai tech
8                                                       parking issues
9                                                       traffic reduction
10      
11      
12  The two most popular cryptocurrencies on the p...   bitcoin
13                                                      limited supplies
14                                                      ethereum
    

What I would like:

    text                                                KEYWORD
0   Fetch.ai will transform economies, healthcare,...   supplies chain issues, self, secured key partnership,  real world challenge, autonomous economic agent, learning traffic signal, autonomous machine learning, disruptive ai tech, parking issues, traffic reduction
1   The two most popular cryptocurrencies on the p...   bitcoin, limited supplies, emphasized text, ethereum

Each row containing text are displayed in the "Text" column. The "Text" column has been analyzed and keywords have been extracted from it and displayed in the "KEYWORD" column. The annoying part is that if 10 key words are extracted from the "Text" column, it will create 10 rows and add 1 keyword per row. I would like to join all of these keywords into a single row (corresponding to the good text).

Unfortunately I do not have access to the keyword extraction process which was done by a software.

rpanai
  • 12,515
  • 2
  • 42
  • 64
Zion
  • 47
  • 9
  • Please don't post your data as screenshots. Read it (using `pd.read_csv` or whatever you prefer) and post the output as code. Is the Text in rows 3, 4, etc. empty strings like "" or `NaN`? – not_speshal Nov 11 '21 at 15:06
  • @not_speshal sorry about that. They are empty strings "" – Zion Nov 11 '21 at 15:09

1 Answers1

1

Try with groupby:

#replace blank cells with NaN
df = df.replace(r"^\s*$",np.nan,regex=True)

#drop rows that are all NaN and forward fill
df = df.dropna(how="all").ffill()

#groupby and aggregate
output = df.groupby("text", as_index=False)["KEYWORD"].agg(", ".join)

>>> output
                                                text                                            KEYWORD
0  Fetch.ai will transform economies, healthcare,...  supplies chain issues, self, secured key partn...
1  The two most popular cryptocurrencies on the p...                bitcoin, limited supplies, ethereum
not_speshal
  • 22,093
  • 2
  • 15
  • 30
  • Thanks for your help. Unfortunately it does not work. With the real df, it replicates the text with the number of keywords. I will add info in the question above. – Zion Nov 11 '21 at 15:51
  • @Loremima - Yes, it replicates the text in the input DataFrame. But I thought you only need the `output`. How does it matter? – not_speshal Nov 11 '21 at 16:01