I have a dataframe in Python, listing a bunch of tweets with their id, created time and the tweet id each one has interacted with. e.g. 006
to 004
, 002
quoted 999
(999
is an old tweet, not listed here). The table is sorted based on the created time.
+----------+---------------------+-------------+--------------+-----------+
| tweet_id | created_at | reply_to_id | retweeted_id | quoted_id |
+----------+---------------------+-------------+--------------+-----------+
| 001 | 2020-02-24 15:51:17 | nan | 000 | nan |
| 002 | 2020-02-24 15:52:17 | nan | nan | nan |
| 003 | 2020-02-24 15:53:17 | nan | nan | 999 |
| 004 | 2020-02-24 15:54:17 | 001 | nan | nan |
| 005 | 2020-02-24 15:55:17 | nan | nan | nan |
| 006 | 2020-02-24 15:56:17 | nan | 004 | 003 |
| 007 | 2020-02-24 15:57:17 | nan | nan | 003 |
| 008 | 2020-02-24 15:58:17 | nan | nan | 006 |
| 009 | 2020-02-24 15:59:17 | 006 | nan | nan |
| 010 | 2020-02-24 16:00:17 | nan | 008 | nan |
+----------+---------------------+-------------+--------------+-----------+
I am trying to write a function to find the interaction history of a single tweet. e.g. 010
retweeted 008
, 008
quoted 006
, 006
retweeted 004
and also quoted 003
, 004
replied to 001
, 003
quoted 999
. I would like this function to return a list of tweets that traces back 010
's history.
In other words, I would like:
input: '010'
output: ['008', '006', '004', '003', '001', '999']
code to generate this toy dataframe:
df = pd.DataFrame(np.array(
[['001','2020-02-24 15:51:17',np.nan,'000',np.nan],
['002','2020-02-24 15:52:17',np.nan,np.nan,np.nan],
['003','2020-02-24 15:53:17',np.nan,np.nan,'999'],
['004','2020-02-24 15:54:17',np.nan,np.nan,np.nan],
['005','2020-02-24 15:55:17',np.nan,np.nan,np.nan],
['006','2020-02-24 15:56:17',np.nan,'004',np.nan],
['007','2020-02-24 15:57:17',np.nan,np.nan,'003'],
['008','2020-02-24 15:58:17',np.nan,np.nan,'006'],
['009','2020-02-24 15:59:17','006',np.nan,np.nan],
['010','2020-02-24 16:00:17',np.nan,'008',np.nan]]),
columns = ['tweet_id', 'created_at', 'reply_to_id', 'retweeted_id', 'quoted_id'])
I guess it might involve some recursive search? I could only handle when there is only one type of interaction (if tweets can only reply to each other. Not sure how to handle when 006
interacted with 2 tweets and it kind of creates two branches. Hope to get some help from you guys!