extracting a list within a list in a tuple which happens to be in a pd.series

Question

x=     
[[(some text,[a]), (some text,[b]), (some text,[c]).........]]
   [[(some text,[d]), (some text,[e]), (some text,[f]).........]]
    [[(some text,[g]), (some text,[h]), (some text,[k]).........]]
    [[(some text,[i]), (some text,[x]), (some text,[y]).........]]
    [[(some text,[z]), (some text,[t]), (some text,[w]).........]]
    [[(some text,[t]), (some text,[g]), (some text,[u]).........]]

type(x)

pandas.core.series.Series

I want to create a series that only contains the values of the list within the tuple such as those[a] or [u] or [w].

How can I extract? Thank you.

UPDATE: I realized the way I phrase the question was confusing. I changed it now. It represents my problem better. Basically, I need to extract all [a] or [u] or [w]row by row. This is tokenized text data, they are words in sentences. Sorry for the confusion.

Try `pd.Series([i for _, i in x])`? – Chris Feb 07 '22 at 05:48 — Chris, Feb 07 '22 at 05:48

keramat · Answer 1 · 2022-02-07T07:28:04.020

1

Use:

x =[('a',['b']), ('c',['d']),('e',['f'])]
x1 = pd.Series(x)
x1.apply(lambda y: y[1])

The result:

Based on your comment:

temp = pd.Series(["[[('aaaa', ['bbbb']), ('cccc', ['ddddd'])]]", "[[('a',['b']), ('c',['d']), ('e',['f'])]]"])
temp.apply(lambda x: [x[1] for x in eval(x)[0]])

And, the result:

edited Feb 07 '22 at 07:28

answered Feb 07 '22 at 05:46

keramat

4,328
6
25
38

Thank you for the reply. I realized the variable is in another list too. I edited. Can you check it again? Sorry. – Oner Yigit Feb 07 '22 at 06:06
Check the answer now. – keramat Feb 07 '22 at 07:38
Thank you, my question was not clear enough. I updated the question. I hope it is clear now. – Oner Yigit Feb 07 '22 at 14:27

Shivang Kakkar · Answer 2 · 2022-02-07T07:27:45.050

1

This Should Work:

old = [[('a', ['b']), ('c', ['d']), ('e', ['f'])]]


def main():
    for item in old:
        for sub_item in item:
            yield sub_item[1]


for x in main():
    print(x)

edited Feb 07 '22 at 07:27

answered Feb 07 '22 at 06:14

Shivang Kakkar

421
3
15

It was very close. :) little background: this is text data. each row contains this --> old = [[('a', ['b']), ('c', ['d']),('e', ['f'])]] your suggestion only gives one value from each row. – Oner Yigit Feb 07 '22 at 06:39
@OnerYigit Please check the edit which uses vanilla python – Shivang Kakkar Feb 07 '22 at 07:28
thank you, my question was not clear enough. I updated the question. I hope it is clear now. – Oner Yigit Feb 07 '22 at 14:26

score 0 · Accepted Answer · 2022-02-07T21:13:15.340

0

Given Series s,

s = pd.Series(x)

we can first get take the first elements out (since each row is a nested list), explode it and use the str accessor to get the second elements in each tuple; then take the elements out from singleton lists to get the raw data. Then groupby the index, and join the tokens.

out = s.str[0].explode().str[1].str[0].groupby(level=0).apply(','.join)

Output:

0    a,b,c
1    d,e,f
2    g,h,k
3    i,x,y
4    z,t,w
5    t,g,u

edited Feb 07 '22 at 21:13

answered Feb 07 '22 at 06:26

row1: ---> `[[(aaaa, ['bbbb']), (cccc, ['ddddd'])......]]' – Oner Yigit Feb 07 '22 at 06:43
row2: ---> `[[(eeee, ['ffff']), (gggg, ['hhhh'])......]]' so on. I need to get every list element within tuple for each row. think of as lemmatization form of a document. a cell contains many lemmatized word. These list within the tuples are the lematized form. Your code only gives one word for each cell. – Oner Yigit Feb 07 '22 at 06:46
Thank you very much. I had to groupby by index and apply a lambda, it worked. :) – Oner Yigit Feb 07 '22 at 14:10
thank you, my question was not clear enough. I updated the question. I hope it is clear now. – Oner Yigit Feb 07 '22 at 14:27
1

That is terrific. I cannot thank you enough. It worked. :) – Oner Yigit Feb 07 '22 at 20:44

score 0 · Answer 4 · answered Feb 07 '22 at 16:28

0

s = pd.Series(x)

a=s.explode().explode().str[1].explode()

b=pd.DataFrame(a)

b.groupby(b.index)['column1'].apply(lambda x: ','.join(x.astype(str)))

that code worked.

answered Feb 07 '22 at 16:28

Oner Yigit

1
2

extracting a list within a list in a tuple which happens to be in a pd.series

4 Answers4