-1
x=     
[[(some text,[a]), (some text,[b]), (some text,[c]).........]]
   [[(some text,[d]), (some text,[e]), (some text,[f]).........]]
    [[(some text,[g]), (some text,[h]), (some text,[k]).........]]
    [[(some text,[i]), (some text,[x]), (some text,[y]).........]]
    [[(some text,[z]), (some text,[t]), (some text,[w]).........]]
    [[(some text,[t]), (some text,[g]), (some text,[u]).........]]

type(x)

pandas.core.series.Series

I want to create a series that only contains the values of the list within the tuple such as those[a] or [u] or [w].

How can I extract? Thank you.

UPDATE: I realized the way I phrase the question was confusing. I changed it now. It represents my problem better. Basically, I need to extract all [a] or [u] or [w]row by row. This is tokenized text data, they are words in sentences. Sorry for the confusion.

4 Answers4

1

Use:

x =[('a',['b']), ('c',['d']),('e',['f'])]
x1 = pd.Series(x)
x1.apply(lambda y: y[1])

The result:

enter image description here

Based on your comment:

temp = pd.Series(["[[('aaaa', ['bbbb']), ('cccc', ['ddddd'])]]", "[[('a',['b']), ('c',['d']), ('e',['f'])]]"])
temp.apply(lambda x: [x[1] for x in eval(x)[0]])

And, the result:

enter image description here

keramat
  • 4,328
  • 6
  • 25
  • 38
1

This Should Work:

old = [[('a', ['b']), ('c', ['d']), ('e', ['f'])]]


def main():
    for item in old:
        for sub_item in item:
            yield sub_item[1]


for x in main():
    print(x)
Shivang Kakkar
  • 421
  • 3
  • 15
0

Given Series s,

s = pd.Series(x)

we can first get take the first elements out (since each row is a nested list), explode it and use the str accessor to get the second elements in each tuple; then take the elements out from singleton lists to get the raw data. Then groupby the index, and join the tokens.

out = s.str[0].explode().str[1].str[0].groupby(level=0).apply(','.join)

Output:

0    a,b,c
1    d,e,f
2    g,h,k
3    i,x,y
4    z,t,w
5    t,g,u
  • row1: ---> `[[(aaaa, ['bbbb']), (cccc, ['ddddd'])......]]' – Oner Yigit Feb 07 '22 at 06:43
  • row2: ---> `[[(eeee, ['ffff']), (gggg, ['hhhh'])......]]' so on. I need to get every list element within tuple for each row. think of as lemmatization form of a document. a cell contains many lemmatized word. These list within the tuples are the lematized form. Your code only gives one word for each cell. – Oner Yigit Feb 07 '22 at 06:46
  • Thank you very much. I had to groupby by index and apply a lambda, it worked. :) – Oner Yigit Feb 07 '22 at 14:10
  • thank you, my question was not clear enough. I updated the question. I hope it is clear now. – Oner Yigit Feb 07 '22 at 14:27
  • 1
    That is terrific. I cannot thank you enough. It worked. :) – Oner Yigit Feb 07 '22 at 20:44
0
s = pd.Series(x)

a=s.explode().explode().str[1].explode()

b=pd.DataFrame(a)

b.groupby(b.index)['column1'].apply(lambda x: ','.join(x.astype(str)))

that code worked.