15

I have a dataframe with one of its column having a list at each index. I want to concatenate these lists into one list. I am using

ids = df.loc[0:index, 'User IDs'].values.tolist()

However, this results in ['[1,2,3,4......]'] which is a string. Somehow each value in my list column is type str. I have tried converting using list(), literal_eval() but it does not work. The list() converts each element within a list into a string e.g. from [12,13,14...] to ['['1'',','2',','1',',','3'......]'].

How to concatenate pandas column with list values into one list? Kindly help out, I am banging my head on it for several hours.

SarwatFatimaM
  • 315
  • 1
  • 3
  • 13

2 Answers2

33

consider the dataframe df

df = pd.DataFrame(dict(col1=[[1, 2, 3]] * 2))
print(df)

        col1
0  [1, 2, 3]
1  [1, 2, 3]

pandas simplest answer

df.col1.sum()

[1, 2, 3, 1, 2, 3]

numpy.concatenate

np.concatenate(df.col1)

array([1, 2, 3, 1, 2, 3])

chain

from itertools import chain

list(chain(*df.col1))

[1, 2, 3, 1, 2, 3]

response to comments:
I think your columns are strings

from ast import literal_eval

df.col1 = df.col1.apply(literal_eval)

If instead your column is string values that look like lists

df = pd.DataFrame(dict(col1=['[1, 2, 3]'] * 2))
print(df)  # will look the same

        col1
0  [1, 2, 3]
1  [1, 2, 3]

However pd.Series.sum does not work the same.

df.col1.sum()

'[1, 2, 3][1, 2, 3]'

We need to evaluate the strings as if they are literals and then sum

df.col1.apply(literal_eval).sum()

[1, 2, 3, 1, 2, 3]
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thank you. The first method is simplest but how do I use it if I want to concatenate a few first list instead of all the lists in the column? I had already tried out np. concatenate() but I got the same thing as ['[1,2,3...]']. – SarwatFatimaM Mar 20 '17 at 17:44
  • @SarwatFatimaM you can do several things. try `df.col1.iloc[:3].sum()` to combine just the first 3. – piRSquared Mar 20 '17 at 17:47
  • Yes, I have tried this out: `ids = pd.DataFrame(GCM.loc[0:2, 'User IDs'])` `ids = uninstall_ids['User IDs'].sum()` But the problem is its type is str which creates problem further in the program. If I use list() or tolist() then it converts [12,13,14,15] to something like ['['1',',','2',',','1',',','3'...]']. I need this to be a list as i am using counter() from collections to compare to two lists later on in the program. I have also tried out `df.col1.iloc[:3].sum()` but same issue. – SarwatFatimaM Mar 20 '17 at 18:03
  • I am not sure how my pandas column converted to type str because I did not do it one my own. I am loading the data from excel sheet though. – SarwatFatimaM Mar 20 '17 at 18:06
  • @SarwatFatimaM ahh, I'm pretty sure those are strings and not lists. I'll update post with a possible solution. Hopefully it helps. – piRSquared Mar 20 '17 at 18:32
  • Thankyou. I am looking forward to it. – SarwatFatimaM Mar 20 '17 at 18:38
  • @SarwatFatimaM I've updated my post further. If this doesn't solve your problem then the issue is with you not providing sufficient information. Consider providing a sample csv or code that produces your initial dataframe. Read [this post](http://stackoverflow.com/help/mcve) for further guidance. – piRSquared Mar 20 '17 at 18:53
2

If you want to flatten the list this is pythonic way to do it:

import pandas as pd

df = pd.DataFrame({'A': [[1,2,3], [4,5,6]]})

a = df['A'].tolist()
a = [i for j in a for i in j]
print a
zipa
  • 27,316
  • 6
  • 40
  • 58
  • But this results in something like this ['[', '1', '2', ',', ' ', '4', '2', ',', ' ', '4', '9', '2', ',', ' ', '1'.........]']. – SarwatFatimaM Mar 20 '17 at 17:50