0

I am reading a dataframe column having comments. The data is taking forever to read using the code below. Is there a way to make this faster ?

for val in df.Description:
    val = str(val)
    tokens = val.split()  
    for i in range(len(tokens)):
        tokens[i] = tokens[i].lower()  
        for words in tokens:
            comment = comment + words + ''

df.Description is a column of comments (basically email text)

misguided
  • 3,699
  • 21
  • 54
  • 96
  • 1
    Can you make it more clear, such as give some examples to show what the `val` in `df.Description` looks like, and what the `comment` is. – Hu Xixi Sep 04 '19 at 01:38
  • 2
    `comment = comment + words + ''` is a very inefficient way to build up a string. Build up a list of strings then `''.join()` it at the end – juanpa.arrivillaga Sep 04 '19 at 02:29

1 Answers1

2

Update: Assuming df.Description is your column, this might be helpful:

arr_string = df.Description.astype(str).values.tolist()
for val in arr_string:
    for words in val:
            comment = ''.join([comment, words])

Take a look at this.

Scott
  • 4,974
  • 6
  • 35
  • 62