Why is converting words into singular from plural in a for loop taking so long (Python 3)?

Question

This is my code to read text from a CSV file and convert all the words in a column of it into singular form from plural:

import pandas as pd
from textblob import TextBlob as tb
data = pd.read_csv(r'path\to\data.csv')

for i in range(len(data)):
    blob = tb(data['word'][i])
    singular = blob.words.singularize()  # This makes singular a list
    data['word'][i] = ''.join(singular)  # Converting the list back to a string

But this code has been running for minutes now (and possibly keep running for hours, if I don't stop it?)! Why is that? When I checked for few words individually, the conversion happens instantly - doesn't take any time at all. There are only 1060 rows (words to convert) in the file.

EDIT: It finished running in about 10-12 minutes.

Here's some sample data:

Input:

word
development
investment
funds
slow
company
commit
pay
claim
finances
customers
claimed
insurance
comment
rapid
bureaucratic
affairs
reports
policyholders
detailed

Output:

word
development
investment
fund
slow
company
commit
pay
claim
finance
customer
claimed
insurance
comment
rapid
bureaucratic
affair
report
policyholder
detailed

You are iterating over a data frame. Performance will be terrible.. — rafaelc, Jul 10 '18 at 02:10
@RafaelC Oh! I didn't know that! Why is that so? And what should I use to store the file if not a dataframe? I find multidimensional lists a pain in the a** to work with in Python - its not as intuitive as, say, in C. — Kristada673, Jul 10 '18 at 02:12
Because you're constantly shuffling your data across the Python/C threshold, which is expensive. Also, `.words` is a pretty complex operation; `.singularize` might be the fastest thing in you have in your code. — Amadan, Jul 10 '18 at 02:15
Can you provide some sample input/output ? Also, you are doing `data['word'][I]` and probably getting a warning that you're changing a copy and not your df ? — rafaelc, Jul 10 '18 at 02:21
@RafaelC Yes, it did throw that warning! I edited the question to include a portion of the input and output. — Kristada673, Jul 10 '18 at 02:28

score 1 · Answer 1 · answered Jul 12 '18 at 11:30

What about this?

In [1]: import pandas as pd

In [2]: from textblob import Word

In [3]: s = pd.read_csv('text', squeeze=True, memory_map=True)

In [4]: type(s)
Out[4]: pandas.core.series.Series

In [5]: s = s.apply(lambda w: Word(w).singularize())

In [6]: s
Out[6]:
0      development
1       investment
2             fund
3             slow
4          company
5           commit
6              pay
7            claim
8          finance
9         customer
10         claimed
11       insurance
12         comment
13           rapid
14    bureaucratic
15          affair
16          report
17    policyholder
18        detailed
Name: word, dtype: object

I use squeeze here to let read_csv return a Series instead of a DataFrame because the word file only has one column. In addition memory_map can be used if the word file is large.

Can you test the performance with your data?

Why is converting words into singular from plural in a for loop taking so long (Python 3)?

1 Answers1