Need to write scraped data into csv file (threading)

Question

Here is my code:

from download1 import download
import threading,lxml.html
def getInfo(initial,ending):
    for Number in range(initial,ending):
        Fields = ['country', 'area', 'population', 'iso', 'capital', 'continent', 'tld', 'currency_code',
                  'currency_name', 'phone',
                  'postal_code_format', 'postal_code_regex', 'languages', 'neighbours']
        url = 'http://example.webscraping.com/places/default/view/%d'%Number
        html=download(url)
        tree = lxml.html.fromstring(html)
        results=[]
        for field in Fields:
            x=tree.cssselect('table > tr#places_%s__row >td.w2p_fw' % field)[0].text_content()
            results.append(x)#should i start writing here?
downloadthreads=[]
for i in range(1,252,63): #create 4 threads
    downloadThread=threading.Thread(target=getInfo,args=(i,i+62))
    downloadthreads.append(downloadThread)
    downloadThread.start()

for threadobj in downloadthreads:
    threadobj.join() #end of each thread

print "Done"

So results will have the values of Fields ,I need to write the data with Fields as top row (only once) then the values in results into CSV file. I am not sure i can open the file in the function because threads will open the file multiple times simultaneously.

Note: i know threading isn't desirable when crawling but i am just testing

have you tried keeping the file open and then just append on it? — wishmaster, Feb 26 '19 at 17:32

Janekx · Answer 1 · 2019-02-26T17:53:41.747

1

I think you should consider using some kind of queuing or thread pools. Thread pools are really useful if you want create several threads (not 4, I think you would use more than 4 threads, but 4 threads at a time).

An example of Queue technique can be found here.

Of course, you can label the files with its threads id, for example: "results_1.txt", "results_2.txt" and so on. Then, you can merge them after all threads finished.

You can use the basic concepts of Lock, Monitor, and so on, however I am not the biggest fan of them. An example of locking can be found here

edited Feb 26 '19 at 17:53

answered Feb 26 '19 at 17:37

Janekx

631
6
21

Added some examples to my original answer. – Janekx Feb 26 '19 at 17:53
if you can edit my code to do it that would be great,these examples seems pretty hard to understand then modify to my actual code – timmy turner Feb 26 '19 at 18:13

Need to write scraped data into csv file (threading)

1 Answers1