2

What I am trying to do:

I am trying to use 'Open' in python and this is the script I am trying to execute. I am trying to give "restaurant name" as input and a file gets saved (reviews.txt).

Script: (in short, the script goes to a page and scrapes the reviews)

from bs4 import BeautifulSoup
from urllib import urlopen
queries = 0
while queries <201:
    stringQ = str(queries)
    page = urlopen('http://www.yelp.com/biz/madison-square-park-new-york?start=' + stringQ)

    soup = BeautifulSoup(page)
    reviews = soup.findAll('p', attrs={'itemprop':'description'})
    authors = soup.findAll('span', attrs={'itemprop':'author'})

    flag = True
    indexOf = 1
    for review in reviews:
        dirtyEntry = str(review)
        while dirtyEntry.index('<') != -1:
            indexOf = dirtyEntry.index('<')
            endOf = dirtyEntry.index('>')
            if flag:
                dirtyEntry = dirtyEntry[endOf+1:]
                flag = False
            else:
                if(endOf+1 == len(dirtyEntry)):
                    cleanEntry = dirtyEntry[0:indexOf]
                    break
                else:
                    dirtyEntry = dirtyEntry[0:indexOf]+dirtyEntry[endOf+1:]
        f=open("reviews.txt", "a")
        f.write(cleanEntry)
        f.write("\n")
        f.close

    queries = queries + 40

Problem: It's using append mode 'a' and according to documentation, 'w' is the write mode where it overwrites. When i change it to 'w' nothing happens.

f=open("reviews.txt", "w") #does not work!

Actual Question: EDIT: Let me clear the confusion.

I just want ONE review.txt file with all the reviews. Everytime I run the script, I want the script to overwrite the existing review.txt with new reviews according to my input.

Thank you,

Dark Knight
  • 503
  • 2
  • 12
  • 25
  • What goes wrong when you use 'w' mode? – Aleksei Zyrianov Apr 16 '14 at 09:11
  • 2
    Not sure what you are asking. You write your file in a nested loop, so it will always only hold the last review of the last query (using `w`). Is that what you want? Or do you want to put all the reviews in one file? Or create one file per review? – tobias_k Apr 16 '14 at 09:11
  • When I use "a", i get multiple reviews from that page and the filesize would be approx ~130kb and when I use 'w' mode, I just get one review and my filesize is ~1kb – Dark Knight Apr 16 '14 at 09:12
  • @tobias_k i want to get all the reviews and everytime i run the script, I don't want to append but create a newfile all together. – Dark Knight Apr 16 '14 at 09:12
  • 2
    That is exactly what "w" does, it rewrites the whole file every time, leaving you with the latest written instance. – bosnjak Apr 16 '14 at 09:13
  • @Lawrence yes, but when I use 'w' i just get one review instead of getting multiple reviews like I get with 'a' (even though I run the script once) Sorry, i'm a newbie with python excuse me – Dark Knight Apr 16 '14 at 09:14
  • Please check my answer, it creates exactly one `review.txt` file with all the reviews, doesn't it? It also does overwrite the file created by previous script run. – Aleksei Zyrianov Apr 16 '14 at 09:33

2 Answers2

3

If I understand properly what behavior you want, then this should be the right code:

with open("reviews.txt", "w") as f:
    for review in reviews:
        dirtyEntry = str(review)
        while dirtyEntry.index('<') != -1:
            indexOf = dirtyEntry.index('<')
            endOf = dirtyEntry.index('>')
            if flag:
                dirtyEntry = dirtyEntry[endOf+1:]
                flag = False
            else:
                if(endOf+1 == len(dirtyEntry)):
                    cleanEntry = dirtyEntry[0:indexOf]
                    break
                else:
                    dirtyEntry = dirtyEntry[0:indexOf]+dirtyEntry[endOf+1:]
        f.write(cleanEntry)
        f.write("\n")

This will open the file for writing only once and will write all the entries to it. Otherwise, if it's nested in for loop, the file is opened for each review and thus overwritten by the next review.

with statement ensures that when the program quits the block, the file will be closed. It also makes code easier to read.


I'd also suggest to avoid using brackets in if statement, so instead of

if(endOf+1 == len(dirtyEntry)):

it's better to use just

if endOf + 1 == len(dirtyEntry):
Aleksei Zyrianov
  • 2,294
  • 1
  • 24
  • 32
1

If you want to write every record to a different new file, you must name it differently, because this way you are always overwritting your old data with new data, and you are left only with the latest record.

You could increment your filename like this:

# at the beginning, above the loop:

i=1

f=open("reviews_{0}.txt".format(i), "a")
        f.write(cleanEntry)
        f.write("\n")
        f.close
i+=1

UPDATE

According to your recent update, I see that this is not what you want. To achieve what you want, you just need to move f=open("reviews.txt", "w") and f.close() outside of the for loop. That way, you won't be opening it multiple times inside a loop, every time overwriting your previous entries:

f=open("reviews.txt", "w")
for review in reviews:
        # ... other code here ... #

        f.write(cleanEntry)
        f.write("\n")
f.close()

But, I encourage you to use with open("reviews.txt", "w") as described in Alexey's answer.

Community
  • 1
  • 1
bosnjak
  • 8,424
  • 2
  • 21
  • 47
  • Thanks for your reply. What if I just want to overwrite the file everytime instead of writing everyrecord to a new file? I just want all my reviews in reviews.txt and must be replaced(overwritten) according to my input. (Everytime I give new input, I get new text file) – Dark Knight Apr 16 '14 at 09:17
  • I don't understand you. At the same time, you want new records as a new file, and also you want them all in one file. Can you describe in more detail? Maybe you want to **update** the data that is already in the file? – bosnjak Apr 16 '14 at 09:19
  • Sure, Sorry for the confusion. I just want **ONE review.txt** with all the reviews in it. Everytime I fire a new query, the file must be overwritten instead of appending. (This is because, I have a different query everytime and appending wont make sense). Sorry for my english – Dark Knight Apr 16 '14 at 09:22