3

Very new to Python and currently stumped at the moment. What keeps happening is the first several rows work correctly, print, and write to the file just fine, but once it gets to row 11, it throws a "list index out of range" error, and does that for the remaining rows as well.

Cant figure out for the life of me how once it fails at row 11, to start over on row 12 and run the try again (should be successful on remaining rows)

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "https://coinmarketcap.com/"

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")
rows = page_soup.find("tbody").findAll("tr")

filename = "coins.csv"
f = open(filename, "w")

headers = "Rank, Name, Price, Circulating Supply\n"

f.write(headers)

for row in rows:

    try:
        rank_number = row.findAll('td')[1].text
        coin_name = row.findAll('td')[2].text
        coin_price = row.findAll('td')[3].text
        supply = row.findAll('td')[8].text

        print(rank_number + "," + coin_name + "," + coin_price.replace(",","") + "," + supply.replace(",","") + "\n")
        f.write(rank_number + "," + coin_name + "," + coin_price.replace(",","") + "," + supply.replace(",","") + "\n")

    except Exception as e:
        print(e)
        continue

f.close()

Any help would be greatly appreciated!

Thanks!

Paul P
  • 3,346
  • 2
  • 12
  • 26
Rob
  • 1,162
  • 2
  • 18
  • 43
  • Hi Rob, The rows variable value is [1, 2, 3, 4] and so on if i'm not wrong. Please correct me – Joel Company Mar 09 '21 at 03:55
  • 2
    Yikes. Can you give a small sample of the data you're working with? Are you trying to read from *and* write to a file at the same time? Obviously the index error is being caused by your subscripting, so clearly your regex isn't working as expected. – ddejohn Mar 09 '21 at 03:55
  • Rows 12-15 are missing some data (perhaps not the _same_ data that is missing from row 11) – John Gordon Mar 09 '21 at 03:56
  • Eh? Please just provide the data. – ddejohn Mar 09 '21 at 03:57
  • Also, generally speaking, you should only ever `except` specific exceptions, and you should also limit your `try` clauses to ONLY the code which can potentially fail. – ddejohn Mar 09 '21 at 03:58
  • Hey sorry all, ive just put the entire script in there, not sure why i was trying to dumb it down. Feel free to give it a run. Just trying to practice web scraping! – Rob Mar 09 '21 at 04:00
  • The page doesn't load the table data after some rows (after the advert row most prob.), as its javascript is in action. You may want to use selenium for this. – Rishabh Kumar Mar 09 '21 at 04:26
  • Rishabh, how so? If I look at rows it has the data from the entire table, so not understanding what you mean by it doesn’t load the table. – Rob Mar 09 '21 at 04:29
  • @blorgon - any thoughts with the actual script posted? I know its sloppy, just trying to get it to run the entire table and spit out data first before I go back and clean it up (learning purposes.) I assume you could be able to run the same script and get the same results? – Rob Mar 09 '21 at 15:53
  • 1
    This might be trickier than you think. It looks like https://www.coinmarketcap.com is loading only the 10 coins with the largest market cap and everything else gets only loaded when you start scrolling (AKA "lazy loading"). There are workarounds for that, see for example [this answer](https://stackoverflow.com/a/21008335/15282257) to "[How to load all entries in an infinite scroll at once to parse the HTML in python](https://stackoverflow.com/q/21006940/15282257)" – Paul P Mar 09 '21 at 20:33

0 Answers0