1

I'm scraping some data from a webpage where at the end of the url has the id of the product, it appears to rewrite the data at every single row, like its not appending the data from the next line, I don't know exactly what's going on, if my first for is wrong, or the indentation, I tried before without the dictionary, and it was appending but at the same line and I transpose it but didn't work as I wanted so I made it this way and now it doesn't append the next lines, help please

data_cols = []
cols = {'pro_header': [],
        'pro_id': [],
        .
        .
        .
        'pro_uns5': []
        }
#the id for each product
fileID = open('idProductsList.txt', 'r')
proIDS = fileID.read().split()
for proID in proIDS:
    url = 'https:/website.com/mall/es/mx/Catalog/Product/' + proID
    html = urllib2.urlopen(url).read()
    soup = bs.BeautifulSoup(html , 'lxml')
    table = soup.find("table",{"class": "ProductDetailsTable"})
    rows = table.find_all('tr')
    for row in rows:
        labels.append(str(row.find_all('td')[0].text))
        try:
            data.append(str(row.find_all('td')[1].text))
        except IndexError:
            data.append('')

    cols['pro_header'].append(data[0])
    cols['pro_id'].append(data[1])
    .
    .
    .
    cols['pro_uns5'].append(data[43])
    df = pd.DataFrame(cols)
    df.set_index
    #df.reindex()
    df.to_csv('sample1.csv')

The actual output is:

pro_id  pro_priceCostumer   pro_priceData
1FK7011-5AK24-1AA3  " Mostrar precios
"   PM300:Producto activo
1FK7011-5AK24-1AA3  " Mostrar precios
"   PM300:Producto activo
1FK7011-5AK24-1AA3  " Mostrar precios
"   PM300:Producto activo

Should be something like this (This is just a small representation of the data):

pro_id  pro_priceCostumer   pro_priceData
1FK7011-5AK24-1AA3  " Mostrar precios
"   PM300:Producto activo
1FK7011-5AK24-1JA3  " Mostrar precios
"   PM300:Producto activo
1FK7022-5AK21-1UA0  " Mostrar precios
"   PM300:Producto activo
Ivan Barba
  • 11
  • 1
  • can you share the url? I'm thinking it might be quicker/easier to access the data API...or since you are grabing `` tags, just go with pandas to pull it.
    – chitown88 Nov 06 '19 at 16:53
  • I'm not sure what the difference is here between your actual output and the desired output. They look the same – chitown88 Nov 06 '19 at 16:53

1 Answers1

0

I guess labels are working as a variable. to append this you need to use a list. add labels=list() at the top of your code as global variable. The same thing should be done for data too.

Nazim Kerimbekov
  • 4,712
  • 8
  • 34
  • 58
Araf
  • 263
  • 1
  • 5
  • 19