0

I have tried many times, but it does not work:

import requests
from lxml import html, etree
from selenium import webdriver
import time, json

#how many page do you want to scan
page_numnotint = input("how many page do you want to scan")
page_num = int(page_numnotint)
file_name = 'jd_goods_data.json'


url = 'https://list.jd.com/list.html?cat=1713,3264,3414&page=1&delivery=1&sort=sort_totalsales15_desc&trans=1&JL=4_10_0#J_main'
driver = webdriver.Chrome()
driver.get(url)
base_html = driver.page_source
selctor = etree.HTML(base_html)
date_info = []
name_data, price_data = [], []
jd_goods_data = {}
for q in range(page_num):
    i = int(1)
    while True:
        name_string = '//*[@id="plist"]/ul/li[%d]/div/div[3]/a/em/text()' %(i)
        price_string = '//*[@id="plist"]/ul/li[%d]/div/div[2]/strong[1]/i/text()' %(i)
        if i == 60:
            break
        else:
            i += 1
        name = selctor.xpath(name_string)[0]
        name_data.append(name)
        price = selctor.xpath(price_string)[0]
        price_data.append(price)
        jd_goods_data[name] = price

        print(name_data)
        with open(file_name, 'w') as f:
            json.dump(jd_goods_data, f)
    time.sleep(2)
    driver.find_element_by_xpath('//*[@id="J_bottomPage"]/span[1]/a[10]').click()
    time.sleep(2)

    # for k, v in jd_goods_data.items():
    #     print(k,v)

I am trying to download some details, but it doesn't work. If you type 2 to scan, it only downloads one page details, but twice!

R. Gadeev
  • 188
  • 3
  • 12
周义翔
  • 1
  • 2
  • 1
    Where is your variable `q` used (the one assigned in `for q in range(page_num):` ? I guess you set it to `2` (via your `input` function), but if you want to load details from the second page, you will have to make your script be function of this `q`. – keepAlive May 07 '17 at 12:50
  • i used a variable 'q' that make the range work and then make the loop work – 周义翔 May 07 '17 at 12:59

1 Answers1

0

Ok, you define q but you do not actually use it as such. In this case, the convention is to name this unused variable as _. I mean, instead of doing

for q in range(page_num):

you should do

for _ in range(page_num):

Thus, other programers will directly know that you do not use q, and only want your operation to be repeated.

Which means that (for some reasons) the line driver.find_element_by_xpath('//*[@id="J_bottomPage"]/span[1]/a[10]').click() does not execute correctly. For sure there is a way to make it work. But in your case, I heuristically see that your url contains a parameter whose name is page. I recommend you to use it instead. Which thus leads to actually use the variable q as such., as follows:

import requests
from lxml import html,etree
from selenium import webdriver
import time, json

#how many page do you want to scan
page_numnotint = input("how many page do you want to scan")
page_num = int(page_numnotint)
file_name = 'jd_goods_data.json'

driver = webdriver.Chrome()
date_info = []
name_data, price_data = [], []
jd_goods_data = {}
for q in range(page_num):
    url = 'https://list.jd.com/list.html?cat=1713,3264,3414&page={page}&delivery=1&sort=sort_totalsales15_desc&trans=1&JL=4_10_0#J_main'.format(page=q)
    driver.get(url)
    base_html = driver.page_source
    selctor = etree.HTML(base_html)
    i = 1
    while True:
        name_string = '//*[@id="plist"]/ul/li[%d]/div/div[3]/a/em/text()' %(i)
        price_string = '//*[@id="plist"]/ul/li[%d]/div/div[2]/strong[1]/i/text()' %(i)
        if i == 60:
            break
        else:
            i += 1
        name = selctor.xpath(name_string)[0]
        name_data.append(name)
        price = selctor.xpath(price_string)[0]
        price_data.append(price)
        jd_goods_data[name] = price

        print(name_data)

with open(file_name, 'w') as f:
    json.dump(jd_goods_data, f)

driver.quit()
keepAlive
  • 6,369
  • 5
  • 24
  • 39
  • you are right ,i might be in narrow, i only know use click to go to the next page,it was the best lesson for me ,anyway(for some reason)this website has a little problem that cause the program no not work correctly :page=1 and page =0 is the same website in JD.com.so i must add q += 1,thank you very much for you to solve my program,that make me crazy in two days!!!! thank you very much,i m Chinese and i don't know how to deeply to say :thank you very much – 周义翔 May 07 '17 at 13:52
  • thank you very much , i m new here can you tell me does it work?(choose your answer is the best one) – 周义翔 May 07 '17 at 14:08
  • @周义翔. Actually, to have a precise idea of how Stack Overflow (SO) works, I recommend you to read [Welcome to Stack Overflow](https://stackoverflow.com/tour). Also, if you have any other question that are related to the functionnning of SO, I recommand you to explore the site [Meta Stack Exchange](https://meta.stackexchange.com/tour). Furthermore, note that SO is a Q&A for professional and enthusiast programmers, [but there are many other equivalent versions of this site](https://stackexchange.com/sites?view=grid#), for biology, economy, etc... – keepAlive May 07 '17 at 14:18