1

I'm trying to extract a data from this website, somehow I get nothing out of any text I'm trying to get. I'm using Xidel to scrape the data.

xidel -e '//span[@class="main-price"]/text()' 'https://www.tokopedia.com/emas/harga-hari-ini'
**** Retrieving (GET): https://www.tokopedia.com/emas/harga-hari-ini ****
**** Processing: https://www.tokopedia.com/emas/harga-hari-ini/ ****

It should at least returning Rp or some numbers. But i'm not sure why it returning null. The other website i'm trying was just fine.

CuriousNewbie
  • 319
  • 4
  • 13
  • I'm afraid you won't be able to do it with xidel. The website is dynamically loaded using javascript and you'll need a tool like Selenium to deal with it. – Jack Fleeting Oct 05 '20 at 13:18
  • Ah, Xpath got mad, I see. – CuriousNewbie Oct 05 '20 at 13:20
  • So, you are saying that this website has Xpath webscrapping protection, right? – CuriousNewbie Oct 05 '20 at 13:23
  • Selenium is on the browser, what I need is scrapping web data, not using browser, or at least can be scrapped using importxml on Google sheet. – CuriousNewbie Oct 05 '20 at 13:25
  • No, it has nothing to do with xpath. Many (most?) websites now loaded their content dynamically. There are a few ways to handle that and get the relevant html/xml before processing with xpath. But in any case, this can't be done with xidel directly. Search around for scraping dynamically loaded pages - there's a lot to learn... – Jack Fleeting Oct 05 '20 at 13:26
  • If you are 100% sure that xidel can't handle dynamic web then please add the answer, so I can give you points. – CuriousNewbie Oct 05 '20 at 13:30

1 Answers1

1

The target website is one of those sites in which the content is dynamically loaded using javascript. A simple way to confirm it is to go to the page, view it, then disable javascript in your browser and reload the page. In the case of this particular page, you'll see it's entirely blank.

There are a couple of ways to handle it, but unless I'm sorely mistaken, xidel isn't one of them. Start by taking a look at this.

Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45