-1

There is a site that I want to scrape: https://tse.ir/MarketWatch.html

I know that I have to use:

file_get_contents("https://examplesite.html")

to get the html part of site, but how can I find a specific part of site for example like this part in text file:

<td title="دالبر"title="something" class="txtclass="someclass">Tag namad">دالبر<Name</td>

When I open the text file, I never see this part and I think it is because in website there is JavaScript file. How can I get all information of website that include every part I want?

TheFaultInOurStars
  • 3,464
  • 1
  • 8
  • 29

2 Answers2

2

Content loaded by ajax request via javascript. This means you can't get this data simply grabbing the page contents.

There are two ways of collecting data you need:

  1. Use solution based on selenium webdriver to load this page by real browser (which will execute JS), and collect data from rendered DOM.
  2. Research what kind of requests are sent by website to get this data. You could use network activity tab in browser dev tools. Here is example for chrome. For other browsers is the same or similar. Than you send the same request and pase response regarding to your needs.

In your specific case, probably, you could use this url: https://tseest.ir/json/MarketWatch/data_211111.json to accees the json object with data you need.

TheFaultInOurStars
  • 3,464
  • 1
  • 8
  • 29
nklen
  • 46
  • 5
1

YOU have three variants for scraping the data:

  1. There's an export to excel file: https://tse.ir/json/MarketWatch/MarketWatch_1.xls?1582392259131. Parse through it, just remember that this number is Unix Timestamp, where first 10 numbers are the month/day/year/hours/minutes

  2. Also there's probably a refresh function(s) for the market data somewhere in all .js files loaded in the page. Just find it and see if you can connect directly to the source (usually a .json)

  3. Download the page at your specific interval and scrape each table row using PHP's DOMXPath::query

1000Gbps
  • 1,455
  • 1
  • 29
  • 34