0

Page Source

<html>
<title>Example Web</title>
<script>

$(document).ready(function(){
    document.getElementById('output').value = "Hi There""
}
)

</script>

<body>
<div id='output'></div>
</body>
</html>

As expected, Page Dom when loaded will be:

<html>
<title>Example Web</title>
<script>

$(document).ready(function(){
    document.getElementById('output').value = "Hi There"
}
)

</script>

<body>
<div id='output'>Hi There</div>
</body>
</html>

It seems that when crawling sites using Scrapy, the response is the Page Source, rather than Page DOM. How do I make scrapy to request for Page DOM so that i can extract the 'Hi There' string in the body?

Danny
  • 13
  • 4

1 Answers1

0

You cannot make Scrapy to request for Page DOM instead of Page Source because Scrapy is not a browser. So, it cannot render Javascript. It simply builds an Element Tree from the response it gets.

Refer Google Group discussion on Scrapy supporting Javascript

1: https://groups.google.com/forum/#!topic/scrapy-users/tOVH-X7H3DI and Another StackOverflow discussion on the same topic

But, you might consider using an external ScrapyJS MiddleWare by ScrapingHub.

Community
  • 1
  • 1
Girish
  • 883
  • 8
  • 16