I would like to download html code source from one web page. Can I do this with HTTP Client ? and in this case, have I to Generate Rows before ? I am using Pentaho Data Integration 6, thanks.
Asked
Active
Viewed 403 times
1 Answers
0
To download the HTML from a web page you should use HTTP Client
. From the documentation:
The HTTP client step doesn't do anything
Q: The HTTP client step doesn't do anything, how do I make it work?
A: The HTTP client step needs to be triggered. Use a Row generator step generating e.g. 1 empty row and link that with a hop to the HTTP client step.
So you need to have rows first. For instance use Generate Rows
or Data Grid
with the urls you want to fetch.
If you just add the url of the web page you want the HTML for in HTTP Client
the HTML will be put in result