-1

I would like to download html code source from one web page. Can I do this with HTTP Client ? and in this case, have I to Generate Rows before ? I am using Pentaho Data Integration 6, thanks.

bolav
  • 6,938
  • 2
  • 18
  • 42

1 Answers1

0

To download the HTML from a web page you should use HTTP Client. From the documentation:

The HTTP client step doesn't do anything

Q: The HTTP client step doesn't do anything, how do I make it work?

A: The HTTP client step needs to be triggered. Use a Row generator step generating e.g. 1 empty row and link that with a hop to the HTTP client step.

So you need to have rows first. For instance use Generate Rows or Data Grid with the urls you want to fetch.

image of transformation

If you just add the url of the web page you want the HTML for in HTTP Client the HTML will be put in result

Community
  • 1
  • 1
bolav
  • 6,938
  • 2
  • 18
  • 42