4

To add to the title: I now have a working workflow consinsting of two steps.

1) I extract the HTML Search Result pages for every keyword given in a input.txt file. - e.g.:

SAP; 
Business Intelligence;

Talend saved those results and writes them as HTML to keywords_SAP.txt and keywords_Business Intelligence.txt. Attached is an image of the talend job.

Talend Workflow

2) I use Java Code to import these files (one by one) - Parse the Data out of the DOM Structure using the JSoup Library. Straigt away, the data gets written into a MySQL Database.

Here is my problem: It all works fine for now, but the requirement is to completely automate the process in the future, so it can run on a server periodically.

Therefore I thought to include my Java Code in Talend - which got my stuck, because I wasn't able to import the mysql connector and the jsoup.jar.

Where I need your help is either to advise me how to connect to my existing Talend workflow - or you are maybe thinking of an easier solution, which I'm just not thinking of right now.

I have to add, I'm quite new to coding, and it was a big leap to come this far with parsing and writing into a DB. With your help throughout the process, I got more comfortable though. I hope you can help me solve this problem. Thank you in advance for your time spent.

ZedBrannigan
  • 601
  • 1
  • 8
  • 18
  • 1
    are you saying that you need help on how to import external jars like "import the mysql connector and the jsoup.jar." " this can be done via tLoadlibrary method and putting external jar in ...talendinstalldire/lib/java folder? – garpitmzn Jul 23 '14 at 12:24
  • Yes, that's one step. How do i connect this tLoadlibrary with the flow then? do i need to use the arrows there or keep them single on the process page? – ZedBrannigan Jul 23 '14 at 12:28
  • you can use trigger - onSubJobOk or onComponentOK for connection to next component...tLibraryload should be first thing you do in your job. Also you can import classes/methods in tJava, tJavaRow under advanceproperties import xxx.yyy....... – garpitmzn Jul 23 '14 at 12:36
  • Thanks so far, you help me dig into it. Can I use JavaFlex too to do it all in once? Im not totally sure where to put my code now. – ZedBrannigan Jul 23 '14 at 12:38
  • 1
    You should be able to do all of this inside a tJava/tJavaRow (depending on your data flow) component or (potentially better) put all of the code inside a routine and call this in the tJava/tJavaRow. I'd suggest doing the database stuff with the provided database connectors though (such as tMySqlOutput) as it will better handle errors and it will be easier for people to maintain your job. – ydaetskcoR Jul 23 '14 at 12:43
  • 1
    As a general rule, I'd only put custom Java code into a job if it can't be done using Talend's provided components. It helps to keep a separation of logic to a reasonable degree. – ydaetskcoR Jul 23 '14 at 12:44
  • That sounds helpful. I might return for further questions. Thank you so far! – ZedBrannigan Jul 23 '14 at 12:57

3 Answers3

3

This can be done by using the tLoadLibrary component and putting the external jar file in <talendInstallDir>/lib/java

You can use the onSubJobOk or onComponentOK connections to connect to the next components.

Your tLibraryLoad component(s) should be first thing you do in your job.

You can also import classes/methods in tJava, tJavaRow under Advanced Properties in the component view and then use something like:

import org.apache.commons.lang3.math.NumberUtils;

to import the specific class you need (in this case, the Apache Commons NumberUtils).

ydaetskcoR
  • 53,225
  • 8
  • 158
  • 177
garpitmzn
  • 1,001
  • 6
  • 9
2

you can use tLoadLibrary into you flow and remember use OnSubjobOk you should use your tJava code.enter image description here

Brij
  • 111
  • 1
  • 1
  • 7
0

Although this thread is 2 years old and you might have already solved this problem., I recently did a similar mini-project and this may help you. I am using plain string manipulation instead of JSoup library. Also has an associate video of step by step instructions. Hope it helps.

Talend project to parse webpage