1

I have a ETL which give text file output and I have to check the those text content has the word error or bad using pentaho.

Is there any simple way to find it?

NamingException
  • 2,388
  • 1
  • 19
  • 41
Cynosure
  • 161
  • 2
  • 3
  • 9

2 Answers2

1

If you are trying to process a number of files, you can use a Get Filenames step to get all the filenames. Then, if your text files are small, you can use a Get File Content step to get the whole file as one row, then use a Java Filter or other matching step (RegEx, e.g.) to search for the words. If your text files are too big but line-based or otherwise in a fixed format (which it likely is if you used a text file output step), you can use a Text File Input step to get the lines, then a matcher step (see above) to find the words in the line. Then you can use a Filter Rows step to choose just those rows that contain the words, then Select Values to choose just the filename, then a Sort Rows on the filename, then a Unique Rows step. The result should be a list of filenames whose contents contain the search words. This may seem like a lot of steps, but Pentaho Data Integration or PDI (aka Kettle) is designed to be a flow of steps with distinct (and very reusable) functionality. A smaller but less "PDI" method is to write a User Defined Java Class (or other scripting) step to do all the work. This solution has a smaller number of steps but is not very configurable or reusable.

mattyb
  • 11,693
  • 15
  • 20
1

If you're writing these files out yourself, then dont you already know the content? So scan the fields at the point at which you already have them in memory.

If you're trying to see if Pentaho has written an error to the file, then you should use error handling on the output step.

Finally PDI is not a text searching tool. If you really need to do this, then probably best bet is good old grep..

Codek
  • 5,114
  • 3
  • 24
  • 38