0

I want to manipulate a large text file, which is coming as TEXT and want to use smooks to manipulate it. The text file contains large number of lines. And from each line, i have to split the characters and get information out of that.

Eg: i do following in java;

row.substring(0, 4) 
row.substring(4, 64) 

I have to convert the text content to CSV file.

  • Can we do exact same string manipulation in smooks too? (that is in smooks configuration can i do that?) I believe i can use Fixed Length processing for that?

  • How to add IF ELSE condition in smooks configuration? Like in java;

    if (row.length() == 900) { //DO }else(){ //DO }

Ratha
  • 9,434
  • 17
  • 85
  • 163
  • Perhaps it was just a bad choice of words but, if it's XML you're processing, then parsing by "line" is not what you want to be doing. There are loads of smooks examples of processing XML (see the examples pages on smooks.org). There's def one example there about process huge XML files. Basic flow would be to bind the relevant XML fragment (that corresponds with your row/record) to a java model (this can be a simple java.util.Map) and then apply a freemarker template to each instance, outputting a line (for CSV) per execution. – Tom Fennelly May 15 '14 at 08:49
  • Sorry , it is text file comes with fixed length.SO, i need to identify columns based on fixed length characters. but the file might contain 200 character line 300 character line. AMong that i need to selectively process 200 length character line. – Ratha May 15 '14 at 15:59

2 Answers2

1

We can do string manipulation using fixed length reader[1]. but still i do not find a way to do condition check.

Eg: if /else

[1]http://www.smooks.org/mediawiki/index.php?title=V1.4:Smooks_v1.4_User_Guide#XML

Ratha
  • 9,434
  • 17
  • 85
  • 163
0

If the format does not fit the flatfile reader, then you might be able to use the regex reader: https://github.com/smooks/smooks/tree/v1.5.1/smooks-examples/flatfile-to-xml-regex/

As for the conditional stuff... you really need to bind the data fragments into a Java model of some sort (real or virtual) and then conditionally process those fragments by either adding elements on the visitors being applied, or process the fragments by routing them to another process that processes them in parallel, which is a far better way of processing a huge data stream.

Tom Fennelly
  • 286
  • 1
  • 2
  • 7
  • I should add... and if the regex reader doesn't work then you might need to consider writing a custom reader, which might be the easiest thing if you're not a regex ninja. – Tom Fennelly May 28 '14 at 11:04