1

Quick question on Smooks transforms, wondering if anyone has had any experience of the same thing, if so time to shine!

Simple really I have a (very large) .csv file and I want to transform it to another .csv format (columns switched etc)..

smooks config file is below.... (Bit of background, it's going through wso2 if that makes any difference - that bit is working fine!)

<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:csv="http://www.milyn.org/xsd/smooks/csv-1.2.xsd" xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd">
    <params>
        <param name="stream.filter.type">SAX</param>
    </params>
    <csv:reader fields="ParentSKU,AttributeSKU,WarehouseID,Published,Stock,SellingPrice,InventoryValue" rootElementName="records" recordElementName="row"/>

    <resource-config selector="row">
        <resource>org.milyn.delivery.DomModelCreator</resource>
    </resource-config>

    <ftl:freemarker applyOnElement="row">
        <ftl:template><![CDATA[${row.ParentSKU},${row.AttributeSKU},${row.WarehouseID},${row.Published},${row.Stock},${row.SellingPrice},${row.InventoryValue}]]></ftl:template>
        <param name="quote">"</param>
        <param name="includeFieldNames">true</param>
        <param name="csvFields">ParentSKU,AttributeSKU,WarehouseID,Published,Stock,SellingPrice,InventoryValue</param>
        <param name="seperator">,</param>
        <param name="messageType">CSV</param>
    </ftl:freemarker>
</smooks-resource-list>

The input file looks something like:

Parent SKU,Attribute SKU,Warehouse ID,Published,Stock,Selling Price,Inventory Value
23551288,,fc,0,0,119.99,0
78234225,,fc,0,0,39.99,0
85275286,,fc,0,0,9.99,0
71235376,7.14034E+12,fc,1,4,24,96
45340656,,fc,0,0,6,0
12343674,,fc,0,0,79.99,0
78049868,,fc,0,0,39.99,0
12082748,,fc,0,0,69.99,0
18302384,,fc,0,0,19.99,0
31366094,,fc,0,0,19.99,0

The problem is that in the output I get the record tags in the output how can I stop this - I have been trying different things for the last 24 hours.

<records>Parent SKU,Attribute SKU,Warehouse ID,Published,Stock,Selling Price,Inventory Value
23551288,,fc,0,0,119.99,0
78234225,,fc,0,0,39.99,0
85275286,,fc,0,0,9.99,0
71235376,7.14034E+12,fc,1,4,24,96 
45340656,,fc,0,0,6,0
12343674,,fc,0,0,79.99,0
78049868,,fc,0,0,39.99,0
12082748,,fc,0,0,69.99,0
18302384,,fc,0,0,19.99,0
31366094,,fc,0,0,19.99,0
</records>

Ideally I would prefer to use smooks configuration only so that I can give this to developers who are not java aware.

I have also tried using

<csv:reader fields="ParentSKU,AttributeSKU,WarehouseID,Published,Stock,SellingPrice,InventoryValue" recordElementName="record" rootElementName="row" skipLines="1">
    <csv:singleBinding beanId="row" class="java.util.HashMap"/> 
  </csv:reader>

in place of the resource config node but it does the same thing.

Thanks in advance.

Tim Teece
  • 21
  • 2
  • Been looking further at this... and tried using two ftl:free marker tags with but this is not matching the nodes if I put two in it only matches records, and will not process the rows. I am wondering if this is something to do with the xml processing libraries in the WSO2 service bus... very confused now. – Tim Teece Mar 11 '16 at 10:02
  • Also tried and that just results in an empty file :( – Tim Teece Mar 14 '16 at 16:03

2 Answers2

0

Ok looks like there is nothing out there to solve this... feel free to answer if you know differently however I have managed to get somewhere...

Step 1. Disable Smooks output in the WSO2 ESB < inSequence> < property name="DISABLE_SMOOKS_RESULT_PAYLOAD" value="true"
scope="default" type="STRING"/> < smokes confi... (remove the spaces in the tags) That stops Smooks from wasting time and memory outputting anything through the default smooks stream

Step 2. In the smooks config file write to an output file not to the default output stream

<ftl:freemarker applyOnElement="row">
    <ftl:template>< ![CDATA[${row.ParentSKU},${row.AttributeSKU},${row.WarehouseID},${row.Published},${row.Stock},${row.SellingPrice},${row.InventoryValue}
    ]]></ftl:template>
    <ftl:use>
        <ftl:outputTo outputStreamResource="outputStream" />
    </ftl:use>
</ftl:freemarker>

<file:outputStream resourceName="outputStream"
    openOnElement="records">
    <file:fileNamePattern>file_output.csv</file:fileNamePattern>
    <file:destinationDirectoryPattern>/vfs/test</file:destinationDirectoryPattern>
</file:outputStream>

(Again remove the spaces at the start of the tags) This causes a file to be opened when the tag 'records' is found, then routes all the csv output to that stream.

Ok whilst this works it's really not the answer as it stops me from using any of the ESB functionality specifically it stops me putting the file anywhere useful e.g. using a vfs with handy file names. Infact as it's smooks 1.1 not 1.4 we cannot even use the inbuilt beans to name the file with a time stamp or a random number.

But I could add another proxy to move the file on once it has been generated or create my own java bean for date and or random number to generate a unique file name from smooks directly.

If anyone comes up with a better answer please let me know as it would still help!

Tim Teece
  • 21
  • 2
  • Only with large files, if you leave the output step connected and have the < property name="DISABLE_SMOOKS_RESULT_PAYLOAD" value="true" scope="default" type="STRING"/> in place oddly out get the correct output through the ESB as well - that;s weird... – Tim Teece Mar 14 '16 at 22:04
  • and finally... rename the output file via the fileconnector (download this from wso2 and install it!) then add file:///vfs/test/ file_output.csv {fn:concat(fn:substring-after(get-property('MessageID'), 'urn:uuid:'), '.txt')} and then make the proxy work sequentially by adding the ifs setting... 1 ad we have a complete working large cvs file proxy :) hurrah! – Tim Teece Mar 15 '16 at 09:52
-1

Have you tried to read the data using DataServices then transform into CSV?

  • Hi MiniMini, thanks I looked at that last night thought you had it for a moment! :) Unfortunately I think whilst I could use DataServices to expose the CSV file as a service and then read from it, what I ideally want to do is the transform as I read it and output to another CSV. I guess I could do a query against that datasource then transform the results but that would have a lot of overhead and as the file is very large I was trying to avoid that as much as possible. – Tim Teece Mar 11 '16 at 08:00
  • I could do similar thing just using the esb and two proxy services, one to transform the csv into an XML file using Smooks and then use a second service to take that xml and transform it using a stylesheet or other to the second CSV format. – Tim Teece Mar 11 '16 at 08:00