2

I have a stream that watch output of multi file in a directory, process data and put it to HDFS. Here is my stream creat command:

stream create --name fileHdfs --definition "file --dir=/var/log/supervisor/ --pattern=tracker.out-*.log --outputType=text/plain | logHdfsTransformer | hdfs --fsUri=hdfs://192.168.1.115:8020 --directory=/data/log/appsync --fileName=log --partitionPath=path(dateFormat('yyyy/MM/dd'))" --deploy

Problem is source:file module send all data read from file to log processing module instead of one line each turn, becase of that, payload string have millions of char, i can't process it. Ex:

--- PAYLOAD LENGTH---- 9511284

Please tell me how to read line by line when use source:file module, thanks !!!

Tu Pham
  • 41
  • 1
  • 4

4 Answers4

3

It's not currently supported, but it would be easy to write a custom source using a Spring Integration inbound-channel-adapter to invoke a POJO that reads a line at a time.

Please open a new feature JIRA issue.

You could also do it with a job instead of a stream in XD.

Gary Russell
  • 166,535
  • 14
  • 146
  • 179
3

I know that this may be late, but for any googlers out there looking for the solution:

Even though there is no module or option that does it automatically, it's as simple as adding a splitter that separates the incoming message into multiple outgoing messages.

Please note that you have to decide between using \n and \r\n. Check your files to see what they're using.

Example:

stream create --name filetest --definition "file --outputType=text/plain --dir=/tmp/examplefiles/| splitter --expression=payload.split('\\n') | log" --deploy

Cheers!

1

Spring Integration has a FileSplitter that splits a text file into lines. You can use this to create a custom processor module, let's call it file-split:

  1. Create file-split.xml file with the following content:

    <?xml version="1.0" encoding="UTF-8"?>
    beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:int="http://www.springframework.org/schema/integration"
       xsi:schemaLocation="http://www.springframework.org/schema/beans        
       http://www.springframework.org/schema/beans/spring-beans.xsd
       http://www.springframework.org/schema/integration 
       http://www.springframework.org/schema/integration/spring-integration.xsd">
    
    <int:splitter input-channel="input" output-channel="output">
            <bean class="org.springframework.integration.file.splitter.FileSplitter">
                    <constructor-arg value="false"/>
            </bean>
    </int:splitter>
    
    <int:channel id="output"/> 
    </beans>
    
  2. Copy the file to ${XD_HOME}/xd/modules/processor/file-split/config/ (create the path if needed)

  3. Sample usage:

    stream create --name splitFile --definition "file --dir=/data --ref=true | file-split | log" --deploy
    

You can further customize the module to take more options if needed.

Khoa Nguyen
  • 280
  • 4
  • 14
0

you can try using the --mode=lines option while deploying the stream. Please check below documentation reference : http://docs.spring.io/spring-xd/docs/current/reference/html/#file

Hope this helps!

Cheers, Pratik

Pratik
  • 21
  • 3