2

I have a .csv file that is 25Gb total size. I am attempting to read it in (line by line for now) however I keep running into an OutOfMemoryError: Java heap space and I can't figure out why. After googling around for a while, I have come up with the following code

from("file:/home/justin/data/?fileName=in.csv&noop=true")//.streamCaching()
    .split().tokenize("\n", 10000000).streaming()
    .unmarshal(csv)
    .process(new CsvParserProcess())
    .marshal(csv)
    .to("file:/home/justin/data/?fileName=out.csv").log("Finished Transformation").end();

after 5 seconds of running is when I run into the OutOfMemoryError

my intuiton would tell me "Oh when you reach near complete memory saturation, flush out old unused contents" however I am unsure of how to do this in the context of ApacheCamel (or really manually in java for that matter I've been migrating from C)

My other solution was a very expensive brute force option of just piping (?) the file into a stream one line at a time from camel's stream endpoint, which works maybe? I just haven't wanted to sit around and wait for it to finish.

from("stream:file?fileName=/home/justin/data/in.csv")
    .streamCaching().split().tokenize("\n")
    .unmarshal(csv)
    .process(new CsvParserProcess())
    .marshal(csv)
    .to("file:/home/justin/data/?fileName=out.csv&fileExist=Append").log("done").end();

Does anyone have any ideas of how I can avoid the MemoryError?

Edit: I forgot that my "improved" code had .streaming() after I tokenized the file. It still however results in the same error :(

j-money
  • 509
  • 2
  • 9
  • 32
  • similar question asked here: https://stackoverflow.com/questions/8122748/best-strategy-for-processing-large-csv-files-in-apache-camel – Craig Smith Jul 19 '18 at 10:50
  • I came across that solution while googling around (guess I should've mentioned that, my mistake!) but the my solution does include the `streaming()` – j-money Jul 19 '18 at 10:53
  • you use a different route and I wonder what your CsvParserProcess does. – Craig Smith Jul 19 '18 at 11:01

1 Answers1

1

Maybe before I ripped out my hair (and went to places on the internet I can never unsee) I should've maybe done a little research on ockham's razor.... It turns out that I can not count as well as I originally thought and the buffer I was creating of size 10000000 should have actually been 1000000....

j-money
  • 509
  • 2
  • 9
  • 32