1

I'd like to understand the behaviour of a lazy sequence if I iterate over with doseq but hold onto part of the first element.

 (with-open [log-file-reader (clojure.java.io/reader (clojure.java.io/file input-file-path))]

    ; Parse line parse-line returns some kind of representation of the line.
    (let [parsed-lines (map parse-line (line-seq log-file-reader))
          first-item (first parsed-lines)]

          ; Iterate over the parsed lines
          (doseq [line parsed-lines]
            ; Do something with a side-effect  
          )))

I don't want to retain any of the list, I just want to perform a side-effect with each element. I believe that without the first-item there would be no problem.

I'm having memory issues in my program and I think that perhaps retaining a reference to something at the start of the parsed-line sequence means that the whole sequence is stored.

What's the defined behaviour here? If the sequence is being stored, is there a generic way to take a copy of an object and enable the realised portion of the sequence to be garbage collected?

Joe
  • 46,419
  • 33
  • 155
  • 245

2 Answers2

2

The sequence-holding occurs here

...
(let [parsed-lines (map parse-line (line-seq log-file-reader))
...

The sequence of lines in the file are being lazily produce and parsed, but the entire sequence is held onto, within the scope of let. This sequence is realized in the doseq, but doseq is not the problem, it does not do sequence-holding.

...
(doseq [line parsed-lines]
 ; Do something
...

You wouldn't necessarily care about sequence-holding in a let because the scope of let is limited, but here presumably your file is large and/or you stay within the dynamic scope of let for a while, or perhaps return a closure containing it in the "do something" section.

Note that holding onto any given element of the sequence, including the first, does not hold the sequence. The term head-holding is a bit of a misnomer if you consider head to be the first element as in "head of the list" in Prolog. The problem is holding onto a reference to the sequence.

A. Webb
  • 26,227
  • 1
  • 63
  • 95
  • FYI I'm dealing with multi-gigabyte files. You're saying that a `let` retains the sequence even if it's lazily produced and consumed? What syntax should I use to do the above then? I've just re-written this with a loop + recur on tail of the line-seq and RAM usage was significantly smaller. But it doesn't look nearly as nice. Is there a way of getting a lazily-evaluated map in scope without seq-holding? – Joe Jan 24 '14 at 14:12
  • No need to answer that if you don't want to, I realised that the binding is onto cons-cell-type-element rather than the 'lazy sequence as a whole'. The way I should have done this was to put the `parsed-lines` into the `doseq` binding. – Joe Jan 24 '14 at 14:42
  • 1
    The compiler should release `parsed-lines` here as soon as it's last referenced, ie when the doseq starts. The code posted here doesn't require a lot of memory, unless the commented-out lines also refer to the large lazy sequence. – amalloy Jan 24 '14 at 20:13
  • @amalloy I never realized the compiler did that. I can reproduce the growing memory consumption most of the time with a side effect not referencing the sequence, but its not in the Java heap. Suspect OS (Win 7 64-bit Cygwin here). – A. Webb Jan 25 '14 at 19:16
  • @Joe See above. Are you running out of Java heap or just consuming system memory? If heap, are you referencing the sequence? Clojure version (amalloy's comment applies to v1.2+ I believe, which surely you have)? If system, OS? – A. Webb Jan 25 '14 at 19:20
  • http://stackoverflow.com/questions/10902296/holding-onto-the-head-of-sequence-when-using-rest/10902739#10902739 – A. Webb Jan 25 '14 at 19:52
  • Mac OS, Java 1.7.0_45 Java HotSpot(TM) 64-Bit Server VM, latest Clojure 1.5.1 – Joe Jan 25 '14 at 19:56
  • Actually I'm doing something a bit more complicated here but didn't want to overburden the question (perhaps mistakenly). I'm calling `partition-by` on the list and then iterating over the partitions with a doseq and writing the stream to a file. I believe (again, perhaps mistakenly) `partition-by` is sufficiently lazy that this doesn't make too much difference to the substance of the question. – Joe Jan 25 '14 at 20:00
  • So just using up system memory? I think you must be sufficiently lazy and iterating without holding then or you'd exhaust Java heap and throw an exception. You might want to unaccept this answer and edit question to the smallest complete example that produces the behavior. – A. Webb Jan 25 '14 at 20:06
  • Yes, growing the heap very large, but freeing a lot in a suspicious manner so as to make me think this. I'll post something on Monday. Thanks very much for your help. – Joe Jan 25 '14 at 21:23
1

The JVM will never return memory to the OS once it becomes part of the java heap, and unless you configure it differently the default max heap size is pretty large (1/4 of available RAM, usually). So if you're only experiencing vague issues like "Gosh, this takes up a lot of memory" rather than "Well, the JVM threw an OutOfMemoryError", you probably just haven't tuned the JVM the way you'd like it to act. partition-by is a little eager, in that it holds one or two partitions in memory at once, but unless your partitions are huge, you shouldn't be running out of heap space with this code. Try setting -Xmx100m, or whatever you think is a reasonable heap size for your program, and see if you have problems.

amalloy
  • 89,153
  • 8
  • 140
  • 205
  • Thanks for your help. Yes, the partitions are very large (~OTOO 1 GB). I've iterated a couple of times on the code, I'll have to go back and confirm the characteristics and update on Monday. The JVM returns a lot of RAM after this operation is completed which is why I'm suspicious. My heap is set sufficiently large (4GB) but I just see the usage increase with the size of data and extrapolating the consumption for large real-world data, I think there's something up. – Joe Jan 25 '14 at 21:26
  • re `partition-by` I'd heard that it returned lazy seqs of lazy seqs. You're saying that the partitions themselves are realised and stored. If the partitions are stored whole then that would answer my question (they're very large). – Joe Jan 25 '14 at 21:42
  • Lazy seq of eager seqs, indeed. – amalloy Jan 25 '14 at 22:03