From a data supplier I download roughly 75 images + 40 pages of details in one job using RestClient.
Goes like this:
- Authenticate to suppliers service and set cookie jar in variable
- Download XML
- XML contains roughly 40 assets.
- For each asset download list of images. (Spans from 0-10 images per asset).
- Download images.
My total download size is 148.14Mb in 37.58 seconds through 115 unique requests. My memory consumption is:
Total allocated: 1165532095 bytes (295682 objects)
Total retained: 43483 bytes (212 objects)
measured with memory_profiler
gem. That's just above 1gb of memory to download ~150mb of data?
My big concern is, that I need to download even more data - this is just 1 out of 15 days of data. When I run 2 days of data I double the download size and memory size. When running 3 days of data I triple etc. It even looks like the memory consumption raises exponential until I run out of memory and my server crashes.
Why is Garbage Collection not kicking in here? I've tried running GC.start
between each day of data I download, that tricks memory_profiler
, but my server still ends up crashing when I add too many days of data.
So my question is:
- Why is the memory consumption so high compared to the data I'm actually downloading.
- As I'm overwriting the variables holding the downloaded data between each download, should Garbage Collection then not clear the memory of the former data download?
- Any tips and tricks to keep memory consumption down?
Versions: Ruby: 2.4.4p296, RestClient: 2.0.2, OS: Ubuntu 16.04
Example code:
Using RestClient: https://gist.github.com/mtrolle/96f55822122ecabd3cc46190a6dc18a5
Using HTTParty: https://gist.github.com/mtrolle/dbd2cdf70f77a83b4178971aa79b6292
Thanks