Can someone explain in details how NiFi processors like GetFile or QueryDatabaseTable store the rows when the next processor is not available to receive or process any data? Would the data gets piped up in memory and then gets swapped to disks when the size exceeds some threshold? Potentially would it have the risk of running out of memory or losing data?
1 Answers
I would recommend reading the Apache NiFi documentation, specifically the "Apache NiFi in Depth" document to understand how data is stored and passed through NiFi:
https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
The short answer is that data is always written to disk in NiFi's internal repositories. A flow file has attributes that are persisted to the flow file repository and content that is persisted to the content repository. The content is not held in memory unless a processor chooses to read the entire content into memory to perform some processing.
When flow files are in a queue, none of the content is held in memory, just flow file objects that know where the content lives on disk. When the queue reaches a certain size, these flow file objects will be swapped to disk which allows you to have a queue with millions of flow files, without actually having a million flow file objects.
There is also a concept of back-pressure to control a maximum size of a queue based on number of flow files, or size of all flow files in a queue.

- 18,320
- 1
- 28
- 39