I have a small (testing) HDFS cluster which I use as snapshot backup space for Flink. Flink creates and deletes roughly 1000 (small) files per second. The namenode seems to handle this without problems at first, but over time the Number of Blocks Pending Deletion builds up until the file system is full. When I stop my Flink job (i.e. no further create/delete/… operations), the number of pending blocks only decreases by about 1.2e6 per hour.
What I'd like to know is… which part is responsible for this slowness? The name, data, or journal nodes? Is this speed to be expected, or can I tune some configuration to get orders of magnitude faster?