Mismatch between content node storage ratio and server storage

Question

I am using Vespa in a docker with one single content node on a Ubuntu server. The total storage is:

[root@vespa-container /]# df -h .
Filesystem      Size  Used Avail Use% Mounted on
overlay         485G  118G  343G  26% /

Apparently, 26% is far less than the default 80% (0.8) limit ratio in the Vespa setting. But I still got a NO_SPACE error:

ReturnCode(NO_SPACE, External feed is blocked due to resource exhaustion: memory on node 0 [vespa-container] (0.802 > 0.800))

How can I fix this? Thanks!

score 1 · Accepted Answer · answered Aug 19 '21 at 10:56

1

In this case you are limited by memory:

memory on node 0 [vespa-container] (0.802 > 0.800))

answered Aug 19 '21 at 10:56

Jo Kristian Bergum

2,984
5
8

But I saw only 26% memory actually used by typing in `df -h .`. Why is there the mismatch between 0.26 and 0.802? – Kexin Wang Aug 19 '21 at 11:06
df -- display free _disk_ space. Use e.g. top to find memory use – Kristian Aune Aug 19 '21 at 12:46
Oh, sorry that I wrongly took "memory" here as disk storage. I found my RAM was indeed nearly full. – Kexin Wang Aug 20 '21 at 07:38
For a much lower RAM burden, I switched to bfloat16 and w/o HNSW index. But I am wondering, why is the embedding uploading processing (with vespa-feed-client.jar) still very slow (e.g. 317.71 docs/sec)? For my understanding, it should be just simply loading the embeddings into memory, isn't it? – Kexin Wang Aug 20 '21 at 07:44
Using bfloat16 reduces the memory footprint indeed, but inserts into the HNSW graph becomes a bit slower due to bloat16 type not having a HW accelerated distance computation, regular float is HW accelerated. Ingestion speed depends on HW (#v-cpu, disk speed) and the concurrency setting and for HNSW indexing the HNSW parameters greatly impacts performance. Note that even if attribute data is in-memory data also needs to be written to disk (transaction log) for durability in case of node failure (e.g power outage). – Jo Kristian Bergum Aug 20 '21 at 11:12
See https://docs.vespa.ai/en/reference/services-content.html#sync-transactionlog and https://docs.vespa.ai/en/reference/services-content.html#feeding-concurrency – Jo Kristian Bergum Aug 20 '21 at 11:14
Thanks. But I mean, I am using no HNSW and no other ANN here. But the uploading speed is still very low (as slow as using HNSW). Do you think it is possible and why? And any way to make it faster? @JoKristianBergum – Kexin Wang Aug 25 '21 at 12:28
So what type of HW do you run this setup on? How do you feed? If you have content nodes on slow disk mediums like EBS I recommend to look at the synch-transactionlog setting described here https://docs.vespa.ai/en/reference/services-content.html#sync-transactionlog. – Jo Kristian Bergum Aug 25 '21 at 16:23

Mismatch between content node storage ratio and server storage

1 Answers1