Weaviate Batch Endpoint Socket Hang Up

Question

When trying to insert a array of ~10k items (10810 to be exact) into my Weaviate instance locally (using docker compose) I ran into this error:

FetchError: request to http://localhost:8080/v1/batch/objects failed, reason: socket hang up
    at ClientRequest.<anonymous> (/Users/bram/Dropbox/PARA/Projects/weaviate-kindle/node_modules/node-fetch/index.js:133:11)
    at ClientRequest.emit (node:events:527:28)
    at Socket.socketOnEnd (node:_http_client:478:9)
    at Socket.emit (node:events:539:35)
    at endReadableNT (node:internal/streams/readable:1344:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  type: 'system',
  errno: 'ECONNRESET',
  code: 'ECONNRESET'
}

However, some of the objects successfully uploaded. When I ran the meta count query in Weaviate console, I found 1233 objects (see image)

Here's the relevant batching code used to import the clippings:

async function importClippings() {
  // Get the data from the data.json file
  const data = await getJsonData();

  // Prepare a batcher
  let batcher = client.batch.objectsBatcher();
  let counter = 0;

  data.clippings.forEach((clipping) => {
    // Construct an object with a class, id, properties and vector
    const obj = generateClippingObject(clipping);

    // add the object to the batch queue
    batcher = batcher.withObject(obj);

    // When the batch counter reaches 20, push the objects to Weaviate
    if (counter++ == 20) {
      // flush the batch queue
      batcher
        .do()
        .then((res) => {
          console.log(res);
        })
        .catch((err) => {
          console.error(err);
        });

      // restart the batch queue
      counter = 0;
      batcher = client.batch.objectsBatcher();
    }
  });

  // Flush the remaining objects
  batcher
    .do()
    .then((res) => {
      console.log(res);
    })
    .catch((err) => {
      console.error(err);
    });
}

EDIT: This error surfaced in the Docker Compose logs as well:

weaviate-kindle-t2v-transformers-1  | INFO:     172.18.0.5:53942 - "POST /vectors/ HTTP/1.1" 200 OK

weaviate-kindle-weaviate-1          | {
"description":"An I/O timeout occurs when the request takes longer than the specified server-side timeout.",
"error":"write tcp 172.18.0.5:8080-\u003e172.18.0.1:61056: i/o timeout",
"hint":"Either try increasing the server-side timeout using e.g. '--write-timeout 600s' as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.",
"level":"error",
"method":"POST",
"msg":"i/o timeout",
"path":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/batch/objects","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"time":"2022-11-03T05:33:30Z"}
}

score 2 · Answer 1 · answered Nov 03 '22 at 11:00

2

The issue is visible in the Docker logs :)

"An I/O timeout occurs when the request takes longer than the specified server-side timeout."

This means that you send the request and that it takes longer than the client-defined time-out for Weaviate to complete the request.

This often has two reasons:

If you bring your own vectors (i.e., you're using Weaviate stand-alone without vectorization modules) the batch size is too high.
If you use vectorization modules, most likely this is the bottleneck.

For the first one, lowering the batch size fixes this.
For the latter one, if you run the ML models yourself you really want to do this on a GPU because vectorization (i.e., ML-model inference) is practically undoable on a CPU.

answered Nov 03 '22 at 11:00

Bob van Luijt

7,153
12
58
101

"> For the latter one, if you run the ML models yourself you really want to do this on a GPU because vectorization (i.e., ML-model inference) is practically undoable on a CPU." Is there any reason as to why some items vectorized successfully and others failed to? I'm using `text2vec-transformers` locally on a Apple Silicon M1 MBP. It doesnt have the beefiest GPU, but it's no slouch! XD – Bram Adams Nov 03 '22 at 16:51
Is there any way to trade time for effeciency? Meaning, can I slowly process text-2-vec for all 10,000 items slower than I'm currently batching to guarantee they all make it through the endpoint? – Bram Adams Nov 03 '22 at 16:53
2

Hi @BramAdams (sorry for late response). We are soon having the ONNX update ready, this might help. Another way is using the [text2vec-huggingface](https://weaviate.io/developers/weaviate/current/retriever-vectorizer-modules/text2vec-huggingface.html) module _only_ for importing. This creates the vectors needed. After this, you can create [a backup](https://weaviate.io/developers/weaviate/current/configuration/backups.html) and store them. – Bob van Luijt Nov 15 '22 at 10:58

Weaviate Batch Endpoint Socket Hang Up

1 Answers1