1

When trying to insert a array of ~10k items (10810 to be exact) into my Weaviate instance locally (using docker compose) I ran into this error:

FetchError: request to http://localhost:8080/v1/batch/objects failed, reason: socket hang up
    at ClientRequest.<anonymous> (/Users/bram/Dropbox/PARA/Projects/weaviate-kindle/node_modules/node-fetch/index.js:133:11)
    at ClientRequest.emit (node:events:527:28)
    at Socket.socketOnEnd (node:_http_client:478:9)
    at Socket.emit (node:events:539:35)
    at endReadableNT (node:internal/streams/readable:1344:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  type: 'system',
  errno: 'ECONNRESET',
  code: 'ECONNRESET'
}

However, some of the objects successfully uploaded. When I ran the meta count query in Weaviate console, I found 1233 objects (see image)

weaviate graphql aggregate result

Here's the relevant batching code used to import the clippings:

async function importClippings() {
  // Get the data from the data.json file
  const data = await getJsonData();

  // Prepare a batcher
  let batcher = client.batch.objectsBatcher();
  let counter = 0;

  data.clippings.forEach((clipping) => {
    // Construct an object with a class, id, properties and vector
    const obj = generateClippingObject(clipping);

    // add the object to the batch queue
    batcher = batcher.withObject(obj);

    // When the batch counter reaches 20, push the objects to Weaviate
    if (counter++ == 20) {
      // flush the batch queue
      batcher
        .do()
        .then((res) => {
          console.log(res);
        })
        .catch((err) => {
          console.error(err);
        });

      // restart the batch queue
      counter = 0;
      batcher = client.batch.objectsBatcher();
    }
  });

  // Flush the remaining objects
  batcher
    .do()
    .then((res) => {
      console.log(res);
    })
    .catch((err) => {
      console.error(err);
    });
}

EDIT: This error surfaced in the Docker Compose logs as well:

weaviate-kindle-t2v-transformers-1  | INFO:     172.18.0.5:53942 - "POST /vectors/ HTTP/1.1" 200 OK

weaviate-kindle-weaviate-1          | {
"description":"An I/O timeout occurs when the request takes longer than the specified server-side timeout.",
"error":"write tcp 172.18.0.5:8080-\u003e172.18.0.1:61056: i/o timeout",
"hint":"Either try increasing the server-side timeout using e.g. '--write-timeout 600s' as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.",
"level":"error",
"method":"POST",
"msg":"i/o timeout",
"path":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/batch/objects","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"time":"2022-11-03T05:33:30Z"}
}
bitski
  • 1,168
  • 1
  • 13
  • 20
Bram Adams
  • 63
  • 1
  • 8

1 Answers1

2

The issue is visible in the Docker logs :)

"An I/O timeout occurs when the request takes longer than the specified server-side timeout."

This means that you send the request and that it takes longer than the client-defined time-out for Weaviate to complete the request.

This often has two reasons:

  1. If you bring your own vectors (i.e., you're using Weaviate stand-alone without vectorization modules) the batch size is too high.
  2. If you use vectorization modules, most likely this is the bottleneck.
  • For the first one, lowering the batch size fixes this.
  • For the latter one, if you run the ML models yourself you really want to do this on a GPU because vectorization (i.e., ML-model inference) is practically undoable on a CPU.
Bob van Luijt
  • 7,153
  • 12
  • 58
  • 101
  • "> For the latter one, if you run the ML models yourself you really want to do this on a GPU because vectorization (i.e., ML-model inference) is practically undoable on a CPU." Is there any reason as to why some items vectorized successfully and others failed to? I'm using `text2vec-transformers` locally on a Apple Silicon M1 MBP. It doesnt have the beefiest GPU, but it's no slouch! XD – Bram Adams Nov 03 '22 at 16:51
  • Is there any way to trade time for effeciency? Meaning, can I slowly process text-2-vec for all 10,000 items slower than I'm currently batching to guarantee they all make it through the endpoint? – Bram Adams Nov 03 '22 at 16:53
  • 2
    Hi @BramAdams (sorry for late response). We are soon having the ONNX update ready, this might help. Another way is using the [text2vec-huggingface](https://weaviate.io/developers/weaviate/current/retriever-vectorizer-modules/text2vec-huggingface.html) module _only_ for importing. This creates the vectors needed. After this, you can create [a backup](https://weaviate.io/developers/weaviate/current/configuration/backups.html) and store them. – Bob van Luijt Nov 15 '22 at 10:58