0

I’m trying to add records in a batch, but after I add my objects to the batch, I always get a JSONDecodeError when I assume the batch is being sent to my Weaviate class.

client.batch.configure(batch_size=100, dynamic=False, timeout_retries=3,
                       callback=weaviate.util.check_batch_result,
consistency_level=weaviate.data.replication.ConsistencyLevel.ALL)
with client.batch as batch:
     for el_idx, el in enumerate(send_to_weaviate):
         batch.add_data_object(el, "MyClass")

Records look like this:

send_to_weaviate[0]
{'my_id': '3c2466b7e7da201c66f42ea362874343',
 'post_timestamp': ['1644883202000', '1644883242000'],
 'dist_metric': [0, 0]}

Schema looks like this:

class_obj = {
        "class": "MyClass",
        "description": "Description",
        "properties": [{
            "dataType": ["text"],
            "description": "ID",
            "name": "my_id"
        },  {
            "dataType": ["text[]"],
            "description": "Timestamps",
            "name": "post_timestamp"
        },  {
            "dataType": ["int[]"],
            "description": "Description",
            "name": "dist_metric"
        }]
    }

Error message:

File ~/opt/anaconda3/envs/scripts/lib/python3.9/site-packages/weaviate/batch/crud_batch.py:644, in Batch._create_data(self, data_type, batch_request)
    642     connection_count += 1
    643 else:
--> 644     response_json = response.json()
    645     if (
    646         self._weaviate_error_retry is not None
    647         and batch_error_count < self._weaviate_error_retry.number_retries
    648     ):
    649         batch_to_retry, response_json_successful = self._retry_on_error(
    650             response_json, data_type
    651         )

File ~/opt/anaconda3/envs/scripts/lib/python3.9/site-packages/requests/models.py:975, in Response.json(self, **kwargs)
    971     return complexjson.loads(self.text, **kwargs)
    972 except JSONDecodeError as e:
    973     # Catch JSON-related errors and raise as requests.JSONDecodeError
    974     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 975     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

All of the Googling I've done has indicated that the JSONDecodeError occurs when your payload is not json serializable. Weaviate supports lists of common data types as properties (e.g., the int[] and text[] data types), but in general JSON doesn't like lists of ints or lists of strings in its values. However when I try to JSON serialize my send_to_weaviate variable, I have no problems, so this may not be the true cause?

import json
json.loads(json.dumps(send_to_weaviate))  # No errors

Can anyone help me figure out why my batch fails to add to Weaviate?

EDIT:: Here is a small reproducible example that reproduces my issue. I'm using weaviate-client v 3.22.1 in a python 3.9 conda environment.

import weaviate
client = weaviate.Client(URL_TO_WEAVIATE_ENDPOINT)
class_obj = {
        "class": "MyClass",
        "description": "Description",
        "properties": [{
            "dataType": ["text"],
            "description": "ID",
            "name": "my_id"
        },  {
            "dataType": ["text[]"],
            "description": "Timestamps",
            "name": "post_timestamp"
        },  {
            "dataType": ["int[]"],
            "description": "Description",
            "name": "dist_metric"
        }]
    }
client.schema.create_class(class_obj)
client.batch.configure(batch_size=100, dynamic=False, timeout_retries=3)
# Just try to add 1 doc
doc = {'my_id': '3c2466b7e7da201c66f42ea362874343','post_timestamp': ['1644883202000', '1644883242000'], 'dist_metric': [0, 0]}
with client.batch() as batch:
    batch.add_data_object(doc, "MyClass")
lrthistlethwaite
  • 494
  • 2
  • 6
  • 23

2 Answers2

0

Can anyone help me figure out why my batch fails to add to Weaviate?

You can start by using manual batching and printing the el_idx before creating the object in weaviate to make sure you do not have any malformed elements.

If all the elements are what you expected, then you'll need to provide a minimal reproducible example for further assistance.

hsm207
  • 471
  • 2
  • 4
  • just added a small reproducible example to bottom of post. Looking at your manual batching now. So far batch_size=None has not changed the behavior - still getting that JSONDecodeError for line 1 column 1 (char 0). When I print el_idx, all objects seem to add to the batch just fine, but it's when the context manager is exiting that the error is thrown. For example I do print(batch.shape) after each object is added to the batch, and the tuple reports all elements of send_to_weaviate are added to the batch. The error is upon exiting the context manager, etc. – lrthistlethwaite Jul 26 '23 at 14:04
  • So I just figured out that if I remove the client.batch.configure() command, it works! Any idea why the parameters I set would cause this behavior? – lrthistlethwaite Jul 26 '23 at 15:40
  • I ran your example using Python 3.11.4, weaviate-client 3.22.1 and weaviate 1.20.3. I did not get any errors. – hsm207 Jul 26 '23 at 16:28
  • Thanks for taking the time to test it out on your machine! I found that lowering the batch_size to 10 seemed to help, but I don't fully understand why. My server version is a bit older though (1.17.2), so that's interesting. – lrthistlethwaite Jul 26 '23 at 20:01
  • what is stopping you from upgrading the server to the latest version? It could be the case that your issue is due to a bug that was fixed in later versions so it beneficial to always be on the latest stable version – hsm207 Jul 26 '23 at 23:41
  • It's a different part of the company that controls the server side, but I'll ask them about this in particular and see if we can't bump up to the most recent release. It very well may be the case! – lrthistlethwaite Aug 02 '23 at 23:10
0

When I set batch_size = 10, things seemed to work again. I think because my records were quite character-rich, batches needed to be smaller than the default=100 setting. You can set batch_size using client.batch.configure(batch_size=10) or in the context manager itself with client.batch(batch_size=10) as batch:, etc.

Also, weirdly, deleting the "client" object and reestablishing the connection seemed to help, but I can't understand why.

lrthistlethwaite
  • 494
  • 2
  • 6
  • 23