1

I would like to investigate the possibility for enriching Splunk ingested data by using the Weaviate Automatic Classification in the streaming ingestion pipeline.

This can only work if the Automatic Classification process will only have a minor impact on the ingestion rate.

Is there any benchmarking data available for the Automatic Classification process (varying text size, schema complexity etc.)?

  • Two questions: (1) With "automatic classification" do you refer to the [Contextual Classification](https://www.semi.technology/documentation/weaviate/current/classification/contextual-classification.html) as opposed to the [kNN-Classification](https://www.semi.technology/documentation/weaviate/current/classification/knn-classification.html) or do you mean classifications in general? (2) By benchmarks you mean speed benchmarks as in "classifications per second", "mean time per item", etc. or are you interest in quality benchmarks (e.g. 80% correctly classified, etc.)? – etiennedi Jul 09 '20 at 14:25
  • 1) I believe the kNN-Classification will fit best for our purposes: classifying new data based on a custom trained set. 2) I mean a speed benchmark. Is it correct that the following steps are needed for the classification of new data? - API call1: Upload to Weaviate (create the "Thing") - API call2: do the automatic classification for the newly created "Thing" – Aniel Parbhoe Jul 13 '20 at 16:42

0 Answers0