0

I am using chromadb version '0.4.5'. I can store my chromadb vector store locally. However, when I tried to store it in DBFS I get the "OperationalError: disk I/O error" just by running

vector_db_path = '/dbfs/FileStore/HuggingFace/data/demo_langchain/test_vector_db/'
client = chromadb.PersistentClient(path=vector_db_path)

I found this post that deals with the same error and the accepted solution there was

conn = sqlite3.connect('data.sqlite')
curr = conn.execute('PRAGMA locking_mode = EXCLUSIVE')

Ok, I tried:

conn = sqlite3.connect("/dbfs/FileStore/HuggingFace/data/demo_langchain/test_vector_db/chroma.sqlite3")
curr = conn.execute('PRAGMA locking_mode = EXCLUSIVE')
client = chromadb.PersistentClient(path=vector_db_path)

OperationalError: disk I/O error

My guess is, it is most likely a new issue with chromadb, since they recently switched from duckdb to sqlite.

David Makovoz
  • 1,766
  • 2
  • 16
  • 27

1 Answers1

0

As it turns out I was using the wrong format for the path.

Databricks offers two formats, e.g.

Spark API Format: dbfs:/FileStore/demo_langchain/hf_embed

File API Format:/dbfs/FileStore/demo_langchain/hf_embed

Normally both work, but for chromadb only the first one works:

vector_db_path = 'dbfs:/FileStore/HuggingFace/data/demo_langchain/test_vector_db/'
client = chromadb.PersistentClient(path=vector_db_path)

no need to connect to sqlite directly.

David Makovoz
  • 1,766
  • 2
  • 16
  • 27