0

This is my scenario:

  1. The client has an Azure SQL database with a profiles table with demographic information.
  2. We created an Azure Cognitive Search and indexed that database, we concatenated all fields into one called content. Because according to the documentation everything needs to be in one field. https://python.langchain.com/docs/modules/data_connection/retrievers/integrations/azure_cognitive_search

Now we are creating a chatbot with LangChain where we can ask questions like: Who is John Smith?, How old is Jane Smith, Who likes gardening.

The way I found is here: https://shweta-lodha.medium.com/integrating-azure-cognitive-search-with-azure-openai-and-langchain-51280d1026f2

Basically first cognitive search is queried and some documents are returned, then those documents are saved as vectors in ChromaDB, and then ChromaDB is queried and the results are received in plain english with langchain and openAI.

However ChromaDB is very slow. and it takes about 50 seconds in this step.

so I wanted to try weaviate, but then I get very weird errors like:

[ERROR] Batch ConnectionError Exception occurred! Retrying in 2s. [1/3]
{'error': [{'message': "'@search.score' is not a valid property name. Property names in Weaviate are restricted to valid GraphQL names, which must be “/[_A-Za-z][_0-9A-Za-z]*/”., no such prop with name '@search.score' found in class 'LangChain_df32d6b6d10c4bb895db75f88aaabd75' in the schema. Check your schema files for which properties in this class are available"}]}

My code is as this:

@timer
def from_documentsWeaviate(docs, embeddings):
     return Weaviate.from_documents(docs, embeddings, weaviate_url=WEAVIATE_URL, by_text=False)

  memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
    embeddings = OpenAIEmbeddings(deployment=OPENAI_EMBEDDING_DEPLOYMENT_NAME, model=OPENAI_EMBEDDING_MODEL_NAME, chunk_size=1)
    user_input = get_text()   
    retriever = AzureCognitiveSearchRetriever(content_key="content")

    
   
    
    llm = AzureChatOpenAI(
        openai_api_base=OPENAI_DEPLOYMENT_ENDPOINT,
        openai_api_version=OPENAI_API_VERSION ,
        deployment_name=OPENAI_DEPLOYMENT_NAME,
        openai_api_key=OPENAI_API_KEY,
        openai_api_type = OPENAI_API_TYPE ,
        model_name=OPENAI_MODEL_NAME,
        temperature=0)
    
    docs = get_relevant_documents(retriever, user_input)
    #vectorstore  = from_documentsChromaDb(docs=docs, embedding=embeddings)
    vectorstore  = from_documentsWeaviate(docs, embeddings)

I wonder if I should first index all rows from the table and avoid the cognitive search part.?

Luis Valencia
  • 32,619
  • 93
  • 286
  • 506
  • Hello Luis, as suggested below the error from Search is related to the property @search.score being invalid. FYI you are using Cognitive Search, you can also try vector there so you don't have a separate vector store. Here is the link: https://github.com/Azure/cognitive-search-vector-pr/blob/main/docs/vector-search-overview.md. FYI this is in private preview. The public preview will come out next week and the link will change. I hope it helps. – Gia Mondragon - MSFT Jun 22 '23 at 13:29

1 Answers1

1

but then I get very weird errors like:

The error means that you have an invalid name for a property e.g. @search.score is invalid because it does not comply with this regex:

/[_A-Za-z][_0-9A-Za-z]*/

I wonder if I should first index all rows from the table and avoid the cognitive search part.?

In my opinion, the Azure Cognitive Search part is overkill in this use case and should be replaced with a pipeline that takes the rows from Azure SQL, combines it into a single field and uploads it weaviate.

hsm207
  • 471
  • 2
  • 4