0

I'm designing an AI app using LangChain to load a private json file, use text splitter, embeddings, vectorstore & retriever to create an Q&A session. However when asking questions about some of the json items in array list, the answer from LLMs only have a subset of the data from the arraylist. Here's the json data:

{"Skill" : {
    "Computer Languages" : ["Java", "Python", "C#", "SQL", "JavaScript", "Typescript", "HTML/CSS", "MATLAB"]
  }
}

Here's the Python code:

def read_json(file_path):
    with open(file_path, "r") as f:
        json_object = json.load(f)
        json_string = json.dumps(json_object)
    return json_string
jsonFile = read_json(json_path)
jsonData = []
document = Document(page_content=jsonFile)
jsonData.append(document)

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
texts = text_splitter.split_documents(pdfData)
embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-large")
vectordb = Chroma.from_documents(texts, embedding=embeddings, 
hf = HuggingFaceHub(repo_id="google/flan-t5-xxl", model_kwargs={"temperature": 0.8, "max_length": 16192})
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
qa = RetrievalQA.from_chain_type(llm=hf, chain_type="stuff", retriever=vectordb.as_retriever(search_kwargs={"k": 1}))
question = input("Enter your question: ")
response = qa(question)

When the question is : "Give me all the computer languages" The answers would be "Java", "Python", "C#", "SQL", "JavaScript".

So "Typescript", "HTML/CSS", "MATLAB" are missing from the answer.

So I wonder what could be wrong in the code.

ArtC
  • 1
  • 1

0 Answers0