Either Python or JS LangChain would be fine, but I am trying to wrap my head around how I could create a fantasy language learning AI tutor. This question focuses only on a first part, of loading dictionary definitions into a long term memory.
I have a list of ~4,000 words, where in one column of a spreadsheet we have the fantasy language word, and in another column we have a 1 or 2 word English definition for that word. How can I "teach" an AI to learn the meaning of each word, and extrapolate the short definitions I have into a general sense for the word?
What are the key pieces I need to focus on for loading such structured data (either as CSV or JSON, doesn't matter to me, of the dictionary definitions) into a Q&A sort of chatbot? For simplicity's sake, I would like to simply be able to ask the AI "what is the definition of X in the fantasy language called Foo" and it gives me a definition, slightly different each time (writing custom natural English), while still giving the general meaning in one way or another. X will be a fantasy word, and the AI should respond with a definition in English.
Here are some resources I've been reading through to get up to speed with building AI chatbots, but I am still missing a sense of how to load structured data into the chatbot system. Say I use Pinecone for long-term memory, OpenAI for the LLM and such, and LangChain as the main API.
The best I have seen in terms of similar examples, is basically this:
import { PineconeClient } from "@pinecone-database/pinecone";
import { DirectoryLoader } from "langchain/document_loaders/fs/directory";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import * as dotenv from "dotenv";
import { createPineconeIndex } from "./1-createPineconeIndex.js";
import { updatePinecone } from "./2-updatePinecone.js";
import { queryPineconeVectorStoreAndQueryLLM } from "./3-queryPineconeAndQueryGPT.js";
// 6. Load environment variables
dotenv.config();
// 7. Set up DirectoryLoader to load documents from the ./documents directory
const loader = new DirectoryLoader("./documents", {
".txt": (path) => new TextLoader(path),
".pdf": (path) => new PDFLoader(path),
});
const docs = await loader.load();
// 8. Set up variables for the filename, question, and index settings
const question = "Who is mr Gatsby?";
const indexName = "your-pinecone-index-name";
const vectorDimension = 1536;
// 9. Initialize Pinecone client with API key and environment
const client = new PineconeClient();
await client.init({
apiKey: process.env.PINECONE_API_KEY,
environment: process.env.PINECONE_ENVIRONMENT,
});
// 10. Run the main async function
(async () => {
// 11. Check if Pinecone index exists and create if necessary
await createPineconeIndex(client, indexName, vectorDimension);
// 12. Update Pinecone vector store with document embeddings
await updatePinecone(client, indexName, docs);
// 13. Query Pinecone vector store and GPT model for an answer
await queryPineconeVectorStoreAndQueryLLM(client, indexName, question);
})();
Would I need to do something with a LangChain JSON file loader (if my dictionary defs were in JSON)? Or should I just send a bunch of messages to my AI somehow using System Prompts, where each message defines a term from the dictionary? I am not really sure how I am supposed to bootstrap the system with structured data I have, how the various pieces come into play.
Hoping for a high level overview of what needs to be implemented for a Q&A chatbot to respond with definitions of the terms I give it, when prompted. No UI is necessary, I can just do this from Node.js in the console at first.