When I using meta-llama/Llama-2-13b-chat-hf the answer that model give is not good. I think is my prompt using wrong. below is my code
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS, Chroma
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate
import transformers
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryBufferMemory,ConversationBufferMemory,ConversationSummaryMemory
import torchimport os
from langchain import OpenAI
os.environ['OPENAI_API_KEY'] = 'My key'
bnb_config = transformers.BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type='nf4',
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
model_id = 'meta-llama/Llama-2-13b-chat-hf'
hf_auth = '***'
model_config = transformers.AutoConfig.from_pretrained(
model_id,
use_auth_token=hf_auth
)
model = transformers.AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
config=model_config,
quantization_config=bnb_config,
device_map='auto',
use_auth_token=hf_auth
)
model.eval()
tokenizer = transformers.AutoTokenizer.from_pretrained(
model_id,
use_auth_token=hf_auth
)
instruct_pipeline = transformers.pipeline(
task='text-generation',
model=model,
torch_dtype=torch.bfloat16,
device_map="auto",
tokenizer=tokenizer,
return_full_text=True,
max_new_tokens=512,
top_p=0.99,
top_k=50,
repetition_penalty=1.1,
temperature=0.01
)
hf_pipe = HuggingFacePipeline(pipeline=instruct_pipeline)
prompt_template1 = """<s>[INST] <<SYS>>
{{ You are a AI chatbot having a conversation with a human. Given the following has three part. First part is a extracted parts of a long document. Second part is the conversation between you and human. Third part is the human's question.
If human's question can't use extracted parts to answer, just chat normally with human. If human's question can use extracted parts to answer, please based on the extracted parts to answer human.
Extracted parts:
###
There are 5 steps to find password back.
STEP 1
Go to MEMBER CENTER Click SECURITY CENTER
STEP 2
Select SECURITY CENTER
STEP 3
Select TRANSACTION PASSWORD
STEP 4
Select FORGOT PASSWORD
STEP 5
for bound EMAIL
Enter your BOUND E-MAIL
then you will receive an email with your new password
for bound PHONE NUMBER
Enter your BOUND PHONE NUMBER
###
Previous Conversation:
'''
{history}
'''
Human's question: ```{input}``` }}
<</SYS>>
"""
prompt = PromptTemplate(template=prompt_template1, input_variables=['input', 'history'])
summary_memory = ConversationSummaryBufferMemory(llm=OpenAI(), max_token_limit=20)
conversation = ConversationChain(
prompt=prompt,
llm=hf_pipe,
verbose=True,
memory=summary_memory,
)
the Extracted parts is searched by Embeddings I just type one of it to test I create a ConversationSummaryBufferMemory to memory conversation but seem like meta-llama has own prompt to memory conversation. I do not know how to use. Please help me !
meta-llama can reply the answer when user ask question that is associated with the Extracted parts. otherwise, just chat with user normally.