you can build you chain as you would do in Hugginface with local_files_only=True here is an exemple:
tokenizer = AutoTokenizer.from_pretrained(your_tokenizer)
model = AutoModelForCausalLM.from_pretrained(
your_model_PATH,
device_map=device_map,
torch_dtype=torch.float16,
max_memory=max_mem,
quantization_config=quantization_config,
local_files_only=True
)
Then you build the pipeline:
pipe = pipeline(
"text-generation",
model = model,
tokenizer = tokenizer,
max_length = 512,
temperature = 0.7,
top_p = 0.95,
repetition_penalty = 1.15
)
local_llm = HuggingFacePipeline(pipeline=pipe)
Now you can feed the pipeline to Langchain:
llm_chain = LLMChain(prompt=prompt, llm=local_llm)