Question & Answering (RAG) for HF

Vector Search powers implementing RAG (Retrieval Augmented Generation) in your application. Run the following code to create a question-answering model that uses these documents as context for the LLM.

Create a HuggingFace Account and API Key: https://huggingface.co/settings/tokens

Install libraries:

pip install pymongo langchain langchain-mongodb sentence-transformers pypdf

Code sample:

from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain_community.llms import HuggingFaceEndpoint
import pprint

# Set token
HUGGINGFACE_TOKEN = <your access token>

# Instantiate HuggingFace as an LLM
llm = HuggingFaceEndpoint(
repo_id="HuggingFaceH4/zephyr-7b-beta",
task="text-generation",
max_new_tokens=512,
top_k=30,
temperature=0.1,
repetition_penalty=1.03,
huggingfacehub_api_token=HUGGINGFACE_TOKEN,
)

# Instantiate Atlas Vector Search as a retriever
qa_retriever = vector_search.as_retriever()

# Define a basic question-answering prompt template
prompt_template = """

Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

# Create the question-answering model
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=qa_retriever,
    return_source_documents=True,
    chain_type="stuff",
    chain_type_kwargs={"prompt": PROMPT},
)

# Prompt the LLM
query = "How can I secure my MongoDB Atlas cluster?"
docs = qa_chain({"query": query})

print(docs["result"])
print("\nSource documents: ")
pprint.pprint(docs["source_documents"])

Tell me what this code does

This example does the following:

Instantiates HuggingFace as the LLM (Large Language Model) used to generate a response.
Instantiates Atlas Vector Search as a retriever to query for similar documents.
Defines a LangChain prompt template to instruct the LLM to use these documents as context for your query. LangChain passes these documents to the {context} input variable and your query to the {question} variable.
Uses the RetrievalQA chain to create a question-answering model that generates context-aware responses. It specifies the following:
- HuggingFace as the LLM used to generate the response.
- Atlas Vector Search as the retriever used to augment the data set.
- The boolean value true to return the source documents used as context.
- The stuff chain type, which specifies that the relevant documents should be inserted, or "stuffed," into the prompt.
- The prompt template that you constructed.
Prompts the LLM with a sample query. The generated response might vary.