Create a Vector Store
Richa Deshpande avatar
Written by Richa Deshpande
Updated over a week ago

The MongoDB document model along with Vector Search is a powerful combination to store data to power AI applications. Vector Search is available to you alongside MongoDB’s document database without any extra integrations.

We are going to be loading in the MongoDB Atlas Best Practices PDF and chunk the PDF for efficient lookup.

Install libraries:

pip install --upgrade langchain langchain-mongodb langchain-openai pymongo pypdf

Code Sample:

import pymongo, pprint
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi

uri = <your connection string>

# Create a new client and connect to the server
client = MongoClient(uri, server_api=ServerApi('1'))

OPENAI_API_KEY = <open ai key>

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

db = client[“langchain_db”]
collection = db[“test”]

# Load the PDF
loader = PyPDFLoader("https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4HkJP")
data = loader.load()

# Split PDF into documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
docs = text_splitter.split_documents(data)

# Create the MongoDB Atlas Vector Search instance
vector_search = MongoDBAtlasVectorSearch.from_connection_string(
connection_string = uri,
namespace = "langchain_db.test",
embedding = embeddings,
index_name="vector_index_test",
)

# Run the documents through the embeddings and add to the vector store
vector_search.add_documents(docs)

The following parameters are used:

  • docs: The documents to store in the vector database.

  • uri: the Atlas URI to connect to the client

  • namespace: langchain_db.test as the Atlas collection for which to insert the documents.

  • embedding: OpenAI's embedding model as the model used to convert text into vector embeddings for the embedding field.

  • index_name: vector_index_test as the index to use for querying the vector store.

Did this answer your question?