How to Build a RAG System Using Claude 3 Opus And MongoDB

Introduction

Anthropic, a provider of large language models (LLMs), recently introduced three state-of-the-art models classified under the Claude 3 model family. This tutorial utilises one of the Claude 3 models within a retrieval-augmented generation (RAG) system powered by the MongoDB vector database. Before diving into the implementation of the retrieval-augmented generation system, here's an overview of the latest Anthropic release:

Introduction of the Claude 3 model family:

Models: The family comprises Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus, each designed to cater to different needs and applications.
Benchmarks: The Claude 3 models have established new standards in AI cognition, excelling in complex tasks, comprehension, and reasoning.

Capabilities and features:

Multilingual and multimodal support: Claude 3 models can generate code and text in a non-English language. The models are also multimodal, with the ability to understand images.
Long context window: The Claude 3 model initially has a 200K token context window, with the ability to extend up to one million tokens for specific use cases.
Near-perfect recall: The models demonstrate exceptional recall capabilities when analyzing extensive amounts of text.

Design considerations:

Balanced attributes: The development of the Claude 3 models was guided by three main factors — speed, intelligence, and cost-effectiveness. This gives consumers a variety of models to leverage for different use cases requiring a tradeoff on one of the factors for an increase in another.

That’s a quick update on the latest Anthropic release. Although the Claude 3 model has a large context window, a substantial cost is still associated with every call that reaches the upper thresholds of the context window provided. RAG is a design pattern that leverages a knowledge source to provide additional information to LLMs by semantically matching the query input with data points within the knowledge store.

This tutorial implements a chatbot prompted to take on the role of a venture capital tech analyst. The chatbot is a naive RAG system with a collection of tech news articles acting as its knowledge source.

What to expect from this tutorial:

Gain insights into constructing a retrieval-augmented generation system by integrating Claude 3 models with MongoDB to enhance query response accuracy.
Follow a comprehensive tutorial on setting up your development environment, from installing necessary libraries to configuring a MongoDB database.
Learn efficient data handling methods, including creating vector search indexes and preparing data for ingestion and query processing.
Understand how to employ Claude 3 models within the RAG system for generating precise responses based on contextual information retrieved from the database.

All implementation code presented in this tutorial is located in this GitHub repository

Step 1: Library installation, data loading, and preparation

This section covers the steps taken to prepare the development environment source and clean the data utilised as the knowledge base for the venture capital tech analyst chatbot.

The following code installs all the required libraries:

pip install pymongo datasets pandas anthropic openai

Below are brief explanations of the tools and libraries utilised within the implementation code:

anthropic: This is the official Python library for Anthropic that enables access to state-of-the-art language models. This library provides access to the Claude 3 family models, which can understand text and images.
datasets: This library is part of the Hugging Face ecosystem. By installing datasets, we gain access to several pre-processed and ready-to-use datasets, which are essential for training and fine-tuning machine learning models or benchmarking their performance.
pandas: This data science library provides robust data structures and methods for data manipulation, processing, and analysis.
openai: This is the official Python client library for accessing OpenAI's embedding models.
pymongo: PyMongo is a Python toolkit for MongoDB. It enables interactions with a MongoDB database.

Tools like Pyenv and Conda can create isolated development environments to separate package versions and dependencies across your projects. In these environments, you can install specific versions of libraries, ensuring that each project operates with its own set of dependencies without interference. The implementation code presentation in this tutorial is best executed within a Colab or Notebook environment.

After importing the necessary libraries, the subsequent steps in this section involve loading the dataset that serves as the foundational knowledge base for the RAG system and chatbot. This dataset contains a curated collection of tech news articles from HackerNoon, supplemented with an additional column of embeddings. These embeddings were created by processing the descriptions of each article in the dataset. The embeddings for this dataset were generated using OpenAI’s embedding model "text-embedding-3-small," with an embedding dimension of 256. This information on the embedding model and dimension is crucial when handling and embedding user queries in later processes.

The tech-news-embedding dataset contains more than one million data points, mirroring the scale of data typically encountered in a production setting. However, for this particular application, only 228,012 data points are utilized.

Code Snippet

import os
import requests
from io import BytesIO
import pandas as pd
from google.colab import userdata

def download_and_combine_parquet_files(parquet_file_urls, hf_token):
    """
    Downloads Parquet files from the provided URLs using the given Hugging Face token,
    and returns a combined DataFrame.

Parameters:
    - parquet_file_urls: List of strings, URLs to the Parquet files.
    - hf_token: String, Hugging Face authorization token.

Returns:
    - combined_df: A pandas DataFrame containing the combined data from all Parquet files.
    """
    headers = {"Authorization": f"Bearer {hf_token}"}
    all_dataframes = []

for parquet_file_url in parquet_file_urls:
        response = requests.get(parquet_file_url, headers=headers)
        if response.status_code == 200:
            parquet_bytes = BytesIO(response.content)
            df = pd.read_parquet(parquet_bytes)
            all_dataframes.append(df)
        else:
            print(f"Failed to download Parquet file from {parquet_file_url}: {response.status_code}")

if all_dataframes:
        combined_df = pd.concat(all_dataframes, ignore_index=True)
        return combined_df
    else:
        print("No dataframes to concatenate.")
        return None

The code snippet above executes the following steps:

Import necessary libraries:

os for interacting with the operating system
requests for making HTTP requests
BytesIO from the io module to handle bytes objects like files in memory
pandas (as pd) for data manipulation and analysis
userdata from google.colab to enable access to environment variables stored in Google Colab secrets

Function definition: The download_and_combine_parquet_files function is defined with two parameters:

parquet_file_urls: a list of URLs as strings, each pointing to a Parquet file that contains a sub-collection of the tech-news-embedding dataset
hf_token: a string representing a Hugging Face authorization token; access tokens can be created or copied from the Hugging Face platform

Download and read Parquet files: The function iterates over each URL in parquet_file_urls. For each URL, it:

Makes a GET request using the requests.get method, passing the URL and the headers for authorization.
Checks if the response status code is 200 (OK), indicating the request was successful.
Reads (if successful) the content of the response into a BytesIO object (to handle it as a file in memory), then uses pandas.read_parquet to read the Parquet file from this object into a Pandas DataFrame.
Appends the DataFrame to the list all_dataframes.

Combine DataFrames: After downloading and reading all Parquet files into DataFrames, there’s a check to ensure that all_dataframes is not empty. If there are DataFrames to work with, then all DataFrames are concatenated into a single DataFrame using pd.concat, with ignore_index=True to reindex the new combined DataFrame. This combined DataFrame is the overall process output in the download_and_combine_parquet_files function.

Below is a list of the Parquet files required for this tutorial. The complete list of all files is located on Hugging Face. Each Parquet file represents approximately 45,000 data points.

Code Snippet

In the code snippet above, a subset of the tech-news-embeddings dataset is grouped into a single DataFrame, which is then assigned to the variable combined_df.

As a final phase in data preparation, the code snippet below shows the step to remove the _id column from the grouped dataset, as it is unnecessary for subsequent steps in this tutorial. Additionally, the data within the embedding column for each data point is converted from a numpy array to a Python list to prevent errors related to incompatible data types during the data ingestion.

Code Snippet

Step 2: Database and collection creation

An approach to composing an AI stack focused on handling large data volumes and reducing data siloed is to utilise the same database provider for your operational and vector data. MongoDB acts as both an operational and a vector database. It offers a database solution that efficiently stores queries and retrieves vector embeddings.

To create a new MongoDB database, set up a database cluster:

Register for a free MongoDB Atlas account, or existing users can sign into MongoDB Atlas.
Select the “Database” option on the left-hand pane, which will navigate to the Database Deployment page with a deployment specification of any existing cluster. Create a new database cluster by clicking on the +Create button.
For assistance with database cluster setup and obtaining the unique resource identifier (URI), refer to our guide for setting up a MongoDB cluster and getting your connection string.

Note: Don’t forget to whitelist the IP for the Python host or 0.0.0.0/0 for any IP when creating proof of concepts.

At this point, you have created a database cluster, obtained a connection string to the database, and placed a reference to the connection string within the development environment. The next step is to create a database and collect data through the MongoDB Atlas user interface.

Once you have created a cluster, navigate to the cluster page and create a database and collection within the MongoDB Atlas cluster by clicking + Create Database. The database will be named tech_news and the collection will be named hacker_noon_tech_news.

Step 3: Vector search index creation

By this point, you have created a cluster, database, and collection.

The steps in this section are crucial to ensure that a vector search can be conducted using the queries entered into the chatbot and searched against the records within the hacker_noon_tech_news collection. The objective of this step is to create a vector search index. To achieve this, refer to the official vector search index creation guide.

In the creation of a vector search index using the JSON editor on MongoDB Atlas, ensure your vector search index is named vector_index and the vector search index definition is as follows:

Code Snippet

Step 4: Data ingestion

To ingest data into the MongoDB database created in the previous steps, the following operations have to be carried out:

Connect to the database and collection.
Clear out the collection of any existing records.
Convert the Pandas DataFrame of the dataset into dictionaries before ingestion.
Ingest dictionaries into MongoDB using a batch operation.

This tutorial requires the cluster's URI. Grab the URI and copy it into the Google Colab Secrets environment in a variable named MONGO_URI, or place it in a .env file or equivalent.

Code Snippet

The code snippet above uses PyMongo to create a MongoDB client object, representing the connection to the cluster and enabling access to its databases and collections. The variables DB_NAME and COLLECTION_NAME are given the names set for the database and collection in the previous step. If you’ve chosen different database and collection names, ensure they are reflected in the implementation code.

The code snippet below guarantees that the current database collection is empty by executing the delete_many() operation on the collection.

Code Snippet

Ingesting data into a MongoDB collection from a pandas DataFrame is a straightforward process that can be efficiently accomplished by converting the DataFrame into dictionaries and then utilising the insert_many method on the collection to pass the converted dataset records.

Code Snippet

The data ingestion process should take less than a minute, and when data ingestion is completed, the IDs of the corresponding records of the ingested document are returned.

Step 5: Vector search

This section showcases the creation of a vector search custom function that accepts a user query, which corresponds to entries to the chatbot. The function also takes a second parameter, collection, which points to the database collection containing records against which the vector search operation should be conducted.

The vector_search function produces a vector search result derived from a series of operations outlined in a MongoDB aggregation pipeline. This pipeline includes the $vectorSearch and $project stages and performs queries based on the vector embeddings of user queries. It then formats the results, omitting any record attributes unnecessary for subsequent processes.

Code Snippet

def vector_search(user_query, collection):
    """
    Perform a vector search in the MongoDB collection based on the user query.

Args:
    user_query (str): The user's query string.
    collection (MongoCollection): The MongoDB collection to search.

Returns:
    list: A list of matching documents.
    """

# Generate embedding for the user query
    query_embedding = get_embedding(user_query)

if query_embedding is None:
        return "Invalid query or embedding generation failed."

# Define the vector search pipeline
    pipeline = [
        {
            "$vectorSearch": {
                "index": "vector_index",
                "queryVector": query_embedding,
                "path": "embedding",
                "numCandidates": 150,  # Number of candidate matches to consider
                "limit": 5  # Return top 5 matches
            }
        },
        {
            "$project": {
                "_id": 0,  # Exclude the _id field
                "embedding": 0,  # Exclude the embedding field
                "score": {
                    "$meta": "vectorSearchScore"  # Include the search score
                }
            }
        }
    ]

# Execute the search
    results = collection.aggregate(pipeline)
    return list(results)

The code snippet above conducts the following operations to allow semantic search for tech news articles:

Define the vector_search function that takes a user's query string and a MongoDB collection as inputs and returns a list of documents that match the query based on vector similarity search.
Generate an embedding for the user's query by calling the previously defined function, get_embedding, which converts the query string into a vector representation.
Construct a pipeline for MongoDB's aggregate function, incorporating two main stages: $vectorSearch and $project.
The $vectorSearch stage performs the actual vector search. The index field specifies the vector index to utilise for the vector search, and this should correspond to the name entered in the vector search index definition in previous steps. The queryVector field takes the embedding representation of the use query. The path field corresponds to the document field containing the embeddings. The numCandidates specifies the number of candidate documents to consider and the limit on the number of results to return.
The $project stage formats the results to exclude the _id and the embedding field.
The aggregate executes the defined pipeline to obtain the vector search results. The final operation converts the returned cursor from the database into a list.

Step 6: Handling user queries with Claude 3 models

The final section of the tutorial outlines the sequence of operations performed as follows:

Accept a user query in the form of a string.
Utilize the OpenAI embedding model to generate embeddings for the user query.
Load the Anthropic Claude 3— specifically, the ‘claude-3-opus-20240229’ model — to serve as the base model, which is the large language model for the RAG system.
Execute a vector search using the embeddings of the user query to fetch relevant information from the knowledge base, which provides additional context for the base model.
Submit both the user query and the gathered additional information to the base model to generate a response.

The code snippet below focuses on generating new embeddings using OpenAI's embedding model. An OpenAI API key is required to ensure the successful completion of this step. More details on OpenAI's embedding models can be found on the official site.

An important note is that the dimensions of the user query embedding match the dimensions set in the vector search index definition on MongoDB Atlas.

Code Snippet

The next step in this section is to import the Anthropic library and load the client to access Anthropic’s methods for handling messages and accessing Claude models. Ensure you obtain an Anthropic API key located within the settings page on the official Anthropic website.

Code Snippet

The following code snippet introduces the function handle_user_query, which serves two primary purposes: It leverages a previously defined custom vector search function to query and retrieve relevant information from a MongoDB database, and it utilizes the Anthropic API via a client object to use one of the Claude 3 models for query response generation.

Code Snippet

This function begins by executing the vector search against the specified MongoDB collection based on the user's input query. It then proceeds to format the retrieved information for further processing. Subsequently, the function invokes the Anthropic API, directing the request to a specific Claude 3 model.

Below is a more detailed description of the operations in the code snippet above:

Vector search execution: The function begins by calling vector_search with the user's query and a specified collection as arguments. This performs a search within the collection, leveraging vector embeddings to find relevant information related to the query.
Compile search results: search_result is initialized as an empty string to aggregate information from the search. The search results are compiled by iterating over the results returned by the vector_search function and formates each item's details (title, company name, URL, publication date, article URL, and description) into a human-readable string, appending this information to search_result with a newline character \n at the end of each entry.
Generate response using Anthropic client: The function then constructs a request to the Anthropic API (through a client object, presumably an instance of the Anthropic client class created earlier). It specifies:
The model to use ("claude-3-opus-20240229"), which indicates a specific version of the Claude 3 model.
The maximum token limit for the generated response (max_tokens=1024).
A system description guides the model to behave as a "Venture Capital Tech Analyst" with access to tech company articles and information, using this as context to advise.
The actual message for the model to process, which combines the user query with the aggregated search results as context.
Return the generated response and search results: It extracts and returns the response text from the first item in the response's content alongside the compiled search results.

Code Snippet

The final step in this tutorial is to initialize the query, pass it into the handle_user_query function, and print the response returned.

Initialise query: The variable query is assigned a string value containing the user's request: "Give me the best tech stock to invest in and tell me why." This serves as the input for the handle_user_query function.
Execute handle_user_query function: The function takes two parameters — the user's query and a reference to the collection from which information will be retrieved. It performs a vector search to find relevant documents within the collection and formats the results for further use. It then queries the Anthropic Claude 3 model, providing it with the query and the formatted search results as context to generate an informed response.
Retrieve response and source information: The function returns two pieces of data: response and source_information. The response contains the model-generated answer to the user's query, while source_information includes detailed data from the collection used to inform the response.
Display results: Finally, the code prints the response from the Claude 3 model, along with the source information that contributed to this response.

Claude 3 models possess what seems like impressive reasoning capabilities. From the response in the screenshot, it is able to consider expressive language as a factor in its decision-making and also provide a structured approach to its response.

More impressively, it gives a reason as to why other options in the search results are not candidates for the final selection. And if you notice, it factored the date into its selection as well.

Obviously, this is not going to replace any human tech analyst soon, but with a more extensive knowledge base and real-time data, this could very quickly become a co-pilot system for VC analysts.

Please remember that Opus's response is not financial advice and is only shown for illustrative purposes.

Conclusion

This tutorial has presented the essential steps of setting up your development environment, preparing your dataset, and integrating state-of-the-art language models with a powerful database system.

By leveraging the unique strengths of Claude 3 models and MongoDB, we've demonstrated how to create a RAG system that not only responds accurately to user queries but does so by understanding the context in depth. The impressive performance of the RAG system is a result of Opus parametric knowledge and the semantic matching capabilities facilitated by vector search.

Building a RAG system with the latest Claude 3 models and MongoDB sets up an efficient AI infrastructure. It offers cost savings and low latency by combining operational and vector databases into one solution. The functionalities of the naive RAG system presented in this tutorial can be extended to do the following:

Get real-time news on the company returned from the search results.
Get additional information by extracting text from the URLs provided in accompanying search results.
Store additional metadata before data ingestion for each data point.

Some of the proposed functionality extensions can be achieved by utilising Anthropic function calling capabilities or leveraging search APIs. The key takeaway is that whether you aim to develop a chatbot, a recommendation system, or any application requiring nuanced AI responses, the principles and techniques outlined here will serve as a valuable starting point.

Want to leverage another state-of-the-art model for your RAG system? Check out our article that uses Google’s Gemma alongside open-source embedding models provided by Hugging Face.

FAQs

1. What are the Claude 3 models, and how do they enhance a RAG system?

The Claude 3 models (Haiku, Sonnet, Opus) are state-of-the-art large language models developed by Anthropic. They offer advanced features like multilingual support, multimodality, and long context windows up to one million tokens. These models are integrated into RAG systems to leverage their ability to understand and generate text, enhancing the system's response accuracy and comprehension.

2. Why is MongoDB chosen for a RAG system powered by Claude 3?

MongoDB is utilized for its dual capabilities as an operational and a vector database. It efficiently stores, queries, and retrieves vector embeddings, making it ideal for managing the extensive data volumes and real-time processing demands of AI applications like a RAG system.

3. How does the vector search function work within the RAG system?

The vector search function in the RAG system conducts a semantic search against a MongoDB collection using the vector embeddings of user queries. It relies on a MongoDB aggregation pipeline, including the $vectorSearch and $project stages, to find and format the most relevant documents based on query similarity.

4. What is the significance of data embeddings in the RAG system?

Data embeddings are crucial for matching the semantic content of user queries with the knowledge stored in the database. They transform text into a vector space, enabling the RAG system to perform vector searches and retrieve contextually relevant information to inform the model's responses.

5. How does the RAG system handle user queries with Claude 3 models?

The RAG system processes user queries by generating embeddings using an embedding model (e.g., OpenAI's "text-embedding-3-small") and conducting a vector search to fetch relevant information. This information and the user query are passed to a Claude 3 model, which generates a detailed and informed response based on the combined context.

Atlas

How to Build a RAG System Using Claude 3 Opus And MongoDB

Introduction

Step 1: Library installation, data loading, and preparation

Step 2: Database and collection creation

Step 3: Vector search index creation

Step 4: Data ingestion

Step 5: Vector search

Step 6: Handling user queries with Claude 3 models

Conclusion

FAQs

Related

Google Developer Day Philadelphia

Adding Autocomplete To Your NextJS Applications With Atlas Search

Coronavirus Map and Live Data Tracker with MongoDB Charts

Getting Started with Atlas and the MongoDB Query API

Table of Contents