Building AI with MongoDB: How Patronus Automates LLM Evaluation to Boost Confidence in GenAI

Mat Keep
February 2, 2024 | Updated: February 21, 2024
#genAI #Vector Search

It is hardly headline news that large language models can be unreliable. For some use cases, this can be inconvenient. For others — especially in regulated industries — the consequences are way more severe. Enter Patronus AI, the industry-first automated evaluation platform for LLMs.

Founded by machine learning experts from Meta AI and Meta Reality Labs, Patronus AI is on a mission to boost enterprise confidence in gen AI-powered apps, leading the way in shaping a trustworthy AI landscape.

Rebecca Qian, Patronus co-founder and CTO explains, “Our platform enables engineers to score and benchmark LLM performance on real-world scenarios, generate adversarial test cases, monitor hallucinations, and detect PII and other unexpected and unsafe behavior. Customers use Patronus AI to detect LLM mistakes at scale and deploy AI products safely and confidently.”

In recently published and widely cited research based on the FinanceBench question answering (QA) evaluation suite, Patronus made a startling discovery. Researchers found that a range of widely used state-of-the-art LLMs frequently hallucinated, incorrectly answering or refusing to answer up to 81% of financial analysts’ questions! This error rate occurred despite the models’ context windows being augmented with context retrieved from an external vector store.

While retrieval augmented generation (RAG) is a common way of feeding models with up-to-date, domain-specific context, a key question faced by app owners is how to test the reliability of model outputs in a scalable way. This is where Patronus comes in. The company has partnered with the leading technologies in the gen AI ecosystem — from model providers and frameworks to vector store and RAG solutions — to provide managed evaluation services, test suites, and adversarial data sets.

“As we assessed the landscape to prioritize which partners to work with, we saw massive demand from our customers for MongoDB Atlas," said Qian. “Through our Patronus RAG evaluation API, we help customers verify that their RAG systems built on top of MongoDB Atlas consistently deliver top-tier, dependable information."

In its new 10-minute guide, Patronus takes developers through a workflow showcasing how to evaluate a MongoDB Atlas-based retrieval system. The guide focuses on evaluating hallucination and answers relevance against an SEC 10-K filing, simulating a financial analyst querying the document for analysis and insights. The workflow is built using:

The LlamaIndex data framework to ingest and chunk the source pdf document
Atlas Vector Search to store, index, and query the chunk’s metadata and embeddings
Patronus to score the model responses

The workflow is shown in the figure below.

Equipped with the results of an analysis, there are a number of steps developers can take to improve the performance of a RAG system. These include exploring different indexes, modifying document chunking sizes, re-engineering prompts, and for the most domain-specific apps, fine-tuning the embedding model itself. Review the 10-minute guide for a more detailed explanation of each of these steps.

As Qian goes on to say, “Regardless of which approach you take to debug and fix hallucinations, it’s always important to continuously test your RAG system to make sure performance improvements are maintained over time. Of course, you can use the Patronus API iteratively to confirm.” To learn more about LLM evaluation, reach out at contact@patronus.ai.

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

← Previous

Building AI With MongoDB: How Gradient Accelerator Blocks Take You From Zero To AI in Seconds

Founded by the former leaders of AI teams at Google, Netflix, and Splunk, Gradient enables businesses to create high-performing, cost-effective custom AI applications. Gradient provides a platform for businesses to build, customize, and deploy bespoke AI solutions — starting with the fastest way to develop AI through the use of its Accelerator Blocks. Gradient’s Accelerator Blocks are comprehensive, fully managed building blocks designed for AI use cases — reducing developer workload and helping businesses achieve their goals in a fraction of the time. Each block can be used as is (e.g. entity extraction, document summarization, etc.) or combined to create more robust and intricate solutions (e.g. investment co-pilots, customer chatbots, etc.) that are low-code, use best-of-breed technologies, and provide state-of-the-art performance. Gradient’s newest Accelerator Block focuses on enhancing the performance and accuracy of a model through retrieval augmented generation (RAG). The Accelerator Block uses Gradient’s state-of-the-art LLMs and embeddings, MongoDB Atlas Vector Search for storing, indexing, and retrieving high-dimensional vector data, and LlamaIndex for data integration. Together, Atlas Vector Search and LlamaIndex feed foundation models with up-to-date, proprietary enterprise data in real-time. Gradient designed the Accelerator Block for RAG to improve development velocity up to 10x by removing the need for infrastructure, setup, or in-depth knowledge around retrieval architectures. It also incorporates best practices in document chunking, re-rankers, and advanced retrieval strategies. As Tiffany Peng, VP of Engineering from Gradient explains, “Users who are looking to build custom AI applications can leverage Gradient’s Accelerator Block for RAG to set up RAG in seconds. Users just have to upload their data into our UI and Gradient will take care of the rest. That way users can leverage all of the benefits of RAG, without having to write any code or worry about the setup.” Peng goes on to say: “With MongoDB, developers can store data of any structure and then expose that data to OLTP, text search, and vector search processing using a single query API and driver. With this unification, developers have all of the core data services they need to build AI-powered apps that rely on working with live, operational data. For example, querying across keyword and vector search applications can filter on metadata and fuse result sets to quickly identify and return the exact context the model needs to generate grounded, accurate outputs. It is really hard to do this with other systems. That is because developers have to deal with the complexity of bolting on a standalone vector database to a separate OLTP database and search engine, and then keep all those separate systems in sync.” Check out our AI resource page to learn more about building AI-powered apps with MongoDB. Providing further customization and an industry edge With Gradient’s platform, businesses can further build, customize, and deploy AI as they see fit — in addition to the benefits that stem from the use of Gradient’s Accelerator Blocks. Gradient partners with key vendors and communities in the AI ecosystem to provide developers and businesses with best-of-breed technologies. This includes Llama-2 and Mistral LLMs — with additional options coming — alongside the BGE embedding model and the Langchain, LlamaIndex, and Haystack frameworks. MongoDB Atlas is included as a core part of the stack available in the Gradient platform. While any business can leverage its platform, Gradient’s domain-specific models in financial services and healthcare provide a unique advantage for businesses within those industries. For example in financial services, typical use cases for Gradient’s models include risk management, KYC, anti-money laundering (AML), and robo-advisers, along with forecasting and analysis. In healthcare, Gradient use cases include pre-screening and post-visit summaries, clinical research, billing, and benefits, along with claims auditing. What is common to both finance and healthcare is that these two industries are subject to comprehensive regulations where user privacy is key. By building on Gradient and its state-of-the-art open-source large language models (LLMs) and embedding models, enterprises maintain full ownership of their data and AI systems. Developers can train, tune, and deploy their models in private environments running on Gradient’s AI cloud, which the company claims delivers up to 7x higher performance than base foundation models at 10x lower cost than the hyperscale cloud providers. To keep up with the latest announcements from Gradient, follow the company on Twitter/X or LinkedIn . You can learn more about MongoDB Atlas Vector Search from our 10-minute learning byte .

February 1, 2024

Next →

Building AI With MongoDB: Integrating Vector Search And Cohere to Build Frontier Enterprise Apps

Cohere is the leading enterprise AI platform, building large language models (LLMs) which help businesses unlock the potential of their data. Operating at the frontier of AI, Cohere’s models provide a more intuitive way for users to retrieve, summarize, and generate complex information. Cohere offers both text generation and embedding models to its customers. Enterprises running mission-critical AI workloads select Cohere because its models offer the best performance-cost tradeoff and can be deployed in production at scale. Cohere’s platform is cloud-agnostic. Their models are accessible through their own API as well as popular cloud managed services, and can be deployed on a virtual private cloud (VPC) or even on-prem to meet companies where their data is, offering the highest levels of flexibility and control. Cohere’s leading Embed 3 and Rerank 3 models can be used with MongoDB Atlas Vector Search to convert MongoDB data to vectors and build a state-of-the-art semantic search system. Search results also can be passed to Cohere’s Command R family of models for retrieval augmented generation (RAG) with citations. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. A new approach to vector embeddings It is in the realm of embedding where Cohere has made a host of recent advances. Described as “AI for language understanding,” Embed is Cohere’s leading text representation language model. Cohere offers both English and multilingual embedding models, and gives users the ability to specify the type of data they are computing an embedding for (e.g., search document, search query). The result is embeddings that improve the accuracy of search results for traditional enterprise search or retrieval-augmented generation. One challenge developers faced using Embed was that documents had to be passed one by one to the model endpoint, limiting throughput when dealing with larger data sets. To address that challenge and improve developer experience, Cohere has recently announced its new Embed Jobs endpoint . Now entire data sets can be passed in one operation to the model, and embedded outputs can be more easily ingested back into your storage systems. Additionally, with only a few lines of code, Rerank 3 can be added at the final stage of search systems to improve accuracy. It also works across 100+ languages and offers uniquely high accuracy on complex data such as JSON, code, and tabular structure. This is particularly useful for developers who rely on legacy dense retrieval systems. Demonstrating how developers can exploit this new endpoint, we have published the How to use Cohere embeddings and rerank modules with MongoDB Atlas tutorial . Readers will learn how to store, index, and search the embeddings from Cohere. They will also learn how to use the Cohere Rerank model to provide a powerful semantic boost to the quality of keyword and vector search results. Figure 1: Illustrating the embedding generation and search workflow shown in the tutorial Why MongoDB Atlas and Cohere? MongoDB Atlas provides a proven OLTP database handling high read and write throughput backed by transactional guarantees. Pairing these capabilities with Cohere’s batch embeddings is massively valuable to developers building sophisticated gen AI apps. Developers can be confident that Atlas Vector Search will handle high scale vector ingestion, making embeddings immediately available for accurate and reliable semantic search and RAG. Increasing the speed of experimentation, developers and data scientists can configure separate vector search indexes side by side to compare the performance of different parameters used in the creation of vector embeddings. In addition to batch embeddings, Atlas Triggers can also be used to embed new or updated source content in real time, as illustrated in the Cohere workflow shown in Figure 2. Figure 2: MongoDB Atlas Vector Search supports Cohere’s batch and real time workflows. (Image courtesy of Cohere) Supporting both batch and real-time embeddings from Cohere makes MongoDB Atlas well suited to highly dynamic gen AI-powered apps that need to be grounded in live, operational data. Developers can use MongoDB’s expressive query API to pre-filter query predicates against metadata, making it much faster to access and retrieve the more relevant vector embeddings. The unification and synchronization of source application data, metadata, and vector embeddings in a single platform, accessed by a single API, makes building gen AI apps faster, with lower cost and complexity. Those apps can be layered on top of the secure, resilient, and mature MongoDB Atlas developer data platform that is used today by over 45,000 customers spanning startups to enterprises and governments handling mission-critical workloads. What's next? To start your journey into gen AI and Atlas Vector Search, review our 10-minute Learning Byte . In the video, you’ll learn about use cases, benefits, and how to get started using Atlas Vector Search.

April 25, 2024