How I Built a Production-Ready RAG Chatbot: A Comprehensive Technical Guide

1. Introduction to Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is an advanced natural language processing technique that enhances large language model responses by dynamically retrieving relevant context from a document corpus. This approach allows chatbots to provide more accurate, contextually-aware answers by grounding responses in specific source documents.

2. System Architecture Overview

In this blog, we’ll walk through creating a powerful Document QA System. This system enables users to upload PDF documents, process them into vector embeddings, and ask questions about their content. The architecture uses FastAPIfor API creation, LangChain for natural language processing, and FAISS for efficient document retrieval.

The RAG chatbot implementation consists of several key components:

Document Ingestion: Process and parse PDF documents
Text Embedding: Convert text chunks into dense vector representations
Vector Storage: Efficient similarity search using FAISS
Question Answering: Retrieve relevant context and generate responses
API Layer: FastAPI for serving the RAG functionality

3. Prerequisites and System Preparation

System Requirements

Python 3.8+
CUDA-compatible GPU (recommended, but optional)
Linux/macOS/Windows

3.1 Environment Setup

3.2 Key Technologies

FastAPI: Web framework for building APIs
HuggingFace Embeddings: Advanced text embedding model
FAISS: Efficient vector similarity search
Ollama: Local LLM inference
RecursiveCharacterTextSplitter: Document text chunking

3.3 Document Processing Pipeline

The document processing involves several critical steps:

PDF Text Extraction
Text Cleaning
Text Chunking
Vector Embedding Generation
Vector Store Creation

Code Breakdown

1. Importing Libraries and Setting Up Logging

Purpose: These are the libraries and modules used throughout the app.

- FastAPI: Essential for building the API and handling HTTP requests.

- File: Used for managing file uploads.

- UploadFile: Handles file operations in the API.

- HTTPException: Raises HTTP errors for invalid requests.

- CORSMiddleware: Enables Cross-Origin Resource Sharing (CORS) to allow communication between different origins.

- langchain and related imports: Facilitates text processing, embeddings, vector storage, and LLM operations.

- torch: Checks the availability of GPU or CPU for computations.

- logging: Logs key events and errors to assist with debugging.

2. Initializing the FastAPI App

FastAPI Instance:title: The name of the API.
docs_url and openapi_url: Hide the default documentation to customize it later.
root_path: Sets the root URL for the API.

3. Middleware for CORS

CORS Settings:Allows requests from any origin (allow_origins=[“*”]).
Supports all methods (GET, POST, etc.) and headers.

4. Custom OpenAPI Schema and Documentation

Purpose: These endpoints provide customized OpenAPI JSON and Swagger documentation.

5. Configuring Device and Models

Device Setup: Checks if a GPU (cuda) is available for faster computations; otherwise defaults to CPU.
Model Configurations:HuggingFaceEmbeddings: Creates embeddings for text using the specified model.Ollama: Configures the language model (mistral) for answering questions.

6. Prompt Templates

Purpose:qa_template: Guides the LLM to answer questions based on a given context.
summary_template: Directs the LLM to summarize the provided text.

7. Chains and Text Splitter

8. Dictionaries to Store Data

Purpose: Stores document metadata (documents) and embeddings (vector_stores) using dictionaries.

9. Utility Functions

Extract Text from PDF

Extracts and cleans text from PDF files using PyPDF.

Clean Text

Normalizes and formats text by fixing spacing and punctuation issues.

Process Document

Cleans the text, splits it into chunks, and generates embeddings using FAISS.

10. Upload Endpoint

Accepts PDF uploads, extracts text, generates embeddings, and stores data in dictionaries.

11. Ask Question Endpoint

Answers questions about uploaded documents using embeddings for context retrieval and LLM for generation.

12. Delete Document Endpoint

Deletes a document’s file and its associated data from storage.

13. Running the Application

Starts the FastAPI application using uvicorn.

1. Setting Up the Project with StreamlitI(UI)

First, we need to set up Streamlit, which is a powerful tool for building interactive web applications with minimal code. To get started, you need to install Streamlit and other dependencies like requests and base64.

Now, let’s begin by importing the necessary libraries and configuring our Streamlit page layout:

Get Sourabh Ligade’s stories in your inbox

Join Medium for free to get updates from this writer.

Now, let’s begin by importing the necessary libraries and configuring our Streamlit page layout:

The st.set_page_config method is used to set up the layout and title of the page. Here, we have chosen a wide layoutand given the app the title “Chat with Any PDF”.

2. Styling the Interface

Next, we style the interface to make it visually appealing. In the following code snippet, we apply custom CSS for the body, headers, and footer of the app, ensuring the design is responsive and modern.

This custom CSS ensures that the headers, footers, and overall page have an elegant design, making the app both functional and visually appealing. The logo and title use gradient effects, and the footer ensures proper alignment at the bottom of the page.

3. Uploading and Displaying the PDF

The key feature of our app is the ability to upload and display PDFs. This is achieved with st.file_uploader which allows users to upload a PDF file. The uploaded file is then processed and displayed using an iframe:

4. Interacting with the PDF Using AI

Once the PDF is uploaded, the user can ask questions about the document. To achieve this, we integrate an API that processes the document and responds to user queries.

We use a simple input field for the user to type their question. When the user asks a question, the app sends the question to an API endpoint using the requests library, which processes the PDF and provides an answer.

In this code, the user’s question is sent to a local API (localhost:8000/ask/{asset_id}). The API processes the question, retrieves relevant information from the document, and responds with an answer.

5. Displaying the Q&A Conversation

As the conversation progresses, each exchange (question and answer) is displayed in a conversational format. We store each user’s question and the assistant’s response in st.session_state.messages, which is then rendered on the page.

Here, we dynamically display the last two interactions between the user and the assistant. The st.markdown function is used to render each question and answer inside a styled div.

6. Final Touch: Footer and Custom Messages

Lastly, we add a footer that credits the AI-powered system and Streamlit. This footer helps reinforce the app’s branding.

This footer is displayed at the bottom of the app, making the interface complete.

Quick Guide to Running the Application

Follow these steps to set up and run the application, including the backend server, AI model, and frontend.

Setup Instructions:

To make the setup process easier, I have provided a setup.sh script that automates the entire process. Simply follow the steps below:

Download the setup.sh file: First, download the setup.sh file to your local machine.
Run the Setup Script:Give the script execution permissions and run it with the following commands:bash

What Does the setup.sh Script Do?

Install Dependencies: It installs all the required Python libraries from the requirements.txt file using pip.
Install Ollama: It downloads and installs Ollama, which is needed to pull the AI model.
Start the Backend: The script launches the backend server using uvicorn, which will listen on 0.0.0.0 and port 8000.
Pull the AI Model: It pulls the mistral model using Ollama, which is used to power the document Q&A functionality.
Start the Frontend: Finally, it runs the Streamlit app, which serves the frontend at http://localhost:8501.

Run the Setup Script:Give the script execution permissions and run it with the following commands:bash

Conclusion:

Congratulations! 🎉 You’ve successfully set up your AI-powered PDF chat application. Now you can upload PDFs, ask questions, and receive insightful answers powered by advanced AI models. This setup will streamline your document reading experience, transforming how you interact with your content.

To make things even easier, here’s a short demo video showcasing the application’s functionality in action. Watch how it works and see how you can leverage this tool for your document-related needs:

Watch the Demo Video

Troubleshooting:

Ensure ports (like 8000) are open if you face connection issues with uvicorn.
Verify Ollama installation using ollama — version.

Customizations & Enhancements:

Modify the UI with custom CSS or add features like saving chat history.
Swap the model or add multilingual support to broaden the app’s capabilities.

Performance:

For large PDFs, implement pagination to improve response time.

Final Thoughts:

This setup provides a great starting point for building a chat-enabled PDF assistant. Feel free to explore and enhance the functionality based on your needs. Happy coding!

For the complete code and setup, visit the repository: GitHub Repository Link

What’s Next?

Explore more features like summarization, keyword extraction, and more!
If you’re feeling adventurous, you can extend the functionality by adding more advanced AI capabilities.

Thanks for following along, and happy coding! 🚀