Introduction

Modern Large Language Models (LLMs) exhibit impressive capabilities in generating human-like text across a wide range of domains. Yet, they are inherently limited by the data they were trained on, often lacking access to new or proprietary information. This shortfall has led to the growing popularity of Retrieval-Augmented Generation (RAG), which injects real-time context and knowledge into LLM prompts.

In this blog post, we will explore how to build a RAG chatbot using Oracle Database 23AI for vector-based storage, the OCI Generative AI service for embeddings and text generation, and Llama-Index for document chunking and retrieval orchestration.

Why Go Beyond Basic LLMs?

Limitations of LLMs

Static Training: LLMs do not automatically acquire new or organization-specific knowledge. Context Window Constraints: Even large context windows can run out if your dataset is extensive. Potential Hallucinations: Without explicit facts, models might generate inaccurate or fabricated content.

Private Data in Organizations

Most Data is Private: Enterprises deal with confidential documents, customer data, and internal knowledge bases.

Security and Compliance:

External LLM services may not be suitable or allowed for storing sensitive information.

Need for Accuracy:

RAG ensures the model references approved, up-to-date data rather than relying on its limited training corpus.

Importance of RAG

Retrieval-Augmented Generation is essential for: Providing dynamic, domain-specific context to the model at inference time. Keeping information updated and ensuring the chatbot references the most relevant chunks of knowledge. Maintaining speed and scalability by offloading retrieval to specialized vector stores.

Oracle Database 23AI for RAG

Oracle Database 23AI introduces robust support for storing and querying vectors directly:

Embeddings Storage: Store high-dimensional embeddings in a VECTOR type column. Similarity Search: Use built-in functions like VECTOR_DISTANCE for cosine similarity or other distance metrics. Indexing & Performance: Specialized indexing methods yield fast, scalable retrieval. These features greatly enhance RAG by enabling quick and accurate retrieval of text chunks before they’re fed into the LLM.

Building a RAG Chatbot Pipeline

Below is a high-level overview of how all components fit together:

  1. Data Ingestion and Chunking
  2. Embedding Generation
  3. Storing Chunks in Oracle Database 23AI
  4. Retrieval
  5. LLM Summarization/Answer Generation

We’ll walk through each step in detail.

Step 1: Creating the CHUNKS Table in Oracle Database 23AI

Before you start inserting embeddings, create a table to hold your data. Here’s an example schema:

CREATE TABLE CHUNKS 
(
    "ID"       VARCHAR2(64) NOT NULL,
    "CHUNK"    CLOB,
    "VEC"      VECTOR(1024, FLOAT64),
    "PAGE_NUM" VARCHAR2(10),
    "BOOK_ID"  NUMBER,
    PRIMARY KEY ("ID"),
    CONSTRAINT fk_book
        FOREIGN KEY (BOOK_ID)
        REFERENCES BOOKS (ID)
);

What This Table Does ID: Uniquely identifies each text chunk. CHUNK: The actual piece of text (stored as a CLOB). VEC: The embedding vector corresponding to the text chunk. PAGE_NUM / BOOK_ID: Useful metadata for referencing the original source.

Step 2: Chunking Your Documents with Llama-Index

Llama-Index (formerly GPT Index) makes it easy to split large documents into smaller pieces—known as chunks—to ensure embeddings are accurate and retrieval is more focused.

from llama_index import SimpleDirectoryReader

# 1. Read and chunk your local documents
reader = SimpleDirectoryReader("docs")
documents = reader.load_data()

Why Chunking Matters Improves Retrieval: Searching smaller pieces of text can yield more precise matches. Manages Context: LLMs can handle smaller text sections more effectively.

Step 3: Generating Embeddings with OCI Gen AI

We can use the OCI Generative AI service to generate embeddings for each chunk.

import oci
from llama_index import ServiceContext, set_global_service_context

# Example custom embedding class for OCI Gen AI
class OCIGenAIEmbeddings:
    def __init__(self, gen_ai_client):
        self.gen_ai_client = gen_ai_client

    def get_embedding(self, text):
        # Pseudocode; replace with actual OCI Gen AI Embedding API usage
        response = self.gen_ai_client.get_text_embedding(text)
        return response.embedding  # a list or array of floats

# Initialize OCI Gen AI client with appropriate config
gen_ai_client = ...  # e.g. oci.ai_language.LanguageClient(...)
oci_embedding_fn = OCIGenAIEmbeddings(gen_ai_client)

# Integrate with llama-index
service_context = ServiceContext.from_defaults(embed_model=oci_embedding_fn)
set_global_service_context(service_context)

Benefits of OCI Gen AI

High-quality embeddings optimized for enterprise data. Scalable and easy to integrate with other OCI services. Secure environment for sensitive workloads. Step 4: Storing Chunks and Embeddings in Oracle Database Once you have your chunks and their embeddings, insert them into the CHUNKS table:

import oracledb
from uuid import uuid4

connection = oracledb.connect(user="USER", password="PASSWORD", dsn="DSN_STRING")
cursor = connection.cursor()

insert_sql = """
INSERT INTO CHUNKS (ID, CHUNK, VEC, PAGE_NUM, BOOK_ID)
VALUES (:id, :chunk, :vec, :page_num, :book_id)
"""

for doc in documents:
    text_chunk = doc.get_text()
    embedding_vector = oci_embedding_fn.get_embedding(text_chunk)

    cursor.execute(insert_sql, {
        "id": str(uuid4()),
        "chunk": text_chunk,
        "vec": embedding_vector,  # The vector must be compatible with Oracle DB's vector type
        "page_num": "1",          # Example
        "book_id": 123            # Example
    })

connection.commit()

Tips for Success Vector Type: Ensure your client library supports inserting vectors into Oracle Database 23AI. Batch Inserts: For large datasets, consider batch insertion for better performance.

Step 5: Retrieving Similar Documents at Query Time

When a user asks a question, you generate an embedding for their query and perform a vector search. Below is an example query using cosine distance:

user_query = "Explain the concept of zero-shot learning?"
user_query_vec = oci_embedding_fn.get_embedding(user_query)

top_k = 5
approx_clause = ""  # or "BY APPROXIMATE" if you want approximate search

sql_retrieve = f"""
    SELECT C.ID,
           C.CHUNK,
           C.PAGE_NUM,
           VECTOR_DISTANCE(C.VEC, :1, COSINE) AS d,
           B.NAME
    FROM CHUNKS C, BOOKS B
    WHERE C.BOOK_ID = B.ID
    ORDER BY 4
    {approx_clause}
    FETCH FIRST {top_k} ROWS ONLY
"""

cursor.execute(sql_retrieve, [user_query_vec])
rows = cursor.fetchall()

Conclusion

Building a Retrieval-Augmented Generation chatbot is an excellent way to ensure your LLM has access to fresh, proprietary data while keeping everything private and secure. By leveraging:

Oracle Database 23AI for robust vector storage and similarity search OCI Generative AI for high-quality embeddings and text generation Llama-Index for seamless document chunking and retrieval orchestration …you can deliver dynamic, context-aware responses to users with high accuracy and minimal latency.

Whether you’re modernizing an enterprise knowledge base or creating a sophisticated internal chatbot, this RAG pipeline provides the scalability, flexibility, and control you need to harness the full potential of next-generation language models.