Building a Private RAG System with Ollama, LangChain and Chroma

Part 4: RAG implementation

Posted by Kike Bodí on January 2026

Series Index

  1. Prerequisites
  2. Populate the Vector Database
  3. Vector Retriever
  4. RAG Implementation
  5. Chat UI
  6. Evaluation
  7. Performance improvements

4. RAG implementation

In addition to the Retriever, we will need our Auto-regresive (conversational) LLM:

llm = ChatOpenAI(temperature=0, model_name=MODEL)

Together with the Retriever we defined in the previous post, we have everything we need to build the RAG:

SYSTEM_PROMPT_TEMPLATE = """
You are a knowledgeable, friendly assistant representing Desert Leaves.
You are chatting internally with a technician from Desert Leaves.
If relevant, use the given context to answer any question.
If you don't know the answer, say so.
Context:
{context}
"""

def fetch_context(question: str) -> list[Document]:
    """
    Retrieve relevant context documents for a question.
    """
    return retriever.invoke(question, k=RETRIEVAL_K)


def combined_question(question: str, history: list[dict] = []) -> str:
    """
    Combine all the user's messages into a single string.
    """
    prior = "\n".join(m["content"] for m in history if m["role"] == "user")
    return prior + "\n" + question


def answer_question(question: str, history: list[dict] = []) -> tuple[str, list[Document]]:
    """
    Answer the given question with RAG; return the answer and the context documents.
    """
    combined = combined_question(question, history)
    docs = fetch_context(combined)
    context = "\n\n".join(doc.page_content for doc in docs)
    system_prompt = SYSTEM_PROMPT.format(context=context)
    messages = [SystemMessage(content=system_prompt)]
    messages.extend(convert_to_messages(history))
    messages.append(HumanMessage(content=question))
    response = llm.invoke(messages)
    return response.content, docs

5. UI with Gradio

gr.ChatInterface(answer_question).launch()

As simple as that

Nice work! Now we have our own private RAG system.

Now a little tune-up for production in the next sections: Part 6: Evaluation (available soon)


More LLM Engineering articles