Here are some notes and code snippets extracted from the LLM Engineering Course. Will expand in the future.
1️⃣ Run Local Models with Ollama
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull Llama 3.2 (small)
ollama pull llama3.2
# Chat from CLI
ollama run llama3.2
import ollama
resp = ollama.chat(
model="llama3.2",
messages=[{"role": "user", "content": "Explain transformers in 2 sentences"}]
)
print(resp.choices[0].message.content)
2️⃣ OpenAI API
from openai import OpenAI
import os
# set OPENAI_API_KEY in your environment or .env
client = OpenAI()
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize the transformer architecture."}
]
)
print(resp.choices[0].message.content)
Note: We can also stream Open AI answer by adding stream=True to the object.
3️⃣ Anthropic API (Stream)
import anthropic
claude = anthropic.Anthropic()
resp = claude.messages.create(
model="claude-3-haiku-20240307",
max_tokens=200,
temperature=0.7,
system = "You are a teacher",
messages=[{"role": "user", "content": "Explain how LLMs work"}],
)
response = ""
with result as stream:
for text in stream.text_stream:
response += text or ""
yield response
💬 Update:
We will build on top of these APIs, enabling tools, adding context and creating Agents in LangChain.
4️⃣ OpenRouter API
We can use OpenRouter to use different models provided by an unified API. Several models are available: OpenAI, Anthropic, Google, Mistral, DeepSeek and many others.
import OpenAI
message = [{"role": "user", "content": "Explain how LLMs work"},]
openrouter = OpenAI(base_url="https://openrouter.ai/api/v1", api_key=openrouter_api_key)
response = openrouter.chat.completions.create(model="kwaipilot/kat-coder-pro:free", messages=message)
display(response.choices[0].message.content)
5️⃣ LangChain
LangChain helps you build agents, tools, retrieval pipelines and chain sequences with minimal boilerplate. Very useful when you want multi-step reasoning or tool usage.
langchain_basic.py
from langchain_openai import ChatOpenAI
message = [{"role": "user", "content": "Explain how LLMs work"},]
llm = ChatOpenAI(model="gpt-5-mini")
response = llm.invoke(message)
display(Markdown(response.content))
🔧 Example: Agent with a Python Tool
from langchain.agents import initialize_agent, Tool
from langchain.chat_models import ChatOpenAI
def get_temperature(city: str):
return f"The temperature in {city} is 20ºC."
tools = [
Tool(
name="weather",
func=get_temperature,
description="Return temperature of a city."
)
]
llm = ChatOpenAI(model="gpt-4o")
agent = initialize_agent(
tools=tools,
llm=llm,
agent="zero-shot-react-description"
)
print(agent.run("What's the temperature in Copenhagen?"))
Great for: multi-step reasoning, tool-enabled assistants, and structured agents.
6️⃣ LiteLLM
LiteLLM is a unified Python client for 100+ LLM providers (OpenAI, Anthropic, Groq, OpenRouter, Azure…).
You use completion() or chat.completion() and change only the model string.
from litellm import completion
message = [{"role": "user", "content": "Explain how LLMs work"},]
response = completion(model="openai/gpt-4.1", messages=message)
reply = response.choices[0].message.content
display(Markdown(reply))
Note: We can also get extra information on tokens and costs.
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")
More LLM Engineering articles
- Building a Private RAG System with LangChain, Chroma, and Local LLMs – private, enterprise-ready RAG and vector database pipeline.
- LLM Engineering | Token optimization – caching, thin system prompts, and cost-optimized production usage.