LLM Engineering | Running local LLMs and APIs - LLM Engineer | AI Agent Engineer

Here are some notes and code snippets extracted from the LLM Engineering Course. Will expand in the future.

1️⃣ Run Local Models with Ollama

            # Install Ollama
            curl -fsSL https://ollama.com/install.sh | sh
            
            # Pull Llama 3.2 (small)
            ollama pull llama3.2
            
            # Chat from CLI
            ollama run llama3.2

            import ollama
          
            resp = ollama.chat(
              model="llama3.2",
              messages=[{"role": "user", "content": "Explain transformers in 2 sentences"}]
            )
            print(resp.choices[0].message.content)

2️⃣ OpenAI API

            from openai import OpenAI
            import os

            # set OPENAI_API_KEY in your environment or .env
            client = OpenAI()

            resp = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": "Summarize the transformer architecture."}
                ]
            )
            print(resp.choices[0].message.content)

Note: We can also stream Open AI answer by adding stream=True to the object.

3️⃣ Anthropic API (Stream)

            import anthropic

            claude = anthropic.Anthropic()
            resp = claude.messages.create(
                model="claude-3-haiku-20240307",
                max_tokens=200,
                temperature=0.7,
                system = "You are a teacher",
                messages=[{"role": "user", "content": "Explain how LLMs work"}],
            )
            response = ""
            with result as stream:
              for text in stream.text_stream:
                  response += text or ""
                  yield response

💬 Update:

We will build on top of these APIs, enabling tools, adding context and creating Agents in LangChain.

4️⃣ OpenRouter API

We can use OpenRouter to use different models provided by an unified API. Several models are available: OpenAI, Anthropic, Google, Mistral, DeepSeek and many others.

            import OpenAI

            message = [{"role": "user", "content": "Explain how LLMs work"},]
            openrouter = OpenAI(base_url="https://openrouter.ai/api/v1", api_key=openrouter_api_key)

            response = openrouter.chat.completions.create(model="kwaipilot/kat-coder-pro:free", messages=message)
            display(response.choices[0].message.content)

5️⃣ LangChain

LangChain helps you build agents, tools, retrieval pipelines and chain sequences with minimal boilerplate. Very useful when you want multi-step reasoning or tool usage.

            langchain_basic.py
            from langchain_openai import ChatOpenAI
            
            message = [{"role": "user", "content": "Explain how LLMs work"},]
            llm = ChatOpenAI(model="gpt-5-mini")

            response = llm.invoke(message)
            
            display(Markdown(response.content))

🔧 Example: Agent with a Python Tool

            from langchain.agents import initialize_agent, Tool
            from langchain.chat_models import ChatOpenAI
            
            def get_temperature(city: str):
                return f"The temperature in {city} is 20ºC."
            
            tools = [
                Tool(
                    name="weather",
                    func=get_temperature,
                    description="Return temperature of a city."
                )
            ]
            
            llm = ChatOpenAI(model="gpt-4o")
            
            agent = initialize_agent(
                tools=tools,
                llm=llm,
                agent="zero-shot-react-description"
            )
            
            print(agent.run("What's the temperature in Copenhagen?"))

Great for: multi-step reasoning, tool-enabled assistants, and structured agents.

6️⃣ LiteLLM

LiteLLM is a unified Python client for 100+ LLM providers (OpenAI, Anthropic, Groq, OpenRouter, Azure…). You use completion() or chat.completion() and change only the model string.

            from litellm import completion

            message = [{"role": "user", "content": "Explain how LLMs work"},]

            response = completion(model="openai/gpt-4.1", messages=message)
            reply = response.choices[0].message.content
            display(Markdown(reply))

Note: We can also get extra information on tokens and costs.

          print(f"Input tokens: {response.usage.prompt_tokens}")
          print(f"Output tokens: {response.usage.completion_tokens}")
          print(f"Total tokens: {response.usage.total_tokens}")
          print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

LLM Engineering | Running local LLMs and APIs

LLM Engineering Course notes and code snippets