|
|
Getting Started with Ollama - Running Local LLMs in Python
Author: Venkata Sudhakar
Ollama is an open-source tool that lets developers run large language models locally on their own machine without needing cloud API keys or internet access. It supports popular models like Llama 3, Mistral, Phi-3, and Gemma, and provides a simple REST API and command-line interface. At ShopMax India, the engineering team uses Ollama for offline prototyping, testing prompts without API costs, and processing sensitive internal data that cannot leave the local network. Installing Ollama is straightforward on Windows, macOS, and Linux. After installation, models can be pulled from the Ollama model library with a single command. The Ollama server starts automatically and listens on port 11434 by default. The ollama Python package provides a convenient client wrapper around the REST API. The below example shows how to install Ollama, pull a model, and run a basic chat completion in Python.
Once Ollama is running and the model is pulled, you can call it from Python as shown below.
It gives the following output,
ShopMax India offers a 7-day return policy for laptops from the date
of delivery, provided the product is in its original condition with
all accessories and packaging. Customers can initiate returns through
our website or by visiting any ShopMax service center in Mumbai,
Bangalore, Delhi, or Hyderabad.
The ollama.chat() function is synchronous by default and blocks until the model finishes generating the response. For real-time streaming output, you can pass stream=True and iterate over the response chunks. Running models locally means response times depend on your hardware - a modern laptop with 16GB RAM can run the 3B parameter Llama 3.2 model at a comfortable speed suitable for development and testing.
|
|