Set up Ollama service configuration on Linux (common environment variables)

Setting environment variables on Linux

1. By callingsystemctl edit Edit the systemd service. This will open an editor. This can be directly passedvim /etc/systemd/system/, open Edit.
2. For each environment variable,[Service]Add a line under the sectionEnvironment：

vim /etc/systemd/system/
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_ORIGINS=*"
Environment="OLLAMA_DEBUG=1"
Environment="OLLAMA_FLASH_ATTENTION=1"

For each environment variable，exist[Service]Add a line under the sectionEnvironment.
1. OLLAMA_HOST=0.0.0.0 External Internet access
2. OLLAMA_MODELS=/mnt/data/.ollama/models Model default download path
3. OLLAMA_KEEP_ALIVE=24h Set the model to load into memory to keep 24 Hour(By default，模型exist卸载之前会exist内存中保留 5 minute)
4. OLLAMA_HOST=0.0.0.0:8080 Modify the default port 11434 port
5. OLLAMA_NUM_PARALLEL=2 set up 2 Concurrent requests for each user
6. OLLAMA_MAX_LOADED_MODELS=2 set up同时加载多个模型

#For the change to take effect, you need to reload the configuration of systemd.  Use the following command:sudo systemctl daemon-reload
# Finally, restart the service to apply the changes:sudo systemctl restart ollama

By default，ollamaThe storage directory of the model is as follows：
macOS: `~/.ollama/models` 
Linux: `/usr/share/ollama/.ollama/models`  
Windows: `C:\Users\&lt;username&gt;\.ollama\models`

journalctl -u ollama | grep -i 'prompt='    #View log/set verbose    #set up以查看tokenspeed

Configure Ollama

Ollama provides a variety of environment variables for configuration:

OLLAMA_DEBUG: Whether to enable debug mode, default is false.
OLLAMA_FLASH_ATTENTENT: Whether to flash attention, default to true.
OLLAMA_HOST: The host address of the Ollama server, default to empty.
OLLAMA_KEEP_ALIVE: The time to stay connected, default is 5m.
OLLAMA_LLM_LIBRARY: LLM library, default is empty.
OLLAMA_MAX_LOADED_MODELS: Maximum number of loaded models, default is 1.
OLLAMA_MAX_QUEUE: Maximum number of queues, default is empty.
OLLAMA_MAX_VRAM: Maximum virtual memory, default is empty.
OLLAMA_MODELS: Model directory, default is empty.
OLLAMA_NOHISTORY: Whether to save history, default is false.
OLLAMA_NOPRUNE: Whether to enable pruning, default is false.
OLLAMA_NUM_PARALLEL: Parallel number, default is 1.
OLLAMA_ORIGINS: Allowed source, default is empty.
OLLAMA_RUNNERS_DIR: Runner directory, default is empty.
OLLAMA_SCHED_SPREAD: Scheduled distribution, default is empty.
OLLAMA_TMPDIR: Temporary file directory, default is empty. Here is the optimized list in the desired format:
OLLAMA_DEBUG: Whether to enable debug mode, default is false.
OLLAMA_FLASH_ATTENTENT: Whether to flash attention, default to true.
OLLAMA_HOST: The host address of the Ollama server, default to empty.
OLLAMA_KEEP_ALIVE: The time to stay connected, default is 5m.
OLLAMA_LLM_LIBRARY: LLM library, default is empty.
OLLAMA_MAX_LOADED_MODELS: Maximum number of loaded models, default is 1.
OLLAMA_MAX_QUEUE: Maximum number of queues, default is empty.
OLLAMA_MAX_VRAM: Maximum virtual memory, default is empty.
OLLAMA_MODELS: Model directory, default is empty.
OLLAMA_NOHISTORY: Whether to save history, default is false.
OLLAMA_NOPRUNE: Whether to enable pruning, default is false.
OLLAMA_NUM_PARALLEL: Parallel number, default is 1.
OLLAMA_ORIGINS: Allowed source, default is empty.
OLLAMA_RUNNERS_DIR: Runner directory, default is empty.
OLLAMA_SCHED_SPREAD: Scheduled distribution, default is empty.
OLLAMA_TMPDIR: Temporary file directory, default is empty.

Ollama uses common instructions:

ollama serve #Start ollama
ollama create #Create a model from a model file
ollama show #Show model information
ollama run #Run model
ollama pull #pull the model from the registry
ollama push #Push the model to the registry
ollama list #list the model
ollama cp #Copy Model
ollama rm #Delete the model
ollama help #Get help information about any command

Import huggingface model

The latest version of Ollama begins to support pulling various models directly from Huggingface Hub, including community-created GGUF quantitative models. Users can quickly run these models through simple command-line instructions, using the following command:

ollama run /{username}/{repository}

Choose a different quantitative plan，Just add a tag to the command：

ollama run /{username}/{repository}:{quantization}

For example：Quantitative names are case-insensitive
ollama run /bartowski/Llama-3.2-3B-Instruct-GGUF:IQ3_M  
ollama run /bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0

You can also use the complete file name directly as the tag：  
ollama run /bartowski/Llama-3.2-3B-Instruct-GGUF:Llama-3.2-3B-Instruct-IQ3_M.gguf

Manual installation

Download and unzip the compressed package

curl -L /download/ -o 
sudo tar -C /usr -xzf

start upollama
ollama serve

Install a specific version

The OLLAMA_VERSION environment variable is used with the installation script to install specific versions of Ollama, including pre-releases. The version number can be found in the releases page.

releasespage：/ollama/ollama/releases

For example：
curl -fsSL / | OLLAMA_VERSION=0.3.9 sh

View log

To view the logs for Ollama running as a service, run:

journalctl -e -u ollama

This is the end of this article about setting up Ollama service configuration (common environment variables) on Linux. For more related Ollama service configuration content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!