SoFunction
Updated on 2025-04-07

Localized deployment DeepSeek full guide (linux, windows, and mac system deployment)

1. Linux system deployment

Preparation

Hardware requirements: The server needs to have sufficient computing resources. It is recommended to use NVIDIA GPUs, such as A100, V100, etc., which can speed up model inference. The memory is at least 32GB, and it is recommended to use a high-speed solid-state drive (SSD) to ensure efficient data reading and writing.
Software environment: Install Linux operating system, such as Ubuntu 20.04. At the same time, install Python 3.8 and above, as well as related dependencies, such as PyTorch, transformers, etc. Taking CUDA 11.7 as an example, the command to install PyTorch is as follows:

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url /whl/cu117

Install the transformers library:

pip install transformers

2. Download the DeepSeek model
Visit the official model download address of DeepSeek and select the appropriate model version according to your needs. Currently, DeepSeek has different parameter sizes to choose from, such as DeepSeek-7B, DeepSeek-13B, etc.
Use the wget command to download the model file, the example is as follows:

wget /

After the download is complete, unzip the model file:

tar -zxvf 

3. Deployment steps
Create a project directory: Create a new project directory locally to store and deploy related files and scripts.

mkdir deepseek_deployment
cd deepseek_deployment

Writing inference scripts: Write inference scripts using Python, such as. Import the necessary libraries in the script, load the DeepSeek model and word participle to implement the inference function. The sample code is as follows:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load word participle and modeltokenizer = AutoTokenizer.from_pretrained("path/to/DeepSeek-7B")
model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).cuda()
# Define inference functionsdef generate_text(prompt, max_length=100):
    input_ids = (prompt, return_tensors='pt').cuda()
    output = (input_ids, max_length=max_length, num_beams=5, early_stopping=True)
    return (output[0], skip_special_tokens=True)
# Example usageprompt = "Please introduce the development trends of artificial intelligence"
generated_text = generate_text(prompt)
print(generated_text)
Pleasepath/to/DeepSeek-7BReplace with the actual model path。
Start the service:If you need to deploy the model as a service,Available FastAPI Same frame。Install first FastAPI and uvicorn:
pip install fastapi uvicorn
Then write the service script,like:
from fastapi import FastAPI
from pydantic import BaseModel
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
app = FastAPI()
# Load word participle and modeltokenizer = AutoTokenizer.from_pretrained("path/to/DeepSeek-7B")
model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).cuda()
class PromptRequest(BaseModel):
    prompt: str
    max_length: int = 100
@("/generate")
def generate_text(request: PromptRequest):
    input_ids = (, return_tensors='pt').cuda()
    output = (input_ids, max_length=request.max_length, num_beams=5, early_stopping=True)
    return {"generated_text": (output[0], skip_special_tokens=True)}

Similarly, replace path/to/DeepSeek-7B with the actual path.
Start the service:

uvicorn :app --host 0.0.0.0 --port 8000

2. Windows system deployment

1. Preparation

Hardware requirements: Similar to Linux systems, it is recommended to be equipped with NVIDIA GPUs, such as RTX 30 series and above, for better inference performance. Memory is recommended to use 32GB or above, and use a high-speed solid-state drive for storage.
Software environment: Install Python 3.8 and above, you can download the installation package from the Python official website for installation. When installing, check the "Add Python to PATH" option to facilitate subsequent command-line operations. At the same time, install the PyTorch and transformers libraries. Since CUDA installation under Windows is more complicated, it is recommended to use conda for environment management. First install Anaconda, then create a new conda environment and install the dependencies:

conda create -n deepseek_env python=3.8
conda activate deepseek_env
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install transformers

2. Download the DeepSeek model

Visit the official DeepSeek model download address and select the appropriate model version.
You can use a browser to download the model file directly, or you can use wget (need to be installed in advance) or curl tools to download it in the command line. For example, use curl to download the DeepSeek-7B model:

curl -O /

After the download is completed, decompress the model file and you can use decompression tools such as 7-Zip.

3. Deployment steps

Create a project directory: Create a new folder in File Explorer, such as "deepseek_deployment", to store and deploy related files.
Writing inference scripts: Use text editors (such as Notepad++, VS Code, etc.) to write Python inference scripts with similar content to Linux versions:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load word participle and modeltokenizer = AutoTokenizer.from_pretrained("path/to/DeepSeek-7B")
model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).to('cuda')
# Define inference functionsdef generate_text(prompt, max_length=100):
    input_ids = (prompt, return_tensors='pt').to('cuda')
    output = (input_ids, max_length=max_length, num_beams=5, early_stopping=True)
    return (output[0], skip_special_tokens=True)
# Example usageprompt = "Please introduce the development trends of artificial intelligence"
generated_text = generate_text(prompt)
print(generated_text)
Pleasepath/to/DeepSeek-7BReplace with the actual model path。
Start the service:To deploy as a service,Can also be used FastAPI and uvicorn。Activate on the command line conda Install related libraries after environment:
pip install fastapi uvicorn
Writing a service script,Content and Linux Version similar:
from fastapi import FastAPI
from pydantic import BaseModel
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
app = FastAPI()
# Load word participle and modeltokenizer = AutoTokenizer.from_pretrained("path/to/DeepSeek-7B")
model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).to('cuda')
class PromptRequest(BaseModel):
    prompt: str
    max_length: int = 100
@("/generate")
def generate_text(request: PromptRequest):
    input_ids = (, return_tensors='pt').to('cuda')
    output = (input_ids, max_length=request.max_length, num_beams=5, early_stopping=True)
    return {"generated_text": (output[0], skip_special_tokens=True)}

Replace path/to/DeepSeek-7B with the actual path.
Start the service:

uvicorn :app --host 0.0.0.0 --port 8000

3. Mac system deployment

1. Preparation

Hardware requirements: If you are a Mac equipped with an M1 or M2 chip, you can use its powerful computing power to deploy. For Macs with Intel chips, it is recommended to have a better graphics card (if there is a discrete graphics card). The memory is at least 16GB and the storage uses a high-speed solid-state drive.
Software environment: Install Python 3.8 and above, and can be installed through Homebrew. First install Homebrew, then install Python and related dependency libraries:

/bin/bash -c "$(curl -fsSL /Homebrew/install/HEAD/)"
brew install python
pip install torch torchvision torchaudio
pip install transformers

If it is a Mac with M1 or M2 chip, you should pay attention to selecting the version that is suitable for the ARM architecture when installing PyTorch:

pip install torch torchvision torchaudio -f /whl/torch_stable.html

2. Download the DeepSeek model

Visit the official DeepSeek model download address and select the appropriate model version.
Use the curl command to download the model file, for example:

curl -O /

After the download is complete, unzip the model file:

tar -zxvf 

3. Deployment steps

Create a project directory: Create a project directory in the terminal using the following command:

mkdir deepseek_deployment
cd deepseek_deployment

Writing inference scripts: Use text editors (such as TextEdit, VS Code, etc.) to write Python inference scripts with similar content to the previous one:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load word participle and modeltokenizer = AutoTokenizer.from_pretrained("path/to/DeepSeek-7B")
if .is_available():
    model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).to('mps')
else:
    model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).to('cuda' if .is_available() else 'cpu')
# Define inference functionsdef generate_text(prompt, max_length=100):
    input_ids = (prompt, return_tensors='pt').to()
    output = (input_ids, max_length=max_length, num_beams=5, early_stopping=True)
    return (output[0], skip_special_tokens=True)
# Example usageprompt = "Please introduce the development trends of artificial intelligence"
generated_text = generate_text(prompt)
print(generated_text)
Pleasepath/to/DeepSeek-7BReplace with the actual model path。
Start the service:To deploy as a service,Install FastAPI and uvicorn:
pip install fastapi uvicorn
Writing a service script,The content is similar to the previous one:
from fastapi import FastAPI
from pydantic import BaseModel
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
app = FastAPI()
# Load word participle and modeltokenizer = AutoTokenizer.from_pretrained("path/to/DeepSeek-7B")
if .is_available():
    model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).to('mps')
else:
    model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).to('cuda' if .is_available() else 'cpu')
class PromptRequest(BaseModel):
    prompt: str
    max_length: int = 100
@("/generate")
def generate_text(request: PromptRequest):
    input_ids = (, return_tensors='pt').to()
    output = (input_ids, max_length=request.max_length, num_beams=5, early_stopping=True)
    return {"generated_text": (output[0], skip_special_tokens=True)}

Replace path/to/DeepSeek-7B with the actual path.
Start the service:

uvicorn :app --host 0.0.0.0 --port 8000

4. Optimization and precautions

Model quantization: To reduce memory usage and improve inference speed, the model can be quantized, such as using INT8 quantization.
Security settings: When deploying services, pay attention to setting reasonable access rights and security policies to prevent the model from being maliciously called.
Performance monitoring: In Linux and Windows systems, NVIDIA System Management Interface (nvidia-smi) can be used to monitor GPU usage; in Mac systems, for M1/M2 chips, top commands and other systems can be used to monitor system resource usage to ensure that the model runs in the best state.

This is the article about localized deployment DeepSeek full guide. For more related localized deployment DeepSeek content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!