Docker Installation Guide

Getting Started with Cortex on Docker

This guide provides comprehensive instructions for installing and running Cortex in a Docker environment, including sensible defaults for security and performance.

Prerequisites

Before beginning, ensure you have:

Docker (version 20.10.0 or higher) or Docker Desktop
At least 8GB of RAM and 10GB of free disk space
For GPU support, make sure you install nvidia-container-toolkit. Here is an example on how to do so for Ubuntu:
# Install NVIDIA Container Toolkit curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

# Add repository curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Install sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker

Installation Methods

Method 1: Using Pre-built Image (Recommended)


# Pull the latest stable release
docker pull menloltd/cortex:latest


# Or pull a specific version (recommended for production)
docker pull menloltd/cortex:nightly-1.0.1-224

Version Tags

latest: Most recent stable release
nightly: Latest development build
x.y.z (e.g., 1.0.1): Specific version release

Method 2: Building from Source

Clone the repo:


git clone https://github.com/janhq/cortex.cpp.git
cd cortex.cpp
git submodule update --init

Build the Docker image:

Latest Build
Specific Version


docker build -t menloltd/cortex:local \
  --build-arg CORTEX_CPP_VERSION=$(git rev-parse HEAD) \
  -f docker/Dockerfile .


docker build \
  --build-arg CORTEX_LLAMACPP_VERSION=0.1.34 \
  --build-arg CORTEX_CPP_VERSION=$(git rev-parse HEAD) \
  -t menloltd/cortex:local \
  -f docker/Dockerfile .

Running Cortex (Securely)

[Optional] Create a dedicated user and data directory:


# Create a dedicated user
sudo useradd -r -s /bin/false cortex
export CORTEX_UID=$(id -u cortex)


# Create data directory with proper permissions
sudo mkdir -p /opt/cortex/data
sudo chown -R ${CORTEX_UID}:${CORTEX_UID} /opt/cortex

Set up persistent storage:


docker volume create cortex_data

Launch the container:

GPU Mode
CPU Mode


docker run --gpus all -d \
  --name cortex \
  --user ${CORTEX_UID}:${CORTEX_UID} \
  --memory=4g \
  --memory-swap=4g \
  --security-opt=no-new-privileges \
  -v cortex_data:/root/cortexcpp:rw \
  -v /opt/cortex/data:/data:rw \
  -p 127.0.0.1:39281:39281 \
  menloltd/cortex:latest


docker run -d \
  --name cortex \
  --user ${CORTEX_UID}:${CORTEX_UID} \
  --memory=4g \
  --memory-swap=4g \
  --security-opt=no-new-privileges \
  -v cortex_data:/root/cortexcpp:rw \
  -v /opt/cortex/data:/data:rw \
  -p 127.0.0.1:39281:39281 \
  menloltd/cortex:latest

Verification and Testing

Check container status:


docker ps | grep cortex
docker logs cortex

Expected output should show:


Cortex server starting...
Initialization complete
Server listening on port 39281

Test the API:


curl http://127.0.0.1:39281/healthz

Working with Cortex

Once your container is running, here's how to interact with Cortex. Make sure you have curl installed on your system.

1. Check Available Engines


curl --request GET --url http://localhost:39281/v1/engines --header "Content-Type: application/json"

You'll see something like:


{
  "data": [
    {
      "description": "This extension enables chat completion API calls using the Onnx engine",
      "format": "ONNX",
      "name": "onnxruntime",
      "status": "Incompatible"
    },
    {
      "description": "This extension enables chat completion API calls using the LlamaCPP engine",
      "format": "GGUF",
      "name": "llama-cpp",
      "status": "Ready",
      "variant": "linux-amd64-avx2",
      "version": "0.1.37"
    }
  ],
  "object": "list",
  "result": "OK"
}

2. Download Models

First, set up event monitoring:

Install websocat following these instructions
Open a terminal and run: websocat ws://localhost:39281/events

Then, in a new terminal, download your desired model:

Pull model from Cortex's Hugging Face hub
Pull model directly from a URL


curl --request POST --url http://localhost:39281/v1/models/pull  --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'


curl --request POST --url http://localhost:39281/v1/models/pull  --header 'Content-Type: application/json' --data '{"model": "https://huggingface.co/afrideva/zephyr-smol_llama-100m-sft-full-GGUF/blob/main/zephyr-smol_llama-100m-sft-full.q2_k.gguf"}'

To see your downloaded models:


curl --request GET --url http://localhost:39281/v1/models

3. Using the Model

First, start your model:


curl --request POST --url http://localhost:39281/v1/models/start --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'

Then, send it a query:


curl --request POST --url http://localhost:39281/v1/chat/completions --header 'Content-Type: application/json' --data '{
    "frequency_penalty": 0.2,
    "max_tokens": 4096,
    "messages": [{"content": "Tell me a joke", "role": "user"}],
    "model": "tinyllama:gguf",
    "presence_penalty": 0.6,
    "stop": ["End"],
    "stream": true,
    "temperature": 0.8,
    "top_p": 0.95
  }'

4. Shutting Down

When you're done, stop the model:


curl --request POST --url http://localhost:39281/v1/models/stop --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'

Maintenance and Troubleshooting

Common Issues

Permission Denied Errors:


sudo chown -R ${CORTEX_UID}:${CORTEX_UID} /opt/cortex/data
docker restart cortex

Container Won't Start:


docker logs cortex
docker system info # Check available resources

Cleanup


# Stop and remove container
docker stop cortex
docker rm cortex


# Remove data (optional)
docker volume rm cortex_data
sudo rm -rf /opt/cortex/data


# Remove image
docker rmi cortexai/cortex:latest

Updating Cortex


# Pull latest version
docker pull cortexai/cortex:latest


# Stop and remove old container
docker stop cortex
docker rm cortex
# Start new container (use run command from above)

Best Practices

Always use specific version tags in production
Regularly backup your data volume
Monitor container resources using docker stats cortex
Keep your Docker installation updated

Getting Started with Cortex on Docker​

Prerequisites​

Installation Methods​

Method 1: Using Pre-built Image (Recommended)​

Method 2: Building from Source​

Running Cortex (Securely)​

Verification and Testing​

Working with Cortex​

1. Check Available Engines​

2. Download Models​

3. Using the Model​

4. Shutting Down​

Maintenance and Troubleshooting​

Common Issues​

Cleanup​

Updating Cortex​