Skip to main content

Docker Installation Guide

Getting Started with Cortex on Docker

This guide provides comprehensive instructions for installing and running Cortex in a Docker environment, including sensible defaults for security and performance.

Prerequisites

Before beginning, ensure you have:

  • Docker (version 20.10.0 or higher) or Docker Desktop
  • At least 8GB of RAM and 10GB of free disk space
  • For GPU support, make sure you install nvidia-container-toolkit. Here is an example on how to do so for Ubuntu:

    # Install NVIDIA Container Toolkit
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg


    # Add repository
    curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list


    # Install
    sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
    sudo nvidia-ctk runtime configure --runtime=docker
    sudo systemctl restart docker

Installation Methods


# Pull the latest stable release
docker pull menloltd/cortex:latest


# Or pull a specific version (recommended for production)
docker pull menloltd/cortex:nightly-1.0.1-224

Version Tags
  • latest: Most recent stable release
  • nightly: Latest development build
  • x.y.z (e.g., 1.0.1): Specific version release

Method 2: Building from Source

  1. Clone the repo:

git clone https://github.com/janhq/cortex.cpp.git
cd cortex.cpp
git submodule update --init

  1. Build the Docker image:

docker build -t menloltd/cortex:local \
--build-arg CORTEX_CPP_VERSION=$(git rev-parse HEAD) \
-f docker/Dockerfile .

Running Cortex (Securely)

  1. [Optional] Create a dedicated user and data directory:

# Create a dedicated user
sudo useradd -r -s /bin/false cortex
export CORTEX_UID=$(id -u cortex)


# Create data directory with proper permissions
sudo mkdir -p /opt/cortex/data
sudo chown -R ${CORTEX_UID}:${CORTEX_UID} /opt/cortex

  1. Set up persistent storage:

docker volume create cortex_data

  1. Launch the container:

docker run --gpus all -d \
--name cortex \
--user ${CORTEX_UID}:${CORTEX_UID} \
--memory=4g \
--memory-swap=4g \
--security-opt=no-new-privileges \
-v cortex_data:/root/cortexcpp:rw \
-v /opt/cortex/data:/data:rw \
-p 127.0.0.1:39281:39281 \
menloltd/cortex:latest

Verification and Testing

  1. Check container status:

docker ps | grep cortex
docker logs cortex

Expected output should show:


Cortex server starting...
Initialization complete
Server listening on port 39281

  1. Test the API:

curl http://127.0.0.1:39281/healthz

Working with Cortex

Once your container is running, here's how to interact with Cortex. Make sure you have curl installed on your system.

1. Check Available Engines


curl --request GET --url http://localhost:39281/v1/engines --header "Content-Type: application/json"

You'll see something like:


{
"data": [
{
"description": "This extension enables chat completion API calls using the Onnx engine",
"format": "ONNX",
"name": "onnxruntime",
"status": "Incompatible"
},
{
"description": "This extension enables chat completion API calls using the LlamaCPP engine",
"format": "GGUF",
"name": "llama-cpp",
"status": "Ready",
"variant": "linux-amd64-avx2",
"version": "0.1.37"
}
],
"object": "list",
"result": "OK"
}

2. Download Models

First, set up event monitoring:

  • Install websocat following these instructions
  • Open a terminal and run: websocat ws://localhost:39281/events

Then, in a new terminal, download your desired model:


curl --request POST --url http://localhost:39281/v1/models/pull --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'

To see your downloaded models:


curl --request GET --url http://localhost:39281/v1/models

3. Using the Model

First, start your model:


curl --request POST --url http://localhost:39281/v1/models/start --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'

Then, send it a query:


curl --request POST --url http://localhost:39281/v1/chat/completions --header 'Content-Type: application/json' --data '{
"frequency_penalty": 0.2,
"max_tokens": 4096,
"messages": [{"content": "Tell me a joke", "role": "user"}],
"model": "tinyllama:gguf",
"presence_penalty": 0.6,
"stop": ["End"],
"stream": true,
"temperature": 0.8,
"top_p": 0.95
}'

4. Shutting Down

When you're done, stop the model:


curl --request POST --url http://localhost:39281/v1/models/stop --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'

Maintenance and Troubleshooting

Common Issues

  1. Permission Denied Errors:

sudo chown -R ${CORTEX_UID}:${CORTEX_UID} /opt/cortex/data
docker restart cortex

  1. Container Won't Start:

docker logs cortex
docker system info # Check available resources

Cleanup


# Stop and remove container
docker stop cortex
docker rm cortex


# Remove data (optional)
docker volume rm cortex_data
sudo rm -rf /opt/cortex/data


# Remove image
docker rmi cortexai/cortex:latest

Updating Cortex


# Pull latest version
docker pull cortexai/cortex:latest


# Stop and remove old container
docker stop cortex
docker rm cortex
# Start new container (use run command from above)

Best Practices
  • Always use specific version tags in production
  • Regularly backup your data volume
  • Monitor container resources using docker stats cortex
  • Keep your Docker installation updated