Engines

Engines in Cortex serve as execution drivers for machine learning models, providing the runtime and environment necessary for model operations. Each engine is optimized for hardware performance and ensures compatibility with its corresponding model types.

Supported Engines

Cortex currently supports two engines:

Engine	Source	Description
llama.cpp	ggerganov	Inference of models in GGUF format, written in pure C/C++
ONNX Runtime	Microsoft	Cross-platform, high performance ML inference and training accelerator

Note: Cortex also supports building and adding your own custom engines.

Features

Engine Retrieval: Install the engines above or your own custom one with a single command.
Engine Management: Easily manage engines by type, variant, and version.
User-Friendly Interface: Manage your server, engines, and models via Cortex's CLI or via HTTP API.
Engine Selection: Depending on the model and its format, you can use different engine for the same models.

Installing an engine

To install an engine using the CLI, use the following command:


cortex engines install llama-cpp


Validating download items, please wait..
Start downloading..
llama-cpp           100%[==================================================] [00m:00s] 1.24 MB/1.24 MB
Engine llama-cpp downloaded successfully!

To install an engine using the HTTP API, use the following command:


curl http://127.0.0.1:39281/v1/engines/llama-cpp/install \
  --request POST \
  --header 'Content-Type: application/json'


{
  "message": "Engine llama-cpp starts installing!"
}

Listing engines

Cortex allows you to list current engines and their statuses. Each engine type can have different variants and versions, which are crucial for debugging and performance optimization. Different variants cater to specific hardware configurations, such as CUDA for NVIDIA GPUs and Vulkan for AMD GPUs on Windows, or AVX512 support for CPUs.

CLI

You can list the available engines using the following command:


cortex engines list


+---+--------------+-------------------+---------+-----------+--------------+
| # | Name         | Supported Formats | Version | Variant   | Status       |
+---+--------------+-------------------+---------+-----------+--------------+
| 1 | onnxruntime  | ONNX              |         |           | Incompatible |
+---+--------------+-------------------+---------+-----------+--------------+
| 2 | llama-cpp    | GGUF              | 0.1.37  | mac-arm64 | Ready        |
+---+--------------+-------------------+---------+-----------+--------------+

HTTP API

You can also retrieve the list of engines via the HTTP API:


curl http://127.0.0.1:39281/v1/engines


{
  "data": [
    {
      "description": "This extension enables chat completion API calls using the Onnx engine",
      "format": "ONNX",
      "name": "onnxruntime",
      "productName": "onnxruntime",
      "status": "Incompatible",
      "variant": "",
      "version": ""
    },
    {
      "description": "This extension enables chat completion API calls using the LlamaCPP engine",
      "format": "GGUF",
      "name": "llama-cpp",
      "productName": "llama-cpp",
      "status": "Ready",
      "variant": "mac-arm64",
      "version": "0.1.37"
    }
  ],
  "object": "list",
  "result": "OK"
}

Getting detail information of an engine

Cortex allows users to retrieve detailed information about a specific engine. This includes supported formats, versions, variants, and status. This information helps users understand the capabilities and compatibility of their engines.

CLI

To retrieve detailed information about an engine using the CLI, use the following command:


cortex engines get llama-cpp


+---+-----------+---------+----------------------------+-----------+
| # | Name      | Version | Variant                    | Status    |
+---+-----------+---------+----------------------------+-----------+
| 1 | llama-cpp | v0.1.49 | linux-amd64-avx2-cuda-12-0 | Installed |
+---+-----------+---------+----------------------------+-----------+

HTTP API

To retrieve detailed information about an engine using the HTTP API, send a GET request to the appropriate endpoint:


curl --location 'http://127.0.0.1:39281/engines/llama-cpp'


[
  {
    "engine": "llama-cpp",
    "name": "linux-amd64-avx2-cuda-12-0",
    "version": "v0.1.49"
  }
]

Uninstalling an engine

Cortex provides an easy way to uninstall an engine, which can be useful if you want to have the latest version only instead of different ones.

CLI


cortex engines uninstall llama-cpp

HTTP API


curl http://127.0.0.1:39281/v1/engines/llama-cpp/install \
  --request DELETE \
  --header 'Content-Type: application/json'

Example response:


{
  "message": "Engine llama-cpp uninstalled successfully!"
}

Upcoming Engine Features

Enhanced engine update mechanism with automated compatibility checks
Seamless engine switching between variants and versions
Improved Vulkan engine support with optimized performance

Engines

Supported Engines

Features

Installing an engine

Listing engines

CLI

HTTP API

Getting detail information of an engine

CLI

HTTP API

Uninstalling an engine

CLI

HTTP API

Upcoming Engine Features

📄️ llama.cpp

📄️ python engine

📄️ Building Engine Extensions

Supported Engines​

Features​

Installing an engine​

Listing engines​

CLI​

HTTP API​

Getting detail information of an engine​

CLI​

HTTP API​

Uninstalling an engine​

CLI​

HTTP API​

Upcoming Engine Features​

📄️ llama.cpp

📄️ python engine

📄️ Building Engine Extensions

Supported Engines

Features

Installing an engine

Listing engines

CLI

HTTP API

Getting detail information of an engine

CLI

HTTP API

Uninstalling an engine

CLI

HTTP API

Upcoming Engine Features