Skip to main content

Engines

Engines in Cortex serve as execution drivers for machine learning models, providing the runtime and environment necessary for model operations. Each engine is optimized for hardware performance and ensures compatibility with its corresponding model types.

Supported Engines

Cortex currently supports two engines:

EngineSourceDescription
llama.cppggerganovInference of models in GGUF format, written in pure C/C++
ONNX RuntimeMicrosoftCross-platform, high performance ML inference and training accelerator

Note: Cortex also supports building and adding your own custom engines.

Features

  • Engine Retrieval: Install the engines above or your own custom one with a single command.
  • Engine Management: Easily manage engines by type, variant, and version.
  • User-Friendly Interface: Manage your server, engines, and models via Cortex's CLI or via HTTP API.
  • Engine Selection: Depending on the model and its format, you can use different engine for the same models.

Installing an engine

To install an engine using the CLI, use the following command:


cortex engines install llama-cpp


Validating download items, please wait..
Start downloading..
llama-cpp 100%[==================================================] [00m:00s] 1.24 MB/1.24 MB
Engine llama-cpp downloaded successfully!

To install an engine using the HTTP API, use the following command:


curl http://127.0.0.1:39281/v1/engines/llama-cpp/install \
--request POST \
--header 'Content-Type: application/json'


{
"message": "Engine llama-cpp starts installing!"
}

Listing engines

Cortex allows you to list current engines and their statuses. Each engine type can have different variants and versions, which are crucial for debugging and performance optimization. Different variants cater to specific hardware configurations, such as CUDA for NVIDIA GPUs and Vulkan for AMD GPUs on Windows, or AVX512 support for CPUs.

CLI

You can list the available engines using the following command:


cortex engines list


+---+--------------+-------------------+---------+-----------+--------------+
| # | Name | Supported Formats | Version | Variant | Status |
+---+--------------+-------------------+---------+-----------+--------------+
| 1 | onnxruntime | ONNX | | | Incompatible |
+---+--------------+-------------------+---------+-----------+--------------+
| 2 | llama-cpp | GGUF | 0.1.37 | mac-arm64 | Ready |
+---+--------------+-------------------+---------+-----------+--------------+

HTTP API

You can also retrieve the list of engines via the HTTP API:


curl http://127.0.0.1:39281/v1/engines


{
"data": [
{
"description": "This extension enables chat completion API calls using the Onnx engine",
"format": "ONNX",
"name": "onnxruntime",
"productName": "onnxruntime",
"status": "Incompatible",
"variant": "",
"version": ""
},
{
"description": "This extension enables chat completion API calls using the LlamaCPP engine",
"format": "GGUF",
"name": "llama-cpp",
"productName": "llama-cpp",
"status": "Ready",
"variant": "mac-arm64",
"version": "0.1.37"
}
],
"object": "list",
"result": "OK"
}

Getting detail information of an engine

Cortex allows users to retrieve detailed information about a specific engine. This includes supported formats, versions, variants, and status. This information helps users understand the capabilities and compatibility of their engines.

CLI

To retrieve detailed information about an engine using the CLI, use the following command:


cortex engines get llama-cpp


+---+-----------+---------+----------------------------+-----------+
| # | Name | Version | Variant | Status |
+---+-----------+---------+----------------------------+-----------+
| 1 | llama-cpp | v0.1.49 | linux-amd64-avx2-cuda-12-0 | Installed |
+---+-----------+---------+----------------------------+-----------+

HTTP API

To retrieve detailed information about an engine using the HTTP API, send a GET request to the appropriate endpoint:


curl --location 'http://127.0.0.1:39281/engines/llama-cpp'


[
{
"engine": "llama-cpp",
"name": "linux-amd64-avx2-cuda-12-0",
"version": "v0.1.49"
}
]

Uninstalling an engine

Cortex provides an easy way to uninstall an engine, which can be useful if you want to have the latest version only instead of different ones.

CLI


cortex engines uninstall llama-cpp

HTTP API


curl http://127.0.0.1:39281/v1/engines/llama-cpp/install \
--request DELETE \
--header 'Content-Type: application/json'

Example response:


{
"message": "Engine llama-cpp uninstalled successfully!"
}

Upcoming Engine Features

  • Enhanced engine update mechanism with automated compatibility checks
  • Seamless engine switching between variants and versions
  • Improved Vulkan engine support with optimized performance