Engines
Engines in Cortex serve as execution drivers for machine learning models, providing the runtime and environment necessary for model operations. Each engine is optimized for hardware performance and ensures compatibility with its corresponding model types.
Supported Engines
Cortex currently supports two engines:
Engine | Source | Description |
---|---|---|
llama.cpp | ggerganov | Inference of models in GGUF format, written in pure C/C++ |
ONNX Runtime | Microsoft | Cross-platform, high performance ML inference and training accelerator |
Note: Cortex also supports building and adding your own custom engines.
Features
- Engine Retrieval: Install the engines above or your own custom one with a single command.
- Engine Management: Easily manage engines by type, variant, and version.
- User-Friendly Interface: Manage your server, engines, and models via Cortex's CLI or via HTTP API.
- Engine Selection: Depending on the model and its format, you can use different engine for the same models.
Installing an engine
To install an engine using the CLI, use the following command:
cortex engines install llama-cpp
Validating download items, please wait..Start downloading..llama-cpp 100%[==================================================] [00m:00s] 1.24 MB/1.24 MBEngine llama-cpp downloaded successfully!
To install an engine using the HTTP API, use the following command:
curl http://127.0.0.1:39281/v1/engines/llama-cpp/install \ --request POST \ --header 'Content-Type: application/json'
{ "message": "Engine llama-cpp starts installing!"}
Listing engines
Cortex allows you to list current engines and their statuses. Each engine type can have different variants and versions, which are crucial for debugging and performance optimization. Different variants cater to specific hardware configurations, such as CUDA for NVIDIA GPUs and Vulkan for AMD GPUs on Windows, or AVX512 support for CPUs.
CLI
You can list the available engines using the following command:
cortex engines list
+---+--------------+-------------------+---------+-----------+--------------+| # | Name | Supported Formats | Version | Variant | Status |+---+--------------+-------------------+---------+-----------+--------------+| 1 | onnxruntime | ONNX | | | Incompatible |+---+--------------+-------------------+---------+-----------+--------------+| 2 | llama-cpp | GGUF | 0.1.37 | mac-arm64 | Ready |+---+--------------+-------------------+---------+-----------+--------------+
HTTP API
You can also retrieve the list of engines via the HTTP API:
curl http://127.0.0.1:39281/v1/engines
{ "data": [ { "description": "This extension enables chat completion API calls using the Onnx engine", "format": "ONNX", "name": "onnxruntime", "productName": "onnxruntime", "status": "Incompatible", "variant": "", "version": "" }, { "description": "This extension enables chat completion API calls using the LlamaCPP engine", "format": "GGUF", "name": "llama-cpp", "productName": "llama-cpp", "status": "Ready", "variant": "mac-arm64", "version": "0.1.37" } ], "object": "list", "result": "OK"}
Getting detail information of an engine
Cortex allows users to retrieve detailed information about a specific engine. This includes supported formats, versions, variants, and status. This information helps users understand the capabilities and compatibility of their engines.
CLI
To retrieve detailed information about an engine using the CLI, use the following command:
cortex engines get llama-cpp
+---+-----------+---------+----------------------------+-----------+| # | Name | Version | Variant | Status |+---+-----------+---------+----------------------------+-----------+| 1 | llama-cpp | v0.1.49 | linux-amd64-avx2-cuda-12-0 | Installed |+---+-----------+---------+----------------------------+-----------+
HTTP API
To retrieve detailed information about an engine using the HTTP API, send a GET request to the appropriate endpoint:
curl --location 'http://127.0.0.1:39281/engines/llama-cpp'
[ { "engine": "llama-cpp", "name": "linux-amd64-avx2-cuda-12-0", "version": "v0.1.49" }]
Uninstalling an engine
Cortex provides an easy way to uninstall an engine, which can be useful if you want to have the latest version only instead of different ones.
CLI
cortex engines uninstall llama-cpp
HTTP API
curl http://127.0.0.1:39281/v1/engines/llama-cpp/install \ --request DELETE \ --header 'Content-Type: application/json'
Example response:
{ "message": "Engine llama-cpp uninstalled successfully!"}
Upcoming Engine Features
- Enhanced engine update mechanism with automated compatibility checks
- Seamless engine switching between variants and versions
- Improved Vulkan engine support with optimized performance