Cortex Basic Usage
Cortex has an API server that runs at localhost:39281
.
The port parameter can be set in .cortexrc
with the apiServerPort
parameter.
Server
By default the server will be started on port 39281
.
cortex start
Start a server with different port number.
cortex-p <port_number>
To create a directory for storing logs and other files.
cortex --data_folder_path <your_directory>
To terminate the cortex server.
curl --request DELETE \ --url http://127.0.0.1:39281/processManager/destroy
Engines
Cortex currently supports a general Python Engine for highly customised deployments and 2 specialized ones for different multi-modal foundation models: llama.cpp and ONNXRuntime.
By default, Cortex installs llama.cpp
as it main engine as it can be used in most laptops,
desktop environments and operating systems.
For more information, check out Engine Management.
Here are some commands to get you started.
List all available engines.
curl --request GET \ --url http://127.0.0.1:39281/v1/engines
{ "llama-cpp": [ { "engine": "llama-cpp", "name": "linux-amd64-avx2-cuda-12-0", "version": "v0.1.49" } ]}
Install an Engine (eg llama-cpp)
curl http://127.0.0.1:39281/v1/engines/llama-cpp/install \ --request POST \ --header 'Content-Type: application/json'
{ "message": "Engine starts installing!"}
Models
Pull a Model
curl --request POST \ --url http://127.0.0.1:39281/v1/models/pull \ -H "Content-Type: application/json" \ --data '{"model": "tinyllama:1b-gguf-q3-km"}'
{ "message": "Model start downloading!", "task": { "id": "tinyllama:1b-gguf-q3-km", "items": [ { "bytes": 0, "checksum": "N/A", "downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q3-km/metadata.yml", "downloadedBytes": 0, "id": "metadata.yml", "localPath": "/home/rpg/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q3-km/metadata.yml" }, { "bytes": 0, "checksum": "N/A", "downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q3-km/model.gguf", "downloadedBytes": 0, "id": "model.gguf", "localPath": "/home/rpg/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q3-km/model.gguf" }, { "bytes": 0, "checksum": "N/A", "downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q3-km/model.yml", "downloadedBytes": 0, "id": "model.yml", "localPath": "/home/rpg/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q3-km/model.yml" } ], "type": "Model" }}
If the model download was interrupted, this request will download the remainder of the model files.
The downloaded models are saved to the Cortex Data Folder.
Stop Model Download
curl --request DELETE \ --url http://127.0.0.1:39281/v1/models/pull \ --header 'Content-Type: application/json' \ --data '{"taskId": "tinyllama:tinyllama:1b-gguf-q3-km"}'
List All Models
curl --request GET \ --url http://127.0.0.1:39281/v1/models
Delete a Model
curl --request DELETE \ --url http://127.0.0.1:39281/v1/models/tinyllama:1b-gguf-q3-km
{ "message":"Deleted successfully!"}
Run Models
Start Model
# Start the modelcurl --request POST \ --url http://127.0.0.1:39281/v1/models/start \ --header 'Content-Type: application/json' \ --data '{"model": "llama3.1:8b-gguf-q4-km"}'
{ "message":"Started successfully!"}
Create Chat Completion
# Invoke the chat completions endpointcurl --request POST \ --url http://localhost:39281/v1/chat/completions \ -H "Content-Type: application/json" \ --data '{ "messages": [ { "role": "user", "content": "Write a Haiku about cats and AI" }, ], "model": "tinyllama:1b-gguf", "stream": false,}'
{ "choices": [ { "finish_reason": "stop", "index": 0, "message": { "content": "Whiskers soft as code\nMachines mimic their gaze\nFurry, digital dreamer", "role": "assistant" } } ], "created": 1737722349, "id": "5vjsnGlRQfxw6CNzzkph", "model": "_", "object": "chat.completion", "system_fingerprint": "_", "usage": { "completion_tokens": 19, "prompt_tokens": 19, "total_tokens": 38 }}
Stop Model
curl --request POST \ --url http://127.0.0.1:39281/v1/models/stop \ --header 'Content-Type: application/json' \ --data '{ "model": "tinyllama:1b-gguf"}'
{ "message":"Stopped successfully!"}