Python Engine
🚧 Cortex.cpp is currently under active development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
The Python Engine manages Python processes that run models via Cortex. Each Python program is treated as
a model with its own model.yml
configuration template. All requests are routed through Cortex using HTTP.
Python Engine Implementation​
The Python Engine is implemented as a C++ package called EngineI. It exposes these core methods:
LoadModel
: Starts Python process and loads modelUnloadModel
: Stops process and unloads modelGetModelStatus
: Health check for running processesGetModels
: Lists active Python models
Additional methods:
HandleInference
: Routes inference requests to Python processHandleRouteRequest
: Routes arbitrary requests to Python process
The Python Engine is built into Cortex.cpp and loads automatically when needed.
Model Configuration​
Each Python model requires a model.yml
configuration file:
id: ichigo-0.5:fp16-linux-amd64model: ichigo-0.5:fp16-linux-amd64name: Ichigo Wrapperversion: 1port: 22310script: src/app.pylog_path: ichigo-wrapper.loglog_level: INFOcommand: - pythonfiles: - /home/thuan/cortexcpp/models/cortex.so/ichigo-0.5/fp16-linux-amd64depends: - ichigo-0.4:8b-gguf-q4-km - whispervq:fp16-linux-amd64 - fish-speech:fp16-linux-amd64engine: python-engineextra_params: device_id: 0 fish_speech_port: 22312 ichigo_model: ichigo-0.4:8b-gguf-q4-km ichigo_port: 39281 whisper_port: 3348
Parameter | Description | Required |
---|---|---|
id | Unique identifier for the model, typically includes version and platform information. | Yes |
model | Specifies the variant of the model, often denoting size or quantization details. | Yes |
name | The human-readable name for the model, used as the model_id . | Yes |
version | The specific version number of the model. | Yes |
port | The network port on which the Python program will listen for requests. | Yes |
script | Path to the main Python script to be executed by the engine. This is relative path to the model folder | Yes |
log_path | File location where logs will be stored for the Python program's execution. log_path is relative path of cortex data folder | No |
log_level | The level of logging detail (e.g., INFO, DEBUG). | No |
command | The command used to launch the Python program, typically starting with 'python'. | Yes |
files | For python models, the files is the path to folder contains all python scripts, model binary and environment to run the program | No |
depends | Dependencies required by the model, specified by their identifiers. The dependencies are other models | No |
engine | Specifies the engine to use, which in this context is 'python-engine'. | Yes |
extra_params | Additional parameters passed to the Python script at runtime | No |
Example: Ichigo Python Model​
Ichigo python is a built-in Cortex model for chat with audio support.
Required Models​
Ichigo requires these models:
- ichigo-0.5
- whispervq
- ichigo-0.4
- fish-speech (optional, for text-to-speech)
Download models for your platform (example for Linux AMD64):
curl --location '127.0.0.1:39281/v1/models/pull' \ --header 'Content-Type: application/json' \ --data '{"model":"ichigo-0.5:fp16-linux-amd64"}'curl --location '127.0.0.1:39281/v1/models/pull' \ --header 'Content-Type: application/json' \ --data '{"model":"ichigo-0.4:8b-gguf-q4-km"}'curl --location '127.0.0.1:39281/v1/models/pull' \ --header 'Content-Type: application/json' \ --data '{"model":"whispervq:fp16-linux-amd64"}'curl --location '127.0.0.1:39281/v1/models/pull' \ --header 'Content-Type: application/json' \ --data '{"model":"fish-speech:fp16-linux-amd64"}'
Model Management​
Start model:
curl --location '127.0.0.1:39281/v1/models/start' \--header 'Content-Type: application/json' \--data '{"model":"ichigo-0.5:fp16-linux-amd64"}'
Check status:
curl --location '127.0.0.1:39281/v1/models/status/fish-speech:fp16-linux-amd64'
Stop model:
curl --location '127.0.0.1:39281/v1/models/stop' \--header 'Content-Type: application/json' \--data '{"model":"ichigo-0.5:fp16-linux-amd64"}'
Inference​
Example inference request:
curl --location '127.0.0.1:39281/v1/inference' \--header 'Content-Type: application/json' \--data '{ "model":"ichigo-0.5:fp16-linux-amd64", "engine":"python-engine", "body":{ "messages": [{ "role":"system", "content":"you are helpful assistant, you must answer questions short and concil!" }], "input_audio": { "data": "base64_encoded_audio_data", "format": "wav" }, "model": "ichigo-0.4:8b-gguf-q4km", "stream": true, "temperature": 0.7, "top_p": 0.9, "max_tokens": 2048, "presence_penalty": 0, "frequency_penalty": 0, "stop": ["<|eot_id|>"], "output_audio": true }}'
Route Requests​
Generic request routing example:
curl --location '127.0.0.1:39281/v1/route/request' \--header 'Content-Type: application/json' \--data '{ "model":"whispervq:fp16", "path":"/inference", "engine":"python-engine", "method":"post", "transform_response":"{ {%- set first = true -%} {%- for key, value in input_request -%} {%- if key == \"tokens\" -%} {%- if not first -%},{%- endif -%} \"{{ key }}\": {{ tojson(value) }} {%- set first = false -%} {%- endif -%} {%- endfor -%} }", "body": { "data": "base64 data", "format": "wav" }}'
Adding New Python Models​
Implementation Requirements​
Python models must expose at least two endpoints:
/health
: Server status check/inference
: Model inference
Example server implementation:
import argparseimport osimport sysfrom pathlib import Pathfrom contextlib import asynccontextmanagerfrom typing import AsyncGenerator, Listimport uvicornfrom dotenv import load_dotenvfrom fastapi import APIRouter, FastAPIfrom common.utility.logger_utility import LoggerUtilityfrom services.audio.audio_controller import AudioControllerfrom services.audio.implementation.audio_service import AudioServicefrom services.health.health_controller import HealthControllerdef create_app() -> FastAPI: routes: List[APIRouter] = [ HealthController(), AudioController() ] app = FastAPI() for route in routes: app.include_router(route) return appdef parse_argument(): parser = argparse.ArgumentParser(description="Ichigo-wrapper Application") parser.add_argument('--log_path', type=str, default='Ichigo-wrapper.log', help='The log file path') parser.add_argument('--log_level', type=str, default='INFO', choices=['DEBUG', 'INFO', 'WARNING', 'ERROR', 'TRACE']) parser.add_argument('--port', type=int, default=22310) parser.add_argument('--device_id', type=str, default="0") parser.add_argument('--package_dir', type=str, default="") parser.add_argument('--whisper_port', type=int, default=3348) parser.add_argument('--ichigo_port', type=int, default=39281) parser.add_argument('--fish_speech_port', type=int, default=22312) parser.add_argument('--ichigo_model', type=str, default="ichigo:8b-gguf-q4-km") return parser.parse_args()if __name__ == "__main__": args = parse_argument() LoggerUtility.init_logger(__name__, args.log_level, args.log_path) env_path = Path(os.path.dirname(os.path.realpath(__file__))) / "variables" / ".env" AudioService.initialize(args.whisper_port, args.ichigo_port, args.fish_speech_port, args.ichigo_model) load_dotenv(dotenv_path=env_path) app = create_app() print("Server is running at: 0.0.0.0:", args.port) uvicorn.run(app=app, host="0.0.0.0", port=args.port)
Deployment​
- Create model files following the example above
- Add required
requirements.txt
andrequirements.cuda.txt
files - Trigger the Python Script Package CI
- Trigger the Python Venv Package CI
The CIs will build and publish your model to Hugging Face where it can then be downloaded and used.