Python Engine

warning

🚧 Cortex.cpp is currently under active development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.

The Python Engine manages Python processes that run models via Cortex. Each Python program is treated as a model with its own model.yml configuration template. All requests are routed through Cortex using HTTP.

Python Engine Implementation

The Python Engine is implemented as a C++ package called EngineI. It exposes these core methods:

LoadModel: Starts Python process and loads model
UnloadModel: Stops process and unloads model
GetModelStatus: Health check for running processes
GetModels: Lists active Python models

Additional methods:

HandleInference: Routes inference requests to Python process
HandleRouteRequest: Routes arbitrary requests to Python process

The Python Engine is built into Cortex.cpp and loads automatically when needed.

Model Configuration

Each Python model requires a model.yml configuration file:


id: ichigo-0.5:fp16-linux-amd64
model: ichigo-0.5:fp16-linux-amd64
name: Ichigo Wrapper
version: 1
port: 22310
script: src/app.py
log_path: ichigo-wrapper.log
log_level: INFO
command:
  - python
files:
  - /home/thuan/cortexcpp/models/cortex.so/ichigo-0.5/fp16-linux-amd64
depends:
  - ichigo-0.4:8b-gguf-q4-km
  - whispervq:fp16-linux-amd64
  - fish-speech:fp16-linux-amd64
engine: python-engine
extra_params:
  device_id: 0
  fish_speech_port: 22312
  ichigo_model: ichigo-0.4:8b-gguf-q4-km
  ichigo_port: 39281
  whisper_port: 3348

Parameter	Description	Required
`id`	Unique identifier for the model, typically includes version and platform information.	Yes
`model`	Specifies the variant of the model, often denoting size or quantization details.	Yes
`name`	The human-readable name for the model, used as the `model_id`.	Yes
`version`	The specific version number of the model.	Yes
`port`	The network port on which the Python program will listen for requests.	Yes
`script`	Path to the main Python script to be executed by the engine. This is relative path to the model folder	Yes
`log_path`	File location where logs will be stored for the Python program's execution. log_path is relative path of cortex data folder	No
`log_level`	The level of logging detail (e.g., INFO, DEBUG).	No
`command`	The command used to launch the Python program, typically starting with 'python'.	Yes
`files`	For python models, the files is the path to folder contains all python scripts, model binary and environment to run the program	No
`depends`	Dependencies required by the model, specified by their identifiers. The dependencies are other models	No
`engine`	Specifies the engine to use, which in this context is 'python-engine'.	Yes
`extra_params`	Additional parameters passed to the Python script at runtime	No

Example: Ichigo Python Model

Ichigo python is a built-in Cortex model for chat with audio support.

Required Models

Ichigo requires these models:

ichigo-0.5
whispervq
ichigo-0.4
fish-speech (optional, for text-to-speech)

Download models for your platform (example for Linux AMD64):


curl --location '127.0.0.1:39281/v1/models/pull' \
    --header 'Content-Type: application/json' \
    --data '{"model":"ichigo-0.5:fp16-linux-amd64"}'
curl --location '127.0.0.1:39281/v1/models/pull' \
    --header 'Content-Type: application/json' \
    --data '{"model":"ichigo-0.4:8b-gguf-q4-km"}'
curl --location '127.0.0.1:39281/v1/models/pull' \
    --header 'Content-Type: application/json' \
    --data '{"model":"whispervq:fp16-linux-amd64"}'
curl --location '127.0.0.1:39281/v1/models/pull' \
    --header 'Content-Type: application/json' \
    --data '{"model":"fish-speech:fp16-linux-amd64"}'

Model Management

Start model:


curl --location '127.0.0.1:39281/v1/models/start' \
--header 'Content-Type: application/json' \
--data '{"model":"ichigo-0.5:fp16-linux-amd64"}'

Check status:


curl --location '127.0.0.1:39281/v1/models/status/fish-speech:fp16-linux-amd64'

Stop model:


curl --location '127.0.0.1:39281/v1/models/stop' \
--header 'Content-Type: application/json' \
--data '{"model":"ichigo-0.5:fp16-linux-amd64"}'

Inference

Example inference request:


curl --location '127.0.0.1:39281/v1/inference' \
--header 'Content-Type: application/json' \
--data '{
    "model":"ichigo-0.5:fp16-linux-amd64",
    "engine":"python-engine",
    "body":{
        "messages": [{
            "role":"system",
            "content":"you are helpful assistant, you must answer questions short and concil!"
        }],
        "input_audio": {
            "data": "base64_encoded_audio_data",
            "format": "wav"
        },
        "model": "ichigo-0.4:8b-gguf-q4km",
        "stream": true,
        "temperature": 0.7,
        "top_p": 0.9,
        "max_tokens": 2048,
        "presence_penalty": 0,
        "frequency_penalty": 0,
        "stop": ["<|eot_id|>"],
        "output_audio": true
    }
}'

Route Requests

Generic request routing example:


curl --location '127.0.0.1:39281/v1/route/request' \
--header 'Content-Type: application/json' \
--data '{
    "model":"whispervq:fp16",
    "path":"/inference",
    "engine":"python-engine",
    "method":"post",
    "transform_response":"{ {%- set first = true -%} {%- for key, value in input_request -%} {%- if key == \"tokens\" -%} {%- if not first -%},{%- endif -%} \"{{ key }}\": {{ tojson(value) }} {%- set first = false -%} {%- endif -%} {%- endfor -%} }",
    "body": {
        "data": "base64 data",
        "format": "wav"
    }
}'

Adding New Python Models

Implementation Requirements

Python models must expose at least two endpoints:

/health: Server status check
/inference: Model inference

Example server implementation:


import argparse
import os
import sys
from pathlib import Path
from contextlib import asynccontextmanager
from typing import AsyncGenerator, List
import uvicorn
from dotenv import load_dotenv
from fastapi import APIRouter, FastAPI
from common.utility.logger_utility import LoggerUtility
from services.audio.audio_controller import AudioController
from services.audio.implementation.audio_service import AudioService
from services.health.health_controller import HealthController
def create_app() -> FastAPI:
    routes: List[APIRouter] = [
        HealthController(),
        AudioController()
    ]
    app = FastAPI()
    for route in routes:
        app.include_router(route)
    return app
def parse_argument():
    parser = argparse.ArgumentParser(description="Ichigo-wrapper Application")
    parser.add_argument('--log_path', type=str, default='Ichigo-wrapper.log', help='The log file path')
    parser.add_argument('--log_level', type=str, default='INFO', choices=['DEBUG', 'INFO', 'WARNING', 'ERROR', 'TRACE'])
    parser.add_argument('--port', type=int, default=22310)
    parser.add_argument('--device_id', type=str, default="0")
    parser.add_argument('--package_dir', type=str, default="")
    parser.add_argument('--whisper_port', type=int, default=3348)
    parser.add_argument('--ichigo_port', type=int, default=39281)
    parser.add_argument('--fish_speech_port', type=int, default=22312)
    parser.add_argument('--ichigo_model', type=str, default="ichigo:8b-gguf-q4-km")
    return parser.parse_args()
if __name__ == "__main__":
    args = parse_argument()
    LoggerUtility.init_logger(__name__, args.log_level, args.log_path)
    env_path = Path(os.path.dirname(os.path.realpath(__file__))) / "variables" / ".env"
    AudioService.initialize(args.whisper_port, args.ichigo_port, args.fish_speech_port, args.ichigo_model)
    load_dotenv(dotenv_path=env_path)
    app = create_app()
    print("Server is running at: 0.0.0.0:", args.port)
    uvicorn.run(app=app, host="0.0.0.0", port=args.port)

Deployment

Create model files following the example above
Add required requirements.txt and requirements.cuda.txt files
Trigger the Python Script Package CI
Trigger the Python Venv Package CI

The CIs will build and publish your model to Hugging Face where it can then be downloaded and used.

Python Engine Implementation​

Model Configuration​

Example: Ichigo Python Model​

Required Models​

Model Management​

Inference​

Route Requests​

Adding New Python Models​

Implementation Requirements​

Deployment​