Skip to main content

Python Engine

warning

🚧 Cortex.cpp is currently under active development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.

The Python Engine manages Python processes that run models via Cortex. Each Python program is treated as a model with its own model.yml configuration template. All requests are routed through Cortex using HTTP.

Python Engine Implementation​

The Python Engine is implemented as a C++ package called EngineI. It exposes these core methods:

  • LoadModel: Starts Python process and loads model
  • UnloadModel: Stops process and unloads model
  • GetModelStatus: Health check for running processes
  • GetModels: Lists active Python models

Additional methods:

  • HandleInference: Routes inference requests to Python process
  • HandleRouteRequest: Routes arbitrary requests to Python process

The Python Engine is built into Cortex.cpp and loads automatically when needed.

Model Configuration​

Each Python model requires a model.yml configuration file:


id: ichigo-0.5:fp16-linux-amd64
model: ichigo-0.5:fp16-linux-amd64
name: Ichigo Wrapper
version: 1
port: 22310
script: src/app.py
log_path: ichigo-wrapper.log
log_level: INFO
command:
- python
files:
- /home/thuan/cortexcpp/models/cortex.so/ichigo-0.5/fp16-linux-amd64
depends:
- ichigo-0.4:8b-gguf-q4-km
- whispervq:fp16-linux-amd64
- fish-speech:fp16-linux-amd64
engine: python-engine
extra_params:
device_id: 0
fish_speech_port: 22312
ichigo_model: ichigo-0.4:8b-gguf-q4-km
ichigo_port: 39281
whisper_port: 3348

ParameterDescriptionRequired
idUnique identifier for the model, typically includes version and platform information.Yes
modelSpecifies the variant of the model, often denoting size or quantization details.Yes
nameThe human-readable name for the model, used as the model_id.Yes
versionThe specific version number of the model.Yes
portThe network port on which the Python program will listen for requests.Yes
scriptPath to the main Python script to be executed by the engine. This is relative path to the model folderYes
log_pathFile location where logs will be stored for the Python program's execution. log_path is relative path of cortex data folderNo
log_levelThe level of logging detail (e.g., INFO, DEBUG).No
commandThe command used to launch the Python program, typically starting with 'python'.Yes
filesFor python models, the files is the path to folder contains all python scripts, model binary and environment to run the programNo
dependsDependencies required by the model, specified by their identifiers. The dependencies are other modelsNo
engineSpecifies the engine to use, which in this context is 'python-engine'.Yes
extra_paramsAdditional parameters passed to the Python script at runtimeNo

Example: Ichigo Python Model​

Ichigo python is a built-in Cortex model for chat with audio support.

Required Models​

Ichigo requires these models:

  • ichigo-0.5
  • whispervq
  • ichigo-0.4
  • fish-speech (optional, for text-to-speech)

Download models for your platform (example for Linux AMD64):


curl --location '127.0.0.1:39281/v1/models/pull' \
--header 'Content-Type: application/json' \
--data '{"model":"ichigo-0.5:fp16-linux-amd64"}'
curl --location '127.0.0.1:39281/v1/models/pull' \
--header 'Content-Type: application/json' \
--data '{"model":"ichigo-0.4:8b-gguf-q4-km"}'
curl --location '127.0.0.1:39281/v1/models/pull' \
--header 'Content-Type: application/json' \
--data '{"model":"whispervq:fp16-linux-amd64"}'
curl --location '127.0.0.1:39281/v1/models/pull' \
--header 'Content-Type: application/json' \
--data '{"model":"fish-speech:fp16-linux-amd64"}'

Model Management​

Start model:


curl --location '127.0.0.1:39281/v1/models/start' \
--header 'Content-Type: application/json' \
--data '{"model":"ichigo-0.5:fp16-linux-amd64"}'

Check status:


curl --location '127.0.0.1:39281/v1/models/status/fish-speech:fp16-linux-amd64'

Stop model:


curl --location '127.0.0.1:39281/v1/models/stop' \
--header 'Content-Type: application/json' \
--data '{"model":"ichigo-0.5:fp16-linux-amd64"}'

Inference​

Example inference request:


curl --location '127.0.0.1:39281/v1/inference' \
--header 'Content-Type: application/json' \
--data '{
"model":"ichigo-0.5:fp16-linux-amd64",
"engine":"python-engine",
"body":{
"messages": [{
"role":"system",
"content":"you are helpful assistant, you must answer questions short and concil!"
}],
"input_audio": {
"data": "base64_encoded_audio_data",
"format": "wav"
},
"model": "ichigo-0.4:8b-gguf-q4km",
"stream": true,
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 2048,
"presence_penalty": 0,
"frequency_penalty": 0,
"stop": ["<|eot_id|>"],
"output_audio": true
}
}'

Route Requests​

Generic request routing example:


curl --location '127.0.0.1:39281/v1/route/request' \
--header 'Content-Type: application/json' \
--data '{
"model":"whispervq:fp16",
"path":"/inference",
"engine":"python-engine",
"method":"post",
"transform_response":"{ {%- set first = true -%} {%- for key, value in input_request -%} {%- if key == \"tokens\" -%} {%- if not first -%},{%- endif -%} \"{{ key }}\": {{ tojson(value) }} {%- set first = false -%} {%- endif -%} {%- endfor -%} }",
"body": {
"data": "base64 data",
"format": "wav"
}
}'

Adding New Python Models​

Implementation Requirements​

Python models must expose at least two endpoints:

  • /health: Server status check
  • /inference: Model inference

Example server implementation:


import argparse
import os
import sys
from pathlib import Path
from contextlib import asynccontextmanager
from typing import AsyncGenerator, List
import uvicorn
from dotenv import load_dotenv
from fastapi import APIRouter, FastAPI
from common.utility.logger_utility import LoggerUtility
from services.audio.audio_controller import AudioController
from services.audio.implementation.audio_service import AudioService
from services.health.health_controller import HealthController
def create_app() -> FastAPI:
routes: List[APIRouter] = [
HealthController(),
AudioController()
]
app = FastAPI()
for route in routes:
app.include_router(route)
return app
def parse_argument():
parser = argparse.ArgumentParser(description="Ichigo-wrapper Application")
parser.add_argument('--log_path', type=str, default='Ichigo-wrapper.log', help='The log file path')
parser.add_argument('--log_level', type=str, default='INFO', choices=['DEBUG', 'INFO', 'WARNING', 'ERROR', 'TRACE'])
parser.add_argument('--port', type=int, default=22310)
parser.add_argument('--device_id', type=str, default="0")
parser.add_argument('--package_dir', type=str, default="")
parser.add_argument('--whisper_port', type=int, default=3348)
parser.add_argument('--ichigo_port', type=int, default=39281)
parser.add_argument('--fish_speech_port', type=int, default=22312)
parser.add_argument('--ichigo_model', type=str, default="ichigo:8b-gguf-q4-km")
return parser.parse_args()
if __name__ == "__main__":
args = parse_argument()
LoggerUtility.init_logger(__name__, args.log_level, args.log_path)
env_path = Path(os.path.dirname(os.path.realpath(__file__))) / "variables" / ".env"
AudioService.initialize(args.whisper_port, args.ichigo_port, args.fish_speech_port, args.ichigo_model)
load_dotenv(dotenv_path=env_path)
app = create_app()
print("Server is running at: 0.0.0.0:", args.port)
uvicorn.run(app=app, host="0.0.0.0", port=args.port)

Deployment​

  1. Create model files following the example above
  2. Add required requirements.txt and requirements.cuda.txt files
  3. Trigger the Python Script Package CI
  4. Trigger the Python Venv Package CI

The CIs will build and publish your model to Hugging Face where it can then be downloaded and used.