Skip to main content

cortex models

This command allows you to start, stop, and manage various local or remote model operations within Cortex.

Usage:

info

You can use the --verbose flag to display more detailed output of the internal processes. To apply this flag, use the following format: cortex --verbose [subcommand].


cortex models [options] [subcommand]

Options:

OptionDescriptionRequiredDefault valueExample
-h, --helpDisplay help information for the command.No--h

Subcommands:

cortex models get

info

This CLI command calls the following API endpoint:

This command returns a model detail defined by a model_id.

Usage:


cortex models get <model_id>

For example, it returns the following:


{
"ai_template" : "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"created" : 127638593791813,
"ctx_len" : 8192,
"dynatemp_exponent" : 1.0,
"dynatemp_range" : 0.0,
"engine" : "llama-cpp",
"files" :
[
"models/cortex.so/llama3.1/8b-gguf-q4-km/model.gguf"
],
"frequency_penalty" : 0.0,
"gpu_arch" : "",
"id" : "llama3.1:8b-gguf-q4-km",
"ignore_eos" : false,
"max_tokens" : 8192,
"min_keep" : 0,
"min_p" : 0.050000000000000003,
"mirostat" : false,
"mirostat_eta" : 0.10000000000000001,
"mirostat_tau" : 5.0,
"model" : "llama3.1:8b-gguf-q4-km",
"n_parallel" : 1,
"n_probs" : 0,
"name" : "llama3.1:8b-gguf-q4-km",
"ngl" : 33,
"object" : "",
"os" : "",
"owned_by" : "",
"penalize_nl" : false,
"precision" : "",
"presence_penalty" : 0.0,
"prompt_template" : "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"quantization_method" : "",
"repeat_last_n" : 64,
"repeat_penalty" : 1.0,
"seed" : -1,
"size" : 4920739981,
"stop" :
[
"<|end_of_text|>",
"<|eot_id|>",
"<|eom_id|>"
],
"stream" : true,
"system_template" : "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n",
"temperature" : 0.59999999999999998,
"text_model" : false,
"tfs_z" : 1.0,
"top_k" : 40,
"top_p" : 0.90000000000000002,
"typ_p" : 1.0,
"user_template" : "<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n",
"version" : "1"
}

info

This command uses a model_id from the model that you have downloaded or available in your file system.

Options:

OptionDescriptionRequiredDefault valueExample
model_idThe identifier of the model you want to retrieve.Yes-mistral
-h, --helpDisplay help information for the command.No--h

cortex models list

info

This CLI command calls the following API endpoint:

This command lists all the downloaded local and remote models.

Usage:


cortex models list [options]

For example, it returns the following:


+---------+---------------------------------------------------------------------------+
| (Index) | ID |
+---------+---------------------------------------------------------------------------+
| 1 | llama3.2:3b-gguf-q4-km |
+---------+---------------------------------------------------------------------------+
| 2 | tinyllama:1b-gguf |
+---------+---------------------------------------------------------------------------+
| 3 | TheBloke:Mistral-7B-Instruct-v0.1-GGUF:mistral-7b-instruct-v0.1.Q2_K.gguf |
+---------+---------------------------------------------------------------------------+

Options:

OptionDescriptionRequiredDefault valueExample
-h, --helpDisplay help for command.No--h
-e, --engineDisplay engines.No---engine
-v, --versionDisplay version for model.No---version
--cpu_modeDisplay CPU mode.No---cpu_mode
--gpu_modeDisplay GPU mode.No---gpu_mode

cortex models start

info

This CLI command calls the following API endpoint:

This command starts a model defined by a model_id.

Usage:


cortex models start [options] <model_id>

info

This command uses a model_id from the model that you have downloaded or available in your file system.

Options:

OptionDescriptionRequiredDefault valueExample
model_idThe identifier of the model you want to start.YesPrompt to select from the available modelsmistral
--gpusList of GPUs to use.No-[0,1]
--ctx_lenMaximum context length for inference.Nomin(8192, max_model_context_length)1024
-h, --helpDisplay help information for the command.No--h

cortex models stop

info

This CLI command calls the following API endpoint:

This command stops a model defined by a model_id.

Usage:


cortex models stop <model_id>

info

This command uses a model_id from the model that you have started before.

Options:

OptionDescriptionRequiredDefault valueExample
model_idThe identifier of the model you want to stop.Yes-mistral
-h, --helpDisplay help information for the command.No--h

cortex models delete

info

This CLI command calls the following API endpoint:

This command deletes a local model defined by a model_id.

Usage:


cortex models delete <model_id>

info

This command uses a model_id from the model that you have downloaded or available in your file system.

Options:

OptionDescriptionRequiredDefault valueExample
model_idThe identifier of the model you want to delete.Yes-mistral
-h, --helpDisplay help for command.No--h

cortex models update

info

This CLI command calls the following API endpoint:

This command updates the model.yaml file of a local model.

Usage:


cortex models update [options]

Options:

OptionDescriptionRequiredDefault valueExample
-h, --helpDisplay help for command.No--h
--model_id REQUIREDUnique identifier for the model.Yes---model_id my_model
--nameName of the model.No---name "GPT Model"
--modelModel type or architecture.No---model GPT-4
--versionVersion of the model to use.No---version 1.2.0
--stopStop token to terminate generation.No---stop "</s>"
--top_pSampling parameter for nucleus sampling.No---top_p 0.9
--temperatureControls randomness in generation.No---temperature 0.8
--frequency_penaltyPenalizes repeated tokens based on frequency.No---frequency_penalty 0.5
--presence_penaltyPenalizes repeated tokens based on presence.No0.0--presence_penalty 0.6
--max_tokensMaximum number of tokens to generate.No---max_tokens 1500
--streamStream output tokens as they are generated.Nofalse--stream true
--nglNumber of generations in parallel.No---ngl 4
--ctx_lenMaximum context length in tokens.No---ctx_len 1024
--engineCompute engine for running the model.No---engine CUDA
--prompt_templateTemplate for the prompt structure.No---prompt_template "###"
--system_templateTemplate for system-level instructions.No---system_template "SYSTEM"
--user_templateTemplate for user inputs.No---user_template "USER"
--ai_templateTemplate for AI responses.No---ai_template "ASSISTANT"
--osOperating system environment.No---os Ubuntu
--gpu_archGPU architecture specification.No---gpu_arch A100
--quantization_methodQuantization method for model weights.No---quantization_method int8
--precisionFloating point precision for computations.Nofloat32--precision float16
--tpTensor parallelism.No---tp 4
--trtllm_versionVersion of the TRTLLM library.No---trtllm_version 2.0
--text_modelThe model used for text generation.No---text_model llama2
--filesFile path or resources associated with the model.No---files config.json
--createdCreation date of the model.No---created 2024-01-01
--objectThe object type (e.g., model or file).No---object model
--owned_byThe owner or creator of the model.No---owned_by "Company"
--seedSeed for random number generation.No---seed 42
--dynatemp_rangeRange for dynamic temperature scaling.No---dynatemp_range 0.7-1.0
--dynatemp_exponentExponent for dynamic temperature scaling.No---dynatemp_exponent 1.2
--top_kTop K sampling to limit token selection.No---top_k 50
--min_pMinimum probability threshold for tokens.No---min_p 0.1
--tfs_zToken frequency selection scaling factor.No---tfs_z 0.5
--typ_pTypicality-based token selection probability.No---typ_p 0.9
--repeat_last_nNumber of last tokens to consider for repetition penalty.No---repeat_last_n 64
--repeat_penaltyPenalty for repeating tokens.No---repeat_penalty 1.2
--mirostatMirostat sampling method for stable generation.No---mirostat 1
--mirostat_tauTarget entropy for Mirostat.No---mirostat_tau 5.0
--mirostat_etaLearning rate for Mirostat.No---mirostat_eta 0.1
--penalize_nlPenalize new lines in generation.Nofalse--penalize_nl true
--ignore_eosIgnore the end of sequence token.Nofalse--ignore_eos true
--n_probsNumber of probability outputs to return.No---n_probs 5

cortex models import

This command imports the local model using the model's gguf file.

Usage:

info

This CLI command calls the following API endpoint:


cortex models import --model_id <model_id> --model_path </path/to/your/model.gguf>

Options:

OptionDescriptionRequiredDefault valueExample
-h, --helpDisplay help for command.No--h
--model_idThe identifier of the model.Yes-mistral
--model_pathThe path of the model source file.Yes-/path/to/your/model.gguf