Building Local AI Assistants

While Cortex doesn't yet support the full OpenAI Assistants API, we can build assistant-like functionality using the chat completions API. Here's how to create persistent, specialized assistants locally.

Get Started

First, fire up our model:


cortex run -d llama3.1:8b-gguf-q4-km

Set up your Python environment:


mkdir assistant-test
cd assistant-test
python -m venv .venv
source .venv/bin/activate
pip install openai

Creating an Assistant

Here's how to create an assistant-like experience using chat completions:


from openai import OpenAI
from typing import List, Dict
class LocalAssistant:
    def __init__(self, name: str, instructions: str):
        self.client = OpenAI(
            base_url="http://localhost:39281/v1",
            api_key="not-needed"
        )
        self.name = name
        self.instructions = instructions
        self.conversation_history: List[Dict] = []
    def add_message(self, content: str, role: str = "user") -> str:
        # Add message to history
        self.conversation_history.append({"role": role, "content": content})
        # Prepare messages with system instructions and history
        messages = [
            {"role": "system", "content": self.instructions},
            *self.conversation_history
        ]
        # Get response
        response = self.client.chat.completions.create(
            model="llama3.1:8b-gguf-q4-km",
            messages=messages
        )
        # Add assistant's response to history
        assistant_message = response.choices[0].message.content
        self.conversation_history.append({"role": "assistant", "content": assistant_message})
        return assistant_message
# Create a coding assistant
coding_assistant = LocalAssistant(
    name="Code Buddy",
    instructions="""You are a helpful coding assistant who:
    - Explains concepts with practical examples
    - Provides working code snippets
    - Points out potential pitfalls
    - Keeps responses concise but informative"""
)
# Ask a question
response = coding_assistant.add_message("Can you explain Python list comprehensions with examples?")
print(response)
# Follow-up question (with conversation history maintained)
response = coding_assistant.add_message("Can you show a more complex example with filtering?")
print(response)

Specialized Assistants

You can create different types of assistants by changing the instructions:


# Math tutor assistant
math_tutor = LocalAssistant(
    name="Math Buddy",
    instructions="""You are a patient math tutor who:
    - Breaks down problems step by step
    - Uses clear explanations
    - Provides practice problems
    - Encourages understanding over memorization"""
)
# Writing assistant
writing_assistant = LocalAssistant(
    name="Writing Buddy",
    instructions="""You are a writing assistant who:
    - Helps improve clarity and structure
    - Suggests better word choices
    - Maintains the author's voice
    - Explains the reasoning behind suggestions"""
)

Working with Context

Here's how to create an assistant that can work with context:


class ContextAwareAssistant(LocalAssistant):
    def __init__(self, name: str, instructions: str, context: str):
        super().__init__(name, instructions)
        self.context = context
    def add_message(self, content: str, role: str = "user") -> str:
        # Include context in the system message
        messages = [
            {"role": "system", "content": f"{self.instructions}\n\nContext:\n{self.context}"},
            *self.conversation_history,
            {"role": role, "content": content}
        ]
        response = self.client.chat.completions.create(
            model="llama3.1:8b-gguf-q4-km",
            messages=messages
        )
        assistant_message = response.choices[0].message.content
        self.conversation_history.append({"role": role, "content": content})
        self.conversation_history.append({"role": "assistant", "content": assistant_message})
        return assistant_message
# Example usage with code review context
code_context = """
def calculate_average(numbers):
    total = 0
    for num in numbers:
        total += num
    return total / len(numbers)
"""
code_reviewer = ContextAwareAssistant(
    name="Code Reviewer",
    instructions="You are a helpful code reviewer. Suggest improvements while being constructive.",
    context=code_context
)
response = code_reviewer.add_message("Can you review this code and suggest improvements?")
print(response)

Pro Tips

Keep the conversation history focused - clear it when starting a new topic
Use specific instructions to get better responses
Consider using temperature and max_tokens parameters for different use cases
Remember that responses are stateless - maintain context yourself

Memory Management

For longer conversations, you might want to limit the history:


def trim_conversation_history(self, max_messages: int = 10):
    if len(self.conversation_history) > max_messages:
        # Keep system message and last N messages
        self.conversation_history = self.conversation_history[-max_messages:]

That's it! While we don't have the full Assistants API yet, we can still create powerful assistant-like experiences using the chat completions API. The best part? It's all running locally on your machine.

Get Started​

Creating an Assistant​

Specialized Assistants​

Working with Context​

Pro Tips​