Building Local AI Assistants
While Cortex doesn't yet support the full OpenAI Assistants API, we can build assistant-like functionality using the chat completions API. Here's how to create persistent, specialized assistants locally.
Get Started
First, fire up our model:
cortex run -d llama3.1:8b-gguf-q4-km
Set up your Python environment:
mkdir assistant-testcd assistant-testpython -m venv .venvsource .venv/bin/activatepip install openai
Creating an Assistant
Here's how to create an assistant-like experience using chat completions:
from openai import OpenAIfrom typing import List, Dictclass LocalAssistant: def __init__(self, name: str, instructions: str): self.client = OpenAI( base_url="http://localhost:39281/v1", api_key="not-needed" ) self.name = name self.instructions = instructions self.conversation_history: List[Dict] = [] def add_message(self, content: str, role: str = "user") -> str: # Add message to history self.conversation_history.append({"role": role, "content": content}) # Prepare messages with system instructions and history messages = [ {"role": "system", "content": self.instructions}, *self.conversation_history ] # Get response response = self.client.chat.completions.create( model="llama3.1:8b-gguf-q4-km", messages=messages ) # Add assistant's response to history assistant_message = response.choices[0].message.content self.conversation_history.append({"role": "assistant", "content": assistant_message}) return assistant_message# Create a coding assistantcoding_assistant = LocalAssistant( name="Code Buddy", instructions="""You are a helpful coding assistant who: - Explains concepts with practical examples - Provides working code snippets - Points out potential pitfalls - Keeps responses concise but informative""")# Ask a questionresponse = coding_assistant.add_message("Can you explain Python list comprehensions with examples?")print(response)# Follow-up question (with conversation history maintained)response = coding_assistant.add_message("Can you show a more complex example with filtering?")print(response)
Specialized Assistants
You can create different types of assistants by changing the instructions:
# Math tutor assistantmath_tutor = LocalAssistant( name="Math Buddy", instructions="""You are a patient math tutor who: - Breaks down problems step by step - Uses clear explanations - Provides practice problems - Encourages understanding over memorization""")# Writing assistantwriting_assistant = LocalAssistant( name="Writing Buddy", instructions="""You are a writing assistant who: - Helps improve clarity and structure - Suggests better word choices - Maintains the author's voice - Explains the reasoning behind suggestions""")
Working with Context
Here's how to create an assistant that can work with context:
class ContextAwareAssistant(LocalAssistant): def __init__(self, name: str, instructions: str, context: str): super().__init__(name, instructions) self.context = context def add_message(self, content: str, role: str = "user") -> str: # Include context in the system message messages = [ {"role": "system", "content": f"{self.instructions}\n\nContext:\n{self.context}"}, *self.conversation_history, {"role": role, "content": content} ] response = self.client.chat.completions.create( model="llama3.1:8b-gguf-q4-km", messages=messages ) assistant_message = response.choices[0].message.content self.conversation_history.append({"role": role, "content": content}) self.conversation_history.append({"role": "assistant", "content": assistant_message}) return assistant_message# Example usage with code review contextcode_context = """def calculate_average(numbers): total = 0 for num in numbers: total += num return total / len(numbers)"""code_reviewer = ContextAwareAssistant( name="Code Reviewer", instructions="You are a helpful code reviewer. Suggest improvements while being constructive.", context=code_context)response = code_reviewer.add_message("Can you review this code and suggest improvements?")print(response)
Pro Tips
- Keep the conversation history focused - clear it when starting a new topic
- Use specific instructions to get better responses
- Consider using temperature and max_tokens parameters for different use cases
- Remember that responses are stateless - maintain context yourself
Memory Management
For longer conversations, you might want to limit the history:
def trim_conversation_history(self, max_messages: int = 10): if len(self.conversation_history) > max_messages: # Keep system message and last N messages self.conversation_history = self.conversation_history[-max_messages:]
That's it! While we don't have the full Assistants API yet, we can still create powerful assistant-like experiences using the chat completions API. The best part? It's all running locally on your machine.