API Reference
This page provides detailed documentation for the local_llm_kit API.
LLMClient
- class LLMClient(model: str, **kwargs)
The main client class for interacting with local language models.
- Parameters:
model – The name or path of the model to use
model_path – Optional path to model weights
context_length – Maximum context length (default: 2048)
temperature – Sampling temperature (default: 0.7)
top_p – Top-p sampling parameter (default: 0.9)
backend – Model backend to use (‘transformers’ or ‘llama.cpp’)
Chat Completions
- LLMClient.chat.completions.create(**kwargs)
Create a chat completion.
- Parameters:
model – Model to use for completion
messages – List of message dictionaries
temperature – Sampling temperature
top_p – Top-p sampling parameter
max_tokens – Maximum tokens to generate
stream – Whether to stream the response
functions – List of function definitions
function_call – Function call behavior
response_format – Specify response format (e.g., JSON)
- Returns:
CompletionResponse object
Memory Management
- LLMClient.enable_memory(max_tokens: int = 1000)
Enable conversation memory management.
- Parameters:
max_tokens – Maximum tokens to store in memory
- LLMClient.add_to_memory(messages: List[Dict])
Add messages to conversation memory.
- Parameters:
messages – List of message dictionaries
- LLMClient.clear_memory()
Clear all stored conversation memory.
Response Objects
CompletionResponse
- class CompletionResponse
Represents a completion response.
- Parameters:
id – Response ID
object – Object type
created – Creation timestamp
model – Model used
choices – List of completion choices
usage – Token usage statistics
Choice
- class Choice
Represents a completion choice.
- Parameters:
index – Choice index
message – Message content
finish_reason – Reason for completion
Message
- class Message
Represents a chat message.
- Parameters:
role – Message role (user/assistant/system)
content – Message content
function_call – Optional function call
Usage
- class Usage
Token usage statistics.
- Parameters:
prompt_tokens – Number of tokens in prompt
completion_tokens – Number of tokens in completion
total_tokens – Total tokens used
Exceptions
- exception ModelNotFoundError
Raised when specified model is not found.
- exception InvalidRequestError
Raised when request parameters are invalid.
- exception TokenLimitError
Raised when token limit is exceeded.
Configuration
The following environment variables can be used to configure the client:
LOCAL_LLM_KIT_MODEL_PATH: Default path to model weightsLOCAL_LLM_KIT_BACKEND: Default backend to useLOCAL_LLM_KIT_CONTEXT_LENGTH: Default context lengthLOCAL_LLM_KIT_CACHE_DIR: Directory for caching model weights