API Reference

This page provides detailed documentation for the local_llm_kit API.

LLMClient

class LLMClient(model: str, **kwargs)

The main client class for interacting with local language models.

Parameters:

model – The name or path of the model to use
model_path – Optional path to model weights
context_length – Maximum context length (default: 2048)
temperature – Sampling temperature (default: 0.7)
top_p – Top-p sampling parameter (default: 0.9)
backend – Model backend to use (‘transformers’ or ‘llama.cpp’)

Chat Completions

LLMClient.chat.completions.create(**kwargs)

Create a chat completion.

Parameters:

model – Model to use for completion
messages – List of message dictionaries
temperature – Sampling temperature
top_p – Top-p sampling parameter
max_tokens – Maximum tokens to generate
stream – Whether to stream the response
functions – List of function definitions
function_call – Function call behavior
response_format – Specify response format (e.g., JSON)

Returns:

CompletionResponse object

Memory Management

LLMClient.enable_memory(max_tokens: int = 1000)

Enable conversation memory management.

Parameters:: max_tokens – Maximum tokens to store in memory

LLMClient.add_to_memory(messages: List[Dict])

Add messages to conversation memory.

Parameters:: messages – List of message dictionaries

LLMClient.clear_memory(): Clear all stored conversation memory.

Response Objects

CompletionResponse

class CompletionResponse

Represents a completion response.

Parameters:

id – Response ID
object – Object type
created – Creation timestamp
model – Model used
choices – List of completion choices
usage – Token usage statistics

Choice

class Choice

Represents a completion choice.

Parameters:

index – Choice index
message – Message content
finish_reason – Reason for completion

Message

class Message

Represents a chat message.

Parameters:

role – Message role (user/assistant/system)
content – Message content
function_call – Optional function call

Usage

class Usage

Token usage statistics.

Parameters:

prompt_tokens – Number of tokens in prompt
completion_tokens – Number of tokens in completion
total_tokens – Total tokens used

Exceptions

exception ModelNotFoundError: Raised when specified model is not found.

exception InvalidRequestError: Raised when request parameters are invalid.

exception TokenLimitError: Raised when token limit is exceeded.

Configuration

The following environment variables can be used to configure the client:

LOCAL_LLM_KIT_MODEL_PATH: Default path to model weights
LOCAL_LLM_KIT_BACKEND: Default backend to use
LOCAL_LLM_KIT_CONTEXT_LENGTH: Default context length
LOCAL_LLM_KIT_CACHE_DIR: Directory for caching model weights