API Reference

This page provides detailed documentation for the local_llm_kit API.

LLMClient

class LLMClient(model: str, **kwargs)

The main client class for interacting with local language models.

Parameters:
  • model – The name or path of the model to use

  • model_path – Optional path to model weights

  • context_length – Maximum context length (default: 2048)

  • temperature – Sampling temperature (default: 0.7)

  • top_p – Top-p sampling parameter (default: 0.9)

  • backend – Model backend to use (‘transformers’ or ‘llama.cpp’)

Chat Completions

LLMClient.chat.completions.create(**kwargs)

Create a chat completion.

Parameters:
  • model – Model to use for completion

  • messages – List of message dictionaries

  • temperature – Sampling temperature

  • top_p – Top-p sampling parameter

  • max_tokens – Maximum tokens to generate

  • stream – Whether to stream the response

  • functions – List of function definitions

  • function_call – Function call behavior

  • response_format – Specify response format (e.g., JSON)

Returns:

CompletionResponse object

Memory Management

LLMClient.enable_memory(max_tokens: int = 1000)

Enable conversation memory management.

Parameters:

max_tokens – Maximum tokens to store in memory

LLMClient.add_to_memory(messages: List[Dict])

Add messages to conversation memory.

Parameters:

messages – List of message dictionaries

LLMClient.clear_memory()

Clear all stored conversation memory.

Response Objects

CompletionResponse

class CompletionResponse

Represents a completion response.

Parameters:
  • id – Response ID

  • object – Object type

  • created – Creation timestamp

  • model – Model used

  • choices – List of completion choices

  • usage – Token usage statistics

Choice

class Choice

Represents a completion choice.

Parameters:
  • index – Choice index

  • message – Message content

  • finish_reason – Reason for completion

Message

class Message

Represents a chat message.

Parameters:
  • role – Message role (user/assistant/system)

  • content – Message content

  • function_call – Optional function call

Usage

class Usage

Token usage statistics.

Parameters:
  • prompt_tokens – Number of tokens in prompt

  • completion_tokens – Number of tokens in completion

  • total_tokens – Total tokens used

Exceptions

exception ModelNotFoundError

Raised when specified model is not found.

exception InvalidRequestError

Raised when request parameters are invalid.

exception TokenLimitError

Raised when token limit is exceeded.

Configuration

The following environment variables can be used to configure the client:

  • LOCAL_LLM_KIT_MODEL_PATH: Default path to model weights

  • LOCAL_LLM_KIT_BACKEND: Default backend to use

  • LOCAL_LLM_KIT_CONTEXT_LENGTH: Default context length

  • LOCAL_LLM_KIT_CACHE_DIR: Directory for caching model weights