Settings

Configuration

Tune model selection, retrieval parameters, and chunking strategy.

Settings are saved to your browser and applied to all subsequent Workspace queries. The model selection and top-K are passed directly to the backend API on each request.

Model Selection

LLaMA 3 70B

Most capable. Best for complex reasoning.

~1.4s avg

LLaMA 3 8B

Faster. Good for simple factual queries.

~0.6s avg

Mixtral 8×7B

Large context window. Good for long documents.

~1.1s avg

Vector Store

FAISS is recommended for local use. ChromaDB supports persistence and metadata filtering.

Retrieval Parameters

Top-K Chunks

Number of chunks retrieved per query

5

120

Chunk Size

Tokens per document chunk

512t

128t2048t

Chunk Overlap

Token overlap between adjacent chunks

64t

0t256t

Temperature

LLM generation randomness (0 = deterministic)

0.10

0.001.00