Settings
Configuration
Tune model selection, retrieval parameters, and chunking strategy.
Settings are saved to your browser and applied to all subsequent Workspace queries. The model selection and top-K are passed directly to the backend API on each request.
Model Selection
LLaMA 3 70B
Most capable. Best for complex reasoning.
~1.4s avg
LLaMA 3 8B
Faster. Good for simple factual queries.
~0.6s avg
Mixtral 8×7B
Large context window. Good for long documents.
~1.1s avg
Vector Store
FAISS is recommended for local use. ChromaDB supports persistence and metadata filtering.
Retrieval Parameters
Top-K Chunks
Number of chunks retrieved per query
5
120
Chunk Size
Tokens per document chunk
512t
128t2048t
Chunk Overlap
Token overlap between adjacent chunks
64t
0t256t
Temperature
LLM generation randomness (0 = deterministic)
0.10
0.001.00