655ebd9479
* Implement use of native embedder (all-Mini-L6-v2) stop showing prisma queries during dev * Add native embedder as an available embedder selection * wrap model loader in try/catch * print progress on download * add built-in LLM support (expiermental) * Update to progress output for embedder * move embedder selection options to component * saftey checks for modelfile * update ref * Hide selection when on hosted subdomain * update documentation hide localLlama when on hosted * saftey checks for storage of models * update dockerfile to pre-build Llama.cpp bindings * update lockfile * add langchain doc comment * remove extraneous --no-metal option * Show data handling for private LLM * persist model in memory for N+1 chats * update import update dev comment on token model size * update primary README * chore: more readme updates and remove screenshots - too much to maintain, just use the app! * remove screeshot link |
||
---|---|---|
.. | ||
.gitignore | ||
README.md |
Native models used by AnythingLLM
This folder is specifically created as a local cache and storage folder that is used for native models that can run on a CPU.
Currently, AnythingLLM uses this folder for the following parts of the application.
Embedding
When your embedding engine preference is native
we will use the ONNX all-MiniLM-L6-v2 model built by Xenova on HuggingFace.co. This model is a quantized and WASM version of the popular all-MiniLM-L6-v2 which produces a 384-dimension vector.
If you are using the native
embedding engine your vector database should be configured to accept 384-dimension models if that parameter is directly editable (Pinecone only).
Text generation (LLM selection)
Important
Use of a locally running LLM model is experimental and may behave unexpectedly, crash, or not function at all. We suggest for production-use of a local LLM model to use a purpose-built inference server like LocalAI or LMStudio.
Tip
We recommend at least using a 4-bit or 5-bit quantized model for your LLM. Lower quantization models tend to just output unreadable garbage.
If you would like to use a local Llama compatible LLM model for chatting you can select any model from this HuggingFace search filter
Requirements
- Model must be in the latest
GGUF
format - Model should be compatible with latest
llama.cpp
- You should have the proper RAM to run such a model. Requirement depends on model size.
Where do I put my GGUF model?
Important
If running in Docker you should be running the container to a mounted storage location on the host machine so you can update the storage files directly without having to re-download or re-build your docker container. See suggested Docker config
All local models you want to have available for LLM selection should be placed in the storage/models/downloaded
folder. Only .gguf
files will be allowed to be selected from the UI.