mirror of
https://github.com/Mintplex-Labs/anything-llm.git
synced 2024-11-19 12:40:09 +01:00
61db981017
* feat: Embed on-instance Whisper model for audio/mp4 transcribing resolves #329 * additional logging * add placeholder for tmp folder in collector storage Add cleanup of hotdir and tmp on collector boot to prevent hanging files split loading of model and file conversion into concurrency * update README * update model size * update supported filetypes
42 lines
3.0 KiB
Markdown
42 lines
3.0 KiB
Markdown
# Native models used by AnythingLLM
|
|
|
|
This folder is specifically created as a local cache and storage folder that is used for native models that can run on a CPU.
|
|
|
|
Currently, AnythingLLM uses this folder for the following parts of the application.
|
|
|
|
## Embedding
|
|
When your embedding engine preference is `native` we will use the ONNX **all-MiniLM-L6-v2** model built by [Xenova on HuggingFace.co](https://huggingface.co/Xenova/all-MiniLM-L6-v2). This model is a quantized and WASM version of the popular [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) which produces a 384-dimension vector.
|
|
|
|
If you are using the `native` embedding engine your vector database should be configured to accept 384-dimension models if that parameter is directly editable (Pinecone only).
|
|
|
|
## Audio/Video transcription
|
|
AnythingLLM allows you to upload various audio and video formats as source documents. In all cases the audio tracks will be transcribed by a locally running ONNX model **whisper-small** built by [Xenova on HuggingFace.co](https://huggingface.co/Xenova/whisper-small). The model is a smaller version of the OpenAI Whisper model. Given the model runs locally on CPU, larger files will result in longer transcription times.
|
|
|
|
Once transcribed you can embed these transcriptions into your workspace like you would any other file!
|
|
|
|
## Text generation (LLM selection)
|
|
> [!IMPORTANT]
|
|
> Use of a locally running LLM model is **experimental** and may behave unexpectedly, crash, or not function at all.
|
|
> We suggest for production-use of a local LLM model to use a purpose-built inference server like [LocalAI](https://localai.io) or [LMStudio](https://lmstudio.ai).
|
|
|
|
> [!TIP]
|
|
> We recommend at _least_ using a 4-bit or 5-bit quantized model for your LLM. Lower quantization models tend to
|
|
> just output unreadable garbage.
|
|
|
|
If you would like to use a local Llama compatible LLM model for chatting you can select any model from this [HuggingFace search filter](https://huggingface.co/models?pipeline_tag=text-generation&library=gguf&other=text-generation-inference&sort=trending)
|
|
|
|
**Requirements**
|
|
- Model must be in the latest `GGUF` format
|
|
- Model should be compatible with latest `llama.cpp`
|
|
- You should have the proper RAM to run such a model. Requirement depends on model size.
|
|
|
|
### Where do I put my GGUF model?
|
|
> [!IMPORTANT]
|
|
> If running in Docker you should be running the container to a mounted storage location on the host machine so you
|
|
> can update the storage files directly without having to re-download or re-build your docker container. [See suggested Docker config](../../../README.md#recommended-usage-with-docker-easy)
|
|
|
|
> [!NOTE]
|
|
> `/server/storage/models/downloaded` is the default location that your model files should be at.
|
|
> Your storage directory may differ if you changed the STORAGE_DIR environment variable.
|
|
|
|
All local models you want to have available for LLM selection should be placed in the `server/storage/models/downloaded` folder. Only `.gguf` files will be allowed to be selected from the UI. |