anything-llm

mirror of https://github.com/Mintplex-Labs/anything-llm.git synced 2024-11-04 22:10:12 +01:00

Open-source multi-user ChatGPT for all LLMs, embedders, and vector databases. Unlimited documents, messages, and users in one privacy-focused app.

Go to file

Sean Hatfield a126b5f5aa Replace custom sqlite dbms with prisma (#239 ) * WIP converted all sqlite models into prisma calls * modify db setup and fix ApiKey model calls in admin.js * renaming function params to be consistent * converted adminEndpoints to utilize prisma orm * converted chatEndpoints to utilize prisma orm * converted inviteEndpoints to utilize prisma orm * converted systemEndpoints to utilize prisma orm * converted workspaceEndpoints to utilize prisma orm * converting sql queries to prisma calls * fixed default param bug for orderBy and limit * fixed typo for workspace chats * fixed order of deletion to account for sql relations * fix invite CRUD and workspace management CRUD * fixed CRUD for api keys * created prisma setup scripts/docs for understanding how to use prisma * prisma dependency change * removing unneeded console.logs * removing unneeded sql escape function * linting and creating migration script * migration from depreciated sqlite script update * removing unneeded migrations in prisma folder * create backup of old sqlite db and use transactions to ensure all operations complete successfully * adding migrations to gitignore * updated PRISMA.md docs for info on how to use sqlite migration script * comment changes * adding back migrations folder to repo * Reviewing SQL and prisma integraiton on fresh repo * update inline key replacement * ensure migration script executes and maps foreign_keys regardless of db ordering * run migration endpoint * support new prisma backend * bump version * change migration call --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>		2023-09-28 14:00:03 -07:00
.vscode	Add Qdrant support for embedding, chat, and conversation (#192 )	2023-08-15 15:26:44 -07:00
cloud-deployments	changing the build behavior in the aws cloudformation template from [… (#247 )	2023-09-25 20:51:39 -07:00
collector	Display better error messages from document processor (#243 )	2023-09-18 16:50:20 -07:00
docker	Replace custom sqlite dbms with prisma (#239 )	2023-09-28 14:00:03 -07:00
frontend	Replace custom sqlite dbms with prisma (#239 )	2023-09-28 14:00:03 -07:00
images	add feedback form, hosting link, update readme, show promo image	2023-08-11 17:28:30 -07:00
server	Replace custom sqlite dbms with prisma (#239 )	2023-09-28 14:00:03 -07:00
.dockerignore	dont ignore frontend prod envs during docker build	2023-06-13 16:57:26 -07:00
.editorconfig	Franzbischoff document improvements (#241 )	2023-09-18 16:21:37 -07:00
.gitattributes	dockerfile cleanup; enforce text LF line endings (#81 )	2023-06-17 20:18:01 -07:00
.gitignore	Replace custom sqlite dbms with prisma (#239 )	2023-09-28 14:00:03 -07:00
.nvmrc	add .nvmrc in root	2023-06-08 10:35:36 -07:00
clean.sh	resolves #14 (#15 )	2023-06-09 12:59:22 -07:00
LICENSE	inital commit ⚡	2023-06-03 19:28:07 -07:00
package.json	Replace custom sqlite dbms with prisma (#239 )	2023-09-28 14:00:03 -07:00
README.md	Update readme to not prefer Pinecone	2023-09-12 14:58:14 -07:00
SECURITY.md	Create SECURITY.md	2023-09-08 16:31:30 -07:00

README.md

AnythingLLM: A business-compliant document chatbot.
A hyper-efficient and open-source enterprise-ready document chatbot solution for all.

| | Docs | Hosted Instance

A full-stack application that enables you to turn any document, resource, or piece of content into context that any LLM can use as references during chatting. This application allows you to pick and choose which LLM or Vector Database you want to use. Currently this project supports Pinecone, ChromaDB & more for vector storage and OpenAI for LLM/chatting.

view more screenshots

Watch the demo!

Product Overview

AnythingLLM aims to be a full-stack application where you can use commercial off-the-shelf LLMs or popular open source LLMs and vectorDB solutions.

Anything LLM is a full-stack product that you can run locally as well as host remotely and be able to chat intelligently with any documents you provide it.

AnythingLLM divides your documents into objects called workspaces. A Workspace functions a lot like a thread, but with the addition of containerization of your documents. Workspaces can share documents, but they do not talk to each other so you can keep your context for each workspace clean.

Some cool features of AnythingLLM

Multi-user instance support and oversight
Atomically manage documents in your vector database from a simple UI
Two chat modes conversation and query. Conversation retains previous questions and amendments. Query is simple QA against your documents
Each chat response contains a citation that is linked to the original content
Simple technology stack for fast iteration
100% Cloud deployment ready.
"Bring your own LLM" model. still in progress - openai support only currently
Extremely efficient cost-saving measures for managing very large documents. You'll never pay to embed a massive document or transcript more than once. 90% more cost effective than other document chatbot solutions.
Full Developer API for custom integrations!

Technical Overview

This monorepo consists of three main sections:

collector: Python tools that enable you to quickly convert online resources or local documents into LLM useable format.
frontend: A viteJS + React frontend that you can run to easily create and manage all your content the LLM can use.
server: A nodeJS + express server to handle all the interactions and do all the vectorDB management and LLM interactions.

Requirements

yarn and node on your machine
python 3.9+ for running scripts in collector/.
access to an LLM like GPT-3.5, GPT-4.
(optional) a vector database like Pinecone, qDrant, Weaviate, or Chroma*. *AnythingLLM by default uses a built-in vector db called LanceDB.

How to get started (Docker - simple setup)

Get up and running in minutes with Docker

How to get started (Development environment)

yarn setup from the project root directory.
- This will fill in the required .env files you'll need in each of the application sections. Go fill those out before proceeding or else things won't work right.
cd frontend && yarn install && cd ../server && yarn install from the project root directory.

To boot the server locally (run commands from root of repo):

ensure server/.env.development is set and filled out. yarn dev:server

To boot the frontend locally (run commands from root of repo):

ensure frontend/.env is set and filled out.
ensure VITE_API_BASE="http://localhost:3001/api" yarn dev:frontend

Next, you will need some content to embed. This could be a Youtube Channel, Medium articles, local text files, word documents, and the list goes on. This is where you will use the collector/ part of the repo.

Go set up and run collector scripts

Learn about documents

Learn about vector caching

Contributing

create issue
create PR with branch name format of <issue number>-<short name>
yee haw let's merge

Telemetry

AnythingLLM by Mintplex Labs Inc contains a telemetry feature that collects anonymous usage information.

Why?

We use this information to help us understand how AnythingLLM is used, to help us prioritize work on new features and bug fixes, and to help us improve AnythingLLM's performance and stability.

Opting out

Set DISABLE_TELEMETRY in your server or docker .env settings to "true" to opt out of telemetry.

DISABLE_TELEMETRY="true"

What do you explicitly track?

We will only track usage details that help us make product and roadmap decisions, specifically:

Version of your installation
When a document is added or removed. No information about the document. Just that the event occurred. This gives us an idea of use.
Type of vector database in use. Let's us know which vector database provider is the most used to prioritize changes when updates arrive for that provider.
Type of LLM in use. Let's us know the most popular choice and prioritize changes when updates arrive for that provider.
Chat is sent. This is the most regular "event" and gives us an idea of the daily-activity of this project across all installations. Again, only the event is sent - we have no information on the nature or content of the chat itself.

You can verify these claims by finding all locations Telemetry.sendTelemetry is called. Additionally these events are written to the output log so you can also see the specific data which was sent - if enabled. No IP or other identifying information is collected. The Telemetry provider is PostHog - an open-source telemetry collection service.