Commit Graph

129 Commits

Author SHA1 Message Date
timothycarambat
a7f6003277 fix: set lower maxChunk limit on native embedder to stay within resource constraints
chore: update comment for what embedding chunk means
2023-12-19 16:20:34 -08:00
Timothy Carambat
ecf4295537
Add ability to grab youtube transcripts via doc processor (#470)
* Add ability to grab youtube transcripts via doc processor

* dynamic imports
swap out Github for Youtube in placeholder text
2023-12-18 17:17:26 -08:00
Timothy Carambat
452582489e
GitHub loader extension + extension support v1 (#469)
* feat: implement github repo loading
fix: purge of folders
fix: rendering of sub-files

* noshow delete on custom-documents

* Add API key support because of rate limits

* WIP for frontend of data connectors

* wip

* Add frontend form for GitHub repo data connector

* remove console.logs
block custom-documents from being deleted

* remove _meta unused arg

* Add support for ignore pathing in request
Ignore path input via tagging

* Update hint
2023-12-18 15:48:02 -08:00
Timothy Carambat
65c7c0a518
fix: patch api key not persisting when setting LLM/Embedder (#458) 2023-12-16 10:21:36 -08:00
Timothy Carambat
61db981017
feat: Embed on-instance Whisper model for audio/mp4 transcribing (#449)
* feat: Embed on-instance Whisper model for audio/mp4 transcribing
resolves #329

* additional logging

* add placeholder for tmp folder in collector storage
Add cleanup of hotdir and tmp on collector boot to prevent hanging files
split loading of model and file conversion into concurrency

* update README

* update model size

* update supported filetypes
2023-12-15 11:20:13 -08:00
Timothy Carambat
719521c307
Document Processor v2 (#442)
* wip: init refactor of document processor to JS

* add NodeJs PDF support

* wip: partity with python processor
feat: add pptx support

* fix: forgot files

* Remove python scripts totally

* wip:update docker to boot new collector

* add package.json support

* update dockerfile for new build

* update gitignore and linting

* add more protections on file lookup

* update package.json

* test build

* update docker commands to use cap-add=SYS_ADMIN so web scraper can run
update all scripts to reflect this
remove docker build for branch
2023-12-14 15:14:56 -08:00
timothycarambat
5f6a013139 Change server bootup log 2023-12-14 13:52:11 -08:00
Timothy Carambat
1e98da07bc
docs: placeholder for model downloads folder (#446) 2023-12-14 10:31:14 -08:00
Timothy Carambat
37cdb845a4
patch: implement @lunamidori hotfix for LocalAI streaming chunk overflows (#433)
* patch: implement @lunamidori hotfix for LocalAI streaming chunk overflows
resolves #416

* change log to error log

* log trace

* lint
2023-12-12 16:20:06 -08:00
Timothy Carambat
d4f4d85492
patch: fix non-latin filenames being encoded improperly during upload and chat (#432)
patch: fix non-latin filenames being messed up during upload and chat
connect #169
resolves #427
2023-12-12 16:07:23 -08:00
Timothy Carambat
a84333901a
feat: implement questionnaire during onboarding (optional) (#429)
fix: PFP url check
2023-12-12 13:11:32 -08:00
Timothy Carambat
cba66150d7
patch: API key to localai service calls (#421)
connect #417
2023-12-11 14:18:28 -08:00
Timothy Carambat
8cc1455b72
feat: add support for variable chunk length (#415)
fix: cleanup code for embedding length clarify
resolves #388
2023-12-07 16:27:36 -08:00
Timothy Carambat
655ebd9479
[Feature] AnythingLLM use locally hosted Llama.cpp and GGUF files for inferencing (#413)
* Implement use of native embedder (all-Mini-L6-v2)
stop showing prisma queries during dev

* Add native embedder as an available embedder selection

* wrap model loader in try/catch

* print progress on download

* add built-in LLM support (expiermental)

* Update to progress output for embedder

* move embedder selection options to component

* saftey checks for modelfile

* update ref

* Hide selection when on hosted subdomain

* update documentation
hide localLlama when on hosted

* saftey checks for storage of models

* update dockerfile to pre-build Llama.cpp bindings

* update lockfile

* add langchain doc comment

* remove extraneous --no-metal option

* Show data handling for private LLM

* persist model in memory for N+1 chats

* update import
update dev comment on token model size

* update primary README

* chore: more readme updates and remove screenshots - too much to maintain, just use the app!

* remove screeshot link
2023-12-07 14:48:27 -08:00
timothycarambat
fecfb0fafc chore: remove unused NO_DEBUG env 2023-12-07 14:14:30 -08:00
Sean Hatfield
fcb591d364
Add user PFP support and context to logo (#408)
* fix sizing of onboarding modals & lint

* fix extra scrolling on mobile onboarding flow

* added message to use desktop for onboarding

* linting

* add arrow to scroll to bottom (debounced) and fix chat scrolling to always scroll to very bottom on message history change

* fix for empty chat

* change mobile alert copy

* WIP adding PFP upload support

* WIP pfp for users

* edit account menu complete with change username/password and upload profile picture

* add pfp context to update all instances of usePfp hook on update

* linting

* add context for logo change to immediately update logo

* fix div with bullet points to use list-disc instead

* fix: small changes

* update multer file storage locations

* fix: use STORAGE_DIR for filepathing

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-12-07 14:11:51 -08:00
timothycarambat
33de34f8dc add embedding engine to telem 2023-12-07 08:53:37 -08:00
timothycarambat
79cdb8631a fix: fix logo fetching raising errors in server 2023-12-06 11:56:07 -08:00
Timothy Carambat
88cdd8c872
Add built-in embedding engine into AnythingLLM (#411)
* Implement use of native embedder (all-Mini-L6-v2)
stop showing prisma queries during dev

* Add native embedder as an available embedder selection

* wrap model loader in try/catch

* print progress on download

* Update to progress output for embedder

* move embedder selection options to component

* forgot import

* add Data privacy alert updates for local embedder
2023-12-06 10:36:22 -08:00
pritchey
732d07829f
401-Password Complexity Check Capability (#402)
* Added improved password complexity checking capability.

* Move password complexity checker as User.util
dynamically import required libraries depending on code execution flow
lint

* Ensure persistence of password requirements on restarts via env-dump
Copy example schema to docker env as well

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-12-05 09:13:06 -08:00
Timothy Carambat
6fa8b0ce93
Add API key option to LocalAI (#407)
* Add API key option to LocalAI

* add api key for model dropdown selector
2023-12-04 08:38:15 -08:00
Timothy Carambat
55d319b527
Rehash password for admin-user pwd updates (#398)
resolved #397
2023-11-27 12:47:07 -06:00
Sean Hatfield
7edfccaf9a
Adding url uploads to document picker (#375)
* WIP adding url uploads to document picker

* fix manual script for uploading url to custom-documents

* fix metadata for url scraping

* wip url parsing

* update how async link scraping works

* docker-compose defaults added
no autocomplete on URLs

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-11-16 17:15:01 -08:00
Sean Hatfield
5ad8a5f2d0
Allow use of any embedder for any llm/update data handling modal (#386)
* allow use of any embedder for any llm/update data handling modal

* Apply embedder override and fallback to OpenAI and Azure models

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-11-16 15:19:49 -08:00
Sean Hatfield
73f342eb19
Warning about switching embedder or vectordb (#385)
* added warning modal to LLM preference

* added warning modal for changing embedder

* remove warning from LLM preference & add warning to vector database selection

* linting

* remove comments and move warning modal to component

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-11-16 14:35:14 -08:00
Timothy Carambat
085745c5e4
Prevent lone-admin from locking themselves out the system (#376)
resolves #367
2023-11-14 14:43:40 -08:00
Sean Hatfield
1aa58dcb7b
Disable prisma logs on prod (#371)
* disable prisma logs on prod

* linting

* keep const top level

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-11-14 13:53:11 -08:00
Tobias Landenberger
a96a9d41a3
LocalAI for embeddings (#361)
* feature: add localAi as embedding provider

* chore: add LocalAI image

* chore: add localai embedding examples to docker .env.example

* update setting env
pull models from localai API

* update comments on embedder
Dont show cost estimation on UI

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-11-14 13:49:31 -08:00
Timothy Carambat
4bb99ab4bf
Support LocalAi as LLM provider by @tlandenberger (#373)
* feature: add LocalAI as llm provider

* update Onboarding/mgmt settings
Grab models from models endpoint for localai
merge with master

* update streaming for complete chunk streaming
update localAI LLM to be able to stream

* force schema on URL

---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
Co-authored-by: tlandenberger <tobiaslandenberger@gmail.com>
2023-11-14 12:31:44 -08:00
Timothy Carambat
6957bc3ec0
Robots.txt (#369)
* assume default model where appropriate

* merge with master and fix other model refs

* disallow robots

* add public file
2023-11-13 15:22:24 -08:00
Timothy Carambat
8743be679b
assume default model where appropriate (#366)
* assume default model where appropriate

* merge with master and fix other model refs
2023-11-13 15:17:22 -08:00
Timothy Carambat
c22c50cca8
Enable chat streaming for LLMs (#354)
* [Draft] Enable chat streaming for LLMs

* stream only, move sendChat to deprecated

* Update TODO deprecation comments
update console output color for streaming disabled
2023-11-13 15:07:30 -08:00
Sean Hatfield
fa29003a46
Create manager role and limit default role (#351)
* added manager role to options

* block default role from editing workspace settings on workspace and text input box

* block default user from accessing settings at all

* create manager route

* let pass through if in single user mode

* fix permissions for manager and admin roles in settings

* fix settings button for single user and remove unneeded console.logs

* rename routes and paths for clarity

* admin, manager, default roles complete

* remove unneeded comments

* consistency changes

* manage permissions for mum modes

* update sidebar for single-user mode

* update comment on middleware
Modify permission setting for admins

* update render conditional

* Add role usage hint to each role

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-11-13 14:51:16 -08:00
Timothy Carambat
2b17bf26a8
Posthog telemetry updates (#356)
track subuser anon
2023-11-10 16:02:46 -08:00
Tobias Landenberger
2914c09dd5
fix: adjust return type of addDocuments in case of no additions (#353) 2023-11-10 13:27:53 -08:00
Francisco Bischoff
f499f1ba59
Using OpenAI API locally (#335)
* Using OpenAI API locally

* Infinite prompt input and compression implementation (#332)

* WIP on continuous prompt window summary

* wip

* Move chat out of VDB
simplify chat interface
normalize LLM model interface
have compression abstraction
Cleanup compressor
TODO: Anthropic stuff

* Implement compression for Anythropic
Fix lancedb sources

* cleanup vectorDBs and check that lance, chroma, and pinecone are returning valid metadata sources

* Resolve Weaviate citation sources not working with schema

* comment cleanup

* disable import on hosted instances (#339)

* disable import on hosted instances

* Update UI on disabled import/export

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>

* Add support for gpt-4-turbo 128K model (#340)

resolves #336
Add support for gpt-4-turbo 128K model

* 315 show citations based on relevancy score (#316)

* settings for similarity score threshold and prisma schema updated

* prisma schema migration for adding similarityScore setting

* WIP

* Min score default change

* added similarityThreshold checking for all vectordb providers

* linting

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>

* rename localai to lmstudio

* forgot files that were renamed

* normalize model interface

* add model and context window limits

* update LMStudio tagline

* Fully working LMStudio integration

---------
Co-authored-by: Francisco Bischoff <984592+franzbischoff@users.noreply.github.com>
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>
2023-11-09 12:33:21 -08:00
Timothy Carambat
1ec774ab2e
[Chore] replace all React-feather icons with phosphor icons fully (#349)
replace all React-feather icons with phosphor icons fully
remove package-lock.json files - yarn only
2023-11-09 08:55:20 -08:00
Sean Hatfield
997482ef8f
added JSONL export to workspace chats (#345)
* added JSONL export to workspace chats

* change permissions for workspace chat settings

* change permissions for workspace chat settings

* Show error for correct limit on fine-tune
Change sidebar position and permission
Remove check for MUM

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-11-08 17:36:54 -08:00
Timothy Carambat
88d4808c52
315 show citations based on relevancy score (#316)
* settings for similarity score threshold and prisma schema updated

* prisma schema migration for adding similarityScore setting

* WIP

* Min score default change

* added similarityThreshold checking for all vectordb providers

* linting

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2023-11-06 16:49:29 -08:00
Timothy Carambat
d34ec68702
Add support for gpt-4-turbo 128K model (#340)
resolves #336
Add support for gpt-4-turbo 128K model
2023-11-06 14:22:19 -08:00
Timothy Carambat
be9d8b0397
Infinite prompt input and compression implementation (#332)
* WIP on continuous prompt window summary

* wip

* Move chat out of VDB
simplify chat interface
normalize LLM model interface
have compression abstraction
Cleanup compressor
TODO: Anthropic stuff

* Implement compression for Anythropic
Fix lancedb sources

* cleanup vectorDBs and check that lance, chroma, and pinecone are returning valid metadata sources

* Resolve Weaviate citation sources not working with schema

* comment cleanup
2023-11-06 13:13:53 -08:00
Timothy Carambat
0751fb1fdd
Fix missing import on API (#333) 2023-11-03 12:50:56 -07:00
timothycarambat
c3abbfbf27 Fix admin chat pagination 2023-11-02 16:12:29 -07:00
timothycarambat
24823cb5e2 patch workspace chat history windows to persist most recent chats, not the top n 2023-11-01 14:12:27 -07:00
Timothy Carambat
67c85f1550
Implement retrieval and use of fine-tune models (#314)
* Implement retrieval and use of fine-tune models
Cleanup LLM selection code
resolves #311

* Cleanup from PR bot
2023-10-31 11:38:28 -07:00
timothycarambat
745d2aeaff fix import path 2023-10-30 15:49:29 -07:00
Timothy Carambat
5d56ab623b
Anthropic claude 2 support (#305)
* WIP Anythropic support for chat, chat and query w/context

* Add onboarding support for Anthropic

* cleanup

* fix Anthropic answer parsing
move embedding selector to general util
2023-10-30 15:44:03 -07:00
Sean Hatfield
669d7a396d
282 return relevancy score with similarityresponse (#304)
* include score value in similarityResponse for weaviate

* include score value in si
milarityResponse for qdrant

* include score value in si
milarityResponse for pinecone

* include score value in similarityResponse for chroma

* include score value in similarityResponse for lancedb

* distance to similarity

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-10-30 12:46:38 -07:00
Sean Hatfield
27809b34b5
Added telemetry for onboarding completion (#295)
* added telemetry for onboarding completion

* minor changes

* linting and remove empty object

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-10-26 13:49:01 -07:00
Timothy Carambat
a8ec0d9584
Compensate for upper OpenAI emedding limit chunk size (#292)
Limit is due to POST body max size. Sufficiently large requests will abort automatically
We should report that error back on the frontend during embedding
Update vectordb providers to return on failed
2023-10-26 10:57:37 -07:00