anything-llm

mirror of https://github.com/Mintplex-Labs/anything-llm.git synced 2024-11-15 19:00:33 +01:00

Author	SHA1	Message	Date
Jason Zhang	fa4ab0f65f	fix: sanitize filename before writing (#1743 ) * fix: sanitize filename before writing Fixes: https://github.com/Mintplex-Labs/anything-llm/issues/1737 * fixup * fixup	2024-06-25 15:45:09 -07:00
Timothy Carambat	dc4ad6b5a9	[BETA] Live document sync (#1719 ) * wip bg workers for live document sync * Add ability to re-embed specific documents across many workspaces via background queue bgworkser is gated behind expieremental system setting flag that needs to be explictly enabled UI for watching/unwatching docments that are embedded. TODO: UI to easily manage all bg tasks and see run results TODO: UI to enable this feature and background endpoints to manage it * create frontend views and paths Move elements to correct experimental scope * update migration to delete runs on removal of watched document * Add watch support to YouTube transcripts (#1716) * Add watch support to YouTube transcripts refactor how sync is done for supported types * Watch specific files in Confluence space (#1718) Add failure-prune check for runs * create tmp workflow modifications for beta image * create tmp workflow modifications for beta image * create tmp workflow modifications for beta image * dual build update copy of alert modals * update job interval * Add support for live-sync of Github files * update copy for document sync feature * hide Experimental features from UI * update docs links * [FEAT] Implement new settings menu for experimental features (#1735) * implement new settings menu for experimental features * remove unused context save bar --------- Co-authored-by: timothycarambat <rambat1010@gmail.com> * dont run job on boot * unset workflow changes * Add persistent encryption service Relay key to collector so persistent encryption can be used Encrypt any private data in chunkSources used for replay during resync jobs * update jsDOC * Linting and organization * update modal copy for feature --------- Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>	2024-06-21 13:38:50 -07:00
Timothy Carambat	a598c8e04c	1347 human readable confluence url (#1706 ) * chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones * chore: formatting as per yarn lint * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * refactor implementation of various types of Confluence URL patterns --------- Co-authored-by: Predrag Stojadinovic <predrag@stojadinovic.net> Co-authored-by: Predrag Stojadinović <cope@users.noreply.github.com> Co-authored-by: Predrag Stojadinovic <predrags@nvidia.com>	2024-06-17 16:04:20 -07:00
Chris Daniel	8a4dd2bdf5	[FEAT] add support for TSX files to be parsed as text (#1597 ) add support for TSX files to be parsed as text	2024-06-03 17:01:41 +08:00
Sean Hatfield	9a38b32c74	[FEAT] Add support for R files to be parsed as text (#1577 ) add support for R files to be parsed as text	2024-05-31 13:52:00 +08:00
Sean Hatfield	4324a8bb4f	[FEAT] Github repo loader bug fix (#1558 ) * fix project names with special characters for github repo data connector * linting	2024-05-29 17:01:29 +08:00
Timothy Carambat	a89812703b	repatch path normalization (#1516 )	2024-05-23 12:52:04 -07:00
timothycarambat	05488c81e0	undo path norm whitespace fix	2024-05-23 12:04:00 -07:00
timothycarambat	e208074ef4	patch path normalization	2024-05-22 11:50:01 -05:00
Timothy Carambat	1a5aacb001	Support multi-model whispers (#1444 )	2024-05-17 21:31:29 -07:00
Timothy Carambat	7e0b638a2c	Patch confluence URL patterns(#1426 ) * patch confluence patterns --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-05-16 14:15:59 -07:00
timothycarambat	87b41a60e9	refactor spaceKey url pattern for custom domains	2024-05-16 11:01:34 -07:00
Predrag Stojadinović	cf969adf37	1362 custom display confluence url (#1423 ) * chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones * chore: formatting as per yarn lint * chore: adding /display/ url matching to confluence data connector	2024-05-16 10:46:18 -07:00
timothycarambat	b5ac944475	patch: bulk-scraper, update when folder is made and path creation params	2024-05-14 12:57:23 -07:00
Sean Hatfield	612a7e1662	[FEAT] Website depth scraping data connector (#1191 ) * WIP website depth scraping, (sort of works) * website depth data connector stable + add maxLinks option * linting + loading small ui tweak * refactor website depth data connector for stability, speed, & readability * patch: remove console log Guard clause on URL validitiy check reasonable overrides --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-05-14 12:49:14 -07:00
jazelly	d71db22799	fix: skip undefined confluence pageContent (#1383 ) Refs: https://github.com/Mintplex-Labs/anything-llm/issues/1381 Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-05-14 10:22:13 -07:00
Predrag Stojadinović	78e3e35d27	[FEAT] Confluence Data Connector handles custom Confluence urls (#1362 ) * chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones * chore: formatting as per yarn lint	2024-05-14 10:21:04 -07:00
timothycarambat	2d215acb75	patch storage dirs for extensions	2024-05-02 14:03:10 -07:00
timothycarambat	1aa8e5766f	duplicate key (no impact)	2024-05-02 13:05:20 -07:00
Timothy Carambat	547d4859ef	Bump `openai` package to latest (#1234 ) * Bump `openai` package to latest Tested all except localai * bump LocalAI support with latest image * add deprecation notice * linting	2024-04-30 12:33:42 -07:00
Sean Hatfield	348b36bf85	[FEAT] Confluence data connector (#1181 ) * WIP Confluence data connector backend * confluence data connector complete * confluence citations * fix citation for confluence * Patch confulence integration * fix Citation Icon for confluence --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-04-25 17:53:38 -07:00
Sean Hatfield	af84b01482	[FIX] GitHub repo with periods in link fix (#1084 ) fix periods in github repo links bug	2024-04-12 14:56:59 -07:00
Timothy Carambat	2c6135aa54	patch file types as plaintext (#1095 ) resolves #1089	2024-04-12 14:54:33 -07:00
Timothy Carambat	1f8ab0d245	Remove YoutubeLoader dependency (#1050 ) * WIP data connector redesign * new UI for data connectors complete * remove old data connector page/cleanup imports * cleanup of UI and imports * Remove Youtube Transcript dep and move in-house * lang pref default to en --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-04-05 16:33:01 -07:00
timothycarambat	0b454016cf	patch comkey path to fallback	2024-04-04 10:47:26 -07:00
Timothy Carambat	4fb4aa2041	Add epub support for parsing (#1017 )	2024-04-02 14:25:52 -07:00
Timothy Carambat	752e3e22ed	Add more text file forced extensions (#1016 )	2024-04-02 14:13:11 -07:00
Timothy Carambat	f4088d9348	RSA-Signing on server<->collector communication via API (#1005 ) * WIP integrity check between processes * Implement integrity checking on document processor payloads	2024-04-01 13:56:35 -07:00
Timothy Carambat	0ada882991	Support external transcription providers (#909 ) * Support External Transcription providers * patch files * update docs * fix return data	2024-03-14 15:43:26 -07:00
Timothy Carambat	ec90060d36	Re-map some file mimes to support text (#842 ) re-map some file mimes to support text	2024-02-29 10:05:03 -08:00
Timothy Carambat	6d18d79bb7	Generic upload fallback as text file. (#808 ) * Do not block any file upload fallback unknown/unsupported types to text if possible * reduce call for frontend * patch	2024-02-26 13:43:54 -08:00
Timothy Carambat	d89610586a	improve error messages from YT scraping (#768 ) parse & enforce URL to allow multiple URL schemas	2024-02-21 10:47:10 -08:00
Timothy Carambat	49fbd09af4	Support more plaintext filetypes (#757 ) * Add more plaintext document types org-mode, asciidoc, and reStructuredText are all text formats Signed-off-by: Christian Romney <christian.a.romney@gmail.com> * lint --------- Signed-off-by: Christian Romney <christian.a.romney@gmail.com> Co-authored-by: Christian Romney <christian.a.romney@gmail.com>	2024-02-19 10:44:01 -08:00
Timothy Carambat	d52f8aafd4	689 links in citation (#715 ) * Include links in citations force ChunkSource key to retain this information old links will be unsupported * show special icons depending on source * remove console log * reset server documents writeTo	2024-02-13 14:11:57 -08:00
Sean Hatfield	288ff0d18c	fix vector cache not deleting cache after unembedding items with folders (#630 )	2024-01-22 13:03:05 -08:00
Timothy Carambat	0db6c3b2aa	Prevent private octets from link collection for self-hosted (#626 )	2024-01-19 10:49:40 -08:00
Timothy Carambat	b35feede87	570 document api return object (#608 ) * Add support for fetching single document in documents folder * Add document object to upload + support link scraping via API * hotfixes for documentation * update api docs	2024-01-16 16:04:22 -08:00
Timothy Carambat	1563a1b20f	Strict link protocol validation (#577 )	2024-01-11 12:29:00 -08:00
Timothy Carambat	ecf4295537	Add ability to grab youtube transcripts via doc processor (#470 ) * Add ability to grab youtube transcripts via doc processor * dynamic imports swap out Github for Youtube in placeholder text	2023-12-18 17:17:26 -08:00
Timothy Carambat	452582489e	GitHub loader extension + extension support v1 (#469 ) * feat: implement github repo loading fix: purge of folders fix: rendering of sub-files * noshow delete on custom-documents * Add API key support because of rate limits * WIP for frontend of data connectors * wip * Add frontend form for GitHub repo data connector * remove console.logs block custom-documents from being deleted * remove _meta unused arg * Add support for ignore pathing in request Ignore path input via tagging * Update hint	2023-12-18 15:48:02 -08:00
timothycarambat	d2e3506bb9	fix: transition on LLM and embedding screen linting	2023-12-15 12:40:11 -08:00
Timothy Carambat	61db981017	feat: Embed on-instance Whisper model for audio/mp4 transcribing (#449 ) * feat: Embed on-instance Whisper model for audio/mp4 transcribing resolves #329 * additional logging * add placeholder for tmp folder in collector storage Add cleanup of hotdir and tmp on collector boot to prevent hanging files split loading of model and file conversion into concurrency * update README * update model size * update supported filetypes	2023-12-15 11:20:13 -08:00
Timothy Carambat	719521c307	Document Processor v2 (#442 ) * wip: init refactor of document processor to JS * add NodeJs PDF support * wip: partity with python processor feat: add pptx support * fix: forgot files * Remove python scripts totally * wip:update docker to boot new collector * add package.json support * update dockerfile for new build * update gitignore and linting * add more protections on file lookup * update package.json * test build * update docker commands to use cap-add=SYS_ADMIN so web scraper can run update all scripts to reflect this remove docker build for branch	2023-12-14 15:14:56 -08:00

43 Commits