anything-llm

mirror of https://github.com/Mintplex-Labs/anything-llm.git synced 2024-11-15 10:50:31 +01:00

Author	SHA1	Message	Date
Sean Hatfield	a87014822a	[REFACTOR] Improve asPDF collector processor with pdfjs (#1791 ) * WIP replace langchain pdfloader with pdfjs and add more context to each page * remove extras from pdfjs and just replace langchain library * remove unneeded dep * fix console log in docs --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-07-03 14:26:48 -07:00
Sean Hatfield	f205d51fe9	[FIX] Confluence code snippet blocks not being extracted (#1804 ) implement custom confluence loader to extract code blocks properly from documents Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-07-03 14:00:44 -07:00
Sean Hatfield	fc375f4036	[FIX] Bulk link scraper bug fix (#1800 ) patch website depth data connector to work for other links that are not root url	2024-07-01 16:59:28 -07:00
Jason Zhang	fa4ab0f65f	fix: sanitize filename before writing (#1743 ) * fix: sanitize filename before writing Fixes: https://github.com/Mintplex-Labs/anything-llm/issues/1737 * fixup * fixup	2024-06-25 15:45:09 -07:00
Timothy Carambat	dc4ad6b5a9	[BETA] Live document sync (#1719 ) * wip bg workers for live document sync * Add ability to re-embed specific documents across many workspaces via background queue bgworkser is gated behind expieremental system setting flag that needs to be explictly enabled UI for watching/unwatching docments that are embedded. TODO: UI to easily manage all bg tasks and see run results TODO: UI to enable this feature and background endpoints to manage it * create frontend views and paths Move elements to correct experimental scope * update migration to delete runs on removal of watched document * Add watch support to YouTube transcripts (#1716) * Add watch support to YouTube transcripts refactor how sync is done for supported types * Watch specific files in Confluence space (#1718) Add failure-prune check for runs * create tmp workflow modifications for beta image * create tmp workflow modifications for beta image * create tmp workflow modifications for beta image * dual build update copy of alert modals * update job interval * Add support for live-sync of Github files * update copy for document sync feature * hide Experimental features from UI * update docs links * [FEAT] Implement new settings menu for experimental features (#1735) * implement new settings menu for experimental features * remove unused context save bar --------- Co-authored-by: timothycarambat <rambat1010@gmail.com> * dont run job on boot * unset workflow changes * Add persistent encryption service Relay key to collector so persistent encryption can be used Encrypt any private data in chunkSources used for replay during resync jobs * update jsDOC * Linting and organization * update modal copy for feature --------- Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>	2024-06-21 13:38:50 -07:00
Timothy Carambat	a598c8e04c	1347 human readable confluence url (#1706 ) * chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones * chore: formatting as per yarn lint * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * refactor implementation of various types of Confluence URL patterns --------- Co-authored-by: Predrag Stojadinovic <predrag@stojadinovic.net> Co-authored-by: Predrag Stojadinović <cope@users.noreply.github.com> Co-authored-by: Predrag Stojadinovic <predrags@nvidia.com>	2024-06-17 16:04:20 -07:00
Timothy Carambat	98cef508a6	Feature/devcontv2 (#1622 ) * Updated apt-packages source for devcontainer Switched the devcontainer's package source to a different repository to align with updated dependencies and package availability. The previous source from 'rocker-org' is replaced with 'devcontainers-contrib', which may offer more recent or relevant development tools. * Subject: Centralize prettier ignores and refine config Body: Centralized all prettier ignore rules by removing individual `.prettierignore` files in subprojects and updating the root `.prettierignore` to include previously ignored patterns, ensuring consistency across the workspace. Additionally, the prettier configuration was refined by making the file pattern for `.config.js` files consistent and adjusting quote styles for better readability. All lint scripts across the project were updated to respect the centralized ignore path, enhancing maintainability. The consolidation simplifies the process of managing ignore rules as the project scales, ensuring developers can focus on writing code without worrying about divergent formatting standards. These changes also align with introducing comprehensive linting across multiple environments to keep the codebase clean and consistent. This adjustment is a foundational step towards a more streamlined and unified code base, making it easier for new contributors to adhere to established coding standards and reducing the cognitive load associated with managing multiple configuration files across the project. * unset package json changes --------- Co-authored-by: Francisco Bischoff <franzbischoff@gmail.com> Co-authored-by: Francisco Bischoff <984592+franzbischoff@users.noreply.github.com>	2024-06-06 12:50:42 -07:00
Chris Daniel	8a4dd2bdf5	[FEAT] add support for TSX files to be parsed as text (#1597 ) add support for TSX files to be parsed as text	2024-06-03 17:01:41 +08:00
Sean Hatfield	9a38b32c74	[FEAT] Add support for R files to be parsed as text (#1577 ) add support for R files to be parsed as text	2024-05-31 13:52:00 +08:00
Sean Hatfield	4324a8bb4f	[FEAT] Github repo loader bug fix (#1558 ) * fix project names with special characters for github repo data connector * linting	2024-05-29 17:01:29 +08:00
Timothy Carambat	a89812703b	repatch path normalization (#1516 )	2024-05-23 12:52:04 -07:00
timothycarambat	05488c81e0	undo path norm whitespace fix	2024-05-23 12:04:00 -07:00
timothycarambat	e208074ef4	patch path normalization	2024-05-22 11:50:01 -05:00
Timothy Carambat	1a5aacb001	Support multi-model whispers (#1444 )	2024-05-17 21:31:29 -07:00
Timothy Carambat	7e0b638a2c	Patch confluence URL patterns(#1426 ) * patch confluence patterns --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-05-16 14:15:59 -07:00
timothycarambat	87b41a60e9	refactor spaceKey url pattern for custom domains	2024-05-16 11:01:34 -07:00
Predrag Stojadinović	cf969adf37	1362 custom display confluence url (#1423 ) * chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones * chore: formatting as per yarn lint * chore: adding /display/ url matching to confluence data connector	2024-05-16 10:46:18 -07:00
timothycarambat	b5ac944475	patch: bulk-scraper, update when folder is made and path creation params	2024-05-14 12:57:23 -07:00
Sean Hatfield	612a7e1662	[FEAT] Website depth scraping data connector (#1191 ) * WIP website depth scraping, (sort of works) * website depth data connector stable + add maxLinks option * linting + loading small ui tweak * refactor website depth data connector for stability, speed, & readability * patch: remove console log Guard clause on URL validitiy check reasonable overrides --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-05-14 12:49:14 -07:00
jazelly	d71db22799	fix: skip undefined confluence pageContent (#1383 ) Refs: https://github.com/Mintplex-Labs/anything-llm/issues/1381 Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-05-14 10:22:13 -07:00
Predrag Stojadinović	78e3e35d27	[FEAT] Confluence Data Connector handles custom Confluence urls (#1362 ) * chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones * chore: formatting as per yarn lint	2024-05-14 10:21:04 -07:00
timothycarambat	2d215acb75	patch storage dirs for extensions	2024-05-02 14:03:10 -07:00
timothycarambat	1aa8e5766f	duplicate key (no impact)	2024-05-02 13:05:20 -07:00
Timothy Carambat	547d4859ef	Bump `openai` package to latest (#1234 ) * Bump `openai` package to latest Tested all except localai * bump LocalAI support with latest image * add deprecation notice * linting	2024-04-30 12:33:42 -07:00
Timothy Carambat	94017e2b51	bump langchain deps (#1231 ) * bump langchain deps * patch native and ollama providers remove deprecated deps --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-04-30 12:04:24 -07:00
Sean Hatfield	348b36bf85	[FEAT] Confluence data connector (#1181 ) * WIP Confluence data connector backend * confluence data connector complete * confluence citations * fix citation for confluence * Patch confulence integration * fix Citation Icon for confluence --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-04-25 17:53:38 -07:00
Ken Kuang	a3b7239d05	Fix Cannot read properties of undefined (reading 'length') (#1145 ) Fix upload failed	2024-04-20 12:28:19 -07:00
Timothy Carambat	a5bb77f97a	Agent support for `@agent` default agent inside workspace chat (#1093 ) V1 of agent support via built-in `@agent` that can be invoked alongside normal workspace RAG chat.	2024-04-16 10:50:10 -07:00
Sean Hatfield	af84b01482	[FIX] GitHub repo with periods in link fix (#1084 ) fix periods in github repo links bug	2024-04-12 14:56:59 -07:00
Timothy Carambat	2c6135aa54	patch file types as plaintext (#1095 ) resolves #1089	2024-04-12 14:54:33 -07:00
Timothy Carambat	1f8ab0d245	Remove YoutubeLoader dependency (#1050 ) * WIP data connector redesign * new UI for data connectors complete * remove old data connector page/cleanup imports * cleanup of UI and imports * Remove Youtube Transcript dep and move in-house * lang pref default to en --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-04-05 16:33:01 -07:00
timothycarambat	0b454016cf	patch comkey path to fallback	2024-04-04 10:47:26 -07:00
timothycarambat	e524afae9e	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm	2024-04-02 14:30:27 -07:00
timothycarambat	117c3b2bfb	forgot epub file!	2024-04-02 14:30:20 -07:00
Timothy Carambat	4fb4aa2041	Add epub support for parsing (#1017 )	2024-04-02 14:25:52 -07:00
Timothy Carambat	752e3e22ed	Add more text file forced extensions (#1016 )	2024-04-02 14:13:11 -07:00
Timothy Carambat	f4088d9348	RSA-Signing on server<->collector communication via API (#1005 ) * WIP integrity check between processes * Implement integrity checking on document processor payloads	2024-04-01 13:56:35 -07:00
Sean Hatfield	45f50ce13c	[FIX] Update metadata tags in PDF collector script (#925 ) update title in pdf collector script to be the filename instead of metadata title	2024-03-19 18:14:34 -07:00
Timothy Carambat	0ada882991	Support external transcription providers (#909 ) * Support External Transcription providers * patch files * update docs * fix return data	2024-03-14 15:43:26 -07:00
Timothy Carambat	0f31e43fd4	bump YT metadata lib for YT api fix rot (#888 )	2024-03-11 10:57:53 -07:00
Timothy Carambat	ec90060d36	Re-map some file mimes to support text (#842 ) re-map some file mimes to support text	2024-02-29 10:05:03 -08:00
Timothy Carambat	6d18d79bb7	Generic upload fallback as text file. (#808 ) * Do not block any file upload fallback unknown/unsupported types to text if possible * reduce call for frontend * patch	2024-02-26 13:43:54 -08:00
Timothy Carambat	d89610586a	improve error messages from YT scraping (#768 ) parse & enforce URL to allow multiple URL schemas	2024-02-21 10:47:10 -08:00
Timothy Carambat	49fbd09af4	Support more plaintext filetypes (#757 ) * Add more plaintext document types org-mode, asciidoc, and reStructuredText are all text formats Signed-off-by: Christian Romney <christian.a.romney@gmail.com> * lint --------- Signed-off-by: Christian Romney <christian.a.romney@gmail.com> Co-authored-by: Christian Romney <christian.a.romney@gmail.com>	2024-02-19 10:44:01 -08:00
Timothy Carambat	d52f8aafd4	689 links in citation (#715 ) * Include links in citations force ChunkSource key to retain this information old links will be unsupported * show special icons depending on source * remove console log * reset server documents writeTo	2024-02-13 14:11:57 -08:00
Timothy Carambat	48cb8f2897	Add support to upload rawText document via api (#692 ) * Add support to upload rawText document via api * update API doc endpoint with correct textContent key * update response swagger doc	2024-02-07 15:17:32 -08:00
Sean Hatfield	288ff0d18c	fix vector cache not deleting cache after unembedding items with folders (#630 )	2024-01-22 13:03:05 -08:00
Timothy Carambat	0db6c3b2aa	Prevent private octets from link collection for self-hosted (#626 )	2024-01-19 10:49:40 -08:00
Timothy Carambat	b35feede87	570 document api return object (#608 ) * Add support for fetching single document in documents folder * Add document object to upload + support link scraping via API * hotfixes for documentation * update api docs	2024-01-16 16:04:22 -08:00
Timothy Carambat	1563a1b20f	Strict link protocol validation (#577 )	2024-01-11 12:29:00 -08:00

1 2

89 Commits