anything-llm

mirror of https://github.com/Mintplex-Labs/anything-llm.git synced 2024-11-19 12:40:09 +01:00

Author	SHA1	Message	Date
Timothy Carambat	a598c8e04c	1347 human readable confluence url (#1706 ) * chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones * chore: formatting as per yarn lint * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * refactor implementation of various types of Confluence URL patterns --------- Co-authored-by: Predrag Stojadinovic <predrag@stojadinovic.net> Co-authored-by: Predrag Stojadinović <cope@users.noreply.github.com> Co-authored-by: Predrag Stojadinovic <predrags@nvidia.com>	2024-06-17 16:04:20 -07:00
timothycarambat	393772c4a5	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2024-06-12 09:05:57 -07:00
Timothy Carambat	98cef508a6	Feature/devcontv2 (#1622 ) * Updated apt-packages source for devcontainer Switched the devcontainer's package source to a different repository to align with updated dependencies and package availability. The previous source from 'rocker-org' is replaced with 'devcontainers-contrib', which may offer more recent or relevant development tools. * Subject: Centralize prettier ignores and refine config Body: Centralized all prettier ignore rules by removing individual `.prettierignore` files in subprojects and updating the root `.prettierignore` to include previously ignored patterns, ensuring consistency across the workspace. Additionally, the prettier configuration was refined by making the file pattern for `.config.js` files consistent and adjusting quote styles for better readability. All lint scripts across the project were updated to respect the centralized ignore path, enhancing maintainability. The consolidation simplifies the process of managing ignore rules as the project scales, ensuring developers can focus on writing code without worrying about divergent formatting standards. These changes also align with introducing comprehensive linting across multiple environments to keep the codebase clean and consistent. This adjustment is a foundational step towards a more streamlined and unified code base, making it easier for new contributors to adhere to established coding standards and reducing the cognitive load associated with managing multiple configuration files across the project. * unset package json changes --------- Co-authored-by: Francisco Bischoff <franzbischoff@gmail.com> Co-authored-by: Francisco Bischoff <984592+franzbischoff@users.noreply.github.com>	2024-06-06 12:50:42 -07:00
Chris Daniel	8a4dd2bdf5	[FEAT] add support for TSX files to be parsed as text (#1597 ) add support for TSX files to be parsed as text	2024-06-03 17:01:41 +08:00
Sean Hatfield	9a38b32c74	[FEAT] Add support for R files to be parsed as text (#1577 ) add support for R files to be parsed as text	2024-05-31 13:52:00 +08:00
Sean Hatfield	4324a8bb4f	[FEAT] Github repo loader bug fix (#1558 ) * fix project names with special characters for github repo data connector * linting	2024-05-29 17:01:29 +08:00
timothycarambat	6e8a327d98	merge with master	2024-05-23 12:58:36 -07:00
Timothy Carambat	a89812703b	repatch path normalization (#1516 )	2024-05-23 12:52:04 -07:00
timothycarambat	05488c81e0	undo path norm whitespace fix	2024-05-23 12:04:00 -07:00
timothycarambat	c6ad94d81a	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2024-05-22 13:43:09 -05:00
timothycarambat	e208074ef4	patch path normalization	2024-05-22 11:50:01 -05:00
timothycarambat	c65ab6d863	merge with master	2024-05-21 14:48:16 -05:00
Timothy Carambat	1a5aacb001	Support multi-model whispers (#1444 )	2024-05-17 21:31:29 -07:00
Timothy Carambat	7e0b638a2c	Patch confluence URL patterns(#1426 ) * patch confluence patterns --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-05-16 14:15:59 -07:00
timothycarambat	87b41a60e9	refactor spaceKey url pattern for custom domains	2024-05-16 11:01:34 -07:00
Predrag Stojadinović	cf969adf37	1362 custom display confluence url (#1423 ) * chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones * chore: formatting as per yarn lint * chore: adding /display/ url matching to confluence data connector	2024-05-16 10:46:18 -07:00
timothycarambat	d603d0fd51	patch:update storage for bulk-website scraper for render	2024-05-14 12:59:14 -07:00
timothycarambat	c8dac6177a	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2024-05-14 12:57:44 -07:00
timothycarambat	b5ac944475	patch: bulk-scraper, update when folder is made and path creation params	2024-05-14 12:57:23 -07:00
timothycarambat	72c9fda6c9	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2024-05-14 12:50:17 -07:00
Sean Hatfield	612a7e1662	[FEAT] Website depth scraping data connector (#1191 ) * WIP website depth scraping, (sort of works) * website depth data connector stable + add maxLinks option * linting + loading small ui tweak * refactor website depth data connector for stability, speed, & readability * patch: remove console log Guard clause on URL validitiy check reasonable overrides --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-05-14 12:49:14 -07:00
jazelly	d71db22799	fix: skip undefined confluence pageContent (#1383 ) Refs: https://github.com/Mintplex-Labs/anything-llm/issues/1381 Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-05-14 10:22:13 -07:00
Predrag Stojadinović	78e3e35d27	[FEAT] Confluence Data Connector handles custom Confluence urls (#1362 ) * chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones * chore: formatting as per yarn lint	2024-05-14 10:21:04 -07:00
timothycarambat	c60077a078	merge with master	2024-05-03 10:02:53 -07:00
timothycarambat	2d215acb75	patch storage dirs for extensions	2024-05-02 14:03:10 -07:00
timothycarambat	1aa8e5766f	duplicate key (no impact)	2024-05-02 13:05:20 -07:00
timothycarambat	6150ff41ea	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2024-05-01 13:33:07 -07:00
Timothy Carambat	547d4859ef	Bump `openai` package to latest (#1234 ) * Bump `openai` package to latest Tested all except localai * bump LocalAI support with latest image * add deprecation notice * linting	2024-04-30 12:33:42 -07:00
Timothy Carambat	94017e2b51	bump langchain deps (#1231 ) * bump langchain deps * patch native and ollama providers remove deprecated deps --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-04-30 12:04:24 -07:00
Sean Hatfield	348b36bf85	[FEAT] Confluence data connector (#1181 ) * WIP Confluence data connector backend * confluence data connector complete * confluence citations * fix citation for confluence * Patch confulence integration * fix Citation Icon for confluence --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-04-25 17:53:38 -07:00
timothycarambat	e1372a81d4	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2024-04-20 18:22:41 -07:00
Ken Kuang	a3b7239d05	Fix Cannot read properties of undefined (reading 'length') (#1145 ) Fix upload failed	2024-04-20 12:28:19 -07:00
timothycarambat	45505630a6	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2024-04-17 11:55:57 -07:00
Timothy Carambat	a5bb77f97a	Agent support for `@agent` default agent inside workspace chat (#1093 ) V1 of agent support via built-in `@agent` that can be invoked alongside normal workspace RAG chat.	2024-04-16 10:50:10 -07:00
timothycarambat	fde4e5400f	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2024-04-12 14:57:46 -07:00
Sean Hatfield	af84b01482	[FIX] GitHub repo with periods in link fix (#1084 ) fix periods in github repo links bug	2024-04-12 14:56:59 -07:00
Timothy Carambat	2c6135aa54	patch file types as plaintext (#1095 ) resolves #1089	2024-04-12 14:54:33 -07:00
timothycarambat	75ced7e65a	merge with master Patch LLM selection for native to be disabled	2024-04-07 14:55:18 -07:00
Timothy Carambat	1f8ab0d245	Remove YoutubeLoader dependency (#1050 ) * WIP data connector redesign * new UI for data connectors complete * remove old data connector page/cleanup imports * cleanup of UI and imports * Remove Youtube Transcript dep and move in-house * lang pref default to en --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-04-05 16:33:01 -07:00
timothycarambat	2638098d49	patch with master	2024-04-05 09:45:28 -07:00
timothycarambat	0b454016cf	patch comkey path to fallback	2024-04-04 10:47:26 -07:00
timothycarambat	a4c1d42e41	merge with master	2024-04-02 14:33:32 -07:00
timothycarambat	e524afae9e	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm	2024-04-02 14:30:27 -07:00
timothycarambat	117c3b2bfb	forgot epub file!	2024-04-02 14:30:20 -07:00
Timothy Carambat	4fb4aa2041	Add epub support for parsing (#1017 )	2024-04-02 14:25:52 -07:00
Timothy Carambat	752e3e22ed	Add more text file forced extensions (#1016 )	2024-04-02 14:13:11 -07:00
Timothy Carambat	f4088d9348	RSA-Signing on server<->collector communication via API (#1005 ) * WIP integrity check between processes * Implement integrity checking on document processor payloads	2024-04-01 13:56:35 -07:00
timothycarambat	971c54e2c8	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2024-03-26 14:12:09 -07:00
Sean Hatfield	45f50ce13c	[FIX] Update metadata tags in PDF collector script (#925 ) update title in pdf collector script to be the filename instead of metadata title	2024-03-19 18:14:34 -07:00
timothycarambat	540d18ec84	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2024-03-18 09:52:11 -07:00
Timothy Carambat	0ada882991	Support external transcription providers (#909 ) * Support External Transcription providers * patch files * update docs * fix return data	2024-03-14 15:43:26 -07:00
timothycarambat	429ea0c805	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2024-03-12 12:29:57 -07:00
Timothy Carambat	0f31e43fd4	bump YT metadata lib for YT api fix rot (#888 )	2024-03-11 10:57:53 -07:00
timothycarambat	65f8a01505	merge with master	2024-03-06 16:43:36 -08:00
Timothy Carambat	ec90060d36	Re-map some file mimes to support text (#842 ) re-map some file mimes to support text	2024-02-29 10:05:03 -08:00
timothycarambat	2b6e1db79b	merge with master	2024-02-27 23:12:09 -08:00
Timothy Carambat	6d18d79bb7	Generic upload fallback as text file. (#808 ) * Do not block any file upload fallback unknown/unsupported types to text if possible * reduce call for frontend * patch	2024-02-26 13:43:54 -08:00
timothycarambat	ae01785220	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2024-02-21 15:11:45 -08:00
Timothy Carambat	d89610586a	improve error messages from YT scraping (#768 ) parse & enforce URL to allow multiple URL schemas	2024-02-21 10:47:10 -08:00
Timothy Carambat	49fbd09af4	Support more plaintext filetypes (#757 ) * Add more plaintext document types org-mode, asciidoc, and reStructuredText are all text formats Signed-off-by: Christian Romney <christian.a.romney@gmail.com> * lint --------- Signed-off-by: Christian Romney <christian.a.romney@gmail.com> Co-authored-by: Christian Romney <christian.a.romney@gmail.com>	2024-02-19 10:44:01 -08:00
Timothy Carambat	d52f8aafd4	689 links in citation (#715 ) * Include links in citations force ChunkSource key to retain this information old links will be unsupported * show special icons depending on source * remove console log * reset server documents writeTo	2024-02-13 14:11:57 -08:00
Timothy Carambat	48cb8f2897	Add support to upload rawText document via api (#692 ) * Add support to upload rawText document via api * update API doc endpoint with correct textContent key * update response swagger doc	2024-02-07 15:17:32 -08:00
Sean Hatfield	288ff0d18c	fix vector cache not deleting cache after unembedding items with folders (#630 )	2024-01-22 13:03:05 -08:00
Timothy Carambat	0db6c3b2aa	Prevent private octets from link collection for self-hosted (#626 )	2024-01-19 10:49:40 -08:00
timothycarambat	addb3d0c3e	Update Render.com image for AnythignLLM to latest	2024-01-17 18:12:25 -08:00
Timothy Carambat	b35feede87	570 document api return object (#608 ) * Add support for fetching single document in documents folder * Add document object to upload + support link scraping via API * hotfixes for documentation * update api docs	2024-01-16 16:04:22 -08:00
Timothy Carambat	1563a1b20f	Strict link protocol validation (#577 )	2024-01-11 12:29:00 -08:00
timothycarambat	a48a5ad6ad	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2024-01-08 17:01:23 -08:00
Timothy Carambat	58971e8b30	Build & Publish AnythingLLM for ARM64 and x86 (#549 ) * Update build process to support multi-platform builds Bump @lancedb/vectordb to 0.1.19 for ARM&AMD compatibility Patch puppeteer on ARM builds because of broken chromium resolves #539 resolves #548 --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-01-08 16:15:01 -08:00
Francisco Bischoff	990a2e85bf	devcontainer v1 (#297 ) Implement support for GitHub codespaces and VSCode devcontainers --------- Co-authored-by: timothycarambat <rambat1010@gmail.com> Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>	2024-01-08 15:31:06 -08:00
timothycarambat	26549df6a9	touchup linting	2023-12-27 13:28:37 -08:00
timothycarambat	daadad3859	hoist var in extensions	2023-12-20 19:41:16 -08:00
timothycarambat	1ca06cc3e1	Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render	2023-12-19 16:23:19 -08:00
Timothy Carambat	f2fadd6d2e	Add placeholder collector ENV file (#476 ) resolves #474	2023-12-19 13:27:09 -08:00
timothycarambat	0eb2fe7248	Map .env to storage .env file map writeToServerDocuments to resolve to fixed storage mount for Render	2023-12-19 11:35:20 -08:00
Timothy Carambat	ecf4295537	Add ability to grab youtube transcripts via doc processor (#470 ) * Add ability to grab youtube transcripts via doc processor * dynamic imports swap out Github for Youtube in placeholder text	2023-12-18 17:17:26 -08:00
Timothy Carambat	452582489e	GitHub loader extension + extension support v1 (#469 ) * feat: implement github repo loading fix: purge of folders fix: rendering of sub-files * noshow delete on custom-documents * Add API key support because of rate limits * WIP for frontend of data connectors * wip * Add frontend form for GitHub repo data connector * remove console.logs block custom-documents from being deleted * remove _meta unused arg * Add support for ignore pathing in request Ignore path input via tagging * Update hint	2023-12-18 15:48:02 -08:00
timothycarambat	d2e3506bb9	fix: transition on LLM and embedding screen linting	2023-12-15 12:40:11 -08:00
Timothy Carambat	61db981017	feat: Embed on-instance Whisper model for audio/mp4 transcribing (#449 ) * feat: Embed on-instance Whisper model for audio/mp4 transcribing resolves #329 * additional logging * add placeholder for tmp folder in collector storage Add cleanup of hotdir and tmp on collector boot to prevent hanging files split loading of model and file conversion into concurrency * update README * update model size * update supported filetypes	2023-12-15 11:20:13 -08:00
Timothy Carambat	719521c307	Document Processor v2 (#442 ) * wip: init refactor of document processor to JS * add NodeJs PDF support * wip: partity with python processor feat: add pptx support * fix: forgot files * Remove python scripts totally * wip:update docker to boot new collector * add package.json support * update dockerfile for new build * update gitignore and linting * add more protections on file lookup * update package.json * test build * update docker commands to use cap-add=SYS_ADMIN so web scraper can run update all scripts to reflect this remove docker build for branch	2023-12-14 15:14:56 -08:00
Timothy Carambat	da0cec7aa2	patch: remove unidecode as it was transliterating non-latin chars (#434 ) resolves #298	2023-12-13 11:54:55 -08:00
Timothy Carambat	ce9233c258	feat: enable HTML uploads from UI (#422 ) resolves #418	2023-12-11 14:40:33 -08:00
timothycarambat	b583aa74fd	remove prints	2023-11-16 17:17:52 -08:00
Sean Hatfield	7edfccaf9a	Adding url uploads to document picker (#375 ) * WIP adding url uploads to document picker * fix manual script for uploading url to custom-documents * fix metadata for url scraping * wip url parsing * update how async link scraping works * docker-compose defaults added no autocomplete on URLs --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2023-11-16 17:15:01 -08:00
Sean Hatfield	f40309cfdb	Add id to all metadata to prevent errors in frontend document picker (#378 ) add id to all metadata to prevent errors in frontend docuemnt picker Co-authored-by: timothycarambat <rambat1010@gmail.com>	2023-11-16 14:36:26 -08:00
timothycarambat	1e3d82e184	patch collector script	2023-11-16 10:25:23 -08:00
timothycarambat	c5dc68633b	patch link scrape tool schema	2023-11-14 16:41:39 -08:00
Timothy Carambat	5441717294	normalize parser struct for all file types (#321 )	2023-11-01 16:44:02 -07:00
Francisco Bischoff	26dba59249	mbox parsing improvements v1 (#308 ) * mbox parsing improvements v1 * autobots roll out!	2023-10-30 11:57:33 -07:00
Timothy Carambat	18798c5b64	prevent deletion of documents not in hotdir via director traversal (#258 ) resolves #257	2023-09-29 11:04:47 -07:00
Timothy Carambat	a505928934	Display better error messages from document processor (#243 ) pass messages to frontend on success/failure resolves #242	2023-09-18 16:50:20 -07:00
Timothy Carambat	3e78476739	Franzbischoff document improvements (#241 ) * cosmetic changes to be compatible to hadolint * common configuration for most editors until better plugins comes up * Changes on PDF metadata, using PyMuPDF (faster and more compatible) * small changes on other file ingestions in order to try to keep the fields equal * Lint, review, and review * fixed unknown chars * Use PyMuPDF for pdf loading for 200% speed increase linting --------- Co-authored-by: Francisco Bischoff <franzbischoff@gmail.com> Co-authored-by: Francisco Bischoff <984592+franzbischoff@users.noreply.github.com>	2023-09-18 16:21:37 -07:00
Melroy van den Berg	16b8330fbf	Update requirements.txt (#185 ) Upgrade fake-useragent to latest version (v1.2.1). Disclaimer: I'm the package maintainer.	2023-08-14 14:38:14 -07:00
Timothy Carambat	b42493c6de	Split large PDFS into subfolder in documents (#176 ) append time value to folder name to prevent duplicate uploads	2023-08-03 18:57:50 -07:00
AntonioCiolino	31e5db7490	Twitter Feature (#134 ) * . * twitter feature update * Key validation and operation	2023-07-06 14:05:50 -07:00
Timothy Carambat	d7315b0e53	be able to parse relative and FQDN links from root reliabily (#138 )	2023-07-05 14:40:54 -07:00
mplawner	3efe55a720	Added mbox support (#106 ) * Update filetypes.py Added mbox format * Created new file Added support for mbox files as used by many email services, including Google Takeout's Gmail archive. * Update filetypes.py * Update as_mbox.py	2023-06-25 18:11:05 -07:00
AntonioCiolino	a52b0ae655	Updated Link scraper to avoid NoneType error. (#90 ) * Enable web scraping based on a urtl and a simple filter. * ignore yarn * Updated Link scraper to avoid NoneType error.	2023-06-19 12:07:26 -07:00
frasergr	4079020de0	dockerfile cleanup; enforce text LF line endings (#81 )	2023-06-17 20:18:01 -07:00
AntonioCiolino	e7ba028497	Enable web scraping based on a urtl and a simple filter. (#73 )	2023-06-16 17:29:11 -07:00

1 2 3 4

159 Commits