anything-llm

mirror of https://github.com/Mintplex-Labs/anything-llm.git synced 2024-11-19 12:40:09 +01:00

Author	SHA1	Message	Date
Timothy Carambat	30645831a1	1959 filetype filters (#2378 ) * Updated the `GitHubRepoLoader` class to use the new import syntax and adjust the `recursiveLoader` method accordingly. * add @langchain/community to collector package.json * fix: Improve handling of complex ignore patterns in GitLabRepoLoader * refactor: use ignore package for simplified ignore logic * run yarn lint * add @langchain/community@^0.2.23 * remove unused dep lint --------- Co-authored-by: Emil Rofors (aider) <emirof@gmail.com>	2024-09-26 12:50:35 -07:00
Timothy Carambat	04a0fc4ec9	Remove unused deps (#1938 ) * Remove unused deps * improve dependency	2024-07-25 10:21:03 -07:00
Timothy Carambat	42235fcd8a	GitLab Hosted and Local Connector (#1932 ) * Add support for GitLab repo collection as well as Github Repo collection * Refactor for repo collectors to be more compact --------- Co-authored-by: Emil Rofors <emirof@gmail.com>	2024-07-23 12:23:51 -07:00
Sean Hatfield	79656718b2	[FEAT] Create custom pdfloader (#1852 ) * implement custom PDFLoader to remove LC dep * remove unneeded comment * remove pdfjs as dep and fix page splitting using pdf-parse * linting + export rename for desktop compat --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-07-11 12:26:11 -07:00
Timothy Carambat	29c9eeaa5c	Add `winston` logging for production (#1811 )	2024-07-03 16:39:33 -07:00
Sean Hatfield	a87014822a	[REFACTOR] Improve asPDF collector processor with pdfjs (#1791 ) * WIP replace langchain pdfloader with pdfjs and add more context to each page * remove extras from pdfjs and just replace langchain library * remove unneeded dep * fix console log in docs --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-07-03 14:26:48 -07:00
Timothy Carambat	98cef508a6	Feature/devcontv2 (#1622 ) * Updated apt-packages source for devcontainer Switched the devcontainer's package source to a different repository to align with updated dependencies and package availability. The previous source from 'rocker-org' is replaced with 'devcontainers-contrib', which may offer more recent or relevant development tools. * Subject: Centralize prettier ignores and refine config Body: Centralized all prettier ignore rules by removing individual `.prettierignore` files in subprojects and updating the root `.prettierignore` to include previously ignored patterns, ensuring consistency across the workspace. Additionally, the prettier configuration was refined by making the file pattern for `.config.js` files consistent and adjusting quote styles for better readability. All lint scripts across the project were updated to respect the centralized ignore path, enhancing maintainability. The consolidation simplifies the process of managing ignore rules as the project scales, ensuring developers can focus on writing code without worrying about divergent formatting standards. These changes also align with introducing comprehensive linting across multiple environments to keep the codebase clean and consistent. This adjustment is a foundational step towards a more streamlined and unified code base, making it easier for new contributors to adhere to established coding standards and reducing the cognitive load associated with managing multiple configuration files across the project. * unset package json changes --------- Co-authored-by: Francisco Bischoff <franzbischoff@gmail.com> Co-authored-by: Francisco Bischoff <984592+franzbischoff@users.noreply.github.com>	2024-06-06 12:50:42 -07:00
Timothy Carambat	547d4859ef	Bump `openai` package to latest (#1234 ) * Bump `openai` package to latest Tested all except localai * bump LocalAI support with latest image * add deprecation notice * linting	2024-04-30 12:33:42 -07:00
Timothy Carambat	94017e2b51	bump langchain deps (#1231 ) * bump langchain deps * patch native and ollama providers remove deprecated deps --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-04-30 12:04:24 -07:00
Sean Hatfield	348b36bf85	[FEAT] Confluence data connector (#1181 ) * WIP Confluence data connector backend * confluence data connector complete * confluence citations * fix citation for confluence * Patch confulence integration * fix Citation Icon for confluence --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-04-25 17:53:38 -07:00
Timothy Carambat	1f8ab0d245	Remove YoutubeLoader dependency (#1050 ) * WIP data connector redesign * new UI for data connectors complete * remove old data connector page/cleanup imports * cleanup of UI and imports * Remove Youtube Transcript dep and move in-house * lang pref default to en --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-04-05 16:33:01 -07:00
Timothy Carambat	4fb4aa2041	Add epub support for parsing (#1017 )	2024-04-02 14:25:52 -07:00
Timothy Carambat	0ada882991	Support external transcription providers (#909 ) * Support External Transcription providers * patch files * update docs * fix return data	2024-03-14 15:43:26 -07:00
Timothy Carambat	0f31e43fd4	bump YT metadata lib for YT api fix rot (#888 )	2024-03-11 10:57:53 -07:00
Timothy Carambat	58971e8b30	Build & Publish AnythingLLM for ARM64 and x86 (#549 ) * Update build process to support multi-platform builds Bump @lancedb/vectordb to 0.1.19 for ARM&AMD compatibility Patch puppeteer on ARM builds because of broken chromium resolves #539 resolves #548 --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-01-08 16:15:01 -08:00
Timothy Carambat	ecf4295537	Add ability to grab youtube transcripts via doc processor (#470 ) * Add ability to grab youtube transcripts via doc processor * dynamic imports swap out Github for Youtube in placeholder text	2023-12-18 17:17:26 -08:00
Timothy Carambat	452582489e	GitHub loader extension + extension support v1 (#469 ) * feat: implement github repo loading fix: purge of folders fix: rendering of sub-files * noshow delete on custom-documents * Add API key support because of rate limits * WIP for frontend of data connectors * wip * Add frontend form for GitHub repo data connector * remove console.logs block custom-documents from being deleted * remove _meta unused arg * Add support for ignore pathing in request Ignore path input via tagging * Update hint	2023-12-18 15:48:02 -08:00
Timothy Carambat	61db981017	feat: Embed on-instance Whisper model for audio/mp4 transcribing (#449 ) * feat: Embed on-instance Whisper model for audio/mp4 transcribing resolves #329 * additional logging * add placeholder for tmp folder in collector storage Add cleanup of hotdir and tmp on collector boot to prevent hanging files split loading of model and file conversion into concurrency * update README * update model size * update supported filetypes	2023-12-15 11:20:13 -08:00
Timothy Carambat	719521c307	Document Processor v2 (#442 ) * wip: init refactor of document processor to JS * add NodeJs PDF support * wip: partity with python processor feat: add pptx support * fix: forgot files * Remove python scripts totally * wip:update docker to boot new collector * add package.json support * update dockerfile for new build * update gitignore and linting * add more protections on file lookup * update package.json * test build * update docker commands to use cap-add=SYS_ADMIN so web scraper can run update all scripts to reflect this remove docker build for branch	2023-12-14 15:14:56 -08:00

19 Commits