anything-llm

mirror of https://github.com/Mintplex-Labs/anything-llm.git synced 2024-11-05 06:20:10 +01:00

Author	SHA1	Message	Date
Timothy Carambat	1563a1b20f	Strict link protocol validation (#577 )	2024-01-11 12:29:00 -08:00
Timothy Carambat	58971e8b30	Build & Publish AnythingLLM for ARM64 and x86 (#549 ) * Update build process to support multi-platform builds Bump @lancedb/vectordb to 0.1.19 for ARM&AMD compatibility Patch puppeteer on ARM builds because of broken chromium resolves #539 resolves #548 --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-01-08 16:15:01 -08:00
Francisco Bischoff	990a2e85bf	devcontainer v1 (#297 ) Implement support for GitHub codespaces and VSCode devcontainers --------- Co-authored-by: timothycarambat <rambat1010@gmail.com> Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>	2024-01-08 15:31:06 -08:00
timothycarambat	26549df6a9	touchup linting	2023-12-27 13:28:37 -08:00
timothycarambat	daadad3859	hoist var in extensions	2023-12-20 19:41:16 -08:00
Timothy Carambat	f2fadd6d2e	Add placeholder collector ENV file (#476 ) resolves #474	2023-12-19 13:27:09 -08:00
Timothy Carambat	ecf4295537	Add ability to grab youtube transcripts via doc processor (#470 ) * Add ability to grab youtube transcripts via doc processor * dynamic imports swap out Github for Youtube in placeholder text	2023-12-18 17:17:26 -08:00
Timothy Carambat	452582489e	GitHub loader extension + extension support v1 (#469 ) * feat: implement github repo loading fix: purge of folders fix: rendering of sub-files * noshow delete on custom-documents * Add API key support because of rate limits * WIP for frontend of data connectors * wip * Add frontend form for GitHub repo data connector * remove console.logs block custom-documents from being deleted * remove _meta unused arg * Add support for ignore pathing in request Ignore path input via tagging * Update hint	2023-12-18 15:48:02 -08:00
timothycarambat	d2e3506bb9	fix: transition on LLM and embedding screen linting	2023-12-15 12:40:11 -08:00
Timothy Carambat	61db981017	feat: Embed on-instance Whisper model for audio/mp4 transcribing (#449 ) * feat: Embed on-instance Whisper model for audio/mp4 transcribing resolves #329 * additional logging * add placeholder for tmp folder in collector storage Add cleanup of hotdir and tmp on collector boot to prevent hanging files split loading of model and file conversion into concurrency * update README * update model size * update supported filetypes	2023-12-15 11:20:13 -08:00
Timothy Carambat	719521c307	Document Processor v2 (#442 ) * wip: init refactor of document processor to JS * add NodeJs PDF support * wip: partity with python processor feat: add pptx support * fix: forgot files * Remove python scripts totally * wip:update docker to boot new collector * add package.json support * update dockerfile for new build * update gitignore and linting * add more protections on file lookup * update package.json * test build * update docker commands to use cap-add=SYS_ADMIN so web scraper can run update all scripts to reflect this remove docker build for branch	2023-12-14 15:14:56 -08:00
Timothy Carambat	da0cec7aa2	patch: remove unidecode as it was transliterating non-latin chars (#434 ) resolves #298	2023-12-13 11:54:55 -08:00
Timothy Carambat	ce9233c258	feat: enable HTML uploads from UI (#422 ) resolves #418	2023-12-11 14:40:33 -08:00
timothycarambat	b583aa74fd	remove prints	2023-11-16 17:17:52 -08:00
Sean Hatfield	7edfccaf9a	Adding url uploads to document picker (#375 ) * WIP adding url uploads to document picker * fix manual script for uploading url to custom-documents * fix metadata for url scraping * wip url parsing * update how async link scraping works * docker-compose defaults added no autocomplete on URLs --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2023-11-16 17:15:01 -08:00
Sean Hatfield	f40309cfdb	Add id to all metadata to prevent errors in frontend document picker (#378 ) add id to all metadata to prevent errors in frontend docuemnt picker Co-authored-by: timothycarambat <rambat1010@gmail.com>	2023-11-16 14:36:26 -08:00
timothycarambat	1e3d82e184	patch collector script	2023-11-16 10:25:23 -08:00
timothycarambat	c5dc68633b	patch link scrape tool schema	2023-11-14 16:41:39 -08:00
Timothy Carambat	5441717294	normalize parser struct for all file types (#321 )	2023-11-01 16:44:02 -07:00
Francisco Bischoff	26dba59249	mbox parsing improvements v1 (#308 ) * mbox parsing improvements v1 * autobots roll out!	2023-10-30 11:57:33 -07:00
Timothy Carambat	18798c5b64	prevent deletion of documents not in hotdir via director traversal (#258 ) resolves #257	2023-09-29 11:04:47 -07:00
Timothy Carambat	a505928934	Display better error messages from document processor (#243 ) pass messages to frontend on success/failure resolves #242	2023-09-18 16:50:20 -07:00
Timothy Carambat	3e78476739	Franzbischoff document improvements (#241 ) * cosmetic changes to be compatible to hadolint * common configuration for most editors until better plugins comes up * Changes on PDF metadata, using PyMuPDF (faster and more compatible) * small changes on other file ingestions in order to try to keep the fields equal * Lint, review, and review * fixed unknown chars * Use PyMuPDF for pdf loading for 200% speed increase linting --------- Co-authored-by: Francisco Bischoff <franzbischoff@gmail.com> Co-authored-by: Francisco Bischoff <984592+franzbischoff@users.noreply.github.com>	2023-09-18 16:21:37 -07:00
Melroy van den Berg	16b8330fbf	Update requirements.txt (#185 ) Upgrade fake-useragent to latest version (v1.2.1). Disclaimer: I'm the package maintainer.	2023-08-14 14:38:14 -07:00
Timothy Carambat	b42493c6de	Split large PDFS into subfolder in documents (#176 ) append time value to folder name to prevent duplicate uploads	2023-08-03 18:57:50 -07:00
AntonioCiolino	31e5db7490	Twitter Feature (#134 ) * . * twitter feature update * Key validation and operation	2023-07-06 14:05:50 -07:00
Timothy Carambat	d7315b0e53	be able to parse relative and FQDN links from root reliabily (#138 )	2023-07-05 14:40:54 -07:00
mplawner	3efe55a720	Added mbox support (#106 ) * Update filetypes.py Added mbox format * Created new file Added support for mbox files as used by many email services, including Google Takeout's Gmail archive. * Update filetypes.py * Update as_mbox.py	2023-06-25 18:11:05 -07:00
AntonioCiolino	a52b0ae655	Updated Link scraper to avoid NoneType error. (#90 ) * Enable web scraping based on a urtl and a simple filter. * ignore yarn * Updated Link scraper to avoid NoneType error.	2023-06-19 12:07:26 -07:00
frasergr	4079020de0	dockerfile cleanup; enforce text LF line endings (#81 )	2023-06-17 20:18:01 -07:00
AntonioCiolino	e7ba028497	Enable web scraping based on a urtl and a simple filter. (#73 )	2023-06-16 17:29:11 -07:00
timothycarambat	81b2159329	reorder docs	2023-06-16 17:26:42 -07:00
Timothy Carambat	c4eb46ca19	Upload and process documents via UI + document processor in docker image (#65 ) * implement dnd uploader show file upload progress write files to hotdirector build simple flaskAPI to process files one off * move document processor calls to util build out dockerfile to run both procs at the same time update UI to check for document processor before upload * disable pragma update on boot * dockerfile changes * add filetype restrictions based on python app support response and show rejected files in the UI * cleanup * stub migrations on boot to prevent exit condition * update CF template for AWS deploy	2023-06-16 16:01:27 -07:00
AntonioCiolino	537a6a91d2	Update __HOTDIR__.md (#70 ) fixed typo for text.	2023-06-16 11:17:18 -07:00
Skid Vis	4118c9dcf3	Blocks images in sitemaps from being parsed. (#56 ) * Adds ability to import sitemaps to include a website * adds example sitemap url * adds filter to bypass common image formats * moves filetype ignoring to sitemap script	2023-06-14 23:00:03 -07:00
Skid Vis	bd32f97a21	Adds ability to import sitemaps to include a website (#51 ) * Adds ability to import sitemaps to include a website * adds example sitemap url	2023-06-14 11:04:17 -07:00
frasergr	9f33b3dfcb	Docker support (#34 ) * Updates for Linux for frontend/server * frontend/server docker * updated Dockerfile for deps related to node vectordb * updates for collector in docker * docker deps for ODT processing * ignore another collector dir * storage mount improvements; run as UID * fix pypandoc version typo * permissions fixes	2023-06-13 11:26:11 -07:00
Fabio	d954d7a3d5	Fix pypandoc issue in requirements.txt (#23 ) Co-authored-by: Carvalho, Fabio <Fabio_Carvalho@comcast.com>	2023-06-12 11:21:11 -07:00
timothycarambat	728eaff773	fix typo	2023-06-09 11:23:53 -07:00
timothycarambat	27c58541bd	inital commit ⚡	2023-06-03 19:28:07 -07:00

40 Commits