Commit Graph

107 Commits

Author SHA1 Message Date
timothycarambat
2cb5caf633 Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render 2024-11-18 10:32:15 -08:00
Sean Hatfield
cf3b085a3a
Handle OpenAI whisper transcription edge case (#2621)
remove openai whisper transcription provider response_format option
2024-11-11 17:32:03 -08:00
timothycarambat
2cdf7877f2 bump render to latest (Nov 6, 2024) 2024-11-06 11:09:24 -08:00
Sean Hatfield
0bb47619dc
Allow 127.0.0.1 as valid URL for scraping (#2560)
* allow 127.0.0.1 as valid url for scraping

* update comments and lint

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-10-31 09:57:28 -07:00
timothycarambat
c870e31aaa add ino filetype to text/plain support 2024-10-28 11:44:15 -07:00
timothycarambat
a6a5084565 merge master 2024-10-22 14:08:46 -07:00
Sean Hatfield
0074ededdd
Github data connector improvements (#2439)
* fix tree/blob github urls from branches not being loaded

* improve ux of github data connector

* lint

* patch Github URL parser to just validate with `URL` native parser

* uncheck LocalStorage of PAT for security reasons

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2024-10-21 15:25:35 -07:00
timothycarambat
ab6f03ce1c linting 2024-10-18 11:44:14 -07:00
Sean Hatfield
41522cdfb4
Handle non-ascii characters in single and bulk link scraper URLs (#2495)
handle non-ascii characters in urls
2024-10-17 17:04:00 -07:00
Sean Hatfield
b658f5012d
Support XLSX files (#2403)
* support xlsx files

* lint

* create seperate docs for each xlsx sheet

* lint

* use node-xlsx pkg for parsing xslx files

* lint

* update error handling

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-10-03 13:45:23 -07:00
Timothy Carambat
93d64642f3
Add exception handling for special case files like Dockerfile and Jenkinsfile (#2410) 2024-10-02 15:13:31 -07:00
timothycarambat
6d7f8b71cf Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render 2024-09-27 09:51:13 -07:00
Timothy Carambat
30645831a1
1959 filetype filters (#2378)
* Updated the `GitHubRepoLoader` class to use the new import syntax and adjust the `recursiveLoader` method accordingly.

* add @langchain/community to collector package.json

* fix: Improve handling of complex ignore patterns in GitLabRepoLoader

* refactor: use ignore package for simplified ignore logic

* run yarn lint

* add @langchain/community@^0.2.23

* remove unused dep
lint

---------

Co-authored-by: Emil Rofors (aider) <emirof@gmail.com>
2024-09-26 12:50:35 -07:00
Blazej Owczarczyk
b2123b13b0
Added an option to fetch issues from gitlab. Made the file fetching a… (#2335)
* Added an option to fetch issues from gitlab. Made the file fetching asynchornous to improve performance. #2334

* Fixed a typo in loadGitlabRepo.

* Convert issues to markdown.

* Fixed an issue with time estimate field names in issueToMarkdown.

* handle rate limits more gracefully + update checkbox to toggle switch

* lint

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2024-09-26 11:45:18 -07:00
Timothy Carambat
961b567541
Add dropdown for confluence connector deployment (#2376) 2024-09-26 08:49:05 -07:00
timothycarambat
822d270aa7 Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render 2024-09-25 16:12:33 -07:00
Sean Hatfield
4488744850
Support more Confluence URL formats (#2118)
* support more confluence url formats

* use pattern matching for confluence urls and manual splitting as fallback

* rework entire Confluence flow to prevent issues with custom, local, and cloud spaces

* remove dep

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2024-09-25 16:12:17 -07:00
timothycarambat
cdfc83e38d merge with master 2024-09-25 14:27:30 -07:00
Sean Hatfield
5a3d55db67
Fix custom domain in confluence (#2328)
confluence custom domain fix
2024-09-19 15:36:07 -05:00
Timothy Carambat
4fa3d6d333
Load all branches in gitlab data connector (#2325)
* Fix gitlab data connector for self-hosted instances (#2315)

* Linting fix.

* Load all branches in the GitLab data connector #2319

* #2319 lint fixes.

* update fetch on fail

---------

Co-authored-by: Błażej Owczarczyk <blazeyy@gmail.com>
2024-09-19 13:34:38 -05:00
Blazej Owczarczyk
b25298c04a
Fix gitlab data connector for self-hosted instances (#2315) (#2316)
* Fix gitlab data connector for self-hosted instances (#2315)

* Linting fix.
2024-09-18 16:12:15 -05:00
timothycarambat
f8a40faeaf Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render 2024-09-12 16:09:16 -07:00
timothycarambat
9aa77dfb8d Add verbose logging to GH loader
connect #2243
2024-09-09 14:36:37 -07:00
timothycarambat
0c86a26601 merge with master 2024-08-15 15:30:57 -07:00
timothycarambat
619f6b3884 Ignore SSL errors for web scraper
resolves #2114
2024-08-14 09:11:22 -07:00
timothycarambat
b541623c9e add SSRF notice 2024-08-13 17:46:07 -07:00
timothycarambat
8fc547e78a Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render 2024-08-12 11:59:46 -07:00
Sean Hatfield
2797298507
Fix depth handling in bulk link scraper (#2096)
fix depth handling in bulk link scraper
2024-08-12 11:44:35 -07:00
Lea Anthony
3b6a2fd2fa
#2084 Support Go filetype (#2085)
Support Go filetype
2024-08-09 19:29:29 -07:00
Mehmet Ünlü
0d4560b9e4
2049 remove break that prevents fetching files from gitlab repo (#2050)
fix: remove unnecessary break

Remove unnecessary break that prevents checking next pages for blob objects.
2024-08-06 10:17:55 -07:00
Sean Hatfield
be3b0b4916
Youtube loader whitespace fix (#2051)
youtube loader whitespace fix
2024-08-06 10:16:17 -07:00
timothycarambat
714f88891d Merge conflicts 2024-07-23 12:48:25 -07:00
Timothy Carambat
42235fcd8a
GitLab Hosted and Local Connector (#1932)
* Add support for GitLab repo collection as well as Github Repo collection
* Refactor for repo collectors to be more compact

---------

Co-authored-by: Emil Rofors <emirof@gmail.com>
2024-07-23 12:23:51 -07:00
timothycarambat
f2ebca8f84 Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render 2024-07-19 18:36:48 -07:00
timothycarambat
f15529653f patch logger for full logs 2024-07-19 18:35:41 -07:00
timothycarambat
cec1a3d585 append stacktraces to winston 2024-07-19 18:13:54 -07:00
timothycarambat
766537180a linting 2024-07-19 15:25:09 -07:00
timothycarambat
a56c124543 Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render 2024-07-11 15:58:21 -07:00
Sean Hatfield
79656718b2
[FEAT] Create custom pdfloader (#1852)
* implement custom PDFLoader to remove LC dep

* remove unneeded comment

* remove pdfjs as dep and fix page splitting using pdf-parse

* linting + export rename for desktop compat

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-07-11 12:26:11 -07:00
timothycarambat
e6ee872136 Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render 2024-07-10 15:51:27 -07:00
timothycarambat
8658b1e7c7 linting 2024-07-03 18:25:44 -07:00
Timothy Carambat
29c9eeaa5c
Add winston logging for production (#1811) 2024-07-03 16:39:33 -07:00
Sean Hatfield
f205d51fe9
[FIX] Confluence code snippet blocks not being extracted (#1804)
implement custom confluence loader to extract code blocks properly from documents

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2024-07-03 14:00:44 -07:00
timothycarambat
86a31d7551 Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render 2024-07-01 17:08:59 -07:00
Sean Hatfield
fc375f4036
[FIX] Bulk link scraper bug fix (#1800)
patch website depth data connector to work for other links that are not root url
2024-07-01 16:59:28 -07:00
Jason Zhang
fa4ab0f65f
fix: sanitize filename before writing (#1743)
* fix: sanitize filename before writing

Fixes: https://github.com/Mintplex-Labs/anything-llm/issues/1737

* fixup

* fixup
2024-06-25 15:45:09 -07:00
Timothy Carambat
dc4ad6b5a9
[BETA] Live document sync (#1719)
* wip bg workers for live document sync

* Add ability to re-embed specific documents across many workspaces via background queue
bgworkser is gated behind expieremental system setting flag that needs to be explictly enabled
UI for watching/unwatching docments that are embedded.
TODO: UI to easily manage all bg tasks and see run results
TODO: UI to enable this feature and background endpoints to manage it

* create frontend views and paths
Move elements to correct experimental scope

* update migration to delete runs on removal of watched document

* Add watch support to YouTube transcripts (#1716)

* Add watch support to YouTube transcripts
refactor how sync is done for supported types

* Watch specific files in Confluence space (#1718)

Add failure-prune check for runs

* create tmp workflow modifications for beta image

* create tmp workflow modifications for beta image

* create tmp workflow modifications for beta image

* dual build
update copy of alert modals

* update job interval

* Add support for live-sync of Github files

* update copy for document sync feature

* hide Experimental features from UI

* update docs links

* [FEAT] Implement new settings menu for experimental features (#1735)

* implement new settings menu for experimental features

* remove unused context save bar

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>

* dont run job on boot

* unset workflow changes

* Add persistent encryption service
Relay key to collector so persistent encryption can be used
Encrypt any private data in chunkSources used for replay during resync jobs

* update jsDOC

* Linting and organization

* update modal copy for feature

---------

Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>
2024-06-21 13:38:50 -07:00
Timothy Carambat
a598c8e04c
1347 human readable confluence url (#1706)
* chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones

* chore: formatting as per yarn lint

* chore: fixing the human readable confluence url fetch baseUrl

* chore: fixing the human readable confluence url fetch baseUrl

* chore: fixing the human readable confluence url fetch baseUrl

* chore: fixing the human readable confluence url fetch baseUrl

* chore: fixing the human readable confluence url fetch baseUrl

* refactor implementation of various types of Confluence URL patterns

---------

Co-authored-by: Predrag Stojadinovic <predrag@stojadinovic.net>
Co-authored-by: Predrag Stojadinović <cope@users.noreply.github.com>
Co-authored-by: Predrag Stojadinovic <predrags@nvidia.com>
2024-06-17 16:04:20 -07:00
timothycarambat
393772c4a5 Merge branch 'master' of github.com:Mintplex-Labs/anything-llm into render 2024-06-12 09:05:57 -07:00
Chris Daniel
8a4dd2bdf5
[FEAT] add support for TSX files to be parsed as text (#1597)
add support for TSX files to be parsed as text
2024-06-03 17:01:41 +08:00