Commit Graph

84 Commits

Author SHA1 Message Date
Timothy Carambat
a598c8e04c
1347 human readable confluence url (#1706)
* chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones

* chore: formatting as per yarn lint

* chore: fixing the human readable confluence url fetch baseUrl

* chore: fixing the human readable confluence url fetch baseUrl

* chore: fixing the human readable confluence url fetch baseUrl

* chore: fixing the human readable confluence url fetch baseUrl

* chore: fixing the human readable confluence url fetch baseUrl

* refactor implementation of various types of Confluence URL patterns

---------

Co-authored-by: Predrag Stojadinovic <predrag@stojadinovic.net>
Co-authored-by: Predrag Stojadinović <cope@users.noreply.github.com>
Co-authored-by: Predrag Stojadinovic <predrags@nvidia.com>
2024-06-17 16:04:20 -07:00
Timothy Carambat
98cef508a6
Feature/devcontv2 (#1622)
* Updated apt-packages source for devcontainer

Switched the devcontainer's package source to a different repository to
align with updated dependencies and package availability. The previous
source from 'rocker-org' is replaced with 'devcontainers-contrib', which
may offer more recent or relevant development tools.

* Subject: Centralize prettier ignores and refine
config

Body:
Centralized all prettier ignore rules by removing individual
`.prettierignore` files in subprojects and updating the root
`.prettierignore` to include previously ignored patterns, ensuring
consistency across the workspace. Additionally, the prettier
configuration was refined by making the file pattern for `.config.js`
files consistent and adjusting quote styles for better readability. All
lint scripts across the project were updated to respect the centralized
ignore path, enhancing maintainability.

The consolidation simplifies the process of managing ignore rules as the
project scales, ensuring developers can focus on writing code without
worrying about divergent formatting standards. These changes also align
with introducing comprehensive linting across multiple environments to
keep the codebase clean and consistent.

This adjustment is a foundational step towards a more streamlined and
unified code base, making it easier for new contributors to adhere to
established coding standards and reducing the cognitive load associated
with managing multiple configuration files across the project.

* unset package json changes

---------

Co-authored-by: Francisco Bischoff <franzbischoff@gmail.com>
Co-authored-by: Francisco Bischoff <984592+franzbischoff@users.noreply.github.com>
2024-06-06 12:50:42 -07:00
Chris Daniel
8a4dd2bdf5
[FEAT] add support for TSX files to be parsed as text (#1597)
add support for TSX files to be parsed as text
2024-06-03 17:01:41 +08:00
Sean Hatfield
9a38b32c74
[FEAT] Add support for R files to be parsed as text (#1577)
add support for R files to be parsed as text
2024-05-31 13:52:00 +08:00
Sean Hatfield
4324a8bb4f
[FEAT] Github repo loader bug fix (#1558)
* fix project names with special characters for github repo data connector

* linting
2024-05-29 17:01:29 +08:00
Timothy Carambat
a89812703b
repatch path normalization (#1516) 2024-05-23 12:52:04 -07:00
timothycarambat
05488c81e0 undo path norm whitespace fix 2024-05-23 12:04:00 -07:00
timothycarambat
e208074ef4 patch path normalization 2024-05-22 11:50:01 -05:00
Timothy Carambat
1a5aacb001
Support multi-model whispers (#1444) 2024-05-17 21:31:29 -07:00
Timothy Carambat
7e0b638a2c
Patch confluence URL patterns(#1426)
* patch confluence patterns

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2024-05-16 14:15:59 -07:00
timothycarambat
87b41a60e9 refactor spaceKey url pattern for custom domains 2024-05-16 11:01:34 -07:00
Predrag Stojadinović
cf969adf37
1362 custom display confluence url (#1423)
* chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones

* chore: formatting as per yarn lint

* chore: adding /display/ url matching to confluence data connector
2024-05-16 10:46:18 -07:00
timothycarambat
b5ac944475 patch: bulk-scraper, update when folder is made and path creation params 2024-05-14 12:57:23 -07:00
Sean Hatfield
612a7e1662
[FEAT] Website depth scraping data connector (#1191)
* WIP website depth scraping, (sort of works)

* website depth data connector stable + add maxLinks option

* linting + loading small ui tweak

* refactor website depth data connector for stability, speed, & readability

* patch: remove console log
Guard clause on URL validitiy check
reasonable overrides

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2024-05-14 12:49:14 -07:00
jazelly
d71db22799
fix: skip undefined confluence pageContent (#1383)
Refs: https://github.com/Mintplex-Labs/anything-llm/issues/1381

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2024-05-14 10:22:13 -07:00
Predrag Stojadinović
78e3e35d27
[FEAT] Confluence Data Connector handles custom Confluence urls (#1362)
* chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones

* chore: formatting as per yarn lint
2024-05-14 10:21:04 -07:00
timothycarambat
2d215acb75 patch storage dirs for extensions 2024-05-02 14:03:10 -07:00
timothycarambat
1aa8e5766f duplicate key (no impact) 2024-05-02 13:05:20 -07:00
Timothy Carambat
547d4859ef
Bump openai package to latest (#1234)
* Bump `openai` package to latest
Tested all except localai

* bump LocalAI support with latest image

* add deprecation notice

* linting
2024-04-30 12:33:42 -07:00
Timothy Carambat
94017e2b51
bump langchain deps (#1231)
* bump langchain deps

* patch native and ollama providers remove deprecated deps

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2024-04-30 12:04:24 -07:00
Sean Hatfield
348b36bf85
[FEAT] Confluence data connector (#1181)
* WIP Confluence data connector backend

* confluence data connector complete

* confluence citations

* fix citation for confluence

* Patch confulence integration

* fix Citation Icon for confluence

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-04-25 17:53:38 -07:00
Ken Kuang
a3b7239d05
Fix Cannot read properties of undefined (reading 'length') (#1145)
Fix upload failed
2024-04-20 12:28:19 -07:00
Timothy Carambat
a5bb77f97a
Agent support for @agent default agent inside workspace chat (#1093)
V1 of agent support via built-in `@agent` that can be invoked alongside normal workspace RAG chat.
2024-04-16 10:50:10 -07:00
Sean Hatfield
af84b01482
[FIX] GitHub repo with periods in link fix (#1084)
fix periods in github repo links bug
2024-04-12 14:56:59 -07:00
Timothy Carambat
2c6135aa54
patch file types as plaintext (#1095)
resolves #1089
2024-04-12 14:54:33 -07:00
Timothy Carambat
1f8ab0d245
Remove YoutubeLoader dependency (#1050)
* WIP data connector redesign

* new UI for data connectors complete

* remove old data connector page/cleanup imports

* cleanup of UI and imports

* Remove Youtube Transcript dep and move in-house

* lang pref default to en

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2024-04-05 16:33:01 -07:00
timothycarambat
0b454016cf patch comkey path to fallback 2024-04-04 10:47:26 -07:00
timothycarambat
e524afae9e Merge branch 'master' of github.com:Mintplex-Labs/anything-llm 2024-04-02 14:30:27 -07:00
timothycarambat
117c3b2bfb forgot epub file! 2024-04-02 14:30:20 -07:00
Timothy Carambat
4fb4aa2041
Add epub support for parsing (#1017) 2024-04-02 14:25:52 -07:00
Timothy Carambat
752e3e22ed
Add more text file forced extensions (#1016) 2024-04-02 14:13:11 -07:00
Timothy Carambat
f4088d9348
RSA-Signing on server<->collector communication via API (#1005)
* WIP integrity check between processes

* Implement integrity checking on document processor payloads
2024-04-01 13:56:35 -07:00
Sean Hatfield
45f50ce13c
[FIX] Update metadata tags in PDF collector script (#925)
update title in pdf collector script to be the filename instead of metadata title
2024-03-19 18:14:34 -07:00
Timothy Carambat
0ada882991
Support external transcription providers (#909)
* Support External Transcription providers

* patch files

* update docs

* fix return data
2024-03-14 15:43:26 -07:00
Timothy Carambat
0f31e43fd4
bump YT metadata lib for YT api fix rot (#888) 2024-03-11 10:57:53 -07:00
Timothy Carambat
ec90060d36
Re-map some file mimes to support text (#842)
re-map some file mimes to support text
2024-02-29 10:05:03 -08:00
Timothy Carambat
6d18d79bb7
Generic upload fallback as text file. (#808)
* Do not block any file upload
fallback unknown/unsupported types to text if possible

* reduce call for frontend

* patch
2024-02-26 13:43:54 -08:00
Timothy Carambat
d89610586a
improve error messages from YT scraping (#768)
parse & enforce URL to allow multiple URL schemas
2024-02-21 10:47:10 -08:00
Timothy Carambat
49fbd09af4
Support more plaintext filetypes (#757)
* Add more plaintext document types

org-mode, asciidoc, and reStructuredText are all text formats

Signed-off-by: Christian Romney <christian.a.romney@gmail.com>

* lint

---------

Signed-off-by: Christian Romney <christian.a.romney@gmail.com>
Co-authored-by: Christian Romney <christian.a.romney@gmail.com>
2024-02-19 10:44:01 -08:00
Timothy Carambat
d52f8aafd4
689 links in citation (#715)
* Include links in citations
force ChunkSource key to retain this information
old links will be unsupported

* show special icons depending on source

* remove console log

* reset server documents writeTo
2024-02-13 14:11:57 -08:00
Timothy Carambat
48cb8f2897
Add support to upload rawText document via api (#692)
* Add support to upload rawText document via api

* update API doc endpoint with correct textContent key

* update response swagger doc
2024-02-07 15:17:32 -08:00
Sean Hatfield
288ff0d18c
fix vector cache not deleting cache after unembedding items with folders (#630) 2024-01-22 13:03:05 -08:00
Timothy Carambat
0db6c3b2aa
Prevent private octets from link collection for self-hosted (#626) 2024-01-19 10:49:40 -08:00
Timothy Carambat
b35feede87
570 document api return object (#608)
* Add support for fetching single document in documents folder

* Add document object to upload + support link scraping via API

* hotfixes for documentation

* update api docs
2024-01-16 16:04:22 -08:00
Timothy Carambat
1563a1b20f
Strict link protocol validation (#577) 2024-01-11 12:29:00 -08:00
Timothy Carambat
58971e8b30
Build & Publish AnythingLLM for ARM64 and x86 (#549)
* Update build process to support multi-platform builds
Bump @lancedb/vectordb to 0.1.19 for ARM&AMD compatibility
Patch puppeteer on ARM builds because of broken chromium
resolves #539
resolves #548

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2024-01-08 16:15:01 -08:00
Francisco Bischoff
990a2e85bf
devcontainer v1 (#297)
Implement support for GitHub codespaces and VSCode devcontainers
---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>
2024-01-08 15:31:06 -08:00
timothycarambat
26549df6a9 touchup linting 2023-12-27 13:28:37 -08:00
timothycarambat
daadad3859 hoist var in extensions 2023-12-20 19:41:16 -08:00
Timothy Carambat
f2fadd6d2e
Add placeholder collector ENV file (#476)
resolves #474
2023-12-19 13:27:09 -08:00