Chris Daniel
8a4dd2bdf5
[FEAT] add support for TSX files to be parsed as text ( #1597 )
...
add support for TSX files to be parsed as text
2024-06-03 17:01:41 +08:00
Sean Hatfield
9a38b32c74
[FEAT] Add support for R files to be parsed as text ( #1577 )
...
add support for R files to be parsed as text
2024-05-31 13:52:00 +08:00
Sean Hatfield
4324a8bb4f
[FEAT] Github repo loader bug fix ( #1558 )
...
* fix project names with special characters for github repo data connector
* linting
2024-05-29 17:01:29 +08:00
Timothy Carambat
a89812703b
repatch path normalization ( #1516 )
2024-05-23 12:52:04 -07:00
timothycarambat
05488c81e0
undo path norm whitespace fix
2024-05-23 12:04:00 -07:00
timothycarambat
e208074ef4
patch path normalization
2024-05-22 11:50:01 -05:00
Timothy Carambat
1a5aacb001
Support multi-model whispers ( #1444 )
2024-05-17 21:31:29 -07:00
Timothy Carambat
7e0b638a2c
Patch confluence URL patterns( #1426 )
...
* patch confluence patterns
---------
Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2024-05-16 14:15:59 -07:00
timothycarambat
87b41a60e9
refactor spaceKey url pattern for custom domains
2024-05-16 11:01:34 -07:00
Predrag Stojadinović
cf969adf37
1362 custom display confluence url ( #1423 )
...
* chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones
* chore: formatting as per yarn lint
* chore: adding /display/ url matching to confluence data connector
2024-05-16 10:46:18 -07:00
timothycarambat
b5ac944475
patch: bulk-scraper, update when folder is made and path creation params
2024-05-14 12:57:23 -07:00
Sean Hatfield
612a7e1662
[FEAT] Website depth scraping data connector ( #1191 )
...
* WIP website depth scraping, (sort of works)
* website depth data connector stable + add maxLinks option
* linting + loading small ui tweak
* refactor website depth data connector for stability, speed, & readability
* patch: remove console log
Guard clause on URL validitiy check
reasonable overrides
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2024-05-14 12:49:14 -07:00
jazelly
d71db22799
fix: skip undefined confluence pageContent ( #1383 )
...
Refs: https://github.com/Mintplex-Labs/anything-llm/issues/1381
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2024-05-14 10:22:13 -07:00
Predrag Stojadinović
78e3e35d27
[FEAT] Confluence Data Connector handles custom Confluence urls ( #1362 )
...
* chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones
* chore: formatting as per yarn lint
2024-05-14 10:21:04 -07:00
timothycarambat
2d215acb75
patch storage dirs for extensions
2024-05-02 14:03:10 -07:00
timothycarambat
1aa8e5766f
duplicate key (no impact)
2024-05-02 13:05:20 -07:00
Timothy Carambat
547d4859ef
Bump openai
package to latest ( #1234 )
...
* Bump `openai` package to latest
Tested all except localai
* bump LocalAI support with latest image
* add deprecation notice
* linting
2024-04-30 12:33:42 -07:00
Timothy Carambat
94017e2b51
bump langchain deps ( #1231 )
...
* bump langchain deps
* patch native and ollama providers remove deprecated deps
---------
Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2024-04-30 12:04:24 -07:00
Sean Hatfield
348b36bf85
[FEAT] Confluence data connector ( #1181 )
...
* WIP Confluence data connector backend
* confluence data connector complete
* confluence citations
* fix citation for confluence
* Patch confulence integration
* fix Citation Icon for confluence
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-04-25 17:53:38 -07:00
Ken Kuang
a3b7239d05
Fix Cannot read properties of undefined (reading 'length') ( #1145 )
...
Fix upload failed
2024-04-20 12:28:19 -07:00
Timothy Carambat
a5bb77f97a
Agent support for @agent
default agent inside workspace chat ( #1093 )
...
V1 of agent support via built-in `@agent` that can be invoked alongside normal workspace RAG chat.
2024-04-16 10:50:10 -07:00
Sean Hatfield
af84b01482
[FIX] GitHub repo with periods in link fix ( #1084 )
...
fix periods in github repo links bug
2024-04-12 14:56:59 -07:00
Timothy Carambat
2c6135aa54
patch file types as plaintext ( #1095 )
...
resolves #1089
2024-04-12 14:54:33 -07:00
Timothy Carambat
1f8ab0d245
Remove YoutubeLoader dependency ( #1050 )
...
* WIP data connector redesign
* new UI for data connectors complete
* remove old data connector page/cleanup imports
* cleanup of UI and imports
* Remove Youtube Transcript dep and move in-house
* lang pref default to en
---------
Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2024-04-05 16:33:01 -07:00
timothycarambat
0b454016cf
patch comkey path to fallback
2024-04-04 10:47:26 -07:00
timothycarambat
e524afae9e
Merge branch 'master' of github.com:Mintplex-Labs/anything-llm
2024-04-02 14:30:27 -07:00
timothycarambat
117c3b2bfb
forgot epub file!
2024-04-02 14:30:20 -07:00
Timothy Carambat
4fb4aa2041
Add epub support for parsing ( #1017 )
2024-04-02 14:25:52 -07:00
Timothy Carambat
752e3e22ed
Add more text file forced extensions ( #1016 )
2024-04-02 14:13:11 -07:00
Timothy Carambat
f4088d9348
RSA-Signing on server<->collector communication via API ( #1005 )
...
* WIP integrity check between processes
* Implement integrity checking on document processor payloads
2024-04-01 13:56:35 -07:00
Sean Hatfield
45f50ce13c
[FIX] Update metadata tags in PDF collector script ( #925 )
...
update title in pdf collector script to be the filename instead of metadata title
2024-03-19 18:14:34 -07:00
Timothy Carambat
0ada882991
Support external transcription providers ( #909 )
...
* Support External Transcription providers
* patch files
* update docs
* fix return data
2024-03-14 15:43:26 -07:00
Timothy Carambat
0f31e43fd4
bump YT metadata lib for YT api fix rot ( #888 )
2024-03-11 10:57:53 -07:00
Timothy Carambat
ec90060d36
Re-map some file mimes to support text ( #842 )
...
re-map some file mimes to support text
2024-02-29 10:05:03 -08:00
Timothy Carambat
6d18d79bb7
Generic upload fallback as text file. ( #808 )
...
* Do not block any file upload
fallback unknown/unsupported types to text if possible
* reduce call for frontend
* patch
2024-02-26 13:43:54 -08:00
Timothy Carambat
d89610586a
improve error messages from YT scraping ( #768 )
...
parse & enforce URL to allow multiple URL schemas
2024-02-21 10:47:10 -08:00
Timothy Carambat
49fbd09af4
Support more plaintext filetypes ( #757 )
...
* Add more plaintext document types
org-mode, asciidoc, and reStructuredText are all text formats
Signed-off-by: Christian Romney <christian.a.romney@gmail.com>
* lint
---------
Signed-off-by: Christian Romney <christian.a.romney@gmail.com>
Co-authored-by: Christian Romney <christian.a.romney@gmail.com>
2024-02-19 10:44:01 -08:00
Timothy Carambat
d52f8aafd4
689 links in citation ( #715 )
...
* Include links in citations
force ChunkSource key to retain this information
old links will be unsupported
* show special icons depending on source
* remove console log
* reset server documents writeTo
2024-02-13 14:11:57 -08:00
Timothy Carambat
48cb8f2897
Add support to upload rawText document via api ( #692 )
...
* Add support to upload rawText document via api
* update API doc endpoint with correct textContent key
* update response swagger doc
2024-02-07 15:17:32 -08:00
Sean Hatfield
288ff0d18c
fix vector cache not deleting cache after unembedding items with folders ( #630 )
2024-01-22 13:03:05 -08:00
Timothy Carambat
0db6c3b2aa
Prevent private octets from link collection for self-hosted ( #626 )
2024-01-19 10:49:40 -08:00
Timothy Carambat
b35feede87
570 document api return object ( #608 )
...
* Add support for fetching single document in documents folder
* Add document object to upload + support link scraping via API
* hotfixes for documentation
* update api docs
2024-01-16 16:04:22 -08:00
Timothy Carambat
1563a1b20f
Strict link protocol validation ( #577 )
2024-01-11 12:29:00 -08:00
Timothy Carambat
58971e8b30
Build & Publish AnythingLLM for ARM64 and x86 ( #549 )
...
* Update build process to support multi-platform builds
Bump @lancedb/vectordb to 0.1.19 for ARM&AMD compatibility
Patch puppeteer on ARM builds because of broken chromium
resolves #539
resolves #548
---------
Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2024-01-08 16:15:01 -08:00
Francisco Bischoff
990a2e85bf
devcontainer v1 ( #297 )
...
Implement support for GitHub codespaces and VSCode devcontainers
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>
2024-01-08 15:31:06 -08:00
timothycarambat
26549df6a9
touchup linting
2023-12-27 13:28:37 -08:00
timothycarambat
daadad3859
hoist var in extensions
2023-12-20 19:41:16 -08:00
Timothy Carambat
f2fadd6d2e
Add placeholder collector ENV file ( #476 )
...
resolves #474
2023-12-19 13:27:09 -08:00
Timothy Carambat
ecf4295537
Add ability to grab youtube transcripts via doc processor ( #470 )
...
* Add ability to grab youtube transcripts via doc processor
* dynamic imports
swap out Github for Youtube in placeholder text
2023-12-18 17:17:26 -08:00
Timothy Carambat
452582489e
GitHub loader extension + extension support v1 ( #469 )
...
* feat: implement github repo loading
fix: purge of folders
fix: rendering of sub-files
* noshow delete on custom-documents
* Add API key support because of rate limits
* WIP for frontend of data connectors
* wip
* Add frontend form for GitHub repo data connector
* remove console.logs
block custom-documents from being deleted
* remove _meta unused arg
* Add support for ignore pathing in request
Ignore path input via tagging
* Update hint
2023-12-18 15:48:02 -08:00