inital commit

This commit is contained in:
timothycarambat 2023-06-03 19:28:07 -07:00
commit 27c58541bd
100 changed files with 5394 additions and 0 deletions

10
.gitignore vendored Normal file
View File

@ -0,0 +1,10 @@
v-env
.env
!.env.example
node_modules
__pycache__
v-env
*.lock
.DS_Store

21
LICENSE Normal file
View File

@ -0,0 +1,21 @@
The MIT License
Copyright (c) Mintplex Labs Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

59
README.md Normal file
View File

@ -0,0 +1,59 @@
# 🤖 AnythingLLM: A full-stack personalized AI assistant
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/tim.svg?style=social&label=Follow%20%40Timothy%20Carambat)](https://twitter.com/tcarambat) [![](https://dcbadge.vercel.app/api/server/6UyHPeGZAC?compact=true&style=flat)](https://discord.gg/6UyHPeGZAC)
A full-stack application and tool suite that enables you to turn any document, resource, or piece of content into a piece of data that any LLM can use as reference during chatting. This application runs with very minimal overhead as by default the LLM and vectorDB are hosted remotely, but can be swapped for local instances. Currently this project supports Pinecone and OpenAI.
![Chatting](/images/screenshots/chat.png)
[view more screenshots](/images/screenshots/SCREENSHOTS.md)
### Watch the demo!
_tbd_
### Product Overview
AnythingLLM aims to be a full-stack application where you can use commercial off-the-shelf LLMs with Long-term-memory solutions or use popular open source LLM and vectorDB solutions.
Anything LLM is a full-stack product that you can run locally as well as host remotely and be able to chat intelligently with any documents you provide it.
AnythingLLM divides your documents into objects called `workspaces`. A Workspace functions a lot like a thread, but with the addition of containerization of your documents. Workspaces can share documents, but they do not talk to each other so you can keep your context for each workspace clean.
Some cool features of AnythingLLM
- Atomically manage documents to be used in long-term-memory from a simple UI
- Two chat modes `conversation` and `query`. Conversation retains previous questions and amendments. Query is simple QA against your documents
- Each chat response contains a citation that is linked to the original content
- Simple technology stack for fast iteration
- Fully capable of being hosted remotely
- "Bring your own LLM" model and vector solution. _still in progress_
- Extremely efficient cost-saving measures for managing very large documents. you'll never pay to embed a massive document or transcript more than once. 90% more cost effective than other LTM chatbots
### Technical Overview
This monorepo consists of three main sections:
- `collector`: Python tools that enable you to quickly convert online resources or local documents into LLM useable format.
- `frontend`: A viteJS + React frontend that you can run to easily create and manage all your content the LLM can use.
- `server`: A nodeJS + express server to handle all the interactions and do all the vectorDB management and LLM interactions.
### Requirements
- `yarn` and `node` on your machine
- `python` 3.8+ for running scripts in `collector/`.
- access to an LLM like `GPT-3.5`, `GPT-4`*.
- a [Pinecone.io](https://pinecone.io) free account*.
*you can use drop in replacements for these. This is just the easiest to get up and running fast.
### How to get started
- `yarn setup` from the project root directory.
This will fill in the required `.env` files you'll need in each of the application sections. Go fill those out before proceeding or else things won't work right.
Next, you will need some content to embed. This could be a Youtube Channel, Medium articles, local text files, word documents, and the list goes on. This is where you will use the `collector/` part of the repo.
[Go set up and run collector scripts](./collector/README.md)
[Learn about documents](./server/documents/DOCUMENTS.md)
[Learn about vector caching](./server/documents/VECTOR_CACHE.md)
### Contributing
- create issue
- create PR with branch name format of `<issue number>-<short name>`
- yee haw let's merge

2
clean.sh Normal file
View File

@ -0,0 +1,2 @@
# Easily kill process on port because sometimes nodemon fails to reboot
kill -9 $(lsof -t -i tcp:5000)

1
collector/.env.example Normal file
View File

@ -0,0 +1 @@
GOOGLE_APIS_KEY=

6
collector/.gitignore vendored Normal file
View File

@ -0,0 +1,6 @@
outputs/*/*.json
hotdir/*
hotdir/processed/*
!hotdir/__HOTDIR__.md
!hotdir/processed

45
collector/README.md Normal file
View File

@ -0,0 +1,45 @@
# How to collect data for vectorizing
This process should be run first. This will enable you to collect a ton of data across various sources. Currently the following services are supported:
- [x] YouTube Channels
- [x] Medium
- [x] Substack
- [x] Arbitrary Link
- [x] Gitbook
- [x] Local Files (.txt, .pdf, etc) [See full list](./hotdir/__HOTDIR__.md)
_these resources are under development or require PR_
- Twitter
![Choices](../images/choices.png)
### Requirements
- [ ] Python 3.8+
- [ ] Google Cloud Account (for YouTube channels)
- [ ] `brew install pandoc` [pandoc](https://pandoc.org/installing.html) (for .ODT document processing)
### Setup
This example will be using python3.9, but will work with 3.8+. Tested on MacOs. Untested on Windows
- install virtualenv for python3.8+ first before any other steps. `python3.9 -m pip install virutalenv`
- `cd collector` from root directory
- `python3.9 -m virtualenv v-env`
- `source v-env/bin/activate`
- `pip install -r requirements.txt`
- `cp .env.example .env`
- `python main.py` for interactive collection or `python watch.py` to process local documents.
- Select the option you want and follow follow the prompts - Done!
- run `deactivate` to get back to regular shell
### Outputs
All JSON file data is cached in the `output/` folder. This is to prevent redundant API calls to services which may have rate limits to quota caps. Clearing out the `output/` folder will execute the script as if there was no cache.
As files are processed you will see data being written to both the `collector/outputs` folder as well as the `server/documents` folder. Later in this process, once you boot up the server you will then bulk vectorize this content from a simple UI!
If collection fails at any point in the process it will pick up where it last bailed out so you are not reusing credits.
### How to get a Google Cloud API Key (YouTube data collection only)
**required to fetch YouTube transcripts and data**
- Have a google account
- [Visit the GCP Cloud Console](https://console.cloud.google.com/welcome)
- Click on dropdown in top right > Create new project. Name it whatever you like
- ![GCP Project Bar](../images/gcp-project-bar.png)
- [Enable YouTube Data APIV3](https://console.cloud.google.com/apis/library/youtube.googleapis.com)
- Once enabled generate a Credential key for this API
- Paste your key after `GOOGLE_APIS_KEY=` in your `collector/.env` file.

View File

@ -0,0 +1,17 @@
### What is the "Hot directory"
This is the location where you can dump all supported file types and have them automatically converted and prepared to be digested by the vectorizing service and selected from the AnythingLLM frontend.
Files dropped in here will only be processed when you are running `python watch.py` from the `collector` directory.
Once converted the original file will be moved to the `hotdir/processed` folder so that the original document is still able to be linked to when referenced when attached as a source document during chatting.
**Supported File types**
- `.md`
- `.text`
- `.pdf`
__requires more development__
- `.png .jpg etc`
- `.mp3`
- `.mp4`

81
collector/main.py Normal file
View File

@ -0,0 +1,81 @@
import os
from whaaaaat import prompt, Separator
from scripts.youtube import youtube
from scripts.link import link, links
from scripts.substack import substack
from scripts.medium import medium
from scripts.gitbook import gitbook
def main():
if os.name == 'nt':
methods = {
'1': 'YouTube Channel',
'2': 'Article or Blog Link',
'3': 'Substack',
'4': 'Medium',
'5': 'Gitbook'
}
print("There are options for data collection to make this easier for you.\nType the number of the method you wish to execute.")
print("1. YouTube Channel\n2. Article or Blog Link (Single)\n3. Substack\n4. Medium\n\n[In development]:\nTwitter\n\n")
selection = input("Your selection: ")
method = methods.get(str(selection))
else:
questions = [
{
"type": "list",
"name": "collector",
"message": "What kind of data would you like to add to convert into long-term memory?",
"choices": [
"YouTube Channel",
"Substack",
"Medium",
"Article or Blog Link(s)",
"Gitbook",
Separator(),
{"name": "Twitter", "disabled": "Needs PR"},
"Abort",
],
},
]
method = prompt(questions).get('collector')
if('Article or Blog Link' in method):
questions = [
{
"type": "list",
"name": "collector",
"message": "Do you want to scrape a single article/blog/url or many at once?",
"choices": [
'Single URL',
'Multiple URLs',
'Abort',
],
},
]
method = prompt(questions).get('collector')
if(method == 'Single URL'):
link()
exit(0)
if(method == 'Multiple URLs'):
links()
exit(0)
if(method == 'Abort'): exit(0)
if(method == 'YouTube Channel'):
youtube()
exit(0)
if(method == 'Substack'):
substack()
exit(0)
if(method == 'Medium'):
medium()
exit(0)
if(method == 'Gitbook'):
gitbook()
exit(0)
print("Selection was not valid.")
exit(1)
if __name__ == "__main__":
main()

221
collector/requirements.txt Normal file
View File

@ -0,0 +1,221 @@
about-time==4.2.1
aiohttp==3.8.4
aiosignal==1.3.1
alive-progress==3.1.2
anyio==3.7.0
appdirs==1.4.4
argilla==1.8.0
async-timeout==4.0.2
attrs==23.1.0
backoff==2.2.1
beautifulsoup4==4.12.2
bs4==0.0.1
certifi==2023.5.7
cffi==1.15.1
chardet==5.1.0
charset-normalizer==3.1.0
click==8.1.3
commonmark==0.9.1
cryptography==41.0.1
cssselect==1.2.0
dataclasses-json==0.5.7
Deprecated==1.2.14
et-xmlfile==1.1.0
exceptiongroup==1.1.1
fake-useragent==1.1.3
frozenlist==1.3.3
grapheme==0.6.0
greenlet==2.0.2
h11==0.14.0
httpcore==0.16.3
httpx==0.23.3
idna==3.4
importlib-metadata==6.6.0
importlib-resources==5.12.0
install==1.3.5
joblib==1.2.0
langchain==0.0.189
lxml==4.9.2
Markdown==3.4.3
marshmallow==3.19.0
marshmallow-enum==1.5.1
monotonic==1.6
msg-parser==1.2.0
multidict==6.0.4
mypy-extensions==1.0.0
nltk==3.8.1
numexpr==2.8.4
numpy==1.23.5
olefile==0.46
openapi-schema-pydantic==1.2.4
openpyxl==3.1.2
packaging==23.1
pandas==1.5.3
parse==1.19.0
pdfminer.six==20221105
Pillow==9.5.0
prompt-toolkit==1.0.14
pycparser==2.21
pydantic==1.10.8
pyee==8.2.2
Pygments==2.15.1
pyobjc==9.1.1
pyobjc-core==9.1.1
pyobjc-framework-Accounts==9.1.1
pyobjc-framework-AddressBook==9.1.1
pyobjc-framework-AdSupport==9.1.1
pyobjc-framework-AppleScriptKit==9.1.1
pyobjc-framework-AppleScriptObjC==9.1.1
pyobjc-framework-ApplicationServices==9.1.1
pyobjc-framework-AudioVideoBridging==9.1.1
pyobjc-framework-AuthenticationServices==9.1.1
pyobjc-framework-AutomaticAssessmentConfiguration==9.1.1
pyobjc-framework-Automator==9.1.1
pyobjc-framework-AVFoundation==9.1.1
pyobjc-framework-AVKit==9.1.1
pyobjc-framework-BusinessChat==9.1.1
pyobjc-framework-CalendarStore==9.1.1
pyobjc-framework-CFNetwork==9.1.1
pyobjc-framework-CloudKit==9.1.1
pyobjc-framework-Cocoa==9.1.1
pyobjc-framework-Collaboration==9.1.1
pyobjc-framework-ColorSync==9.1.1
pyobjc-framework-Contacts==9.1.1
pyobjc-framework-ContactsUI==9.1.1
pyobjc-framework-CoreAudio==9.1.1
pyobjc-framework-CoreAudioKit==9.1.1
pyobjc-framework-CoreBluetooth==9.1.1
pyobjc-framework-CoreData==9.1.1
pyobjc-framework-CoreHaptics==9.1.1
pyobjc-framework-CoreLocation==9.1.1
pyobjc-framework-CoreMedia==9.1.1
pyobjc-framework-CoreMediaIO==9.1.1
pyobjc-framework-CoreMIDI==9.1.1
pyobjc-framework-CoreML==9.1.1
pyobjc-framework-CoreMotion==9.1.1
pyobjc-framework-CoreServices==9.1.1
pyobjc-framework-CoreSpotlight==9.1.1
pyobjc-framework-CoreText==9.1.1
pyobjc-framework-CoreWLAN==9.1.1
pyobjc-framework-CryptoTokenKit==9.1.1
pyobjc-framework-DeviceCheck==9.1.1
pyobjc-framework-DictionaryServices==9.1.1
pyobjc-framework-DiscRecording==9.1.1
pyobjc-framework-DiscRecordingUI==9.1.1
pyobjc-framework-DiskArbitration==9.1.1
pyobjc-framework-DVDPlayback==9.1.1
pyobjc-framework-EventKit==9.1.1
pyobjc-framework-ExceptionHandling==9.1.1
pyobjc-framework-ExecutionPolicy==9.1.1
pyobjc-framework-ExternalAccessory==9.1.1
pyobjc-framework-FileProvider==9.1.1
pyobjc-framework-FileProviderUI==9.1.1
pyobjc-framework-FinderSync==9.1.1
pyobjc-framework-FSEvents==9.1.1
pyobjc-framework-GameCenter==9.1.1
pyobjc-framework-GameController==9.1.1
pyobjc-framework-GameKit==9.1.1
pyobjc-framework-GameplayKit==9.1.1
pyobjc-framework-ImageCaptureCore==9.1.1
pyobjc-framework-IMServicePlugIn==9.1.1
pyobjc-framework-InputMethodKit==9.1.1
pyobjc-framework-InstallerPlugins==9.1.1
pyobjc-framework-InstantMessage==9.1.1
pyobjc-framework-Intents==9.1.1
pyobjc-framework-IOBluetooth==9.1.1
pyobjc-framework-IOBluetoothUI==9.1.1
pyobjc-framework-IOSurface==9.1.1
pyobjc-framework-iTunesLibrary==9.1.1
pyobjc-framework-LatentSemanticMapping==9.1.1
pyobjc-framework-LaunchServices==9.1.1
pyobjc-framework-libdispatch==9.1.1
pyobjc-framework-libxpc==9.1.1
pyobjc-framework-LinkPresentation==9.1.1
pyobjc-framework-LocalAuthentication==9.1.1
pyobjc-framework-MapKit==9.1.1
pyobjc-framework-MediaAccessibility==9.1.1
pyobjc-framework-MediaLibrary==9.1.1
pyobjc-framework-MediaPlayer==9.1.1
pyobjc-framework-MediaToolbox==9.1.1
pyobjc-framework-Metal==9.1.1
pyobjc-framework-MetalKit==9.1.1
pyobjc-framework-MetalPerformanceShaders==9.1.1
pyobjc-framework-ModelIO==9.1.1
pyobjc-framework-MultipeerConnectivity==9.1.1
pyobjc-framework-NaturalLanguage==9.1.1
pyobjc-framework-NetFS==9.1.1
pyobjc-framework-Network==9.1.1
pyobjc-framework-NetworkExtension==9.1.1
pyobjc-framework-NotificationCenter==9.1.1
pyobjc-framework-OpenDirectory==9.1.1
pyobjc-framework-OSAKit==9.1.1
pyobjc-framework-OSLog==9.1.1
pyobjc-framework-PencilKit==9.1.1
pyobjc-framework-Photos==9.1.1
pyobjc-framework-PhotosUI==9.1.1
pyobjc-framework-PreferencePanes==9.1.1
pyobjc-framework-PushKit==9.1.1
pyobjc-framework-Quartz==9.1.1
pyobjc-framework-QuickLookThumbnailing==9.1.1
pyobjc-framework-SafariServices==9.1.1
pyobjc-framework-SceneKit==9.1.1
pyobjc-framework-ScreenSaver==9.1.1
pyobjc-framework-ScriptingBridge==9.1.1
pyobjc-framework-SearchKit==9.1.1
pyobjc-framework-Security==9.1.1
pyobjc-framework-SecurityFoundation==9.1.1
pyobjc-framework-SecurityInterface==9.1.1
pyobjc-framework-ServiceManagement==9.1.1
pyobjc-framework-Social==9.1.1
pyobjc-framework-SoundAnalysis==9.1.1
pyobjc-framework-Speech==9.1.1
pyobjc-framework-SpriteKit==9.1.1
pyobjc-framework-StoreKit==9.1.1
pyobjc-framework-SyncServices==9.1.1
pyobjc-framework-SystemConfiguration==9.1.1
pyobjc-framework-SystemExtensions==9.1.1
pyobjc-framework-UserNotifications==9.1.1
pyobjc-framework-VideoSubscriberAccount==9.1.1
pyobjc-framework-VideoToolbox==9.1.1
pyobjc-framework-Vision==9.1.1
pyobjc-framework-WebKit==9.1.1
pypandoc==1.11
pyppeteer==1.0.2
pyquery==2.0.0
python-dateutil==2.8.2
python-docx==0.8.11
python-dotenv==0.21.1
python-magic==0.4.27
python-pptx==0.6.21
python-slugify==8.0.1
pytz==2023.3
PyYAML==6.0
regex==2023.5.5
requests==2.31.0
requests-html==0.10.0
rfc3986==1.5.0
rich==13.0.1
six==1.16.0
sniffio==1.3.0
soupsieve==2.4.1
SQLAlchemy==2.0.15
tenacity==8.2.2
text-unidecode==1.3
tiktoken==0.4.0
tqdm==4.65.0
typer==0.9.0
typing-inspect==0.9.0
typing_extensions==4.6.3
unstructured==0.7.1
urllib3==1.26.16
uuid==1.30
w3lib==2.1.1
wcwidth==0.2.6
websockets==10.4
whaaaaat==0.5.2
wrapt==1.14.1
xlrd==2.0.1
XlsxWriter==3.1.2
yarl==1.9.2
youtube-transcript-api==0.6.0
zipp==3.15.0

View File

View File

@ -0,0 +1,44 @@
import os, json
from langchain.document_loaders import GitbookLoader
from urllib.parse import urlparse
from datetime import datetime
from alive_progress import alive_it
from .utils import tokenize
from uuid import uuid4
def gitbook():
url = input("Enter the URL of the GitBook you want to collect: ")
if(url == ''):
print("Not a gitbook URL")
exit(1)
primary_source = urlparse(url)
output_path = f"./outputs/gitbook-logs/{primary_source.netloc}"
transaction_output_dir = f"../server/documents/gitbook-{primary_source.netloc}"
if os.path.exists(output_path) == False:os.makedirs(output_path)
if os.path.exists(transaction_output_dir) == False: os.makedirs(transaction_output_dir)
loader = GitbookLoader(url, load_all_paths= primary_source.path in ['','/'])
for doc in alive_it(loader.load()):
metadata = doc.metadata
content = doc.page_content
source = urlparse(metadata.get('source'))
name = 'home' if source.path in ['','/'] else source.path.replace('/','_')
output_filename = f"doc-{name}.json"
transaction_output_filename = f"doc-{name}.json"
data = {
'id': str(uuid4()),
'url': metadata.get('source'),
"title": metadata.get('title'),
"description": metadata.get('title'),
"published": datetime.today().strftime('%Y-%m-%d %H:%M:%S'),
"wordCount": len(content),
'pageContent': content,
'token_count_estimate': len(tokenize(content))
}
with open(f"{output_path}/{output_filename}", 'w', encoding='utf-8') as file:
json.dump(data, file, ensure_ascii=True, indent=4)
with open(f"{transaction_output_dir}/{transaction_output_filename}", 'w', encoding='utf-8') as file:
json.dump(data, file, ensure_ascii=True, indent=4)

139
collector/scripts/link.py Normal file
View File

@ -0,0 +1,139 @@
import os, json, tempfile
from urllib.parse import urlparse
from requests_html import HTMLSession
from langchain.document_loaders import UnstructuredHTMLLoader
from .link_utils import append_meta
from .utils import tokenize, ada_v2_cost
# Example Channel URL https://tim.blog/2022/08/09/nft-insider-trading-policy/
def link():
print("[NOTICE]: The first time running this process it will download supporting libraries.\n\n")
fqdn_link = input("Paste in the URL of an online article or blog: ")
if(len(fqdn_link) == 0):
print("Invalid URL!")
exit(1)
session = HTMLSession()
req = session.get(fqdn_link)
if(req.ok == False):
print("Could not reach this url!")
exit(1)
req.html.render()
full_text = None
with tempfile.NamedTemporaryFile(mode = "w") as tmp:
tmp.write(req.html.html)
tmp.seek(0)
loader = UnstructuredHTMLLoader(tmp.name)
data = loader.load()[0]
full_text = data.page_content
tmp.close()
link = append_meta(req, full_text, True)
if(len(full_text) > 0):
source = urlparse(req.url)
output_filename = f"website-{source.netloc}-{source.path.replace('/','_')}.json"
output_path = f"./outputs/website-logs"
transaction_output_filename = f"article-{source.path.replace('/','_')}.json"
transaction_output_dir = f"../server/documents/website-{source.netloc}"
if os.path.isdir(output_path) == False:
os.makedirs(output_path)
if os.path.isdir(transaction_output_dir) == False:
os.makedirs(transaction_output_dir)
full_text = append_meta(req, full_text)
tokenCount = len(tokenize(full_text))
link['pageContent'] = full_text
link['token_count_estimate'] = tokenCount
with open(f"{output_path}/{output_filename}", 'w', encoding='utf-8') as file:
json.dump(link, file, ensure_ascii=True, indent=4)
with open(f"{transaction_output_dir}/{transaction_output_filename}", 'w', encoding='utf-8') as file:
json.dump(link, file, ensure_ascii=True, indent=4)
else:
print("Could not parse any meaningful data from this link or url.")
exit(1)
print(f"\n\n[Success]: article or link content fetched!")
print(f"////////////////////////////")
print(f"Your estimated cost to embed this data using OpenAI's text-embedding-ada-002 model at $0.0004 / 1K tokens will cost {ada_v2_cost(tokenCount)} using {tokenCount} tokens.")
print(f"////////////////////////////")
exit(0)
def links():
links = []
prompt = "Paste in the URL of an online article or blog: "
done = False
while(done == False):
new_link = input(prompt)
if(len(new_link) == 0):
done = True
links = [*set(links)]
continue
links.append(new_link)
prompt = f"\n{len(links)} links in queue. Submit an empty value when done pasting in links to execute collection.\nPaste in the next URL of an online article or blog: "
if(len(links) == 0):
print("No valid links provided!")
exit(1)
totalTokens = 0
for link in links:
print(f"Working on {link}...")
session = HTMLSession()
req = session.get(link)
if(req.ok == False):
print(f"Could not reach {link} - skipping!")
continue
req.html.render()
full_text = None
with tempfile.NamedTemporaryFile(mode = "w") as tmp:
tmp.write(req.html.html)
tmp.seek(0)
loader = UnstructuredHTMLLoader(tmp.name)
data = loader.load()[0]
full_text = data.page_content
tmp.close()
link = append_meta(req, full_text, True)
if(len(full_text) > 0):
source = urlparse(req.url)
output_filename = f"website-{source.netloc}-{source.path.replace('/','_')}.json"
output_path = f"./outputs/website-logs"
transaction_output_filename = f"article-{source.path.replace('/','_')}.json"
transaction_output_dir = f"../server/documents/website-{source.netloc}"
if os.path.isdir(output_path) == False:
os.makedirs(output_path)
if os.path.isdir(transaction_output_dir) == False:
os.makedirs(transaction_output_dir)
full_text = append_meta(req, full_text)
tokenCount = len(tokenize(full_text))
link['pageContent'] = full_text
link['token_count_estimate'] = tokenCount
totalTokens += tokenCount
with open(f"{output_path}/{output_filename}", 'w', encoding='utf-8') as file:
json.dump(link, file, ensure_ascii=True, indent=4)
with open(f"{transaction_output_dir}/{transaction_output_filename}", 'w', encoding='utf-8') as file:
json.dump(link, file, ensure_ascii=True, indent=4)
else:
print(f"Could not parse any meaningful data from {link}.")
continue
print(f"\n\n[Success]: {len(links)} article or link contents fetched!")
print(f"////////////////////////////")
print(f"Your estimated cost to embed this data using OpenAI's text-embedding-ada-002 model at $0.0004 / 1K tokens will cost {ada_v2_cost(totalTokens)} using {totalTokens} tokens.")
print(f"////////////////////////////")
exit(0)

View File

@ -0,0 +1,14 @@
import json
from datetime import datetime
from dotenv import load_dotenv
load_dotenv()
def append_meta(request, text, metadata_only = False):
meta = {
'url': request.url,
'title': request.html.find('title', first=True).text if len(request.html.find('title')) != 0 else '',
'description': request.html.find('meta[name="description"]', first=True).attrs.get('content') if request.html.find('meta[name="description"]', first=True) != None else '',
'published':request.html.find('meta[property="article:published_time"]', first=True).attrs.get('content') if request.html.find('meta[property="article:published_time"]', first=True) != None else datetime.today().strftime('%Y-%m-%d %H:%M:%S'),
'wordCount': len(text.split(' ')),
}
return "Article JSON Metadata:\n"+json.dumps(meta)+"\n\n\nText Content:\n" + text if metadata_only == False else meta

View File

@ -0,0 +1,71 @@
import os, json
from urllib.parse import urlparse
from .utils import tokenize, ada_v2_cost
from .medium_utils import get_username, fetch_recent_publications, append_meta
from alive_progress import alive_it
# Example medium URL: https://medium.com/@yujiangtham or https://davidall.medium.com
def medium():
print("[NOTICE]: This method will only get the 10 most recent publishings.")
author_url = input("Enter the medium URL of the author you want to collect: ")
if(author_url == ''):
print("Not a valid medium.com/@author URL")
exit(1)
handle = get_username(author_url)
if(handle is None):
print("This does not appear to be a valid medium.com/@author URL")
exit(1)
publications = fetch_recent_publications(handle)
if(len(publications)==0):
print("There are no public or free publications by this creator - nothing to collect.")
exit(1)
totalTokenCount = 0
transaction_output_dir = f"../server/documents/medium-{handle}"
if os.path.isdir(transaction_output_dir) == False:
os.makedirs(transaction_output_dir)
for publication in alive_it(publications):
pub_file_path = transaction_output_dir + f"/publication-{publication.get('id')}.json"
if os.path.exists(pub_file_path) == True: continue
full_text = publication.get('pageContent')
if full_text is None or len(full_text) == 0: continue
full_text = append_meta(publication, full_text)
item = {
'id': publication.get('id'),
'url': publication.get('url'),
'title': publication.get('title'),
'published': publication.get('published'),
'wordCount': len(full_text.split(' ')),
'pageContent': full_text,
}
tokenCount = len(tokenize(full_text))
item['token_count_estimate'] = tokenCount
totalTokenCount += tokenCount
with open(pub_file_path, 'w', encoding='utf-8') as file:
json.dump(item, file, ensure_ascii=True, indent=4)
print(f"[Success]: {len(publications)} scraped and fetched!")
print(f"\n\n////////////////////////////")
print(f"Your estimated cost to embed all of this data using OpenAI's text-embedding-ada-002 model at $0.0004 / 1K tokens will cost {ada_v2_cost(totalTokenCount)} using {totalTokenCount} tokens.")
print(f"////////////////////////////\n\n")
exit(0)

View File

@ -0,0 +1,71 @@
import os, json, requests, re
from bs4 import BeautifulSoup
def get_username(author_url):
if '@' in author_url:
pattern = r"medium\.com/@([\w-]+)"
match = re.search(pattern, author_url)
return match.group(1) if match else None
else:
# Given subdomain
pattern = r"([\w-]+).medium\.com"
match = re.search(pattern, author_url)
return match.group(1) if match else None
def get_docid(medium_docpath):
pattern = r"medium\.com/p/([\w-]+)"
match = re.search(pattern, medium_docpath)
return match.group(1) if match else None
def fetch_recent_publications(handle):
rss_link = f"https://medium.com/feed/@{handle}"
response = requests.get(rss_link)
if(response.ok == False):
print(f"Could not fetch RSS results for author.")
return []
xml = response.content
soup = BeautifulSoup(xml, 'xml')
items = soup.find_all('item')
publications = []
if os.path.isdir("./outputs/medium-logs") == False:
os.makedirs("./outputs/medium-logs")
file_path = f"./outputs/medium-logs/medium-{handle}.json"
if os.path.exists(file_path):
with open(file_path, "r") as file:
print(f"Returning cached data for Author {handle}. If you do not wish to use stored data then delete the file for this author to allow refetching.")
return json.load(file)
for item in items:
tags = []
for tag in item.find_all('category'): tags.append(tag.text)
content = BeautifulSoup(item.find('content:encoded').text, 'html.parser')
data = {
'id': get_docid(item.find('guid').text),
'title': item.find('title').text,
'url': item.find('link').text.split('?')[0],
'tags': ','.join(tags),
'published': item.find('pubDate').text,
'pageContent': content.get_text()
}
publications.append(data)
with open(file_path, 'w+', encoding='utf-8') as json_file:
json.dump(publications, json_file, ensure_ascii=True, indent=2)
print(f"{len(publications)} articles found for author medium.com/@{handle}. Saved to medium-logs/medium-{handle}.json")
return publications
def append_meta(publication, text):
meta = {
'url': publication.get('url'),
'tags': publication.get('tags'),
'title': publication.get('title'),
'createdAt': publication.get('published'),
'wordCount': len(text.split(' '))
}
return "Article Metadata:\n"+json.dumps(meta)+"\n\nArticle Content:\n" + text

View File

@ -0,0 +1,78 @@
import os, json
from urllib.parse import urlparse
from .utils import tokenize, ada_v2_cost
from .substack_utils import fetch_all_publications, only_valid_publications, get_content, append_meta
from alive_progress import alive_it
# Example substack URL: https://swyx.substack.com/
def substack():
author_url = input("Enter the substack URL of the author you want to collect: ")
if(author_url == ''):
print("Not a valid author.substack.com URL")
exit(1)
source = urlparse(author_url)
if('substack.com' not in source.netloc or len(source.netloc.split('.')) != 3):
print("This does not appear to be a valid author.substack.com URL")
exit(1)
subdomain = source.netloc.split('.')[0]
publications = fetch_all_publications(subdomain)
valid_publications = only_valid_publications(publications)
if(len(valid_publications)==0):
print("There are no public or free preview newsletters by this creator - nothing to collect.")
exit(1)
print(f"{len(valid_publications)} of {len(publications)} publications are readable publically text posts - collecting those.")
totalTokenCount = 0
transaction_output_dir = f"../server/documents/substack-{subdomain}"
if os.path.isdir(transaction_output_dir) == False:
os.makedirs(transaction_output_dir)
for publication in alive_it(valid_publications):
pub_file_path = transaction_output_dir + f"/publication-{publication.get('id')}.json"
if os.path.exists(pub_file_path) == True: continue
full_text = get_content(publication.get('canonical_url'))
if full_text is None or len(full_text) == 0: continue
full_text = append_meta(publication, full_text)
item = {
'id': publication.get('id'),
'url': publication.get('canonical_url'),
'thumbnail': publication.get('cover_image'),
'title': publication.get('title'),
'subtitle': publication.get('subtitle'),
'description': publication.get('description'),
'published': publication.get('post_date'),
'wordCount': publication.get('wordcount'),
'pageContent': full_text,
}
tokenCount = len(tokenize(full_text))
item['token_count_estimate'] = tokenCount
totalTokenCount += tokenCount
with open(pub_file_path, 'w', encoding='utf-8') as file:
json.dump(item, file, ensure_ascii=True, indent=4)
print(f"[Success]: {len(valid_publications)} scraped and fetched!")
print(f"\n\n////////////////////////////")
print(f"Your estimated cost to embed all of this data using OpenAI's text-embedding-ada-002 model at $0.0004 / 1K tokens will cost {ada_v2_cost(totalTokenCount)} using {totalTokenCount} tokens.")
print(f"////////////////////////////\n\n")
exit(0)

View File

@ -0,0 +1,86 @@
import os, json, requests, tempfile
from requests_html import HTMLSession
from langchain.document_loaders import UnstructuredHTMLLoader
def fetch_all_publications(subdomain):
file_path = f"./outputs/substack-logs/substack-{subdomain}.json"
if os.path.isdir("./outputs/substack-logs") == False:
os.makedirs("./outputs/substack-logs")
if os.path.exists(file_path):
with open(file_path, "r") as file:
print(f"Returning cached data for substack {subdomain}.substack.com. If you do not wish to use stored data then delete the file for this newsletter to allow refetching.")
return json.load(file)
collecting = True
offset = 0
publications = []
while collecting is True:
url = f"https://{subdomain}.substack.com/api/v1/archive?sort=new&offset={offset}"
response = requests.get(url)
if(response.ok == False):
print("Bad response - exiting collection")
collecting = False
continue
data = response.json()
if(len(data) ==0 ):
collecting = False
continue
for publication in data:
publications.append(publication)
offset = len(publications)
with open(file_path, 'w+', encoding='utf-8') as json_file:
json.dump(publications, json_file, ensure_ascii=True, indent=2)
print(f"{len(publications)} publications found for author {subdomain}.substack.com. Saved to substack-logs/channel-{subdomain}.json")
return publications
def only_valid_publications(publications= []):
valid_publications = []
for publication in publications:
is_paid = publication.get('audience') != 'everyone'
if (is_paid and publication.get('should_send_free_preview') != True) or publication.get('type') != 'newsletter': continue
valid_publications.append(publication)
return valid_publications
def get_content(article_link):
print(f"Fetching {article_link}")
if(len(article_link) == 0):
print("Invalid URL!")
return None
session = HTMLSession()
req = session.get(article_link)
if(req.ok == False):
print("Could not reach this url!")
return None
req.html.render()
full_text = None
with tempfile.NamedTemporaryFile(mode = "w") as tmp:
tmp.write(req.html.html)
tmp.seek(0)
loader = UnstructuredHTMLLoader(tmp.name)
data = loader.load()[0]
full_text = data.page_content
tmp.close()
return full_text
def append_meta(publication, text):
meta = {
'url': publication.get('canonical_url'),
'thumbnail': publication.get('cover_image'),
'title': publication.get('title'),
'subtitle': publication.get('subtitle'),
'description': publication.get('description'),
'createdAt': publication.get('post_date'),
'wordCount': publication.get('wordcount')
}
return "Newsletter Metadata:\n"+json.dumps(meta)+"\n\nArticle Content:\n" + text

View File

@ -0,0 +1,10 @@
import tiktoken
encoder = tiktoken.encoding_for_model("text-embedding-ada-002")
def tokenize(fullText):
return encoder.encode(fullText)
def ada_v2_cost(tokenCount):
rate_per = 0.0004 / 1_000 # $0.0004 / 1K tokens
total = tokenCount * rate_per
return '${:,.2f}'.format(total) if total >= 0.01 else '< $0.01'

View File

View File

@ -0,0 +1,58 @@
import os
from langchain.document_loaders import Docx2txtLoader, UnstructuredODTLoader
from slugify import slugify
from ..utils import guid, file_creation_time, write_to_server_documents, move_source
from ...utils import tokenize
# Process all text-related documents.
def as_docx(**kwargs):
parent_dir = kwargs.get('directory', 'hotdir')
filename = kwargs.get('filename')
ext = kwargs.get('ext', '.txt')
fullpath = f"{parent_dir}/{filename}{ext}"
loader = Docx2txtLoader(fullpath)
data = loader.load()[0]
content = data.page_content
print(f"-- Working {fullpath} --")
data = {
'id': guid(),
'url': "file://"+os.path.abspath(f"{parent_dir}/processed/{filename}{ext}"),
'title': f"{filename}{ext}",
'description': "a custom file uploaded by the user.",
'published': file_creation_time(fullpath),
'wordCount': len(content),
'pageContent': content,
'token_count_estimate': len(tokenize(content))
}
write_to_server_documents(data, f"{slugify(filename)}-{data.get('id')}")
move_source(parent_dir, f"{filename}{ext}")
print(f"[SUCCESS]: {filename}{ext} converted & ready for embedding.\n")
def as_odt(**kwargs):
parent_dir = kwargs.get('directory', 'hotdir')
filename = kwargs.get('filename')
ext = kwargs.get('ext', '.txt')
fullpath = f"{parent_dir}/{filename}{ext}"
loader = UnstructuredODTLoader(fullpath)
data = loader.load()[0]
content = data.page_content
print(f"-- Working {fullpath} --")
data = {
'id': guid(),
'url': "file://"+os.path.abspath(f"{parent_dir}/processed/{filename}{ext}"),
'title': f"{filename}{ext}",
'description': "a custom file uploaded by the user.",
'published': file_creation_time(fullpath),
'wordCount': len(content),
'pageContent': content,
'token_count_estimate': len(tokenize(content))
}
write_to_server_documents(data, f"{slugify(filename)}-{data.get('id')}")
move_source(parent_dir, f"{filename}{ext}")
print(f"[SUCCESS]: {filename}{ext} converted & ready for embedding.\n")

View File

@ -0,0 +1,32 @@
import os
from langchain.document_loaders import UnstructuredMarkdownLoader
from slugify import slugify
from ..utils import guid, file_creation_time, write_to_server_documents, move_source
from ...utils import tokenize
# Process all text-related documents.
def as_markdown(**kwargs):
parent_dir = kwargs.get('directory', 'hotdir')
filename = kwargs.get('filename')
ext = kwargs.get('ext', '.txt')
fullpath = f"{parent_dir}/{filename}{ext}"
loader = UnstructuredMarkdownLoader(fullpath)
data = loader.load()[0]
content = data.page_content
print(f"-- Working {fullpath} --")
data = {
'id': guid(),
'url': "file://"+os.path.abspath(f"{parent_dir}/processed/{filename}{ext}"),
'title': f"{filename}{ext}",
'description': "a custom file uploaded by the user.",
'published': file_creation_time(fullpath),
'wordCount': len(content),
'pageContent': content,
'token_count_estimate': len(tokenize(content))
}
write_to_server_documents(data, f"{slugify(filename)}-{data.get('id')}")
move_source(parent_dir, f"{filename}{ext}")
print(f"[SUCCESS]: {filename}{ext} converted & ready for embedding.\n")

View File

@ -0,0 +1,36 @@
import os
from langchain.document_loaders import PyPDFLoader
from slugify import slugify
from ..utils import guid, file_creation_time, write_to_server_documents, move_source
from ...utils import tokenize
# Process all text-related documents.
def as_pdf(**kwargs):
parent_dir = kwargs.get('directory', 'hotdir')
filename = kwargs.get('filename')
ext = kwargs.get('ext', '.txt')
fullpath = f"{parent_dir}/{filename}{ext}"
loader = PyPDFLoader(fullpath)
pages = loader.load_and_split()
print(f"-- Working {fullpath} --")
for page in pages:
pg_num = page.metadata.get('page')
print(f"-- Working page {pg_num} --")
content = page.page_content
data = {
'id': guid(),
'url': "file://"+os.path.abspath(f"{parent_dir}/processed/{filename}{ext}"),
'title': f"{filename}_pg{pg_num}{ext}",
'description': "a custom file uploaded by the user.",
'published': file_creation_time(fullpath),
'wordCount': len(content),
'pageContent': content,
'token_count_estimate': len(tokenize(content))
}
write_to_server_documents(data, f"{slugify(filename)}-pg{pg_num}-{data.get('id')}")
move_source(parent_dir, f"{filename}{ext}")
print(f"[SUCCESS]: {filename}{ext} converted & ready for embedding.\n")

View File

@ -0,0 +1,28 @@
import os
from slugify import slugify
from ..utils import guid, file_creation_time, write_to_server_documents, move_source
from ...utils import tokenize
# Process all text-related documents.
def as_text(**kwargs):
parent_dir = kwargs.get('directory', 'hotdir')
filename = kwargs.get('filename')
ext = kwargs.get('ext', '.txt')
fullpath = f"{parent_dir}/{filename}{ext}"
content = open(fullpath).read()
print(f"-- Working {fullpath} --")
data = {
'id': guid(),
'url': "file://"+os.path.abspath(f"{parent_dir}/processed/{filename}{ext}"),
'title': f"{filename}{ext}",
'description': "a custom file uploaded by the user.",
'published': file_creation_time(fullpath),
'wordCount': len(content),
'pageContent': content,
'token_count_estimate': len(tokenize(content))
}
write_to_server_documents(data, f"{slugify(filename)}-{data.get('id')}")
move_source(parent_dir, f"{filename}{ext}")
print(f"[SUCCESS]: {filename}{ext} converted & ready for embedding.\n")

View File

@ -0,0 +1,12 @@
from .convert.as_text import as_text
from .convert.as_markdown import as_markdown
from .convert.as_pdf import as_pdf
from .convert.as_docx import as_docx, as_odt
FILETYPES = {
'.txt': as_text,
'.md': as_markdown,
'.pdf': as_pdf,
'.docx': as_docx,
'.odt': as_odt,
}

View File

@ -0,0 +1,20 @@
import os
from .filetypes import FILETYPES
RESERVED = ['__HOTDIR__.md']
def watch_for_changes(directory):
for raw_doc in os.listdir(directory):
if os.path.isdir(f"{directory}/{raw_doc}") or raw_doc in RESERVED: continue
filename, fileext = os.path.splitext(raw_doc)
if filename in ['.DS_Store'] or fileext == '': continue
if fileext not in FILETYPES.keys():
print(f"{fileext} not a supported file type for conversion. Please remove from hot directory.")
continue
FILETYPES[fileext](
directory=directory,
filename=filename,
ext=fileext,
)

View File

@ -0,0 +1,30 @@
import os, json
from datetime import datetime
from uuid import uuid4
def guid():
return str(uuid4())
def file_creation_time(path_to_file):
try:
if os.name == 'nt':
return datetime.fromtimestamp(os.path.getctime(path_to_file)).strftime('%Y-%m-%d %H:%M:%S')
else:
stat = os.stat(path_to_file)
return datetime.fromtimestamp(stat.st_birthtime).strftime('%Y-%m-%d %H:%M:%S')
except AttributeError:
return datetime.today().strftime('%Y-%m-%d %H:%M:%S')
def move_source(working_dir='hotdir', new_destination_filename= ''):
destination = f"{working_dir}/processed"
if os.path.exists(destination) == False:
os.mkdir(destination)
os.replace(f"{working_dir}/{new_destination_filename}", f"{destination}/{new_destination_filename}")
return
def write_to_server_documents(data, filename):
destination = f"../server/documents/custom-documents"
if os.path.exists(destination) == False: os.makedirs(destination)
with open(f"{destination}/{filename}.json", 'w', encoding='utf-8') as file:
json.dump(data, file, ensure_ascii=True, indent=4)

View File

@ -0,0 +1,55 @@
import os, json
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter, JSONFormatter
from .utils import tokenize, ada_v2_cost
from .yt_utils import fetch_channel_video_information, get_channel_id, clean_text, append_meta, get_duration
from alive_progress import alive_it
# Example Channel URL https://www.youtube.com/channel/UCmWbhBB96ynOZuWG7LfKong
# Example Channel URL https://www.youtube.com/@mintplex
def youtube():
channel_link = input("Paste in the URL of a YouTube channel: ")
channel_id = get_channel_id(channel_link)
if channel_id == None or len(channel_id) == 0:
print("Invalid input - must be full YouTube channel URL")
exit(1)
channel_data = fetch_channel_video_information(channel_id)
transaction_output_dir = f"../server/documents/youtube-{channel_data.get('channelTitle')}"
if os.path.isdir(transaction_output_dir) == False:
os.makedirs(transaction_output_dir)
print(f"\nFetching transcripts for {len(channel_data.get('items'))} videos - please wait.\nStopping and restarting will not refetch known transcripts in case there is an error.\nSaving results to: {transaction_output_dir}.")
totalTokenCount = 0
for video in alive_it(channel_data.get('items')):
video_file_path = transaction_output_dir + f"/video-{video.get('id')}.json"
if os.path.exists(video_file_path) == True:
continue
formatter = TextFormatter()
json_formatter = JSONFormatter()
try:
transcript = YouTubeTranscriptApi.get_transcript(video.get('id'))
raw_text = clean_text(formatter.format_transcript(transcript))
duration = get_duration(json_formatter.format_transcript(transcript))
if(len(raw_text) > 0):
fullText = append_meta(video, duration, raw_text)
tokenCount = len(tokenize(fullText))
video['pageContent'] = fullText
video['token_count_estimate'] = tokenCount
totalTokenCount += tokenCount
with open(video_file_path, 'w', encoding='utf-8') as file:
json.dump(video, file, ensure_ascii=True, indent=4)
except:
print("There was an issue getting the transcription of a video in the list - likely because captions are disabled. Skipping")
continue
print(f"[Success]: {len(channel_data.get('items'))} video transcripts fetched!")
print(f"\n\n////////////////////////////")
print(f"Your estimated cost to embed all of this data using OpenAI's text-embedding-ada-002 model at $0.0004 / 1K tokens will cost {ada_v2_cost(totalTokenCount)} using {totalTokenCount} tokens.")
print(f"////////////////////////////\n\n")
exit(0)

View File

@ -0,0 +1,120 @@
import json, requests, os, re
from slugify import slugify
from dotenv import load_dotenv
load_dotenv()
def is_yt_short(videoId):
url = 'https://www.youtube.com/shorts/' + videoId
ret = requests.head(url)
return ret.status_code == 200
def get_channel_id(channel_link):
if('@' in channel_link):
pattern = r'https?://www\.youtube\.com/(@\w+)/?'
match = re.match(pattern, channel_link)
if match is False: return None
handle = match.group(1)
print('Need to map username to channelId - this can take a while sometimes.')
response = requests.get(f"https://yt.lemnoslife.com/channels?handle={handle}", timeout=20)
if(response.ok == False):
print("Handle => ChannelId mapping endpoint is too slow - use regular youtube.com/channel URL")
return None
json_data = response.json()
return json_data.get('items')[0].get('id')
else:
pattern = r"youtube\.com/channel/([\w-]+)"
match = re.search(pattern, channel_link)
return match.group(1) if match else None
def clean_text(text):
return re.sub(r"\[.*?\]", "", text)
def append_meta(video, duration, text):
meta = {
'youtubeURL': f"https://youtube.com/watch?v={video.get('id')}",
'thumbnail': video.get('thumbnail'),
'description': video.get('description'),
'createdAt': video.get('published'),
'videoDurationInSeconds': duration,
}
return "Video JSON Metadata:\n"+json.dumps(meta, indent=4)+"\n\n\nAudio Transcript:\n" + text
def get_duration(json_str):
data = json.loads(json_str)
return data[-1].get('start')
def fetch_channel_video_information(channel_id, windowSize = 50):
if channel_id == None or len(channel_id) == 0:
print("No channel id provided!")
exit(1)
if os.path.isdir("./outputs/channel-logs") == False:
os.makedirs("./outputs/channel-logs")
file_path = f"./outputs/channel-logs/channel-{channel_id}.json"
if os.path.exists(file_path):
with open(file_path, "r") as file:
print(f"Returning cached data for channel {channel_id}. If you do not wish to use stored data then delete the file for this channel to allow refetching.")
return json.load(file)
if(os.getenv('GOOGLE_APIS_KEY') == None):
print("GOOGLE_APIS_KEY env variable not set!")
exit(1)
done = False
currentPage = None
pageTokens = []
items = []
data = {
'id': channel_id,
}
print("Fetching first page of results...")
while(done == False):
url = f"https://www.googleapis.com/youtube/v3/search?key={os.getenv('GOOGLE_APIS_KEY')}&channelId={channel_id}&part=snippet,id&order=date&type=video&maxResults={windowSize}"
if(currentPage != None):
print(f"Fetching page ${currentPage}")
url += f"&pageToken={currentPage}"
req = requests.get(url)
if(req.ok == False):
print("Could not fetch channel_id items!")
exit(1)
response = req.json()
currentPage = response.get('nextPageToken')
if currentPage in pageTokens:
print('All pages iterated and logged!')
done = True
break
for item in response.get('items'):
if 'id' in item and 'videoId' in item.get('id'):
if is_yt_short(item.get('id').get('videoId')):
print(f"Filtering out YT Short {item.get('id').get('videoId')}")
continue
if data.get('channelTitle') is None:
data['channelTitle'] = slugify(item.get('snippet').get('channelTitle'))
newItem = {
'id': item.get('id').get('videoId'),
'url': f"https://youtube.com/watch?v={item.get('id').get('videoId')}",
'title': item.get('snippet').get('title'),
'description': item.get('snippet').get('description'),
'thumbnail': item.get('snippet').get('thumbnails').get('high').get('url'),
'published': item.get('snippet').get('publishTime'),
}
items.append(newItem)
pageTokens.append(currentPage)
data['items'] = items
with open(file_path, 'w+', encoding='utf-8') as json_file:
json.dump(data, json_file, ensure_ascii=True, indent=2)
print(f"{len(items)} videos found for channel {data.get('channelTitle')}. Saved to channel-logs/channel-{channel_id}.json")
return data

21
collector/watch.py Normal file
View File

@ -0,0 +1,21 @@
import _thread, time
from scripts.watch.main import watch_for_changes
a_list = []
WATCH_DIRECTORY = "hotdir"
def input_thread(a_list):
input()
a_list.append(True)
def main():
_thread.start_new_thread(input_thread, (a_list,))
print(f"Watching '{WATCH_DIRECTORY}/' for new files.\n\nUpload files into this directory while this script is running to convert them.\nPress enter or crtl+c to exit script.")
while not a_list:
watch_for_changes(WATCH_DIRECTORY)
time.sleep(1)
print("Stopping watching of hot directory.")
exit(1)
if __name__ == "__main__":
main()

1
frontend/.env.production Normal file
View File

@ -0,0 +1 @@
GENERATE_SOURCEMAP=false

15
frontend/.eslintrc.cjs Normal file
View File

@ -0,0 +1,15 @@
module.exports = {
env: { browser: true, es2020: true },
extends: [
'eslint:recommended',
'plugin:react/recommended',
'plugin:react/jsx-runtime',
'plugin:react-hooks/recommended',
],
parserOptions: { ecmaVersion: 'latest', sourceType: 'module' },
settings: { react: { version: '18.2' } },
plugins: ['react-refresh'],
rules: {
'react-refresh/only-export-components': 'warn',
},
}

25
frontend/.gitignore vendored Normal file
View File

@ -0,0 +1,25 @@
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*
node_modules
dist
dist-ssr
*.local
# Editor directories and files
.vscode/*
!.vscode/extensions.json
.idea
.DS_Store
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?
bundleinspector.html

1
frontend/.nvmrc Normal file
View File

@ -0,0 +1 @@
v18.12.1

36
frontend/index.html Normal file
View File

@ -0,0 +1,36 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/svg+xml" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>AnythingLLM | Your personal LLM trained on anything</title>
<meta name="title" content="AnythingLLM | Your personal LLM trained on anything">
<meta name="description" content="AnythingLLM | Your personal LLM trained on anything">
<!-- Facebook -->
<meta property="og:type" content="website">
<meta property="og:url" content="https://anything-llm.ai">
<meta property="og:title" content="AnythingLLM | Your personal LLM trained on anything">
<meta property="og:description" content="AnythingLLM | Your personal LLM trained on anything">
<meta property="og:image" content="https://anything-llm.ai/promo.png">
<!-- Twitter -->
<meta property="twitter:card" content="summary_large_image">
<meta property="twitter:url" content="https://anything-llm.ai">
<meta property="twitter:title" content="AnythingLLM | Your personal LLM trained on anything">
<meta property="twitter:description" content="AnythingLLM | Your personal LLM trained on anything">
<meta property="twitter:image" content="https://anything-llm.ai/promo.png">
<link rel="icon" href="/favicon.ico" />
<link rel="apple-touch-icon" href="/favicon.ico" />
</head>
<body>
<div id="root" class="h-screen"></div>
<script type="module" src="/src/main.jsx"></script>
</body>
</html>

7
frontend/jsconfig.json Normal file
View File

@ -0,0 +1,7 @@
{
"compilerOptions": {
"module": "commonjs",
"target": "esnext",
"jsx": "react"
}
}

50
frontend/package.json Normal file
View File

@ -0,0 +1,50 @@
{
"name": "anything-llm-frontend",
"private": false,
"version": "0.1.0",
"type": "module",
"scripts": {
"start": "vite --open",
"build": "vite build",
"lint": "yarn prettier --write ./src",
"preview": "vite preview"
},
"dependencies": {
"@esbuild-plugins/node-globals-polyfill": "^0.1.1",
"@metamask/jazzicon": "^2.0.0",
"@react-oauth/google": "^0.11.0",
"buffer": "^6.0.3",
"email-validator": "^2.0.4",
"he": "^1.2.0",
"js-file-download": "^0.4.12",
"moment-timezone": "^0.5.43",
"pluralize": "^8.0.0",
"react": "^18.2.0",
"react-confetti-explosion": "^2.1.2",
"react-device-detect": "^2.2.2",
"react-dom": "^18.2.0",
"react-drag-drop-files": "^2.3.7",
"react-feather": "^2.0.10",
"react-loading-skeleton": "^3.1.0",
"react-router-dom": "^6.3.0",
"react-type-animation": "^3.0.1",
"text-case": "^1.0.9",
"truncate": "^3.0.0",
"uuid": "^9.0.0"
},
"devDependencies": {
"@types/react": "^18.0.28",
"@types/react-dom": "^18.0.11",
"@vitejs/plugin-react": "^4.0.0-beta.0",
"autoprefixer": "^10.4.14",
"eslint": "^8.38.0",
"eslint-plugin-react": "^7.32.2",
"eslint-plugin-react-hooks": "^4.6.0",
"eslint-plugin-react-refresh": "^0.3.4",
"postcss": "^8.4.23",
"prettier": "^2.4.1",
"rollup-plugin-visualizer": "^5.9.0",
"tailwindcss": "^3.3.1",
"vite": "^4.3.0"
}
}

View File

@ -0,0 +1,7 @@
import tailwind from 'tailwindcss'
import autoprefixer from 'autoprefixer'
import tailwindConfig from './tailwind.config.js'
export default {
plugins: [tailwind(tailwindConfig), autoprefixer],
}

BIN
frontend/public/favicon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.6 KiB

Binary file not shown.

19
frontend/src/App.jsx Normal file
View File

@ -0,0 +1,19 @@
import React, { lazy, Suspense } from "react";
import { Routes, Route } from "react-router-dom";
import { ContextWrapper } from "./AuthContext";
const Main = lazy(() => import("./pages/Main"));
const WorkspaceChat = lazy(() => import("./pages/WorkspaceChat"));
export default function App() {
return (
<Suspense fallback={<div />}>
<ContextWrapper>
<Routes>
<Route path="/" element={<Main />} />
<Route path="/workspace/:slug" element={<WorkspaceChat />} />
</Routes>
</ContextWrapper>
</Suspense>
);
}

View File

@ -0,0 +1,30 @@
import React, { useState, createContext } from "react";
export const AuthContext = createContext(null);
export function ContextWrapper(props) {
const localUser = localStorage.getItem("anythingllm_user");
const localAuthToken = localStorage.getItem("anythingllm_authToken");
const [store, setStore] = useState({
user: localUser ? JSON.parse(localUser) : null,
authToken: localAuthToken ? localAuthToken : null,
});
const [actions] = useState({
updateUser: (user, authToken = "") => {
localStorage.setItem("anythingllm_user", JSON.stringify(user));
localStorage.setItem("anythingllm_authToken", authToken);
setStore({ user, authToken });
},
unsetUser: () => {
localStorage.removeItem("anythingllm_user");
localStorage.removeItem("anythingllm_authToken");
setStore({ user: null, authToken: null });
},
});
return (
<AuthContext.Provider value={{ store, actions }}>
{props.children}
</AuthContext.Provider>
);
}

View File

@ -0,0 +1,254 @@
import React, { useEffect, useState } from "react";
import { GitHub, GitMerge, Mail, Plus } from "react-feather";
import NewWorkspaceModal, {
useNewWorkspaceModal,
} from "../Modals/NewWorkspace";
export default function DefaultChatContainer() {
const [mockMsgs, setMockMessages] = useState([]);
const {
showing: showingNewWsModal,
showModal: showNewWsModal,
hideModal: hideNewWsModal,
} = useNewWorkspaceModal();
const popMsg = !window.localStorage.getItem("anythingllm_intro");
const MESSAGES = [
<React.Fragment>
<div
className={`flex w-full mt-2 justify-start ${
popMsg ? "chat__message" : ""
}`}
>
<div className="p-4 max-w-[75%] bg-orange-100 dark:bg-stone-700 rounded-b-2xl rounded-tr-2xl rounded-tl-sm">
<p className="text-slate-800 dark:text-slate-200 font-semibold">
Welcome to AnythingLLM, AnythingLLM is an open-source AI tool by
Mintplex Labs that turns <i>anything</i> into a trained chatbot you
can query and chat with. AnythingLLM is a BYOK (bring-your-own-keys)
software so there is no subscription, fee, or charges for this
software outside of the services you want to use with it.
</p>
</div>
</div>
</React.Fragment>,
<React.Fragment>
<div
className={`flex w-full mt-2 justify-start ${
popMsg ? "chat__message" : ""
}`}
>
<div className="p-4 max-w-[75%] bg-orange-100 dark:bg-stone-700 rounded-b-2xl rounded-tr-2xl rounded-tl-sm">
<p className="text-slate-800 dark:text-slate-200 font-semibold">
AnythingLLM is the easiest way to put powerful AI products like
OpenAi, GPT-4, LangChain, PineconeDB, ChromaDB, and other services
together in a neat package with no fuss to increase your
productivity by 100x.
</p>
</div>
</div>
</React.Fragment>,
<React.Fragment>
<div
className={`flex w-full mt-2 justify-start ${
popMsg ? "chat__message" : ""
}`}
>
<div className="p-4 max-w-[75%] bg-orange-100 dark:bg-stone-700 rounded-b-2xl rounded-tr-2xl rounded-tl-sm">
<p className="text-slate-800 dark:text-slate-200 font-semibold">
AnythingLLM can run totally locally on your machine with little
overhead you wont even notice it's there! No GPU needed. Cloud and
on-premises installtion is available as well.
<br />
The AI tooling ecosytem gets more powerful everyday. AnythingLLM
makes it easy to use.
</p>
<a
href=""
className="mt-4 w-fit flex flex-grow gap-x-2 py-[5px] px-4 border border-slate-400 rounded-lg text-slate-800 dark:text-slate-200 justify-start items-center hover:bg-slate-100 dark:hover:bg-stone-900 dark:bg-stone-900"
>
<GitMerge className="h-4 w-4" />
<p className="text-slate-800 dark:text-slate-200 text-lg leading-loose">
Create an issue on Github
</p>
</a>
</div>
</div>
</React.Fragment>,
<React.Fragment>
<div
className={`flex w-full mt-2 justify-end ${
popMsg ? "chat__message" : ""
}`}
>
<div className="p-4 max-w-[75%] bg-slate-200 dark:bg-amber-800 rounded-b-2xl rounded-tl-2xl rounded-tr-sm">
<p className="text-slate-800 dark:text-slate-200 font-semibold">
How do I get started?!
</p>
</div>
</div>
</React.Fragment>,
<React.Fragment>
<div
className={`flex w-full mt-2 justify-start ${
popMsg ? "chat__message" : ""
}`}
>
<div className="p-4 max-w-[75%] bg-orange-100 dark:bg-stone-700 rounded-b-2xl rounded-tr-2xl rounded-tl-sm">
<p className="text-slate-800 dark:text-slate-200 font-semibold">
It's simple. All collections are organized into buckets we call{" "}
<b>"Workspaces"</b>. Workspaces are buckets of files, documents,
images, PDFs, and other files which will be transformed into
something LLM's can understand and use in conversation.
<br />
<br />
You can add and remove files at anytime.
</p>
<button
onClick={showNewWsModal}
className="mt-4 w-fit flex flex-grow gap-x-2 py-[5px] px-4 border border-slate-400 rounded-lg text-slate-800 dark:text-slate-200 justify-start items-center hover:bg-slate-100 dark:hover:bg-stone-900 dark:bg-stone-900"
>
<Plus className="h-4 w-4" />
<p className="text-slate-800 dark:text-slate-200 text-lg leading-loose">
Create your first workspace
</p>
</button>
</div>
</div>
</React.Fragment>,
<React.Fragment>
<div
className={`flex w-full mt-2 justify-end ${
popMsg ? "chat__message" : ""
}`}
>
<div className="p-4 max-w-[75%] bg-slate-200 dark:bg-amber-800 rounded-b-2xl rounded-tl-2xl rounded-tr-sm">
<p className="text-slate-800 dark:text-slate-200 font-semibold">
Is this like an AI dropbox or something? What about chatting? It is
a chatbot isnt it?
</p>
</div>
</div>
</React.Fragment>,
<React.Fragment>
<div
className={`flex w-full mt-2 justify-start ${
popMsg ? "chat__message" : ""
}`}
>
<div className="p-4 max-w-[75%] bg-orange-100 dark:bg-stone-700 rounded-b-2xl rounded-tr-2xl rounded-tl-sm">
<p className="text-slate-800 dark:text-slate-200 font-semibold">
AnythingLLM is more than a smarter Dropbox.
<br />
<br />
AnythingLLM offers two ways of talking with your data:
<br />
<br />
<i>Query:</i> Your chats will return data or inferences found with
the documents in your workspace it has access to. Adding more
documents to the Workspace make it smarter!
<br />
<br />
<i>Conversational:</i> Your documents + your on-going chat history
both contribute to the LLM knowledge at the same time. Great for
appending real-time text-based info or corrections and
misunderstandings the LLM might have.
<br />
<br />
You can toggle between either mode <i>in the middle of chatting!</i>
</p>
</div>
</div>
</React.Fragment>,
<React.Fragment>
<div
className={`flex w-full mt-2 justify-end ${
popMsg ? "chat__message" : ""
}`}
>
<div className="p-4 max-w-[75%] bg-slate-200 dark:bg-amber-800 rounded-b-2xl rounded-tl-2xl rounded-tr-sm">
<p className="text-slate-800 dark:text-slate-200 font-semibold">
Wow, this sounds amazing, let me try it out already!
</p>
</div>
</div>
</React.Fragment>,
<React.Fragment>
<div
className={`flex w-full mt-2 justify-start ${
popMsg ? "chat__message" : ""
}`}
>
<div className="p-4 max-w-[75%] bg-orange-100 dark:bg-stone-700 rounded-b-2xl rounded-tr-2xl rounded-tl-sm">
<p className="text-slate-800 dark:text-slate-200 font-semibold">
Have Fun!
</p>
<div className="flex items-center gap-x-4">
<a
href=""
className="mt-4 w-fit flex flex-grow gap-x-2 py-[5px] px-4 border border-slate-400 rounded-lg text-slate-800 dark:text-slate-200 justify-start items-center hover:bg-slate-100 dark:hover:bg-stone-900 dark:bg-stone-900"
>
<GitHub className="h-4 w-4" />
<p className="text-slate-800 dark:text-slate-200 text-lg leading-loose">
Star on GitHub
</p>
</a>
<a
href=""
className="mt-4 w-fit flex flex-grow gap-x-2 py-[5px] px-4 border border-slate-400 rounded-lg text-slate-800 dark:text-slate-200 justify-start items-center hover:bg-slate-100 dark:hover:bg-stone-900 dark:bg-stone-900"
>
<Mail className="h-4 w-4" />
<p className="text-slate-800 dark:text-slate-200 text-lg leading-loose">
Contact Mintplex Labs
</p>
</a>
</div>
</div>
</div>
</React.Fragment>,
];
useEffect(() => {
function processMsgs() {
if (!!window.localStorage.getItem("anythingllm_intro")) {
setMockMessages([...MESSAGES]);
return false;
} else {
setMockMessages([MESSAGES[0]]);
}
var timer = 500;
var messages = [];
MESSAGES.map((child) => {
setTimeout(() => {
setMockMessages([...messages, child]);
messages.push(child);
}, timer);
timer += 2_500;
});
window.localStorage.setItem("anythingllm_intro", 1);
}
processMsgs();
}, []);
return (
<div
style={{ height: "calc(100% - 32px)" }}
className="transition-all duration-500 relative ml-[2px] mr-[8px] my-[16px] rounded-[26px] bg-white dark:bg-black-900 min-w-[82%] p-[18px] h-full overflow-y-scroll"
>
{mockMsgs.map((content, i) => {
return <React.Fragment key={i}>{content}</React.Fragment>;
})}
{showingNewWsModal && <NewWorkspaceModal hideModal={hideNewWsModal} />}
</div>
);
}

View File

@ -0,0 +1,163 @@
import React, { useState, useEffect } from "react";
import { X } from "react-feather";
import System from "../../models/system";
const noop = () => false;
export default function KeysModal({ hideModal = noop }) {
const [loading, setLoading] = useState(true);
const [settings, setSettings] = useState({});
useEffect(() => {
async function fetchKeys() {
const settings = await System.keys();
setSettings(settings);
setLoading(false);
}
fetchKeys();
}, []);
const allSettingsValid =
!!settings && Object.values(settings).every((val) => !!val);
return (
<div class="fixed top-0 left-0 right-0 z-50 w-full p-4 overflow-x-hidden overflow-y-auto md:inset-0 h-[calc(100%-1rem)] h-full bg-black bg-opacity-50 flex items-center justify-center">
<div
className="flex fixed top-0 left-0 right-0 w-full h-full"
onClick={hideModal}
/>
<div class="relative w-full max-w-2xl max-h-full">
<div class="relative bg-white rounded-lg shadow dark:bg-stone-700">
<div class="flex items-start justify-between p-4 border-b rounded-t dark:border-gray-600">
<h3 class="text-xl font-semibold text-gray-900 dark:text-white">
Your System Settings
</h3>
<button
onClick={hideModal}
type="button"
class="text-gray-400 bg-transparent hover:bg-gray-200 hover:text-gray-900 rounded-lg text-sm p-1.5 ml-auto inline-flex items-center dark:hover:bg-gray-600 dark:hover:text-white"
data-modal-hide="staticModal"
>
<X className="text-gray-300 text-lg" />
</button>
</div>
<div class="p-6 space-y-6 flex h-full w-full">
{loading ? (
<div className="w-full h-full flex items-center justify-center">
<p className="text-gray-800 dark:text-gray-200 text-base">
loading system settings
</p>
</div>
) : (
<div className="w-full flex flex-col gap-y-4">
{allSettingsValid ? (
<div className="bg-green-300 p-4 rounded-lg border border-green-600 text-green-700 w-full">
<p>All system settings are defined. You are good to go!</p>
</div>
) : (
<div className="bg-red-300 p-4 rounded-lg border border-red-600 text-red-700 w-full text-sm">
<p>
ENV setttings are missing - this software will not
function fully.
<br />
After updating restart the server.
</p>
</div>
)}
<ShowKey
name="OpenAI API Key"
value={settings?.OpenAiKey ? "*".repeat(20) : ""}
valid={settings?.OpenAiKey}
/>
<ShowKey
name="OpenAI Model for chats"
value={settings?.OpenAiModelPref}
valid={!!settings?.OpenAiModelPref}
/>
<div className="h-[2px] w-full bg-gray-200 dark:bg-stone-600" />
<ShowKey
name="Pinecone DB API Key"
value={settings?.PineConeKey ? "*".repeat(20) : ""}
valid={!!settings?.PineConeKey}
/>
<ShowKey
name="Pinecone DB Environment"
value={settings?.PineConeEnvironment}
valid={!!settings?.PineConeEnvironment}
/>
<ShowKey
name="Pinecone DB Index"
value={settings?.PinceConeIndex}
valid={!!settings?.PinceConeIndex}
/>
</div>
)}
</div>
<div class="flex items-center p-6 space-x-2 border-t border-gray-200 rounded-b dark:border-gray-600">
<button
onClick={hideModal}
type="button"
class="text-gray-500 bg-white hover:bg-gray-100 focus:ring-4 focus:outline-none focus:ring-blue-300 rounded-lg border border-gray-200 text-sm font-medium px-5 py-2.5 hover:text-gray-900 focus:z-10 dark:bg-gray-700 dark:text-gray-300 dark:border-gray-500 dark:hover:text-white dark:hover:bg-gray-600 dark:focus:ring-gray-600"
>
Close
</button>
</div>
</div>
</div>
</div>
);
}
function ShowKey({ name, value, valid }) {
if (!valid) {
return (
<div>
<label
for="error"
class="block mb-2 text-sm font-medium text-red-700 dark:text-red-500"
>
{name}
</label>
<input
type="text"
id="error"
disabled={true}
class="bg-red-50 border border-red-500 text-red-900 placeholder-red-700 text-sm rounded-lg focus:ring-red-500 dark:bg-gray-700 focus:border-red-500 block w-full p-2.5 dark:text-red-500 dark:placeholder-red-500 dark:border-red-500"
placeholder={name}
defaultValue={value}
/>
<p class="mt-2 text-sm text-red-600 dark:text-red-500">
Need setup in .env file.
</p>
</div>
);
}
return (
<div class="mb-6">
<label
for="success"
class="block mb-2 text-sm font-medium text-gray-800 dark:text-slate-200"
>
{name}
</label>
<input
type="text"
id="success"
disabled={true}
class="border border-white text-green-900 dark:text-green-400 placeholder-green-700 dark:placeholder-green-500 text-sm rounded-lg focus:ring-green-500 focus:border-green-500 block w-full p-2.5 dark:bg-gray-700 dark:border-green-500"
defaultValue={value}
/>
</div>
);
}
export function useKeysModal() {
const [showing, setShowing] = useState(false);
const showModal = () => {
setShowing(true);
};
const hideModal = () => {
setShowing(false);
};
return { showing, showModal, hideModal };
}

View File

@ -0,0 +1,476 @@
import React, { useState, useEffect } from "react";
import {
FileMinus,
FilePlus,
Folder,
FolderMinus,
FolderPlus,
X,
Zap,
} from "react-feather";
import System from "../../models/system";
import Workspace from "../../models/workspace";
import { nFormatter } from "../../utils/numbers";
import { dollarFormat } from "../../utils/numbers";
import paths from "../../utils/paths";
import { useParams } from "react-router-dom";
const noop = () => false;
export default function ManageWorkspace({ hideModal = noop, workspace }) {
const { slug } = useParams();
const [loading, setLoading] = useState(true);
const [saving, setSaving] = useState(false);
const [showConfirmation, setShowConfirmation] = useState(false);
const [directories, setDirectories] = useState(null);
const [originalDocuments, setOriginalDocuments] = useState([]);
const [selectedFiles, setSelectFiles] = useState([]);
useEffect(() => {
async function fetchKeys() {
const _workspace = await Workspace.bySlug(workspace.slug);
const localFiles = await System.localFiles();
const originalDocs = _workspace.documents.map((doc) => doc.docpath) || [];
setDirectories(localFiles);
setOriginalDocuments([...originalDocs]);
setSelectFiles([...originalDocs]);
setLoading(false);
}
fetchKeys();
}, []);
const deleteWorkspace = async () => {
if (
!window.confirm(
`You are about to delete your entire ${workspace.name} workspace. This will remove all vector embeddings on your vector database.\n\nThe original source files will remiain untouched. This action is irreversible.`
)
)
return false;
await Workspace.delete(workspace.slug);
workspace.slug === slug
? (window.location = paths.home())
: window.location.reload();
};
const docChanges = () => {
const changes = {
adds: [],
deletes: [],
};
selectedFiles.map((doc) => {
const inOriginal = !!originalDocuments.find((oDoc) => oDoc === doc);
if (!inOriginal) {
changes.adds.push(doc);
}
});
originalDocuments.map((doc) => {
const selected = !!selectedFiles.find((oDoc) => oDoc === doc);
if (!selected) {
changes.deletes.push(doc);
}
});
return changes;
};
const confirmChanges = (e) => {
e.preventDefault();
const changes = docChanges();
changes.adds.length > 0 ? setShowConfirmation(true) : updateWorkspace(e);
};
const updateWorkspace = async (e) => {
e.preventDefault();
setSaving(true);
setShowConfirmation(false);
const changes = docChanges();
await Workspace.modifyEmbeddings(workspace.slug, changes);
setSaving(false);
window.location.reload();
};
const isSelected = (filepath) => {
const isFolder = !filepath.includes("/");
return isFolder
? selectedFiles.some((doc) => doc.includes(filepath.split("/")[0]))
: selectedFiles.some((doc) => doc.includes(filepath));
};
const toggleSelection = (filepath) => {
const isFolder = !filepath.includes("/");
const parent = isFolder ? filepath : filepath.split("/")[0];
if (isSelected(filepath)) {
const updatedDocs = isFolder
? selectedFiles.filter((doc) => !doc.includes(parent))
: selectedFiles.filter((doc) => !doc.includes(filepath));
setSelectFiles([...new Set(updatedDocs)]);
} else {
var newDocs = [];
if (isFolder) {
const folderItems = directories.items.find(
(item) => item.name === parent
).items;
newDocs = folderItems.map((item) => parent + "/" + item.name);
} else {
newDocs = [filepath];
}
const combined = [...selectedFiles, ...newDocs];
setSelectFiles([...new Set(combined)]);
}
};
if (loading) {
return (
<div className="fixed top-0 left-0 right-0 z-50 w-full p-4 overflow-x-hidden overflow-y-auto md:inset-0 h-[calc(100%-1rem)] h-full bg-black bg-opacity-50 flex items-center justify-center">
<div
className="flex fixed top-0 left-0 right-0 w-full h-full"
onClick={hideModal}
/>
<div className="relative w-full max-w-2xl max-h-full">
<div className="relative bg-white rounded-lg shadow dark:bg-stone-700">
<div className="flex items-start justify-between p-4 border-b rounded-t dark:border-gray-600">
<h3 className="text-xl font-semibold text-gray-900 dark:text-white">
{workspace.name} Settings
</h3>
<button
onClick={hideModal}
type="button"
className="text-gray-400 bg-transparent hover:bg-gray-200 hover:text-gray-900 rounded-lg text-sm p-1.5 ml-auto inline-flex items-center dark:hover:bg-gray-600 dark:hover:text-white"
data-modal-hide="staticModal"
>
<X className="text-gray-300 text-lg" />
</button>
</div>
<div className="p-6 flex h-full w-full max-h-[80vh] overflow-y-scroll">
<div className="flex flex-col gap-y-1 w-full">
<p className="text-slate-200 dark:text-stone-300 text-center">
loading workspace files
</p>
</div>
</div>
<div className="flex items-center p-6 space-x-2 border-t border-gray-200 rounded-b dark:border-gray-600"></div>
</div>
</div>
</div>
);
}
return (
<>
{showConfirmation && (
<ConfirmationModal
directories={directories}
hideConfirm={() => setShowConfirmation(false)}
additions={docChanges().adds}
updateWorkspace={updateWorkspace}
/>
)}
<div className="fixed top-0 left-0 right-0 z-50 w-full p-4 overflow-x-hidden overflow-y-auto md:inset-0 h-[calc(100%-1rem)] h-full bg-black bg-opacity-50 flex items-center justify-center">
<div
className="flex fixed top-0 left-0 right-0 w-full h-full"
onClick={hideModal}
/>
<div className="relative w-full max-w-2xl max-h-full">
<div className="relative bg-white rounded-lg shadow dark:bg-stone-700">
<div className="flex items-start justify-between p-4 border-b rounded-t dark:border-gray-600">
<h3 className="text-xl font-semibold text-gray-900 dark:text-white">
"{workspace.name}" workspace settings
</h3>
<button
onClick={hideModal}
type="button"
className="text-gray-400 bg-transparent hover:bg-gray-200 hover:text-gray-900 rounded-lg text-sm p-1.5 ml-auto inline-flex items-center dark:hover:bg-gray-600 dark:hover:text-white"
data-modal-hide="staticModal"
>
<X className="text-gray-300 text-lg" />
</button>
</div>
<div className="p-6 flex h-full w-full max-h-[80vh] overflow-y-scroll">
<div className="flex flex-col gap-y-1 w-full">
<div className="flex flex-col mb-2">
<p className="text-gray-800 dark:text-stone-200 text-base ">
Select folders to add or remove from workspace.
</p>
<p className="text-gray-800 dark:text-stone-400 text-xs italic">
{selectedFiles.length} documents in workspace selected.
</p>
</div>
<div className="w-full h-auto border border-slate-200 dark:border-stone-600 rounded-lg px-4 py-2">
{!!directories && (
<Directory
files={directories}
toggleSelection={toggleSelection}
isSelected={isSelected}
/>
)}
</div>
</div>
</div>
<div className="flex items-center justify-between p-6 space-x-2 border-t border-gray-200 rounded-b dark:border-gray-600">
<button
onClick={deleteWorkspace}
type="button"
className="border border-transparent text-gray-500 bg-white hover:bg-red-100 rounded-lg text-sm font-medium px-5 py-2.5 hover:text-red-900 focus:z-10 dark:bg-transparent dark:text-gray-300 dark:hover:text-white dark:hover:bg-red-600"
>
Delete Workspace
</button>
<div className="flex items-center">
<button
disabled={saving}
onClick={confirmChanges}
type="submit"
className="text-slate-200 bg-black-900 px-4 py-2 rounded-lg hover:bg-gray-900"
>
{saving ? "Saving..." : "Confirm Changes"}
</button>
</div>
</div>
</div>
</div>
</div>
</>
);
}
function Directory({
files,
parent = null,
nested = 0,
toggleSelection,
isSelected,
}) {
const [isExpanded, toggleExpanded] = useState(false);
const [showDetails, toggleDetails] = useState(false);
const [showZap, setShowZap] = useState(false);
if (files.type === "folder") {
return (
<div style={{ marginLeft: nested }} className="mb-2">
<div
className={`flex items-center hover:bg-gray-100 gap-x-2 text-gray-800 dark:text-stone-200 dark:hover:bg-stone-800 px-2 rounded-lg`}
>
{files.items.some((files) => files.type === "folder") ? (
<Folder className="w-6 h-6" />
) : (
<button onClick={() => toggleSelection(files.name)}>
{isSelected(files.name) ? (
<FolderMinus className="w-6 h-6 stroke-red-800 hover:fill-red-500" />
) : (
<FolderPlus className="w-6 h-6 hover:stroke-green-800 hover:fill-green-500" />
)}
</button>
)}
<div
className="flex gap-x-2 items-center cursor-pointer w-full"
onClick={() => toggleExpanded(!isExpanded)}
>
<h2 className="text-2xl">{files.name}</h2>
{files.items.some((files) => files.type === "folder") ? (
<p className="text-xs italic">{files.items.length} folders</p>
) : (
<p className="text-xs italic">
{files.items.length} documents |{" "}
{nFormatter(
files.items.reduce((a, b) => a + b.token_count_estimate, 0)
)}{" "}
tokens
</p>
)}
</div>
</div>
{isExpanded &&
files.items.map((item) => (
<Directory
key={item.name}
parent={files.name}
files={item}
nested={nested + 20}
toggleSelection={toggleSelection}
isSelected={isSelected}
/>
))}
</div>
);
}
const { name, type: _type, ...meta } = files;
return (
<div className="ml-[20px] my-2">
<div className="flex items-center">
{meta?.cached && (
<button
type="button"
onClick={() => setShowZap(true)}
className="rounded-full p-1 hover:bg-stone-500 hover:bg-opacity-75"
>
<Zap className="h-4 w-4 stroke-yellow-500 fill-yellow-400" />
</button>
)}
{showZap && (
<dialog
open={true}
style={{ zIndex: 100 }}
className="fixed top-0 flex bg-black bg-opacity-50 w-[100vw] h-full items-center justify-center "
>
<div className="w-fit px-10 py-4 w-[25%] rounded-lg bg-white shadow dark:bg-stone-700 text-black dark:text-slate-200">
<div className="flex flex-col w-full">
<p className="font-semibold text-xl flex items-center gap-x-1 justify-left">
What does{" "}
<Zap className="h-4 w-4 stroke-yellow-500 fill-yellow-400" />{" "}
mean?
</p>
<p className="text-base mt-4">
This symbol indicates that you have embed this document before
and will not have to pay to re-embed this document.
</p>
<div className="flex w-full justify-center items-center mt-4">
<button
onClick={() => setShowZap(false)}
className="border border-gray-800 text-gray-800 hover:bg-gray-100 px-4 py-1 rounded-lg dark:text-slate-200 dark:border-slate-200 dark:hover:bg-stone-900"
>
Close
</button>
</div>
</div>
</div>
</dialog>
)}
<div
className={`flex items-center gap-x-2 text-gray-800 dark:text-stone-200 hover:bg-gray-100 dark:hover:bg-stone-800 px-2 rounded-lg`}
>
<button onClick={() => toggleSelection(`${parent}/${name}`)}>
{isSelected(`${parent}/${name}`) ? (
<FileMinus className="w-6 h-6 stroke-red-800 hover:fill-red-500" />
) : (
<FilePlus className="w-6 h-6 hover:stroke-green-800 hover:fill-green-500" />
)}
</button>
<div
className="w-full items-center flex cursor-pointer"
onClick={() => toggleDetails(!showDetails)}
>
<h3 className="text-sm">{name}</h3>
<br />
</div>
</div>
</div>
{showDetails && (
<div className="ml-[20px] flex flex-col gap-y-1 my-1 p-2 rounded-md bg-slate-200 font-mono text-sm overflow-x-scroll">
{Object.entries(meta).map(([key, value]) => {
if (key === "cached") return null;
return (
<p className="whitespace-pre">
{key}: {value}
</p>
);
})}
</div>
)}
</div>
);
}
function ConfirmationModal({
directories,
hideConfirm,
additions,
updateWorkspace,
}) {
function estimateCosts() {
const cachedTokens = additions.map((filepath) => {
const [parent, filename] = filepath.split("/");
const details = directories.items
.find((folder) => folder.name === parent)
.items.find((file) => file.name === filename);
const { token_count_estimate = 0, cached = false } = details;
return cached ? token_count_estimate : 0;
});
const tokenEstimates = additions.map((filepath) => {
const [parent, filename] = filepath.split("/");
const details = directories.items
.find((folder) => folder.name === parent)
.items.find((file) => file.name === filename);
const { token_count_estimate = 0 } = details;
return token_count_estimate;
});
const totalTokens = tokenEstimates.reduce((a, b) => a + b, 0);
const cachedTotal = cachedTokens.reduce((a, b) => a + b, 0);
const dollarValue = 0.0004 * ((totalTokens - cachedTotal) / 1_000);
return {
dollarValue,
dollarText:
dollarValue < 0.01 ? "< $0.01" : `about ${dollarFormat(dollarValue)}`,
};
}
const { dollarValue, dollarText } = estimateCosts();
return (
<dialog
open={true}
style={{ zIndex: 100 }}
className="fixed top-0 flex bg-black bg-opacity-50 w-[100vw] h-full items-center justify-center "
>
<div className="w-fit px-10 p-4 min-w-1/2 rounded-lg bg-white shadow dark:bg-stone-700 text-black dark:text-slate-200">
<div className="flex flex-col w-full">
<p className="font-semibold">
Are you sure you want to embed these documents?
</p>
<div className="flex flex-col gap-y-1">
{dollarValue <= 0 ? (
<p className="text-base mt-4">
You will be embedding {additions.length} new documents into this
workspace.
<br />
This will not incur any costs for OpenAI credits.
</p>
) : (
<p className="text-base mt-4">
You will be embedding {additions.length} new documents into this
workspace. <br />
This will cost {dollarText} in OpenAI credits.
</p>
)}
</div>
<div className="flex w-full justify-between items-center mt-4">
<button
onClick={hideConfirm}
className="text-gray-800 hover:bg-gray-100 px-4 py-1 rounded-lg dark:text-slate-200 dark:hover:bg-stone-900"
>
Cancel
</button>
<button
onClick={updateWorkspace}
className="border border-gray-800 text-gray-800 hover:bg-gray-100 px-4 py-1 rounded-lg dark:text-slate-200 dark:border-slate-200 dark:hover:bg-stone-900"
>
Continue
</button>
</div>
</div>
</div>
</dialog>
);
}
export function useManageWorkspaceModal() {
const [showing, setShowing] = useState(false);
const showModal = () => {
setShowing(true);
};
const hideModal = () => {
setShowing(false);
};
return { showing, showModal, hideModal };
}

View File

@ -0,0 +1,104 @@
import React, { useRef, useState } from "react";
import { X } from "react-feather";
import Workspace from "../../models/workspace";
const noop = () => false;
export default function NewWorkspaceModal({ hideModal = noop }) {
const formEl = useRef(null);
const [error, setError] = useState(null);
const handleCreate = async (e) => {
setError(null);
e.preventDefault();
const data = {};
const form = new FormData(formEl.current);
for (var [key, value] of form.entries()) data[key] = value;
const { workspace, message } = await Workspace.new(data);
if (!!workspace) window.location.reload();
setError(message);
};
return (
<div class="fixed top-0 left-0 right-0 z-50 w-full p-4 overflow-x-hidden overflow-y-auto md:inset-0 h-[calc(100%-1rem)] h-full bg-black bg-opacity-50 flex items-center justify-center">
<div
className="flex fixed top-0 left-0 right-0 w-full h-full"
onClick={hideModal}
/>
<div class="relative w-full max-w-2xl max-h-full">
<div class="relative bg-white rounded-lg shadow dark:bg-stone-700">
<div class="flex items-start justify-between p-4 border-b rounded-t dark:border-gray-600">
<h3 class="text-xl font-semibold text-gray-900 dark:text-white">
Create a New Workspace
</h3>
<button
onClick={hideModal}
type="button"
class="text-gray-400 bg-transparent hover:bg-gray-200 hover:text-gray-900 rounded-lg text-sm p-1.5 ml-auto inline-flex items-center dark:hover:bg-gray-600 dark:hover:text-white"
data-modal-hide="staticModal"
>
<X className="text-gray-300 text-lg" />
</button>
</div>
<form ref={formEl} onSubmit={handleCreate}>
<div class="p-6 space-y-6 flex h-full w-full">
<div className="w-full flex flex-col gap-y-4">
<div>
<label
htmlFor="name"
class="block mb-2 text-sm font-medium text-gray-900 dark:text-white"
>
Workspace Name
</label>
<input
name="name"
type="text"
id="name"
class="bg-gray-50 border border-gray-300 text-gray-900 text-sm rounded-lg focus:ring-blue-500 focus:border-blue-500 block w-full p-2.5 dark:bg-stone-600 dark:border-stone-600 dark:placeholder-gray-400 dark:text-white dark:focus:ring-blue-500 dark:focus:border-blue-500"
placeholder="My Workspace"
required={true}
autoComplete="off"
/>
</div>
{error && (
<p className="text-red-600 dark:text-red-400 text-sm">
Error: {error}
</p>
)}
<p className="text-gray-800 dark:text-slate-200 text-sm">
After creating a workspace you will be able to add and remove
documents from it.
</p>
</div>
</div>
<div class="flex w-full justify-between items-center p-6 space-x-2 border-t border-gray-200 rounded-b dark:border-gray-600">
<button
onClick={hideModal}
type="button"
className="text-gray-800 hover:bg-gray-100 px-4 py-1 rounded-lg dark:text-slate-200 dark:hover:bg-stone-900"
>
Cancel
</button>
<button
type="submit"
class="text-gray-500 bg-white hover:bg-gray-100 focus:ring-4 focus:outline-none focus:ring-blue-300 rounded-lg border border-gray-200 text-sm font-medium px-5 py-2.5 hover:text-gray-900 focus:z-10 dark:bg-black dark:text-slate-200 dark:border-transparent dark:hover:text-slate-200 dark:hover:bg-gray-900 dark:focus:ring-gray-800"
>
Create Workspace
</button>
</div>
</form>
</div>
</div>
</div>
);
}
export function useNewWorkspaceModal() {
const [showing, setShowing] = useState(false);
const showModal = () => {
setShowing(true);
};
const hideModal = () => {
setShowing(false);
};
return { showing, showModal, hideModal };
}

View File

@ -0,0 +1,82 @@
import React, { useState, useEffect } from "react";
import { Book, Settings } from "react-feather";
import * as Skeleton from "react-loading-skeleton";
import "react-loading-skeleton/dist/skeleton.css";
import Workspace from "../../../models/workspace";
import ManageWorkspace, {
useManageWorkspaceModal,
} from "../../Modals/ManageWorkspace";
import paths from "../../../utils/paths";
import { useParams } from "react-router-dom";
export default function ActiveWorkspaces() {
const { slug } = useParams();
const [loading, setLoading] = useState(true);
const [workspaces, setWorkspaces] = useState([]);
const [selectedWs, setSelectedWs] = useState(null);
const { showing, showModal, hideModal } = useManageWorkspaceModal();
useEffect(() => {
async function getWorkspaces() {
const workspaces = await Workspace.all();
setLoading(false);
setWorkspaces(workspaces);
}
getWorkspaces();
}, []);
if (loading) {
return (
<>
<Skeleton.default
height={36}
width="100%"
count={3}
baseColor="#292524"
highlightColor="#4c4948"
enableAnimation={true}
/>
</>
);
}
return (
<>
{workspaces.map((workspace) => {
const isActive = workspace.slug === slug;
return (
<div
key={workspace.id}
className="flex gap-x-2 items-center justify-between"
>
<a
href={isActive ? null : paths.workspace.chat(workspace.slug)}
className={`flex flex-grow w-[75%] h-[36px] gap-x-2 py-[5px] px-4 border border-slate-400 rounded-lg text-slate-800 dark:text-slate-200 justify-start items-center ${
isActive
? "bg-gray-100 dark:bg-stone-600"
: "hover:bg-slate-100 dark:hover:bg-stone-900 "
}`}
>
<Book className="h-4 w-4" />
<p className="text-slate-800 dark:text-slate-200 text-xs leading-loose font-semibold">
{workspace.name}
</p>
</a>
<button
onClick={() => {
setSelectedWs(workspace);
showModal();
}}
className="rounded-md bg-stone-200 p-2 h-[36px] w-[15%] flex items-center justify-center text-slate-800 hover:bg-stone-300 group dark:bg-stone-800 dark:text-slate-200 dark:hover:bg-stone-900 dark:border dark:border-stone-800"
>
<Settings className="h-3.5 w-3.5 transition-all duration-300 group-hover:rotate-90" />
</button>
</div>
);
})}
{showing && !!selectedWs && (
<ManageWorkspace hideModal={hideModal} workspace={selectedWs} />
)}
</>
);
}

View File

@ -0,0 +1,34 @@
import pluralize from "pluralize";
import React, { useEffect, useState } from "react";
import System from "../../models/system";
import { numberWithCommas } from "../../utils/numbers";
export default function IndexCount() {
const [indexes, setIndexes] = useState(null);
useEffect(() => {
async function indexCount() {
setIndexes(await System.totalIndexes());
}
indexCount();
}, []);
if (indexes === null || indexes === 0) {
return (
<div className="flex w-full items-center justify-end gap-x-2">
<div className="flex items-center gap-x-1 px-2 rounded-full">
<p className="text-slate-400 leading-tight text-sm"></p>
</div>
</div>
);
}
return (
<div className="flex w-full items-center justify-end gap-x-2">
<div className="flex items-center gap-x-1 px-2 rounded-full">
<p className="text-slate-400 leading-tight text-sm">
{numberWithCommas(indexes)} {pluralize("index", indexes)}
</p>
</div>
</div>
);
}

View File

@ -0,0 +1,49 @@
import React, { useEffect, useState } from "react";
import { AlertCircle, Circle } from "react-feather";
import System from "../../models/system";
export default function LLMStatus() {
const [status, setStatus] = useState(null);
useEffect(() => {
async function checkPing() {
setStatus(await System.ping());
}
checkPing();
}, []);
if (status === null) {
return (
<div className="flex w-full items-center justify-start gap-x-2">
<p className="text-slate-400 leading-loose text-sm">LLM</p>
<div className="flex items-center gap-x-1 border border-slate-400 px-2 rounded-full">
<p className="text-slate-400 leading-tight text-sm">unknown</p>
<Circle className="h-3 w-3 stroke-slate-700 fill-slate-400 animate-pulse" />
</div>
</div>
);
}
// TODO: add modal or toast on click to identify why this is broken
// need to likely start server.
if (status === false) {
return (
<div className="flex w-full items-center justify-end gap-x-2">
<p className="text-slate-400 leading-loose text-sm">LLM</p>
<div className="flex items-center gap-x-1 border border-red-400 px-2 bg-red-200 rounded-full">
<p className="text-red-700 leading-tight text-sm">offline</p>
<AlertCircle className="h-3 w-3 stroke-red-100 fill-red-400" />
</div>
</div>
);
}
return (
<div className="flex w-full items-center justify-end gap-x-2">
<p className="text-slate-400 leading-loose text-sm">LLM</p>
<div className="flex items-center gap-x-1 border border-slate-400 px-2 rounded-full">
<p className="text-slate-400 leading-tight text-sm">online</p>
<Circle className="h-3 w-3 stroke-green-100 fill-green-400 animate-pulse" />
</div>
</div>
);
}

View File

@ -0,0 +1,133 @@
import React, { useRef } from "react";
import { BookOpen, Briefcase, Cpu, GitHub, Key, Plus } from "react-feather";
import IndexCount from "./IndexCount";
import LLMStatus from "./LLMStatus";
import KeysModal, { useKeysModal } from "../Modals/Keys";
import NewWorkspaceModal, {
useNewWorkspaceModal,
} from "../Modals/NewWorkspace";
import ActiveWorkspaces from "./ActiveWorkspaces";
import paths from "../../utils/paths";
export default function Sidebar() {
const sidebarRef = useRef(null);
const {
showing: showingKeyModal,
showModal: showKeyModal,
hideModal: hideKeyModal,
} = useKeysModal();
const {
showing: showingNewWsModal,
showModal: showNewWsModal,
hideModal: hideNewWsModal,
} = useNewWorkspaceModal();
// const handleWidthToggle = () => {
// if (!sidebarRef.current) return false;
// sidebarRef.current.classList.add('translate-x-[-100%]')
// }
return (
<>
<div
ref={sidebarRef}
style={{ height: "calc(100% - 32px)" }}
className="transition-all duration-500 relative m-[16px] rounded-[26px] bg-white dark:bg-black-900 min-w-[15.5%] p-[18px] "
>
{/* <button onClick={handleWidthToggle} className='absolute -right-[13px] top-[35%] bg-white w-auto h-auto bg-transparent flex items-center'>
<svg width="16" height="96" viewBox="0 0 16 96" fill="none" xmlns="http://www.w3.org/2000/svg" stroke="#141414"><path d="M2.5 0H3C3 20 15 12 15 32V64C15 84 3 76 3 96H2.5V0Z" fill="black" fill-opacity="0.12" stroke="transparent" stroke-width="0px"></path><path d="M0 0H2.5C2.5 20 14.5 12 14.5 32V64C14.5 84 2.5 76 2.5 96H0V0Z" fill="#141414"></path></svg>
<ChevronLeft className='absolute h-4 w-4 text-white mr-1' />
</button> */}
<div className="w-full h-full flex flex-col overflow-x-hidden items-between">
{/* Header Information */}
<div className="flex w-full items-center justify-between">
<p className="text-xl font-base text-slate-600 dark:text-slate-200">
AnythingLLM
</p>
<div className="flex gap-x-2 items-center text-slate-500">
<button
onClick={showKeyModal}
className="transition-all duration-300 p-2 rounded-full bg-slate-200 text-slate-400 dark:bg-stone-800 hover:bg-slate-800 hover:text-slate-200 dark:hover:text-slate-200"
>
<Key className="h-4 w-4 " />
</button>
</div>
</div>
{/* Primary Body */}
<div className="h-[100%] flex flex-col w-full justify-between pt-4 overflow-y-hidden">
<div className="h-auto sidebar-items dark:sidebar-items">
<div className="flex flex-col gap-y-4 h-[65vh] pb-8 overflow-y-scroll no-scroll">
<div className="flex gap-x-2 items-center justify-between">
<button
onClick={showNewWsModal}
className="flex flex-grow w-[75%] h-[36px] gap-x-2 py-[5px] px-4 border border-slate-400 rounded-lg text-slate-800 dark:text-slate-200 justify-start items-center hover:bg-slate-100 dark:hover:bg-stone-900"
>
<Plus className="h-4 w-4" />
<p className="text-slate-800 dark:text-slate-200 text-xs leading-loose font-semibold">
New workspace
</p>
</button>
</div>
<ActiveWorkspaces />
</div>
</div>
<div>
<div className="flex flex-col gap-y-2">
<div className="w-full flex items-center justify-between">
<LLMStatus />
<IndexCount />
</div>
<a
href=""
className="flex flex-grow w-[100%] h-[36px] gap-x-2 py-[5px] px-4 border border-slate-400 dark:border-transparent rounded-lg text-slate-800 dark:text-slate-200 justify-center items-center hover:bg-slate-100 dark:bg-stone-800 dark:hover:bg-stone-900"
>
<Cpu className="h-4 w-4" />
<p className="text-slate-800 dark:text-slate-200 text-xs leading-loose font-semibold">
Managed cloud hosting
</p>
</a>
<a
href=""
className="flex flex-grow w-[100%] h-[36px] gap-x-2 py-[5px] px-4 border border-slate-400 dark:border-transparent rounded-lg text-slate-800 dark:text-slate-200 justify-center items-center hover:bg-slate-100 dark:bg-stone-800 dark:hover:bg-stone-900"
>
<Briefcase className="h-4 w-4" />
<p className="text-slate-800 dark:text-slate-200 text-xs leading-loose font-semibold">
Enterpise Installation
</p>
</a>
</div>
{/* Footer */}
<div className="flex items-end justify-between mt-2">
<div className="flex gap-x-1 items-center">
<a
href={paths.github()}
className="transition-all duration-300 p-2 rounded-full bg-slate-200 text-slate-400 dark:bg-slate-800 hover:bg-slate-800 hover:text-slate-200 dark:hover:text-slate-200"
>
<GitHub className="h-4 w-4 " />
</a>
<a
href={paths.docs()}
className="transition-all duration-300 p-2 rounded-full bg-slate-200 text-slate-400 dark:bg-slate-800 hover:bg-slate-800 hover:text-slate-200 dark:hover:text-slate-200"
>
<BookOpen className="h-4 w-4 " />
</a>
</div>
<a
href={paths.mailToMintplex()}
className="transition-all duration-300 text-xs text-slate-200 dark:text-slate-600 hover:text-blue-600 dark:hover:text-blue-400"
>
@MintplexLabs
</a>
</div>
</div>
</div>
</div>
</div>
{showingKeyModal && <KeysModal hideModal={hideKeyModal} />}
{showingNewWsModal && <NewWorkspaceModal hideModal={hideNewWsModal} />}
</>
);
}

View File

@ -0,0 +1,27 @@
import React, { useRef, useEffect } from "react";
import JAZZ from "@metamask/jazzicon";
export default function Jazzicon({ size = 10, user }) {
const divRef = useRef(null);
const seed = user?.uid
? toPseudoRandomInteger(user.uid)
: Math.floor(100000 + Math.random() * 900000);
const result = JAZZ(size, seed);
useEffect(() => {
if (!divRef || !divRef.current) return null;
divRef.current.appendChild(result);
}, []); // eslint-disable-line react-hooks/exhaustive-deps
return <div className="flex" ref={divRef} />;
}
function toPseudoRandomInteger(uidString = "") {
var numberArray = [uidString.length];
for (var i = 0; i < uidString.length; i++) {
numberArray[i] = uidString.charCodeAt(i);
}
return numberArray.reduce((a, b) => a + b, 0);
}

View File

@ -0,0 +1,106 @@
import { useEffect, useRef, memo, useState } from "react";
import { AlertTriangle } from "react-feather";
import Jazzicon from "../../../../UserIcon";
import { v4 } from "uuid";
import { decode as HTMLDecode } from "he";
function HistoricalMessage({
message,
role,
workspace,
sources = [],
error = false,
}) {
const replyRef = useRef(null);
useEffect(() => {
if (replyRef.current)
replyRef.current.scrollIntoView({ behavior: "smooth", block: "end" });
}, [replyRef.current]);
if (role === "user") {
return (
<div className="flex justify-end mb-4 items-start">
<div className="mr-2 py-1 px-4 max-w-[75%] bg-slate-200 dark:bg-amber-800 rounded-b-2xl rounded-tl-2xl rounded-tr-sm">
<span
className={`inline-block p-2 rounded-lg whitespace-pre-line text-slate-800 dark:text-slate-200 font-semibold`}
>
{message}
</span>
</div>
<Jazzicon size={30} user={{ uid: "user" }} />
</div>
);
}
if (error) {
return (
<div className="flex justify-start mb-4 items-end">
<Jazzicon size={30} user={{ uid: workspace.slug }} />
<div className="ml-2 max-w-[75%] bg-orange-100 dark:bg-stone-700 rounded-t-2xl rounded-br-2xl rounded-bl-sm">
<span
className={`inline-block p-2 rounded-lg bg-red-50 text-red-500`}
>
<AlertTriangle className="h-4 w-4 mb-1 inline-block" /> Could not
respond to message.
</span>
</div>
</div>
);
}
return (
<div ref={replyRef} className="flex justify-start items-end mb-4">
<Jazzicon size={30} user={{ uid: workspace.slug }} />
<div className="ml-2 py-3 px-4 max-w-[75%] bg-orange-100 dark:bg-stone-700 rounded-t-2xl rounded-br-2xl rounded-bl-sm">
<span className="whitespace-pre-line text-slate-800 dark:text-slate-200 font-semibold">
{message}
</span>
<Citations sources={sources} />
</div>
</div>
);
}
const Citations = ({ sources = [] }) => {
const [show, setShow] = useState(false);
if (sources.length === 0) return null;
return (
<div className="flex flex-col mt-4 justify-left">
<button
type="button"
onClick={() => setShow(!show)}
className="w-fit text-gray-700 dark:text-stone-400 italic text-xs"
>
{show ? "hide" : "show"} citations{show && "*"}
</button>
{show && (
<>
<div className="w-full flex flex-wrap items-center gap-4 mt-1 doc__source">
{sources.map((source) => {
const { id = null, title, url } = source;
const handleClick = () => {
if (!url) return false;
window.open(url, "_blank");
};
return (
<button
key={id || v4()}
onClick={handleClick}
className="italic transition-all duration-300 w-fit bg-gray-400 text-gray-900 py-[1px] hover:text-slate-200 hover:bg-gray-500 hover:dark:text-gray-900 dark:bg-stone-400 dark:hover:bg-stone-300 rounded-full px-2 text-xs leading-tight"
>
"{HTMLDecode(title)}"
</button>
);
})}
</div>
<p className="w-fit text-gray-700 dark:text-stone-400 text-xs mt-1">
*citation may not be relevant to end result.
</p>
</>
)}
</div>
);
};
export default memo(HistoricalMessage);

View File

@ -0,0 +1,112 @@
import { memo, useEffect, useRef, useState } from "react";
import { AlertTriangle } from "react-feather";
import Jazzicon from "../../../../UserIcon";
import { decode as HTMLDecode } from "he";
function PromptReply({
uuid,
reply,
pending,
error,
workspace,
sources = [],
closed = true,
}) {
const replyRef = useRef(null);
useEffect(() => {
if (replyRef.current)
replyRef.current.scrollIntoView({ behavior: "smooth", block: "end" });
}, [replyRef.current]);
if (!reply && !sources.length === 0 && !pending && !error) return null;
if (pending) {
return (
<div className="chat__message flex justify-start mb-4 items-end">
<Jazzicon size={30} user={{ uid: workspace.slug }} />
<div className="ml-2 pt-2 px-6 max-w-[75%] bg-orange-100 dark:bg-stone-700 rounded-t-2xl rounded-br-2xl rounded-bl-sm">
<span className={`inline-block p-2`}>
<div className="dot-falling"></div>
</span>
</div>
</div>
);
}
if (error) {
return (
<div className="chat__message flex justify-start mb-4 items-center">
<Jazzicon size={30} user={{ uid: workspace.slug }} />
<div className="ml-2 py-3 px-4 rounded-br-3xl rounded-tr-3xl rounded-tl-xl text-slate-100 ">
<div className="bg-red-50 text-red-500 rounded-lg w-fit flex flex-col p-2">
<span className={`inline-block`}>
<AlertTriangle className="h-4 w-4 mb-1 inline-block" /> Could not
respond to message.
</span>
<span className="text-xs">Reason: {error || "unknown"}</span>
</div>
</div>
</div>
);
}
return (
<div
key={uuid}
ref={replyRef}
className="chat__message mb-4 flex justify-start items-end"
>
<Jazzicon size={30} user={{ uid: workspace.slug }} />
<div className="ml-2 py-3 px-4 max-w-[75%] bg-orange-100 dark:bg-stone-700 rounded-t-2xl rounded-br-2xl rounded-bl-sm">
<p className="text-[15px] whitespace-pre-line break-words text-slate-800 dark:text-slate-200 font-semibold">
{reply}
{!closed && <i className="not-italic blink">|</i>}
</p>
<Citations sources={sources} />
</div>
</div>
);
}
const Citations = ({ sources = [] }) => {
const [show, setShow] = useState(false);
if (sources.length === 0) return null;
return (
<div className="flex flex-col mt-4 justify-left">
<button
type="button"
onClick={() => setShow(!show)}
className="w-fit text-gray-700 dark:text-stone-400 italic text-xs"
>
{show ? "hide" : "show"} citations{show && "*"}
</button>
{show && (
<>
<div className="w-full flex flex-wrap items-center gap-4 mt-1 doc__source">
{sources.map((source) => {
const { id = null, title, url } = source;
const handleClick = () => {
if (!url) return false;
window.open(url, "_blank");
};
return (
<button
key={id || v4()}
onClick={handleClick}
className="italic transition-all duration-300 w-fit bg-gray-400 text-gray-900 py-[1px] hover:text-slate-200 hover:bg-gray-500 hover:dark:text-gray-900 dark:bg-stone-400 dark:hover:bg-stone-300 rounded-full px-2 text-xs leading-tight"
>
"{HTMLDecode(title)}"
</button>
);
})}
</div>
<p className="w-fit text-gray-700 dark:text-stone-400 text-xs mt-1">
*citation may not be relevant to end result.
</p>
</>
)}
</div>
);
};
export default memo(PromptReply);

View File

@ -0,0 +1,70 @@
import { Frown } from "react-feather";
import HistoricalMessage from "./HistoricalMessage";
import PromptReply from "./PromptReply";
// import paths from '../../../../../utils/paths';
export default function ChatHistory({ history = [], workspace }) {
if (history.length === 0) {
return (
<div className="flex flex-col h-[89%] md:mt-0 pb-5 w-full justify-center items-center">
<div className="w-fit flex items-center gap-x-2">
<Frown className="h-4 w-4 text-slate-400" />
<p className="text-slate-400">No chat history found.</p>
</div>
<p className="text-slate-400 text-xs">
Send your first message to get started.
</p>
</div>
);
}
return (
<div
className="h-[89%] pb-[100px] pt-[50px] md:pt-0 md:pb-5 mx-2 md:mx-0 overflow-y-scroll flex flex-col justify-between md:justify-start"
id="chat-history"
>
{history.map(
(
{
uuid = null,
content,
sources = [],
role,
closed = true,
pending = false,
error = false,
animate = false,
},
index
) => {
const isLastBotReply =
index === history.length - 1 && role === "assistant";
if (isLastBotReply && animate) {
return (
<PromptReply
key={uuid}
uuid={uuid}
reply={content}
pending={pending}
sources={sources}
error={error}
workspace={workspace}
closed={closed}
/>
);
}
return (
<HistoricalMessage
key={index}
message={content}
role={role}
workspace={workspace}
sources={sources}
error={error}
/>
);
}
)}
</div>
);
}

View File

@ -0,0 +1,106 @@
import React, { useState, useRef } from "react";
import { Loader, Menu, Send, X } from "react-feather";
export default function PromptInput({
workspace,
message,
submit,
onChange,
inputDisabled,
buttonDisabled,
}) {
const [showMenu, setShowMenu] = useState(false);
const formRef = useRef(null);
const [_, setFocused] = useState(false);
const handleSubmit = (e) => {
setFocused(false);
submit(e);
};
const captureEnter = (event) => {
if (event.keyCode == 13) {
if (!event.shiftKey) {
submit(event);
}
}
};
const adjustTextArea = (event) => {
const element = event.target;
element.style.height = "1px";
element.style.height =
event.target.value.length !== 0
? 25 + element.scrollHeight + "px"
: "1px";
};
const setTextCommand = (command = "") => {
onChange({ target: { value: `${command} ${message}` } });
};
return (
<div className="w-full fixed md:absolute bottom-0 left-0">
<form
onSubmit={handleSubmit}
className="flex flex-col gap-y-1 bg-transparentrounded-t-lg w-3/4 mx-auto"
>
<div className="flex items-center py-2 px-4 rounded-lg">
{/* Toggle selector? */}
{/* <button
onClick={() => setShowMenu(!showMenu)}
type="button"
className="p-2 text-slate-200 bg-transparent rounded-md hover:bg-gray-50 dark:hover:bg-stone-500">
<Menu className="w-4 h-4 md:h-6 md:w-6" />
</button> */}
<textarea
onKeyUp={adjustTextArea}
onKeyDown={captureEnter}
onChange={onChange}
required={true}
maxLength={240}
disabled={inputDisabled}
onFocus={() => setFocused(true)}
onBlur={(e) => {
setFocused(false);
adjustTextArea(e);
}}
value={message}
className="cursor-text max-h-[100px] md:min-h-[40px] block mx-2 md:mx-4 p-2.5 w-full text-[16px] md:text-sm rounded-lg border bg-gray-50 border-gray-300 placeholder-gray-400 text-white dark:bg-stone-600 dark:border-stone-700 dark:placeholder-stone-400"
placeholder="Shift + Enter for newline. Enter to submit."
/>
<button
ref={formRef}
type="submit"
disabled={buttonDisabled}
className="inline-flex justify-center p-0 md:p-2 rounded-full cursor-pointer text-black-900 dark:text-slate-200 hover:bg-gray-600 dark:hover:bg-stone-500"
>
{buttonDisabled ? (
<Loader className="w-6 h-6 animate-spin" />
) : (
<svg
aria-hidden="true"
className="w-6 h-6 rotate-45"
fill="currentColor"
viewBox="0 0 20 20"
xmlns="http://www.w3.org/2000/svg"
>
<path d="M10.894 2.553a1 1 0 00-1.788 0l-7 14a1 1 0 001.169 1.409l5-1.429A1 1 0 009 15.571V11a1 1 0 112 0v4.571a1 1 0 00.725.962l5 1.428a1 1 0 001.17-1.408l-7-14z"></path>
</svg>
)}
<span className="sr-only">Send message</span>
</button>
</div>
<Tracking />
</form>
</div>
);
}
const Tracking = () => {
return (
<div className="flex flex-col w-full justify-center items-center gap-y-2 mb-2 px-4 mx:px-0">
<p className="text-slate-400 text-xs">
Responses from system may produce inaccurate or invalid responses - use
with caution.
</p>
</div>
);
};

View File

@ -0,0 +1,87 @@
import { useState, useEffect } from "react";
import ChatHistory from "./ChatHistory";
import PromptInput from "./PromptInput";
import Workspace from "../../../models/workspace";
import handleChat from "../../../utils/chat";
export default function ChatContainer({ workspace, knownHistory = [] }) {
const [message, setMessage] = useState("");
const [loadingResponse, setLoadingResponse] = useState(false);
const [chatHistory, setChatHistory] = useState(knownHistory);
const handleMessageChange = (event) => {
setMessage(event.target.value);
};
const handleSubmit = async (event) => {
event.preventDefault();
if (!message || message === "") return false;
const prevChatHistory = [
...chatHistory,
{ content: message, role: "user" },
{
content: "",
role: "assistant",
pending: true,
userMessage: message,
animate: true,
},
];
setChatHistory(prevChatHistory);
setMessage("");
setLoadingResponse(true);
};
useEffect(() => {
async function fetchReply() {
const promptMessage =
chatHistory.length > 0 ? chatHistory[chatHistory.length - 1] : null;
const remHistory = chatHistory.length > 0 ? chatHistory.slice(0, -1) : [];
var _chatHistory = [...remHistory];
if (!promptMessage || !promptMessage?.userMessage) {
setLoadingResponse(false);
return false;
}
const chatResult = await Workspace.sendChat(
workspace,
promptMessage.userMessage
);
if (!chatResult) {
alert("Could not send chat.");
setLoadingResponse(false);
return;
}
handleChat(
chatResult,
setLoadingResponse,
setChatHistory,
remHistory,
_chatHistory
);
}
loadingResponse === true && fetchReply();
}, [loadingResponse, chatHistory, workspace]);
return (
<div
style={{ height: "calc(100% - 32px)" }}
className="transition-all duration-500 relative ml-[2px] mr-[8px] my-[16px] rounded-[26px] bg-white dark:bg-black-900 min-w-[82%] p-[18px] h-full overflow-y-scroll"
>
<div className="flex flex-col h-full w-full flex">
<ChatHistory history={chatHistory} workspace={workspace} />
<PromptInput
workspace={workspace}
message={message}
submit={handleSubmit}
onChange={handleMessageChange}
inputDisabled={loadingResponse}
buttonDisabled={loadingResponse}
/>
</div>
</div>
);
}

View File

@ -0,0 +1,57 @@
import * as Skeleton from "react-loading-skeleton";
import "react-loading-skeleton/dist/skeleton.css";
export default function LoadingChat() {
return (
<div
style={{ height: "calc(100% - 32px)" }}
className="transition-all duration-500 relative ml-[2px] mr-[8px] my-[16px] rounded-[26px] bg-white dark:bg-black-900 min-w-[82%] p-[18px] h-full overflow-y-scroll"
>
<Skeleton.default
height="100px"
width="100%"
baseColor={"#2a3a53"}
highlightColor={"#395073"}
count={1}
className="max-w-[75%] p-4 rounded-b-2xl rounded-tr-2xl rounded-tl-sm mt-6"
containerClassName="flex justify-start"
/>
<Skeleton.default
height="100px"
width="45%"
baseColor={"#2a3a53"}
highlightColor={"#395073"}
count={1}
className="max-w-[75%] p-4 rounded-b-2xl rounded-tr-2xl rounded-tl-sm mt-6"
containerClassName="flex justify-end"
/>
<Skeleton.default
height="100px"
width="30%"
baseColor={"#2a3a53"}
highlightColor={"#395073"}
count={1}
className="max-w-[75%] p-4 rounded-b-2xl rounded-tr-2xl rounded-tl-sm mt-6"
containerClassName="flex justify-start"
/>
<Skeleton.default
height="100px"
width="25%"
baseColor={"#2a3a53"}
highlightColor={"#395073"}
count={1}
className="max-w-[75%] p-4 rounded-b-2xl rounded-tr-2xl rounded-tl-sm mt-6"
containerClassName="flex justify-end"
/>
<Skeleton.default
height="160px"
width="100%"
baseColor={"#2a3a53"}
highlightColor={"#395073"}
count={1}
className="max-w-[75%] p-4 rounded-b-2xl rounded-tr-2xl rounded-tl-sm mt-6"
containerClassName="flex justify-start"
/>
</div>
);
}

View File

@ -0,0 +1,62 @@
import React, { useEffect, useState } from "react";
import Workspace from "../../models/workspace";
import LoadingChat from "./LoadingChat";
import ChatContainer from "./ChatContainer";
import paths from "../../utils/paths";
export default function WorkspaceChat({ loading, workspace }) {
const [history, setHistory] = useState([]);
const [loadingHistory, setLoadingHistory] = useState(true);
useEffect(() => {
async function getHistory() {
if (loading) return;
if (!workspace?.slug) {
setLoadingHistory(false);
return false;
}
const chatHistory = await Workspace.chatHistory(workspace.slug);
setHistory(chatHistory);
setLoadingHistory(false);
}
getHistory();
}, [workspace, loading]);
if (loadingHistory) return <LoadingChat />;
if (!loading && !loadingHistory && !workspace)
return (
<>
{loading === false && !workspace && (
<dialog
open={true}
style={{ zIndex: 100 }}
className="fixed top-0 flex bg-black bg-opacity-50 w-[100vw] h-full items-center justify-center "
>
<div className="w-fit px-10 p-4 w-1/4 rounded-lg bg-white shadow dark:bg-stone-700 text-black dark:text-slate-200">
<div className="flex flex-col w-full">
<p className="font-semibold text-red-500">
We cannot locate this workspace!
</p>
<p className="text-sm mt-4">
It looks like a workspace by this name is not available.
</p>
<div className="flex w-full justify-center items-center mt-4">
<a
href={paths.home()}
className="border border-gray-800 text-gray-800 hover:bg-gray-100 px-4 py-1 rounded-lg dark:text-slate-200 dark:border-slate-200 dark:hover:bg-stone-900"
>
Go back to homepage
</a>
</div>
</div>
</div>
</dialog>
)}
<LoadingChat />
</>
);
return <ChatContainer workspace={workspace} knownHistory={history} />;
}

293
frontend/src/index.css Normal file
View File

@ -0,0 +1,293 @@
@tailwind base;
@tailwind components;
@tailwind utilities;
html,
body {
padding: 0;
margin: 0;
font-family: -apple-system, BlinkMacSystemFont, Segoe UI, Roboto, Oxygen,
Ubuntu, Cantarell, Fira Sans, Droid Sans, Helvetica Neue, sans-serif;
background-color: white;
}
a {
color: inherit;
text-decoration: none;
}
* {
box-sizing: border-box;
}
.g327 {
border-color: #302f30;
}
@font-face {
font-family: "AvenirNextW10-Bold";
src: url("../public/fonts/AvenirNext.ttf");
}
.Avenir {
font-family: AvenirNextW10-Bold;
font-display: swap;
}
.grr {
grid-template-columns: repeat(2, 1fr);
}
.greyC {
filter: gray;
-webkit-filter: grayscale(100%);
transition: 0.4s;
}
.greyC:hover {
filter: none;
-webkit-filter: none;
transition: 0.4s;
}
.chat__message {
transform-origin: 0 100%;
transform: scale(0);
animation: message 0.15s ease-out 0s forwards;
animation-delay: 500ms;
}
@keyframes message {
0% {
max-height: 100%;
}
80% {
transform: scale(1.1);
}
100% {
transform: scale(1);
max-height: 100%;
overflow: visible;
padding-top: 1rem;
}
}
.doc__source {
transform-origin: 0 100%;
transform: scale(0);
animation: message2 0.15s ease-out 0s forwards;
animation-delay: 50ms;
}
@keyframes message2 {
0% {
max-height: 100%;
}
80% {
transform: scale(1.1);
}
100% {
transform: scale(1);
max-height: 100%;
overflow: visible;
}
}
@media (prefers-color-scheme: light) {
.sidebar-items:after {
content: " ";
position: absolute;
left: 0;
right: 0px;
height: 4em;
top: 69vh;
background: linear-gradient(
to bottom,
rgba(173, 3, 3, 0),
rgb(255 255 255) 50%
);
z-index: 1;
pointer-events: none;
}
}
@media (prefers-color-scheme: dark) {
.sidebar-items:after {
content: " ";
position: absolute;
left: 0;
right: 0px;
height: 4em;
top: 69vh;
background: linear-gradient(
to bottom,
rgba(173, 3, 3, 0),
rgb(20 20 20) 50%
);
z-index: 1;
pointer-events: none;
}
}
/**
* ==============================================
* Dot Falling
* ==============================================
*/
.dot-falling {
position: relative;
left: -9999px;
width: 10px;
height: 10px;
border-radius: 5px;
background-color: #5fa4fa;
color: #5fa4fa;
box-shadow: 9999px 0 0 0 #5fa4fa;
animation: dot-falling 1.5s infinite linear;
animation-delay: 0.1s;
}
.dot-falling::before,
.dot-falling::after {
content: "";
display: inline-block;
position: absolute;
top: 0;
}
.dot-falling::before {
width: 10px;
height: 10px;
border-radius: 5px;
background-color: #5fa4fa;
color: #5fa4fa;
animation: dot-falling-before 1.5s infinite linear;
animation-delay: 0s;
}
.dot-falling::after {
width: 10px;
height: 10px;
border-radius: 5px;
background-color: #5fa4fa;
color: #5fa4fa;
animation: dot-falling-after 1.5s infinite linear;
animation-delay: 0.2s;
}
@keyframes dot-falling {
0% {
box-shadow: 9999px -15px 0 0 rgba(152, 128, 255, 0);
}
25%,
50%,
75% {
box-shadow: 9999px 0 0 0 #5fa4fa;
}
100% {
box-shadow: 9999px 15px 0 0 rgba(152, 128, 255, 0);
}
}
@keyframes dot-falling-before {
0% {
box-shadow: 9984px -15px 0 0 rgba(152, 128, 255, 0);
}
25%,
50%,
75% {
box-shadow: 9984px 0 0 0 #5fa4fa;
}
100% {
box-shadow: 9984px 15px 0 0 rgba(152, 128, 255, 0);
}
}
@keyframes dot-falling-after {
0% {
box-shadow: 10014px -15px 0 0 rgba(152, 128, 255, 0);
}
25%,
50%,
75% {
box-shadow: 10014px 0 0 0 #5fa4fa;
}
100% {
box-shadow: 10014px 15px 0 0 rgba(152, 128, 255, 0);
}
}
#chat-history::-webkit-scrollbar,
#chat-container::-webkit-scrollbar,
.no-scroll::-webkit-scrollbar {
display: none !important;
}
/* Hide scrollbar for IE, Edge and Firefox */
#chat-history,
#chat-container,
.no-scroll {
-ms-overflow-style: none !important;
/* IE and Edge */
scrollbar-width: none !important;
/* Firefox */
}
.z-99 {
z-index: 99;
}
.z-98 {
z-index: 98;
}
.file-uploader {
width: 100% !important;
height: 100px !important;
}
.blink {
animation: blink 1.5s steps(1) infinite;
}
@keyframes blink {
0% {
opacity: 0;
}
50% {
opacity: 1;
}
100% {
opacity: 0;
}
}
.background-animate {
background-size: 400%;
-webkit-animation: bgAnimate 10s ease infinite;
-moz-animation: bgAnimate 10s ease infinite;
animation: bgAnimate 10s ease infinite;
}
@keyframes bgAnimate {
0%,
100% {
background-position: 0% 50%;
}
50% {
background-position: 100% 50%;
}
}

15
frontend/src/main.jsx Normal file
View File

@ -0,0 +1,15 @@
import React from "react";
import ReactDOM from "react-dom/client";
import { BrowserRouter as Router } from "react-router-dom";
import App from "./App.jsx";
import "./index.css";
const isDev = process.env.NODE_ENV !== "production";
const REACTWRAP = isDev ? React.Fragment : React.StrictMode;
ReactDOM.createRoot(document.getElementById("root")).render(
<REACTWRAP>
<Router>
<App />
</Router>
</REACTWRAP>
);

View File

@ -0,0 +1,38 @@
import { API_BASE } from "../utils/constants";
const System = {
ping: async function () {
return await fetch(`${API_BASE}/ping`)
.then((res) => res.ok)
.catch(() => false);
},
totalIndexes: async function () {
return await fetch(`${API_BASE}/system-vectors`)
.then((res) => {
if (!res.ok) throw new Error("Could not find indexes.");
return res.json();
})
.then((res) => res.vectorCount)
.catch(() => 0);
},
keys: async function () {
return await fetch(`${API_BASE}/setup-complete`)
.then((res) => {
if (!res.ok) throw new Error("Could not find setup information.");
return res.json();
})
.then((res) => res.results)
.catch(() => null);
},
localFiles: async function () {
return await fetch(`${API_BASE}/local-files`)
.then((res) => {
if (!res.ok) throw new Error("Could not find setup information.");
return res.json();
})
.then((res) => res.localFiles)
.catch(() => null);
},
};
export default System;

View File

@ -0,0 +1,77 @@
import { API_BASE } from "../utils/constants";
const Workspace = {
new: async function (data = {}) {
const { workspace, message } = await fetch(`${API_BASE}/workspace/new`, {
method: "POST",
body: JSON.stringify(data),
})
.then((res) => res.json())
.catch((e) => {
return { workspace: null, message: e.message };
});
return { workspace, message };
},
modifyEmbeddings: async function (slug, changes = {}) {
const { workspace, message } = await fetch(
`${API_BASE}/workspace/${slug}/update-embeddings`,
{
method: "POST",
body: JSON.stringify(changes), // contains 'adds' and 'removes' keys that are arrays of filepaths
}
)
.then((res) => res.json())
.catch((e) => {
return { workspace: null, message: e.message };
});
return { workspace, message };
},
chatHistory: async function (slug) {
const history = await fetch(`${API_BASE}/workspace/${slug}/chats`)
.then((res) => res.json())
.then((res) => res.history || [])
.catch(() => []);
return history;
},
sendChat: async function ({ slug }, message, mode = "query") {
const chatResult = await fetch(`${API_BASE}/workspace/${slug}/chat`, {
method: "POST",
body: JSON.stringify({ message, mode }),
})
.then((res) => res.json())
.catch((e) => {
console.error(e);
return null;
});
return chatResult;
},
all: async function () {
const workspaces = await fetch(`${API_BASE}/workspaces`)
.then((res) => res.json())
.then((res) => res.workspaces || [])
.catch(() => []);
return workspaces;
},
bySlug: async function (slug = "") {
const workspace = await fetch(`${API_BASE}/workspace/${slug}`)
.then((res) => res.json())
.then((res) => res.workspace)
.catch(() => null);
return workspace;
},
delete: async function (slug) {
const result = await fetch(`${API_BASE}/workspace/${slug}`, {
method: "DELETE",
})
.then((res) => res.ok)
.catch(() => false);
return result;
},
};
export default Workspace;

View File

@ -0,0 +1,24 @@
import Header from "../components/Header";
import Footer from "../components/Footer";
export default function Contact() {
return (
<div className="text-black">
<Header />
<div className="flex flex-col justify-center mx-auto mt-52 text-center max-w-2x1">
<h1 className="text-3xl font-bold tracking-tight text-black md:text-5xl">
404 Unavailable
</h1>
<br />
<a
className="w-64 p-1 mx-auto font-bold text-center text-black border border-gray-500 rounded-lg sm:p-4"
href="/"
>
Return Home
</a>
</div>
<div className="mt-64"></div>
<Footer />
</div>
);
}

View File

@ -0,0 +1,12 @@
import React from "react";
import DefaultChatContainer from "../../components/DefaultChat";
import Sidebar from "../../components/Sidebar";
export default function Main() {
return (
<div className="w-screen h-screen overflow-hidden bg-orange-100 dark:bg-stone-700 flex">
<Sidebar />
<DefaultChatContainer />
</div>
);
}

View File

@ -0,0 +1,28 @@
import React, { useEffect, useState } from "react";
import { default as WorkspaceChatContainer } from "../../components/WorkspaceChat";
import Sidebar from "../../components/Sidebar";
import { useParams } from "react-router-dom";
import Workspace from "../../models/workspace";
export default function WorkspaceChat() {
const { slug } = useParams();
const [workspace, setWorkspace] = useState(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
async function getWorkspace() {
if (!slug) return;
const _workspace = await Workspace.bySlug(slug);
setWorkspace(_workspace);
setLoading(false);
}
getWorkspace();
}, []);
return (
<div className="w-screen h-screen overflow-hidden bg-orange-100 dark:bg-stone-700 flex">
<Sidebar />
<WorkspaceChatContainer loading={loading} workspace={workspace} />
</div>
);
}

View File

@ -0,0 +1,59 @@
// For handling of synchronous chats that are not utilizing streaming or chat requests.
export default function handleChat(
chatResult,
setLoadingResponse,
setChatHistory,
remHistory,
_chatHistory
) {
const { uuid, textResponse, type, sources = [], error, close } = chatResult;
if (type === "abort") {
setLoadingResponse(false);
alert(error);
setChatHistory([
...remHistory,
{
uuid,
content: textResponse,
role: "assistant",
sources,
closed: true,
error,
animate: true,
},
]);
_chatHistory.push({
uuid,
content: textResponse,
role: "assistant",
sources,
closed: true,
error,
animate: true,
});
} else if (type === "textResponse") {
setLoadingResponse(false);
setChatHistory([
...remHistory,
{
uuid,
content: textResponse,
role: "assistant",
sources,
closed: close,
error,
animate: true,
},
]);
_chatHistory.push({
uuid,
content: textResponse,
role: "assistant",
sources,
closed: close,
error,
animate: true,
});
}
}

View File

@ -0,0 +1,2 @@
export const API_BASE =
import.meta.env.VITE_ENABLE_GOOGLE_AUTH || "http://localhost:5000";

View File

@ -0,0 +1,16 @@
const Formatter = Intl.NumberFormat("en", { notation: "compact" });
export function numberWithCommas(input) {
return input.toString().replace(/\B(?=(\d{3})+(?!\d))/g, ",");
}
export function nFormatter(input) {
return Formatter.format(input);
}
export function dollarFormat(input) {
return new Intl.NumberFormat("en-us", {
style: "currency",
currency: "USD",
}).format(input);
}

View File

@ -0,0 +1,19 @@
export default {
home: () => {
return "/";
},
github: () => {
return "/";
},
docs: () => {
return "/";
},
mailToMintplex: () => {
return "mailto:team@mintplex.xyz";
},
workspace: {
chat: (slug) => {
return `/workspace/${slug}`;
},
},
};

View File

@ -0,0 +1,13 @@
/** @type {import('tailwindcss').Config} */
export default {
content: ["./src/**/*.{js,jsx}"],
theme: {
extend: {
colors: {
'black-900': '#141414',
}
},
},
plugins: [],
}

59
frontend/vite.config.js Normal file
View File

@ -0,0 +1,59 @@
import { defineConfig } from 'vite'
import postcss from './postcss.config.js'
import react from '@vitejs/plugin-react'
import dns from 'dns'
import { visualizer } from "rollup-plugin-visualizer";
dns.setDefaultResultOrder('verbatim')
// https://vitejs.dev/config/
export default defineConfig({
server: {
port: 3000,
host: 'localhost'
},
define: {
'process.env': process.env
},
css: {
postcss,
},
plugins: [
react(),
visualizer({
template: "treemap", // or sunburst
open: false,
gzipSize: true,
brotliSize: true,
filename: "bundleinspector.html", // will be saved in project's root
}),
],
resolve: {
alias: [
{
process: "process/browser",
stream: "stream-browserify",
zlib: "browserify-zlib",
util: "util",
find: /^~.+/,
replacement: (val) => {
return val.replace(/^~/, "");
},
},
],
},
build: {
commonjsOptions: {
transformMixedEsModules: true,
}
},
optimizeDeps: {
esbuildOptions: {
define: {
global: 'globalThis'
},
plugins: [
]
}
}
})

BIN
images/choices.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 152 KiB

BIN
images/gcp-project-bar.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

View File

@ -0,0 +1,14 @@
# AnythingLLM Screenshots
### Homescreen
![Homescreen](./home.png)
### Document Manager
⚡ means the current version of the document has been embedded before and will not cost money to convert into a vector!
![Document Manager](./document.png)
### Chatting
![Chatting](./chat.png)
### Setup check
![Setup check](./keys.png)

BIN
images/screenshots/chat.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 320 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 766 KiB

BIN
images/screenshots/home.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 575 KiB

BIN
images/screenshots/keys.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 515 KiB

15
package.json Normal file
View File

@ -0,0 +1,15 @@
{
"name": "socials-to-chat",
"version": "1.0.0",
"description": "Turn creator socials into chatbots with long-term-memory though a simple UI",
"main": "index.js",
"author": "Timothy Carambat (Mintplex Labs)",
"license": "MIT",
"scripts": {
"setup": "cd server && yarn && cd .. && yarn setup:envs && echo \"Please run yarn dev:server and yarn dev:frontend in separate terminal tabs.\"",
"setup:envs": "cd server && cp -n .env.example .env.development && cd ../collector && cp -n .env.example .env && cd ..",
"dev:server": "cd server && yarn dev",
"dev:frontend": "cd frontend && yarn start"
},
"private": false
}

8
server/.env.example Normal file
View File

@ -0,0 +1,8 @@
SERVER_PORT=5000
OPEN_AI_KEY=
OPEN_MODEL_PREF='gpt-3.5-turbo'
PINECONE_ENVIRONMENT=
PINECONE_API_KEY=
PINECONE_INDEX=
AUTH_TOKEN="hunter2" # This is the password to your application if remote hosting.
CACHE_VECTORS="true"

7
server/.gitignore vendored Normal file
View File

@ -0,0 +1,7 @@
.env.production
.env.development
documents/*
vector-cache/*.json
!documents/DOCUMENTS.md
logs/server.log
*.db

1
server/.nvmrc Normal file
View File

@ -0,0 +1 @@
v18.12.1

View File

@ -0,0 +1,10 @@
### What is this folder of documents?
This is a temporary cache of the resulting files you have collected from `collector/`. You really should not be adding files manually to this folder. However the general format of this is you should partion data by how it was collected - it will be added to the appropriate namespace when you undergo vectorizing.
You can manage these files from the frontend application.
All files should be JSON files and in general there is only one main required key: `pageContent` all other keys will be inserted as metadata for each document inserted into the vector DB.
There is also a special reserved key called `published` that should be reserved for timestamps.

23
server/endpoints/chat.js Normal file
View File

@ -0,0 +1,23 @@
const { reqBody } = require('../utils/http');
const { Workspace } = require('../models/workspace');
const { chatWithWorkspace } = require('../utils/chats');
function chatEndpoints(app) {
if (!app) return;
app.post('/workspace/:slug/chat', async (request, response) => {
const { slug } = request.params
const { message, mode = 'query' } = reqBody(request)
const workspace = await Workspace.get(`slug = '${slug}'`);
if (!workspace) {
response.sendStatus(400).end();
return;
}
const result = await chatWithWorkspace(workspace, message, mode);
response.status(200).json({ ...result });
})
}
module.exports = { chatEndpoints }

View File

@ -0,0 +1,34 @@
require('dotenv').config({ path: `.env.${process.env.NODE_ENV}` })
const { Pinecone } = require('../utils/pinecone');
const { viewLocalFiles } = require('../utils/files');
function systemEndpoints(app) {
if (!app) return;
app.get('/ping', (_, response) => {
response.sendStatus(200);
})
app.get('/setup-complete', (_, response) => {
const results = {
OpenAiKey: !!process.env.OPEN_AI_KEY,
OpenAiModelPref: process.env.OPEN_MODEL_PREF || 'gpt-3.5-turbo',
PineConeEnvironment: process.env.PINECONE_ENVIRONMENT,
PineConeKey: !!process.env.PINECONE_API_KEY,
PinceConeIndex: process.env.PINECONE_INDEX,
}
response.status(200).json({ results })
})
app.get('/system-vectors', async (_, response) => {
const vectorCount = await Pinecone.totalIndicies();
response.status(200).json({ vectorCount })
})
app.get('/local-files', async (_, response) => {
const localFiles = await viewLocalFiles()
response.status(200).json({ localFiles })
})
}
module.exports = { systemEndpoints }

View File

@ -0,0 +1,75 @@
const { Pinecone } = require('../utils/pinecone');
const { reqBody } = require('../utils/http');
const { Workspace } = require('../models/workspace');
const { Document } = require('../models/documents');
const { DocumentVectors } = require('../models/vectors');
const { WorkspaceChats } = require('../models/workspaceChats');
const { convertToChatHistory } = require('../utils/chats');
function workspaceEndpoints(app) {
if (!app) return;
app.post('/workspace/new', async (request, response) => {
const { name = null } = reqBody(request);
const { workspace, message } = await Workspace.new(name);
response.status(200).json({ workspace, message })
})
app.post('/workspace/:slug/update-embeddings', async (request, response) => {
const { slug = null } = request.params;
const { adds = [], deletes = [] } = reqBody(request);
const currWorkspace = await Workspace.get(`slug = '${slug}'`);
if (!currWorkspace) {
response.sendStatus(400).end();
return;
}
await Document.removeDocuments(currWorkspace, deletes);
await Document.addDocuments(currWorkspace, adds);
const updatedWorkspace = await Workspace.get(`slug = '${slug}'`);
response.status(200).json({ workspace: updatedWorkspace })
})
app.delete('/workspace/:slug', async (request, response) => {
const { slug = '' } = request.params
const workspace = await Workspace.get(`slug = '${slug}'`);
if (!workspace) {
response.sendStatus(400).end();
return;
}
await Workspace.delete(`slug = '${slug.toLowerCase()}'`);
await DocumentVectors.deleteForWorkspace(workspace.id);
await Document.delete(`workspaceId = ${Number(workspace.id)}`)
await WorkspaceChats.delete(`workspaceId = ${Number(workspace.id)}`)
try { await Pinecone['delete-namespace']({ namespace: slug }) } catch (e) { console.error(e.message) }
response.sendStatus(200).end()
})
app.get('/workspaces', async (_, response) => {
const workspaces = await Workspace.where();
response.status(200).json({ workspaces })
})
app.get('/workspace/:slug', async (request, response) => {
const { slug } = request.params
const workspace = await Workspace.get(`slug = '${slug}'`);
response.status(200).json({ workspace })
})
app.get('/workspace/:slug/chats', async (request, response) => {
const { slug } = request.params
const workspace = await Workspace.get(`slug = '${slug}'`);
if (!workspace) {
response.sendStatus(400).end()
return;
}
const history = await WorkspaceChats.forWorkspace(workspace.id)
response.status(200).json({ history: convertToChatHistory(history) })
})
}
module.exports = { workspaceEndpoints }

59
server/index.js Normal file
View File

@ -0,0 +1,59 @@
require('dotenv').config({ path: `.env.${process.env.NODE_ENV}` })
const express = require('express')
const bodyParser = require('body-parser')
const cors = require('cors');
const { validatedRequest } = require('./utils/middleware/validatedRequest');
const { Pinecone } = require('./utils/pinecone');
const { reqBody } = require('./utils/http');
const { systemEndpoints } = require('./endpoints/system');
const { workspaceEndpoints } = require('./endpoints/workspaces');
const { chatEndpoints } = require('./endpoints/chat');
const app = express();
app.use(cors({ origin: true }));
app.use(validatedRequest);
app.use(bodyParser.text());
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({
extended: true
}));
systemEndpoints(app);
workspaceEndpoints(app);
chatEndpoints(app);
app.post('/v/:command', async (request, response) => {
const { command } = request.params
if (!Object.getOwnPropertyNames(Pinecone).includes(command)) {
response.status(500).json({ message: 'invalid interface command', commands: Object.getOwnPropertyNames(Pinecone.prototype) });
return
}
try {
const body = reqBody(request);
const resBody = await Pinecone[command](body)
response.status(200).json({ ...resBody });
} catch (e) {
// console.error(e)
console.error(JSON.stringify(e))
response.status(500).json({ error: e.message });
}
return;
})
app.all('*', function (_, response) {
response.sendStatus(404);
});
app.listen(process.env.SERVER_PORT || 5000, () => {
console.log(`Example app listening on port ${process.env.SERVER_PORT || 5000}`)
})
.on("error", function (err) {
process.once("SIGUSR2", function () {
process.kill(process.pid, "SIGUSR2");
});
process.on("SIGINT", function () {
process.kill(process.pid, "SIGINT");
});
});

View File

@ -0,0 +1,99 @@
const { fileData } = require('../utils/files');
const { v4: uuidv4 } = require('uuid');
const Document = {
tablename: 'workspace_documents',
colsInit: `
id INTEGER PRIMARY KEY AUTOINCREMENT,
docId TEXT NOT NULL UNIQUE,
filename TEXT NOT NULL,
docpath TEXT NOT NULL,
workspaceId INTEGER NOT NULL,
metadata TEXT NULL,
createdAt TEXT DEFAULT CURRENT_TIMESTAMP,
lastUpdatedAt TEXT DEFAULT CURRENT_TIMESTAMP
`,
db: async function () {
const sqlite3 = require('sqlite3').verbose();
const { open } = require('sqlite');
const db = await open({
filename: 'anythingllm.db',
driver: sqlite3.Database
})
await db.exec(`CREATE TABLE IF NOT EXISTS ${this.tablename} (${this.colsInit})`);
db.on('trace', (sql) => console.log(sql))
return db
},
forWorkspace: async function (workspaceId = null) {
if (!workspaceId) return [];
return await this.where(`workspaceId = ${workspaceId}`);
},
delete: async function (clause = '') {
const db = await this.db()
await db.get(`DELETE FROM ${this.tablename} WHERE ${clause}`)
db.close()
return true
},
where: async function (clause = '', limit = null) {
const db = await this.db()
const results = await db.all(`SELECT * FROM ${this.tablename} ${clause ? `WHERE ${clause}` : ''} ${!!limit ? `LIMIT ${limit}` : ''}`)
db.close()
return results
},
firstWhere: async function (clause = '') {
const results = await this.where(clause);
return results.length > 0 ? results[0] : null
},
addDocuments: async function (workspace, additions = []) {
const { Pinecone } = require('../utils/pinecone');
if (additions.length === 0) return;
const db = await this.db()
const stmt = await db.prepare(`INSERT INTO ${this.tablename} (docId, filename, docpath, workspaceId, metadata) VALUES (?,?,?,?,?)`)
for (const path of additions) {
const data = await fileData(path);
if (!data) continue;
const docId = uuidv4();
const { pageContent, ...metadata } = data
const newDoc = {
docId,
filename: path.split('/')[1],
docpath: path,
workspaceId: Number(workspace.id),
metadata: JSON.stringify(metadata)
}
const vectorized = await Pinecone.addDocumentToNamespace(workspace.slug, { ...data, docId }, path);
if (!vectorized) {
console.error('Failed to vectorize', path)
continue;
}
stmt.run([docId, newDoc.filename, newDoc.docpath, newDoc.workspaceId, newDoc.metadata])
}
stmt.finalize();
db.close();
return;
},
removeDocuments: async function (workspace, removals = []) {
const { Pinecone } = require('../utils/pinecone');
if (removals.length === 0) return;
const db = await this.db()
const stmt = await db.prepare(`DELETE FROM ${this.tablename} WHERE docpath = ? AND workspaceId = ?`);
for (const path of removals) {
const document = await this.firstWhere(`docPath = '${path}' AND workspaceId = ${workspace.id}`)
if (!document) continue;
await Pinecone.deleteDocumentFromNamespace(workspace.slug, document.docId);
stmt.run([path, workspace.id])
}
stmt.finalize();
db.close();
return true;
}
}
module.exports = { Document }

63
server/models/vectors.js Normal file
View File

@ -0,0 +1,63 @@
const { Document } = require('./documents');
// TODO: Do we want to store entire vectorized chunks in here
// so that we can easily spin up temp-namespace clones for threading
//
const DocumentVectors = {
tablename: 'document_vectors',
colsInit: `
id INTEGER PRIMARY KEY AUTOINCREMENT,
docId TEXT NOT NULL,
vectorId TEXT NOT NULL,
createdAt TEXT DEFAULT CURRENT_TIMESTAMP,
lastUpdatedAt TEXT DEFAULT CURRENT_TIMESTAMP
`,
db: async function () {
const sqlite3 = require('sqlite3').verbose();
const { open } = require('sqlite');
const db = await open({
filename: 'anythingllm.db',
driver: sqlite3.Database
})
await db.exec(`CREATE TABLE IF NOT EXISTS ${this.tablename} (${this.colsInit})`);
db.on('trace', (sql) => console.log(sql))
return db
},
bulkInsert: async function (vectorRecords = []) {
if (vectorRecords.length === 0) return;
const db = await this.db();
const stmt = await db.prepare(`INSERT INTO ${this.tablename} (docId, vectorId) VALUES (?, ?)`);
for (const record of vectorRecords) {
const { docId, vectorId } = record
stmt.run([docId, vectorId])
}
stmt.finalize()
db.close()
return { documentsInserted: vectorRecords.length };
},
deleteForWorkspace: async function (workspaceId) {
const documents = await Document.forWorkspace(workspaceId);
const docIds = [...(new Set(documents.map((doc) => doc.docId)))];
const ids = (await this.where(`docId IN (${docIds.map((id) => `'${id}'`).join(',')})`)).map((doc) => doc.id)
await this.deleteIds(ids)
return true;
},
where: async function (clause = '', limit = null) {
const db = await this.db()
const results = await db.all(`SELECT * FROM ${this.tablename} ${clause ? `WHERE ${clause}` : ''} ${!!limit ? `LIMIT ${limit}` : ''}`)
db.close()
return results
},
deleteIds: async function (ids = []) {
const db = await this.db()
await db.get(`DELETE FROM ${this.tablename} WHERE id IN (${ids.join(', ')}) `)
db.close()
return true
}
}
module.exports = { DocumentVectors }

View File

@ -0,0 +1,63 @@
const slugify = require('slugify');
const { Document } = require('./documents');
const Workspace = {
tablename: 'workspaces',
colsInit: `
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
slug TEXT NOT NULL UNIQUE,
vectorTag TEXT DEFAULT NULL,
createdAt TEXT DEFAULT CURRENT_TIMESTAMP,
lastUpdatedAt TEXT DEFAULT CURRENT_TIMESTAMP
`,
db: async function () {
const sqlite3 = require('sqlite3').verbose();
const { open } = require('sqlite');
const db = await open({
filename: 'anythingllm.db',
driver: sqlite3.Database
})
await db.exec(`CREATE TABLE IF NOT EXISTS ${this.tablename} (${this.colsInit})`);
db.on('trace', (sql) => console.log(sql))
return db
},
new: async function (name = null) {
if (!name) return { result: null, message: 'name cannot be null' };
const db = await this.db()
const { id, success, message } = await db.run(`INSERT INTO ${this.tablename} (name, slug) VALUES (?, ?)`, [name, slugify(name, { lower: true })])
.then((res) => {
return { id: res.lastID, success: true, message: null }
})
.catch((error) => {
return { id: null, success: false, message: error.message }
})
if (!success) return { workspace: null, message }
const workspace = await db.get(`SELECT * FROM ${this.tablename} WHERE id = ${id}`)
return { workspace, message: null }
},
get: async function (clause = '') {
const db = await this.db()
const result = await db.get(`SELECT * FROM ${this.tablename} WHERE ${clause}`).then((res) => res || null)
if (!result) return null;
const documents = await Document.forWorkspace(result.id);
return { ...result, documents }
},
delete: async function (clause = '') {
const db = await this.db()
await db.get(`DELETE FROM ${this.tablename} WHERE ${clause}`)
return true
},
where: async function (clause = '', limit = null) {
const db = await this.db()
const results = await db.all(`SELECT * FROM ${this.tablename} ${clause ? `WHERE ${clause}` : ''} ${!!limit ? `LIMIT ${limit}` : ''}`)
return results
},
}
module.exports = { Workspace }

View File

@ -0,0 +1,68 @@
const WorkspaceChats = {
tablename: 'workspace_chats',
colsInit: `
id INTEGER PRIMARY KEY AUTOINCREMENT,
workspaceId INTEGER NOT NULL,
prompt TEXT NOT NULL,
response TEXT NOT NULL,
include BOOL DEFAULT true,
createdAt TEXT DEFAULT CURRENT_TIMESTAMP,
lastUpdatedAt TEXT DEFAULT CURRENT_TIMESTAMP
`,
db: async function () {
const sqlite3 = require('sqlite3').verbose();
const { open } = require('sqlite');
const db = await open({
filename: 'anythingllm.db',
driver: sqlite3.Database
})
await db.exec(`CREATE TABLE IF NOT EXISTS ${this.tablename} (${this.colsInit})`);
db.on('trace', (sql) => console.log(sql))
return db
},
new: async function ({ workspaceId, prompt, response = {} }) {
const db = await this.db()
const { id, success, message } = await db.run(`INSERT INTO ${this.tablename} (workspaceId, prompt, response) VALUES (?, ?, ?)`, [workspaceId, prompt, JSON.stringify(response)])
.then((res) => {
return { id: res.lastID, success: true, message: null }
})
.catch((error) => {
return { id: null, success: false, message: error.message }
})
if (!success) return { chat: null, message }
const chat = await db.get(`SELECT * FROM ${this.tablename} WHERE id = ${id}`)
return { chat, message: null }
},
forWorkspace: async function (workspaceId = null) {
if (!workspaceId) return [];
return await this.where(`workspaceId = ${workspaceId} AND include = true`, null, 'ORDER BY id ASC')
},
markHistoryInvalid: async function (workspaceId = null) {
if (!workspaceId) return;
const db = await this.db()
await db.run(`UPDATE ${this.tablename} SET include = false WHERE workspaceId = ?`, [workspaceId]);
return;
},
get: async function (clause = '') {
const db = await this.db()
const result = await db.get(`SELECT * FROM ${this.tablename} WHERE ${clause}`).then((res) => res || null)
if (!result) return null;
return result
},
delete: async function (clause = '') {
const db = await this.db()
await db.get(`DELETE FROM ${this.tablename} WHERE ${clause}`)
return true
},
where: async function (clause = '', limit = null, order = null) {
const db = await this.db()
const results = await db.all(`SELECT * FROM ${this.tablename} ${clause ? `WHERE ${clause}` : ''} ${!!limit ? `LIMIT ${limit}` : ''} ${!!order ? order : ''}`)
return results
},
}
module.exports = { WorkspaceChats }

35
server/package.json Normal file
View File

@ -0,0 +1,35 @@
{
"name": "socials-to-chat-server",
"version": "1.0.0",
"description": "Server endpoints to process or create content for chatting",
"main": "index.js",
"author": "Timothy Carambat (Mintplex Labs)",
"license": "MIT",
"private": false,
"engines": {
"node": ">=18.12.1"
},
"scripts": {
"dev": "NODE_ENV=development nodemon --ignore documents index.js",
"start": "NODE_ENV=production node index.js"
},
"dependencies": {
"@googleapis/youtube": "^9.0.0",
"@pinecone-database/pinecone": "^0.1.6",
"body-parser": "^1.20.2",
"cors": "^2.8.5",
"dotenv": "^16.0.3",
"express": "^4.18.2",
"langchain": "^0.0.81",
"moment": "^2.29.4",
"openai": "^3.2.1",
"pinecone-client": "^1.1.0",
"slugify": "^1.6.6",
"sqlite": "^4.2.1",
"sqlite3": "^5.1.6",
"uuid": "^9.0.0"
},
"devDependencies": {
"nodemon": "^2.0.22"
}
}

View File

@ -0,0 +1,17 @@
const { WorkspaceChats } = require("../../../models/workspaceChats");
async function resetMemory(workspace, _message, msgUUID) {
await WorkspaceChats.markHistoryInvalid(workspace.id);
return {
uuid: msgUUID,
type: 'textResponse',
textResponse: 'Workspace chat memory was reset!',
sources: [],
close: true,
error: false,
};
}
module.exports = {
resetMemory
}

128
server/utils/chats/index.js Normal file
View File

@ -0,0 +1,128 @@
const { v4: uuidv4 } = require('uuid');
const { OpenAi } = require('../openAi');
const { Pinecone } = require('../pinecone');
const { WorkspaceChats } = require('../../models/workspaceChats');
const { resetMemory } = require("./commands/reset");
const moment = require('moment')
function convertToChatHistory(history = []) {
const formattedHistory = []
history.forEach((history) => {
const { prompt, response, createdAt } = history
const data = JSON.parse(response);
formattedHistory.push([
{
role: 'user',
content: prompt,
sentAt: moment(createdAt).unix(),
},
{
role: 'assistant',
content: data.text,
sources: data.sources || [],
sentAt: moment(createdAt).unix(),
},
])
})
return formattedHistory.flat()
}
function convertToPromptHistory(history = []) {
const formattedHistory = []
history.forEach((history) => {
const { prompt, response } = history
const data = JSON.parse(response);
formattedHistory.push([
{ role: 'user', content: prompt },
{ role: 'assistant', content: data.text },
])
})
return formattedHistory.flat()
}
const VALID_COMMANDS = {
'/reset': resetMemory,
}
function grepCommand(message) {
const availableCommands = Object.keys(VALID_COMMANDS);
for (let i = 0; i < availableCommands.length; i++) {
const cmd = availableCommands[i];
const re = new RegExp(`^(${cmd})`, "i");
if (re.test(message)) {
return cmd;
}
}
return null
}
async function chatWithWorkspace(workspace, message, chatMode = 'query') {
const uuid = uuidv4();
const openai = new OpenAi();
const command = grepCommand(message)
if (!!command && Object.keys(VALID_COMMANDS).includes(command)) {
return await VALID_COMMANDS[command](workspace, message, uuid);
}
const { safe, reasons = [] } = await openai.isSafe(message)
if (!safe) {
return {
id: uuid,
type: 'abort',
textResponse: null,
sources: [],
close: true,
error: `This message was moderated and will not be allowed. Violations for ${reasons.join(', ')} found.`
};
}
const hasVectorizedSpace = await Pinecone.hasNamespace(workspace.slug);
if (!hasVectorizedSpace) {
const rawHistory = await WorkspaceChats.forWorkspace(workspace.id)
const chatHistory = convertToPromptHistory(rawHistory);
const response = await openai.sendChat(chatHistory, message);
const data = { text: response, sources: [], type: 'chat' }
await WorkspaceChats.new({ workspaceId: workspace.id, prompt: message, response: data })
return {
id: uuid,
type: 'textResponse',
textResponse: response,
sources: [],
close: true,
error: null,
};
} else {
const { response, sources, message: error } = await Pinecone[chatMode]({ namespace: workspace.slug, input: message });
if (!response) {
return {
id: uuid,
type: 'abort',
textResponse: null,
sources: [],
close: true,
error,
};
}
const data = { text: response, sources, type: chatMode }
await WorkspaceChats.new({ workspaceId: workspace.id, prompt: message, response: data })
return {
id: uuid,
type: 'textResponse',
textResponse: response,
sources,
close: true,
error,
};
}
}
module.exports = {
convertToChatHistory,
chatWithWorkspace
}

120
server/utils/files/index.js Normal file
View File

@ -0,0 +1,120 @@
const fs = require("fs")
const path = require('path');
const { v5: uuidv5 } = require('uuid');
async function collectDocumentData(folderName = null) {
if (!folderName) throw new Error('No docPath provided in request');
const folder = path.resolve(__dirname, `../../documents/${folderName}`)
const dirExists = fs.existsSync(folder);
if (!dirExists) throw new Error(`No documents folder for ${folderName} - did you run collector/main.py for this element?`);
const files = fs.readdirSync(folder);
const fileData = [];
files.forEach(file => {
if (path.extname(file) === '.json') {
const filePath = path.join(folder, file);
const data = fs.readFileSync(filePath, 'utf8');
console.log(`Parsing document: ${file}`);
fileData.push(JSON.parse(data))
}
});
return fileData;
}
// Should take in a folder that is a subfolder of documents
// eg: youtube-subject/video-123.json
async function fileData(filePath = null) {
if (!filePath) throw new Error('No docPath provided in request');
const fullPath = path.resolve(__dirname, `../../documents/${filePath}`)
const fileExists = fs.existsSync(fullPath);
if (!fileExists) return null;
const data = fs.readFileSync(fullPath, 'utf8');
return JSON.parse(data)
}
async function viewLocalFiles() {
const folder = path.resolve(__dirname, `../../documents`)
const dirExists = fs.existsSync(folder);
if (!dirExists) return {}
const directory = {
name: "documents",
type: "folder",
items: [],
}
for (const file of fs.readdirSync(folder)) {
if (path.extname(file) === '.md') continue;
const folderPath = path.resolve(__dirname, `../../documents/${file}`)
const isFolder = fs.lstatSync(folderPath).isDirectory()
if (isFolder) {
const subdocs = {
name: file,
type: "folder",
items: [],
}
const subfiles = fs.readdirSync(folderPath);
for (const subfile of subfiles) {
if (path.extname(subfile) !== '.json') continue;
const filePath = path.join(folderPath, subfile);
const rawData = fs.readFileSync(filePath, 'utf8');
const cachefilename = `${file}/${subfile}`
const { pageContent, ...metadata } = JSON.parse(rawData)
subdocs.items.push({
name: subfile,
type: "file",
...metadata,
cached: await cachedVectorInformation(cachefilename, true)
})
}
directory.items.push(subdocs)
}
};
return directory
}
// Searches the vector-cache folder for existing information so we dont have to re-embed a
// document and can instead push directly to vector db.
async function cachedVectorInformation(filename = null, checkOnly = false) {
if (!process.env.CACHE_VECTORS) return checkOnly ? false : { exists: false, chunks: [] };
if (!filename) return checkOnly ? false : { exists: false, chunks: [] };
const digest = uuidv5(filename, uuidv5.URL);
const file = path.resolve(__dirname, `../../vector-cache/${digest}.json`);
const exists = fs.existsSync(file);
if (checkOnly) return exists
if (!exists) return { exists, chunks: [] }
console.log(`Cached vectorized results of ${filename} found! Using cached data to save on embed costs.`)
const rawData = fs.readFileSync(file, 'utf8');
return { exists: true, chunks: JSON.parse(rawData) }
}
// vectorData: pre-chunked vectorized data for a given file that includes the proper metadata and chunk-size limit so it can be iterated and dumped into Pinecone, etc
// filename is the fullpath to the doc so we can compare by filename to find cached matches.
async function storeVectorResult(vectorData = [], filename = null) {
if (!process.env.CACHE_VECTORS) return;
if (!filename) return;
console.log(`Caching vectorized results of ${filename} to prevent duplicated embedding.`)
const folder = path.resolve(__dirname, `../../vector-cache`);
if (!fs.existsSync(folder)) fs.mkdirSync(folder);
const digest = uuidv5(filename, uuidv5.URL);
const writeTo = path.resolve(folder, `${digest}.json`);
fs.writeFileSync(writeTo, JSON.stringify(vectorData), 'utf8');
return;
}
module.exports = {
cachedVectorInformation,
collectDocumentData,
viewLocalFiles,
storeVectorResult,
fileData
}

View File

@ -0,0 +1,14 @@
function reqBody(request) {
return typeof request.body === 'string'
? JSON.parse(request.body)
: request.body;
}
function queryParams(request) {
return request.query;
}
module.exports = {
reqBody,
queryParams,
};

View File

@ -0,0 +1,37 @@
function validatedRequest(request, response, next) {
// When in development passthrough auth token for ease of development.
if (process.env.NODE_ENV === 'development' || !process.env.AUTH_TOKEN) {
next();
return;
}
if (!process.env.AUTH_TOKEN) {
response.status(403).json({
error: "You need to set an AUTH_TOKEN environment variable."
});
return;
}
const auth = request.header('Authorization');
const token = auth ? auth.split(' ')[1] : null;
if (!token) {
response.status(403).json({
error: "No auth token found."
});
return;
}
if (token !== process.env.AUTH_TOKEN) {
response.status(403).json({
error: "Invalid auth token found."
});
return;
}
next();
}
module.exports = {
validatedRequest,
};

View File

@ -0,0 +1,64 @@
const { Configuration, OpenAIApi } = require('openai')
class OpenAi {
constructor() {
const config = new Configuration({ apiKey: process.env.OPEN_AI_KEY, organization: 'org-amIuvAIIcdUmN5YCiwRayVfb' })
const openai = new OpenAIApi(config);
this.openai = openai
}
isValidChatModel(modelName = '') {
const validModels = ['gpt-4', 'gpt-3.5-turbo']
return validModels.includes(modelName)
}
async isSafe(input = '') {
const { flagged = false, categories = {} } = await this.openai.createModeration({ input })
.then((json) => {
const res = json.data;
if (!res.hasOwnProperty('results')) throw new Error('OpenAI moderation: No results!');
if (res.results.length === 0) throw new Error('OpenAI moderation: No results length!');
return res.results[0]
})
if (!flagged) return { safe: true, reasons: [] };
const reasons = Object.keys(categories).map((category) => {
const value = categories[category]
if (value === true) {
return category.replace('/', ' or ');
} else {
return null;
}
}).filter((reason) => !!reason)
return { safe: false, reasons }
}
async sendChat(chatHistory = [], prompt) {
const model = process.env.OPEN_MODEL_PREF
if (!this.isValidChatModel(model)) throw new Error(`OpenAI chat: ${model} is not valid for chat completion!`);
const textResponse = await this.openai.createChatCompletion({
model,
temperature: 0.7,
n: 1,
messages: [
{ role: 'system', content: '' },
...chatHistory,
{ role: 'user', content: prompt },
]
})
.then((json) => {
const res = json.data
if (!res.hasOwnProperty('choices')) throw new Error('OpenAI chat: No results!');
if (res.choices.length === 0) throw new Error('OpenAI chat: No results length!');
return res.choices[0].message.content
})
return textResponse
}
}
module.exports = {
OpenAi,
};

View File

@ -0,0 +1,279 @@
const { PineconeClient } = require("@pinecone-database/pinecone");
const { PineconeStore } = require("langchain/vectorstores/pinecone");
const { OpenAI } = require("langchain/llms/openai");
const { ChatOpenAI } = require('langchain/chat_models/openai');
const { VectorDBQAChain, LLMChain, RetrievalQAChain, ConversationalRetrievalQAChain } = require("langchain/chains");
const { OpenAIEmbeddings } = require("langchain/embeddings/openai");
const { VectorStoreRetrieverMemory, BufferMemory } = require("langchain/memory");
const { PromptTemplate } = require("langchain/prompts");
const { RecursiveCharacterTextSplitter } = require("langchain/text_splitter");
const { storeVectorResult, cachedVectorInformation } = require('../files');
const { Configuration, OpenAIApi } = require('openai')
const { v4: uuidv4 } = require('uuid');
const toChunks = (arr, size) => {
return Array.from({ length: Math.ceil(arr.length / size) }, (_v, i) =>
arr.slice(i * size, i * size + size)
);
}
function curateSources(sources = []) {
const knownDocs = [];
const documents = []
for (const source of sources) {
const { metadata = {} } = source
if (Object.keys(metadata).length > 0 && !knownDocs.includes(metadata.title)) {
documents.push({ ...metadata })
knownDocs.push(metadata.title)
}
}
return documents;
}
const Pinecone = {
connect: async function () {
const client = new PineconeClient();
await client.init({
apiKey: process.env.PINECONE_API_KEY,
environment: process.env.PINECONE_ENVIRONMENT,
});
const pineconeIndex = client.Index(process.env.PINECONE_INDEX);
const { status } = await client.describeIndex({ indexName: process.env.PINECONE_INDEX });
if (!status.ready) throw new Error("Pinecode::Index not ready.")
return { client, pineconeIndex, indexName: process.env.PINECONE_INDEX };
},
embedder: function () {
return new OpenAIEmbeddings({ openAIApiKey: process.env.OPEN_AI_KEY });
},
openai: function () {
const config = new Configuration({ apiKey: process.env.OPEN_AI_KEY })
const openai = new OpenAIApi(config);
return openai
},
embedChunk: async function (openai, textChunk) {
const { data: { data } } = await openai.createEmbedding({
model: 'text-embedding-ada-002',
input: textChunk
})
return data.length > 0 && data[0].hasOwnProperty('embedding') ? data[0].embedding : null
},
llm: function () {
const model = process.env.OPEN_MODEL_PREF || 'gpt-3.5-turbo'
return new OpenAI({ openAIApiKey: process.env.OPEN_AI_KEY, temperature: 0.7, modelName: model });
},
chatLLM: function () {
const model = process.env.OPEN_MODEL_PREF || 'gpt-3.5-turbo'
return new ChatOpenAI({ openAIApiKey: process.env.OPEN_AI_KEY, temperature: 0.7, modelName: model });
},
totalIndicies: async function () {
const { pineconeIndex } = await this.connect();
const { namespaces } = await pineconeIndex.describeIndexStats1();
return Object.values(namespaces).reduce((a, b) => a + (b?.vectorCount || 0), 0)
},
namespace: async function (index, namespace = null) {
if (!namespace) throw new Error("No namespace value provided.");
const { namespaces } = await index.describeIndexStats1();
return namespaces.hasOwnProperty(namespace) ? namespaces[namespace] : null
},
hasNamespace: async function (namespace = null) {
if (!namespace) return false;
const { pineconeIndex } = await this.connect();
return await this.namespaceExists(pineconeIndex, namespace)
},
namespaceExists: async function (index, namespace = null) {
if (!namespace) throw new Error("No namespace value provided.");
const { namespaces } = await index.describeIndexStats1();
return namespaces.hasOwnProperty(namespace)
},
deleteVectorsInNamespace: async function (index, namespace = null) {
await index.delete1({ namespace, deleteAll: true })
return true
},
addDocumentToNamespace: async function (namespace, documentData = {}, fullFilePath = null) {
const { DocumentVectors } = require("../../models/vectors");
try {
const { pageContent, docId, ...metadata } = documentData
if (!pageContent || pageContent.length == 0) return false;
console.log("Adding new vectorized document into namespace", namespace);
const cacheResult = await cachedVectorInformation(fullFilePath)
if (cacheResult.exists) {
const { pineconeIndex } = await this.connect();
const { chunks } = cacheResult
const documentVectors = []
for (const chunk of chunks) {
// Before sending to Pinecone and saving the records to our db
// we need to assign the id of each chunk that is stored in the cached file.
const newChunks = chunk.map((chunk) => {
const id = uuidv4()
documentVectors.push({ docId, vectorId: id });
return { ...chunk, id }
})
// Push chunks with new ids to pinecone.
await pineconeIndex.upsert({
upsertRequest: {
vectors: [...newChunks],
namespace,
}
})
}
await DocumentVectors.bulkInsert(documentVectors)
return true
}
// If we are here then we are going to embed and store a novel document.
// We have to do this manually as opposed to using LangChains `PineconeStore.fromDocuments`
// because we then cannot atomically control our namespace to granularly find/remove documents
// from vectordb.
// https://github.com/hwchase17/langchainjs/blob/2def486af734c0ca87285a48f1a04c057ab74bdf/langchain/src/vectorstores/pinecone.ts#L167
const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 20 });
const textChunks = await textSplitter.splitText(pageContent)
console.log('Chunks created from document:', textChunks.length)
const documentVectors = []
const vectors = []
const openai = this.openai()
for (const textChunk of textChunks) {
const vectorValues = await this.embedChunk(openai, textChunk);
if (!!vectorValues) {
const vectorRecord = {
id: uuidv4(),
values: vectorValues,
// [DO NOT REMOVE]
// LangChain will be unable to find your text if you embed manually and dont include the `text` key.
// https://github.com/hwchase17/langchainjs/blob/2def486af734c0ca87285a48f1a04c057ab74bdf/langchain/src/vectorstores/pinecone.ts#L64
metadata: { ...metadata, text: textChunk },
}
vectors.push(vectorRecord);
documentVectors.push({ docId, vectorId: vectorRecord.id });
} else {
console.error('Could not use OpenAI to embed document chunk! This document will not be recorded.')
}
}
if (vectors.length > 0) {
const chunks = []
const { pineconeIndex } = await this.connect();
console.log('Inserting vectorized chunks into Pinecone.')
for (const chunk of toChunks(vectors, 100)) {
chunks.push(chunk)
await pineconeIndex.upsert({
upsertRequest: {
vectors: [...chunk],
namespace,
}
})
}
await storeVectorResult(chunks, fullFilePath)
}
await DocumentVectors.bulkInsert(documentVectors)
return true;
} catch (e) {
console.error('addDocumentToNamespace', e.message)
return false;
}
},
deleteDocumentFromNamespace: async function (namespace, docId) {
const { DocumentVectors } = require("../../models/vectors");
const { pineconeIndex } = await this.connect();
if (!await this.namespaceExists(pineconeIndex, namespace)) return;
const knownDocuments = await DocumentVectors.where(`docId = '${docId}'`)
if (knownDocuments.length === 0) return;
const vectorIds = knownDocuments.map((doc) => doc.vectorId);
await pineconeIndex.delete1({
ids: vectorIds,
namespace,
})
const indexes = knownDocuments.map((doc) => doc.id);
await DocumentVectors.deleteIds(indexes)
return true;
},
'namespace-stats': async function (reqBody = {}) {
const { namespace = null } = reqBody
if (!namespace) throw new Error("namespace required");
const { pineconeIndex } = await this.connect();
if (!await this.namespaceExists(pineconeIndex, namespace)) throw new Error('Namespace by that name does not exist.');
const stats = await this.namespace(pineconeIndex, namespace)
return stats ? stats : { message: 'No stats were able to be fetched from DB' }
},
'delete-namespace': async function (reqBody = {}) {
const { namespace = null } = reqBody
const { pineconeIndex } = await this.connect();
if (!await this.namespaceExists(pineconeIndex, namespace)) throw new Error('Namespace by that name does not exist.');
const details = await this.namespace(pineconeIndex, namespace);
await this.deleteVectorsInNamespace(pineconeIndex, namespace);
return { message: `Namespace ${namespace} was deleted along with ${details.vectorCount} vectors.` }
},
query: async function (reqBody = {}) {
const { namespace = null, input } = reqBody;
if (!namespace || !input) throw new Error("Invalid request body");
const { pineconeIndex } = await this.connect();
if (!await this.namespaceExists(pineconeIndex, namespace)) {
return {
response: null, sources: [], message: 'Invalid query - no documents found for workspace!'
}
}
const vectorStore = await PineconeStore.fromExistingIndex(
this.embedder(),
{ pineconeIndex, namespace }
);
const model = this.llm();
const chain = VectorDBQAChain.fromLLM(model, vectorStore, {
k: 5,
returnSourceDocuments: true,
});
const response = await chain.call({ query: input });
return { response: response.text, sources: curateSources(response.sourceDocuments), message: false }
},
// This implementation of chat also expands the memory of the chat itself
// and adds more tokens to the PineconeDB instance namespace
chat: async function (reqBody = {}) {
const { namespace = null, input } = reqBody;
if (!namespace || !input) throw new Error("Invalid request body");
const { pineconeIndex } = await this.connect();
if (!await this.namespaceExists(pineconeIndex, namespace)) throw new Error("Invalid namespace - has it been collected and seeded yet?");
const vectorStore = await PineconeStore.fromExistingIndex(
this.embedder(),
{ pineconeIndex, namespace }
);
const memory = new VectorStoreRetrieverMemory({
vectorStoreRetriever: vectorStore.asRetriever(1),
memoryKey: "history",
});
const model = this.llm();
const prompt =
PromptTemplate.fromTemplate(`The following is a friendly conversation between a human and an AI. The AI is very casual and talkative and responds with a friendly tone. If the AI does not know the answer to a question, it truthfully says it does not know.
Relevant pieces of previous conversation:
{history}
Current conversation:
Human: {input}
AI:`);
const chain = new LLMChain({ llm: model, prompt, memory });
const response = await chain.call({ input });
return { response: response.text, sources: [], message: false }
},
}
module.exports = {
Pinecone
}

View File

@ -0,0 +1,5 @@
### What is this folder?
`vector-cache` is a running storage of JSON documents that you have already run embeddings on. This allows you to use the same large documents for multiple workspaces without having to pay to re-embed them each time you want to reference them across workspaces.
This also allows you to reset entire workspaces back to their original state without having to pay for the embeddings again. Saving you tons of money for large documents that take a while to embed.