* WIP replace langchain pdfloader with pdfjs and add more context to each page
* remove extras from pdfjs and just replace langchain library
* remove unneeded dep
* fix console log in docs
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* Include links in citations
force ChunkSource key to retain this information
old links will be unsupported
* show special icons depending on source
* remove console log
* reset server documents writeTo
* Add support for fetching single document in documents folder
* Add document object to upload + support link scraping via API
* hotfixes for documentation
* update api docs
* wip: init refactor of document processor to JS
* add NodeJs PDF support
* wip: partity with python processor
feat: add pptx support
* fix: forgot files
* Remove python scripts totally
* wip:update docker to boot new collector
* add package.json support
* update dockerfile for new build
* update gitignore and linting
* add more protections on file lookup
* update package.json
* test build
* update docker commands to use cap-add=SYS_ADMIN so web scraper can run
update all scripts to reflect this
remove docker build for branch