* cosmetic changes to be compatible to hadolint
* common configuration for most editors until better plugins comes up
* Changes on PDF metadata, using PyMuPDF (faster and more compatible)
* small changes on other file ingestions in order to try to keep the fields equal
* Lint, review, and review
* fixed unknown chars
* Use PyMuPDF for pdf loading for 200% speed increase
linting
---------
Co-authored-by: Francisco Bischoff <franzbischoff@gmail.com>
Co-authored-by: Francisco Bischoff <984592+franzbischoff@users.noreply.github.com>
* Update filetypes.py
Added mbox format
* Created new file
Added support for mbox files as used by many email services, including Google Takeout's Gmail archive.
* Update filetypes.py
* Update as_mbox.py
* implement dnd uploader
show file upload progress
write files to hotdirector
build simple flaskAPI to process files one off
* move document processor calls to util
build out dockerfile to run both procs at the same time
update UI to check for document processor before upload
* disable pragma update on boot
* dockerfile changes
* add filetype restrictions based on python app support response and show rejected files in the UI
* cleanup
* stub migrations on boot to prevent exit condition
* update CF template for AWS deploy
* Adds ability to import sitemaps to include a website
* adds example sitemap url
* adds filter to bypass common image formats
* moves filetype ignoring to sitemap script
* Updates for Linux for frontend/server
* frontend/server docker
* updated Dockerfile for deps related to node vectordb
* updates for collector in docker
* docker deps for ODT processing
* ignore another collector dir
* storage mount improvements; run as UID
* fix pypandoc version typo
* permissions fixes