Commit Graph

5 Commits

Author SHA1 Message Date
Timothy Carambat
5441717294
normalize parser struct for all file types (#321) 2023-11-01 16:44:02 -07:00
Francisco Bischoff
26dba59249
mbox parsing improvements v1 (#308)
* mbox parsing improvements v1

* autobots roll out!
2023-10-30 11:57:33 -07:00
Timothy Carambat
a505928934
Display better error messages from document processor (#243)
pass messages to frontend on success/failure
resolves #242
2023-09-18 16:50:20 -07:00
Timothy Carambat
3e78476739
Franzbischoff document improvements (#241)
* cosmetic changes to be compatible to hadolint

* common configuration for most editors until better plugins comes up

* Changes on PDF metadata, using PyMuPDF (faster and more compatible)

* small changes on other file ingestions in order to try to keep the fields equal

* Lint, review, and review

* fixed unknown chars

* Use PyMuPDF for pdf loading for 200% speed increase
linting

---------

Co-authored-by: Francisco Bischoff <franzbischoff@gmail.com>
Co-authored-by: Francisco Bischoff <984592+franzbischoff@users.noreply.github.com>
2023-09-18 16:21:37 -07:00
mplawner
3efe55a720
Added mbox support (#106)
* Update filetypes.py

Added mbox format

* Created new file

Added support for mbox files as used by many email services, including Google Takeout's Gmail archive.

* Update filetypes.py

* Update as_mbox.py
2023-06-25 18:11:05 -07:00