Commit Graph

6 Commits

Author SHA1 Message Date
Sean Hatfield
0bb47619dc
Allow 127.0.0.1 as valid URL for scraping (#2560)
* allow 127.0.0.1 as valid url for scraping

* update comments and lint

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-10-31 09:57:28 -07:00
timothycarambat
619f6b3884 Ignore SSL errors for web scraper
resolves #2114
2024-08-14 09:11:22 -07:00
timothycarambat
b541623c9e add SSRF notice 2024-08-13 17:46:07 -07:00
Timothy Carambat
0db6c3b2aa
Prevent private octets from link collection for self-hosted (#626) 2024-01-19 10:49:40 -08:00
Timothy Carambat
1563a1b20f
Strict link protocol validation (#577) 2024-01-11 12:29:00 -08:00
Timothy Carambat
719521c307
Document Processor v2 (#442)
* wip: init refactor of document processor to JS

* add NodeJs PDF support

* wip: partity with python processor
feat: add pptx support

* fix: forgot files

* Remove python scripts totally

* wip:update docker to boot new collector

* add package.json support

* update dockerfile for new build

* update gitignore and linting

* add more protections on file lookup

* update package.json

* test build

* update docker commands to use cap-add=SYS_ADMIN so web scraper can run
update all scripts to reflect this
remove docker build for branch
2023-12-14 15:14:56 -08:00