timothycarambat
b583aa74fd
remove prints
2023-11-16 17:17:52 -08:00
Sean Hatfield
7edfccaf9a
Adding url uploads to document picker ( #375 )
...
* WIP adding url uploads to document picker
* fix manual script for uploading url to custom-documents
* fix metadata for url scraping
* wip url parsing
* update how async link scraping works
* docker-compose defaults added
no autocomplete on URLs
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-11-16 17:15:01 -08:00
timothycarambat
1e3d82e184
patch collector script
2023-11-16 10:25:23 -08:00
timothycarambat
c5dc68633b
patch link scrape tool schema
2023-11-14 16:41:39 -08:00
Timothy Carambat
d7315b0e53
be able to parse relative and FQDN links from root reliabily ( #138 )
2023-07-05 14:40:54 -07:00
AntonioCiolino
a52b0ae655
Updated Link scraper to avoid NoneType error. ( #90 )
...
* Enable web scraping based on a urtl and a simple filter.
* ignore yarn
* Updated Link scraper to avoid NoneType error.
2023-06-19 12:07:26 -07:00
AntonioCiolino
e7ba028497
Enable web scraping based on a urtl and a simple filter. ( #73 )
2023-06-16 17:29:11 -07:00
Skid Vis
4118c9dcf3
Blocks images in sitemaps from being parsed. ( #56 )
...
* Adds ability to import sitemaps to include a website
* adds example sitemap url
* adds filter to bypass common image formats
* moves filetype ignoring to sitemap script
2023-06-14 23:00:03 -07:00
Skid Vis
bd32f97a21
Adds ability to import sitemaps to include a website ( #51 )
...
* Adds ability to import sitemaps to include a website
* adds example sitemap url
2023-06-14 11:04:17 -07:00
frasergr
9f33b3dfcb
Docker support ( #34 )
...
* Updates for Linux for frontend/server
* frontend/server docker
* updated Dockerfile for deps related to node vectordb
* updates for collector in docker
* docker deps for ODT processing
* ignore another collector dir
* storage mount improvements; run as UID
* fix pypandoc version typo
* permissions fixes
2023-06-13 11:26:11 -07:00
timothycarambat
27c58541bd
inital commit ⚡
2023-06-03 19:28:07 -07:00