1
0
mirror of https://github.com/searxng/searxng.git synced 2024-11-05 04:40:11 +01:00
Commit Graph

7636 Commits

Author SHA1 Message Date
Markus Heiser
e2917e64ff [mod] Upgrade Sphinx from 6.2.1 to 7.0.1
To upgrade Sphinx, MyST-Parser and markdown-it-py must also be updated at the
same time:

Closes: https://github.com/searxng/searxng/pull/2433
Closes: https://github.com/searxng/searxng/pull/2492
Closes: https://github.com/searxng/searxng/pull/2504
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-29 14:33:19 +02:00
dalf
fbb72fc1f4 Update searx.data - update_engine_descriptions.py 2023-06-29 13:59:25 +02:00
Markus Heiser
749b04ac1a
Merge [feat] engine: implementation of Anna's Archive
Anna's Archive [1] is a free non-profit online shadow library metasearch engine
providing access to a variety of book resources (also via IPFS), created by a
team of anonymous archivists [2].

[1] https://annas-archive.org/
[2] https://annas-software.org/AnnaArchivist/annas-archive
2023-06-29 13:56:19 +02:00
Markus Heiser
87e7926ae9 [fix] engine: Anna's Archive - grep results from '.js-scroll-hidden' elements
The renderuing of the WEB page is very strange; except the firts position all
other positions of Anna's result page are enclosed in SGML comments.  These
cooments are *uncommented* by some JS code, see query of the class
'.js-scroll-hidden' in Anna's HTML template [1].

[1] https://annas-software.org/AnnaArchivist/annas-archive/-/blob/main/allthethings/templates/macros/md5_list.html

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-29 09:32:57 +02:00
Markus Heiser
e2df6b77a3 [mod] engine: Anna's Archive - additionl settings (content, sort, ext)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-29 09:32:57 +02:00
Markus Heiser
eafc2906f1 [mod] engine: Anna's Archive - fetch search arguments from search form
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-29 09:32:57 +02:00
Paolo Basso
7adb9090e5 [mod] engine: Anna's Archive - add language support 2023-06-29 09:32:57 +02:00
Paolo Basso
e5637fe7b9 [feat] engine: implementation of Anna's Archive
Anna's Archive [1] is a free non-profit online shadow library metasearch engine
providing access to a variety of book resources (also via IPFS), created by a
team of anonymous archivists [2].

[1] https://annas-archive.org/
[2] https://annas-software.org/AnnaArchivist/annas-archive
2023-06-29 09:32:57 +02:00
Markus Heiser
fd26f37073 [upd] make data.all
- ahmia_blacklist.txt
- currencies.json
- engine_descriptions.json
- engine_traits.json
- osm_keys_tags.json
- useragents.json

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-28 21:21:53 +02:00
Markus Heiser
0ebff871a5 [fix] update_currencies.py - AttributeError: 'str' object has no attribute 'insert'
Replace lists with one item by the item, not before last currency has been
added.  In this traceback 'MXN' is added to 'pesos' while pesos is no longer a
list as the optimization was carried out too early.

    $ ./local/py3/bin/python searxng_extra/update/update_currencies.py
    Traceback (most recent call last):
      File "searxng_extra/update/update_currencies.py", line 164, in <module>
        main()
      File "searxng_extra/update/update_currencies.py", line 157, in main
        add_currency_name(db, "pesos", 'MXN')
      File "searxng_extra/update/update_currencies.py", line 89, in add_currency_name
        iso4217_set.insert(0, iso4217)
      AttributeError: 'str' object has no attribute 'insert'

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-28 21:21:53 +02:00
Markus Heiser
efea962504 [fix] simple template: preferences - add missing icon_smal import
Related: https://github.com/searxng/searxng/commit/2149e88bdd64#r119535272
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-28 18:36:52 +02:00
Paolo Basso
401561cb58 [mod] engine torznab - refactor & option to hide links
- torznab engine using types and clearer code
- torznab option to hide torrent and magnet links.
- document the torznab engine
- add myself to authors

Closes: https://github.com/searxng/searxng/issues/1124
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-28 10:03:44 +02:00
Markus Heiser
da7c30291d [fix] Google API changed
It seems that Google is rolling out a modified WEB API [1][2].

In the past there was only the UI language in the `hl` argument but nowadays it
seems a combination of the UI language and the "search region" is mixed in this
argument and the `gl` argument has been removed.  I'm very surprised that google
is starting to mix the parameters of the UI with the parameters of the search
index.

This patch modifies the get_google_info(..) function.  Beside Google-WEB this
function is also used by other Google services, here are some examples to test
region & language of ..

- Google-WEB:    `!go dragon boat :en-CA`
- Google-News:   `!gon dragon boat :en-CA`
- Google-Videos: `!gov bmw :en-CA`
- Goolge-Images  `!goi bmw :en-CA`

- [1] https://github.com/searxng/searxng/issues/2515#issuecomment-1606294635
- [2] https://github.com/searxng/searxng/issues/2515#issuecomment-1607150817

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-26 18:28:09 +02:00
Markus Heiser
e8706fb738 [fix] engine & network issues / documentation and type annotations
This patch fixes some quirks and issues related to the engines and the network.
Each engine has its own network and this network was broken for the following
engines[1]:

- archlinux
- bing
- dailymotion
- duckduckgo
- google
- peertube
- startpage
- wikipedia

Since the files have been touched anyway, the type annotaions of the engine
modules has also been completed so that error messages from the type checker are
no longer reported.

Related and (partial) fixed issue:

- [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
- [2] https://github.com/searxng/searxng/issues/2513
- [3] https://github.com/searxng/searxng/issues/2515

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-25 13:58:26 +02:00
searxng-bot
2e4a435134 [translations] update from Weblate
9512b92a - 2023-06-23 - Coccocoas_Helper <coccocoahelper@gmail.com>
ca08c51e - 2023-06-23 - Coccocoas_Helper <coccocoahelper@gmail.com>
56ad4f21 - 2023-06-21 - return42 <markus.heiser@darmarit.de>
3ee419d6 - 2023-06-21 - return42 <markus.heiser@darmarit.de>
2023-06-23 09:34:46 +02:00
Markus Heiser
86db08793b [fix] implement a JSONEncoder for the json format
This patch implements a simple JSONEncoder just to fix #2502 / on the long term
SearXNG needs a data schema for the result items and a json generator for the
result list.

Closes: https://github.com/searxng/searxng/issues/2505
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-19 19:49:44 +02:00
Markus Heiser
fa1ef9a07b [mod] move some code from webapp module to webutils module (no functional change)
Over the years the webapp module became more and more a mess.  To improve the
modulaization a little this patch moves some implementations from the webapp
module to webutils module.

HINT: this patch brings non functional change

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-19 19:49:44 +02:00
searxng-bot
71b6ff07ca [translations] update from Weblate
98f61c70 - 2023-06-15 - alexgabi <alexgabi@disroot.org>
a1679b93 - 2023-06-13 - return42 <markus.heiser@darmarit.de>
ebd1d574 - 2023-06-13 - return42 <markus.heiser@darmarit.de>
b28a1da3 - 2023-06-13 - return42 <markus.heiser@darmarit.de>
56409bf0 - 2023-06-11 - return42 <markus.heiser@darmarit.de>
abc4916c - 2023-06-10 - return42 <markus.heiser@darmarit.de>
b1900abe - 2023-06-10 - return42 <markus.heiser@darmarit.de>
b48e84c4 - 2023-06-10 - return42 <markus.heiser@darmarit.de>
bf395e32 - 2023-06-10 - return42 <markus.heiser@darmarit.de>
c9c0a3c9 - 2023-06-10 - return42 <markus.heiser@darmarit.de>
3f50d31e - 2023-06-10 - return42 <markus.heiser@darmarit.de>
9da1c142 - 2023-06-09 - artnay <jiri.gronroos@iki.fi>
2023-06-16 09:20:43 +02:00
Markus Heiser
825846ed4b [doc] settings.yml: add missing $SEARXNG_REDIS_URL to the docs
Closes: https://github.com/searxng/searxng/issues/2499
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-16 07:49:41 +02:00
Markus Heiser
1f0fb3122d [doc] code and sytle injection is not supported by the simple theme
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-13 11:57:40 +02:00
Markus Heiser
5f11155ccb
Merge pull request #2494 from searxng/dependabot/pip/master/pallets-sphinx-themes-2.1.1
Bump pallets-sphinx-themes from 2.1.0 to 2.1.1
2023-06-10 08:14:27 +02:00
dependabot[bot]
5f39b7ace0
Bump pallets-sphinx-themes from 2.1.0 to 2.1.1
Bumps [pallets-sphinx-themes](https://github.com/pallets/pallets-sphinx-themes) from 2.1.0 to 2.1.1.
- [Release notes](https://github.com/pallets/pallets-sphinx-themes/releases)
- [Changelog](https://github.com/pallets/pallets-sphinx-themes/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/pallets-sphinx-themes/compare/2.1.0...2.1.1)

---
updated-dependencies:
- dependency-name: pallets-sphinx-themes
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-10 05:20:21 +00:00
Markus Heiser
7db28944f0
Merge pull request #2493 from searxng/dependabot/pip/master/selenium-4.10.0
Bump selenium from 4.9.1 to 4.10.0
2023-06-10 07:19:41 +02:00
dependabot[bot]
f6cbc3630a
Bump selenium from 4.9.1 to 4.10.0
Bumps [selenium](https://github.com/SeleniumHQ/Selenium) from 4.9.1 to 4.10.0.
- [Release notes](https://github.com/SeleniumHQ/Selenium/releases)
- [Commits](https://github.com/SeleniumHQ/Selenium/compare/selenium-4.9.1...selenium-4.10.0)

---
updated-dependencies:
- dependency-name: selenium
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-09 07:56:59 +00:00
Markus Heiser
280a66b24b
Merge pull request #2491 from searxng/translations_update
Update translations
2023-06-09 09:25:24 +02:00
searxng-bot
1be27d5d83 [translations] update from Weblate
b40da1a3 - 2023-06-06 - return42 <markus.heiser@darmarit.de>
666ee7d4 - 2023-06-06 - return42 <markus.heiser@darmarit.de>
1e0e8ead - 2023-06-06 - return42 <markus.heiser@darmarit.de>
404b9937 - 2023-06-07 - Ivan Gabaldon <admin@inetol.net>
a627f9a1 - 2023-06-04 - return42 <markus.heiser@darmarit.de>
a234d2f8 - 2023-06-04 - gallegonovato <fran-carro@hotmail.es>
cc41f9b5 - 2023-06-02 - return42 <markus.heiser@darmarit.de>
24651eac - 2023-06-02 - return42 <markus.heiser@darmarit.de>
c37b0627 - 2023-06-02 - return42 <markus.heiser@darmarit.de>
9a435ea1 - 2023-06-02 - return42 <markus.heiser@darmarit.de>
40e0adad - 2023-06-02 - return42 <markus.heiser@darmarit.de>
6833b142 - 2023-06-02 - return42 <markus.heiser@darmarit.de>
00f397ad - 2023-06-02 - tentsbet <remendne@pentrens.jp>
7d3d4a97 - 2023-06-02 - return42 <markus.heiser@darmarit.de>
f7d713a4 - 2023-06-02 - return42 <markus.heiser@darmarit.de>
b1ec3160 - 2023-06-03 - ghose <correo@xmgz.eu>
04591a3a - 2023-06-02 - return42 <markus.heiser@darmarit.de>
cb3ac67c - 2023-06-02 - return42 <markus.heiser@darmarit.de>
fe81dbc7 - 2023-06-02 - return42 <markus.heiser@darmarit.de>
7882670f - 2023-06-02 - return42 <markus.heiser@darmarit.de>
38882f3b - 2023-06-02 - return42 <markus.heiser@darmarit.de>
c6df5047 - 2023-06-02 - return42 <markus.heiser@darmarit.de>
6ca23c3b - 2023-06-02 - return42 <markus.heiser@darmarit.de>
72f1ee09 - 2023-06-02 - return42 <markus.heiser@darmarit.de>
2023-06-09 07:07:51 +00:00
Markus Heiser
b295b497f7
Merge pull request #2484 from return42/limiter-ip_lists
[mod] limiter: blocklist and passlist (ip_lists)
2023-06-06 09:09:20 +02:00
Markus Heiser
22b13f4fa5 [mod] tools.Config.get(): add missing type annotations
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-05 14:07:19 +02:00
Markus Heiser
f3763d73ad [mod] limiter: blocklist and passlist (ip_lists)
A blocklist and a passlist can be configured in /etc/searxng/limiter.toml::

    [botdetection.ip_lists]
    pass_ip = [
      '51.15.252.168',  # IPv4 of check.searx.space
    ]

    block_ip = [
      '93.184.216.34',  # IPv4 of example.org
    ]

Closes: https://github.com/searxng/searxng/issues/2127
Closes: https://github.com/searxng/searxng/pull/2129
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-05 14:07:19 +02:00
Markus Heiser
de2f396e50
Merge pull request #2489 from return42/remove-marginalia-cfg
[fix] engines: don't spam marginalia.nu with default settings
2023-06-05 09:24:36 +02:00
Markus Heiser
f77807257b [fix] engines: don't spam marginalia.nu with default settings
The engine configuration of marginalia [2][3][4][5] spams marginalia.nu with
requests from SearXNG instances [1].  It is not in the interest of SearXNG to
disturb other FOSS projects, so the engine will be removed::

    - name: marginalia
      engine: json_engine
      shortcut: mar
      categories: general
      paging: false
      # Key and license: https://www.marginalia.nu/marginalia-search/api/
      # index: 0 popular, 1 blogs, 2 big_sites, 3 default, 4 experimental
      search_url: https://api.marginalia.nu/<insert your key here>/search/{query}?index=4&count=20
      results_query: results
      url_query: url
      title_query: title
      content_query: description
      timeout: 1.5
      disabled: true
      about:
        website: https://www.marginalia.nu/
        official_api_documentation: https://api.marginalia.nu/
        use_official_api: true
        require_api_key: true
        results: JSON

[1] https://github.com/searxng/searxng/issues/1673
[2] https://github.com/searxng/searxng/pull/1627
[3] https://github.com/searxng/searxng/issues/1620
[4] https://news.ycombinator.com/item?id=35874640
[5] d82a858491/code/services-satellite/api-service/src/main/java/nu/marginalia/api/svc/ResponseCache.java (L12-L20)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-05 08:23:17 +02:00
Markus Heiser
80aaef6c95
Merge pull request #2357 / limiter -> botdetection
The monolithic implementation of the limiter was divided into methods and
implemented in the Python package searx.botdetection.  Detailed documentation on
the methods has been added.

The methods are divided into two groups:

1. Probe HTTP headers

- Method http_accept
- Method http_accept_encoding
- Method http_accept_language
- Method http_connection
- Method http_user_agent

2. Rate limit:

- Method ip_limit
- Method link_token (new)

The (reduced) implementation of the limiter is now in the module
searx.botdetection.limiter.  The first group was transferred unchanged to this
module.  The ip_limit contains the sliding windows implemented by the limiter so
far.

This merge also fixes some long outstandig issue:

- limiter does not evaluate the Accept-Language correct [1]
- limiter needs a IPv6 prefix to block networks instead of IPs [2]

Without additional configuration the limiter works as before (apart from the
bugfixes).  For the commissioning of additional methods (link_toke), a
configuration must be made in an additional configuration file.  Without this
configuration, the limiter runs as before (zero configuration).

The ip_limit Method implements the sliding windows of the vanilla limiter,
additionally the link_token method can be used in this method.  The link_token
method can be used to investigate whether a request is suspicious. To activate
the link_token method in the ip_limit method add the following to your
/etc/searxng/limiter.toml::

    [botdetection.ip_limit]
    link_token = true


[1] https://github.com/searxng/searxng/issues/2455
[2] https://github.com/searxng/searxng/issues/2477
2023-06-03 06:00:15 +02:00
Markus Heiser
1a1ab34d9d [fix] URL percent-encoding in translations fail in babel
Closes: https://github.com/searxng/searxng/issues/2482
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-02 20:30:41 +02:00
Markus Heiser
1541f8660e
Merge pull request #2481 / [mod] template preferences: split into elements
HINT: this patch has no functional change / it is the preparation for following changes and bugfixes

Over the years, the preferences template became an unmanageable beast. To make the source code more readable the monolith is splitted into elements. The splitting into elements also has the advantage that a new template can make use of them.

The reversed checkbox is a quirk that is only used in the prefereces and must be eliminated in the long term. For this the macro 'checkbox_onoff_reversed' was added to the preferences.html template. The 'checkbox' macro is also a quirk of the preferences.html we don't want to use in other templates (it is an input-checkbox in a HTML form that was misused for status display).
2023-06-02 19:55:43 +02:00
Markus Heiser
b867c39ce0 [build] /static 2023-06-02 19:05:43 +02:00
Markus Heiser
2149e88bdd [mod] template preferences: split into elements (no functional change)
HINT: this patch has no functional change / it is the preparation for following
      changes and bugfixes

Over the years, the preferences template became an unmanageable beast.  To make
the source code more readable the monolith is splitted into elements.  The
splitting into elements also has the advantage that a new template can make use
of them.

The reversed checkbox is a quirk that is only used in the prefereces and must be
eliminated in the long term.  For this the macro 'checkbox_onoff_reversed' was
added to the preferences.html template.  The 'checkbox' macro is also a quirk of
the preferences.html we don't want to use in other templates (it is an
input-checkbox in a HTML form that was misused for status display).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-02 19:05:43 +02:00
dependabot[bot]
d289a8b225 Bump typing-extensions from 4.6.2 to 4.6.3
Bumps [typing-extensions](https://github.com/python/typing_extensions) from 4.6.2 to 4.6.3.
- [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/python/typing_extensions/compare/4.6.2...4.6.3)

---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-02 10:04:46 +02:00
searxng-bot
789b43ab60 [translations] update from Weblate
5344314f - 2023-05-30 - return42 <markus.heiser@darmarit.de>
ee8fd955 - 2023-06-01 - BBTranslate <357835338@qq.com>
1ce31caf - 2023-05-29 - return42 <markus.heiser@darmarit.de>
fe75c53d - 2023-05-29 - return42 <markus.heiser@darmarit.de>
ca60af52 - 2023-05-30 - return42 <markus.heiser@darmarit.de>
f34b88f3 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
22d76a26 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
43d8c982 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
43a92e85 - 2023-05-30 - return42 <markus.heiser@darmarit.de>
2bfc12dd - 2023-05-29 - return42 <markus.heiser@darmarit.de>
e2b5fb5f - 2023-05-29 - return42 <markus.heiser@darmarit.de>
9f088420 - 2023-05-30 - return42 <markus.heiser@darmarit.de>
bdf81b4c - 2023-05-29 - return42 <markus.heiser@darmarit.de>
f6a24c5d - 2023-05-30 - return42 <markus.heiser@darmarit.de>
01bcea56 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
8c0209f8 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
c629c610 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
a4e4945d - 2023-05-29 - return42 <markus.heiser@darmarit.de>
96bad166 - 2023-06-01 - mradalbert <mister.adalbert@gmail.com>
b0032d90 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
366adaef - 2023-05-29 - return42 <markus.heiser@darmarit.de>
2e4271bf - 2023-05-29 - return42 <markus.heiser@darmarit.de>
c5856fd6 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
790b5a6f - 2023-05-29 - return42 <markus.heiser@darmarit.de>
6c9f92a9 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
f5a6a35d - 2023-05-29 - return42 <markus.heiser@darmarit.de>
4c8eeb32 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
7b8c0618 - 2023-05-30 - nicfab <nicfab@icloud.com>
4e851dd4 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
0fa6006e - 2023-05-29 - return42 <markus.heiser@darmarit.de>
877f4396 - 2023-05-30 - return42 <markus.heiser@darmarit.de>
c3bb1da7 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
e66e6fae - 2023-05-30 - return42 <markus.heiser@darmarit.de>
1cac4771 - 2023-05-30 - return42 <markus.heiser@darmarit.de>
949e994f - 2023-05-28 - ghose <correo@xmgz.eu>
8b181582 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
65f8fb93 - 2023-05-30 - return42 <markus.heiser@darmarit.de>
e5088e1c - 2023-05-29 - return42 <markus.heiser@darmarit.de>
f151100c - 2023-05-29 - return42 <markus.heiser@darmarit.de>
51d169fa - 2023-05-29 - return42 <markus.heiser@darmarit.de>
e68ac961 - 2023-05-30 - return42 <markus.heiser@darmarit.de>
c336c5a1 - 2023-05-31 - dom1torii <djmdmitri.a@gmail.com>
88bda0d0 - 2023-05-30 - Fijxu <fijxu@zzls.xyz>
6a57c29a - 2023-05-29 - return42 <markus.heiser@darmarit.de>
0c585b4d - 2023-05-30 - return42 <markus.heiser@darmarit.de>
e8ca9891 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
817b2da4 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
6b2508aa - 2023-05-29 - return42 <markus.heiser@darmarit.de>
3a5b1842 - 2023-05-30 - return42 <markus.heiser@darmarit.de>
fd826ab8 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
a3938c43 - 2023-05-30 - return42 <markus.heiser@darmarit.de>
30cad6b2 - 2023-05-30 - Ivan Gabaldon <admin@inetol.net>
e997055f - 2023-05-30 - return42 <markus.heiser@darmarit.de>
de6bd3d8 - 2023-05-30 - return42 <markus.heiser@darmarit.de>
ba5e0129 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
e48fd248 - 2023-05-29 - return42 <markus.heiser@darmarit.de>
b0e7d3f1 - 2023-05-30 - return42 <markus.heiser@darmarit.de>
2023-06-02 09:34:36 +02:00
Markus Heiser
80af38d37b [mod] increase SUSPICIOUS_IP_WINDOW from one day to 30 days
In my tests I see bots rotating IPs (with endless IP lists).  If such a bot has
100 IPs and has three attempts (SUSPICIOUS_IP_MAX = 3) then it can successfully
send up to 300 requests in one day while rotating the IP.  To block the bots for
a longer period of time the SUSPICIOUS_IP_WINDOW, as the time period in which an
IP is observed, must be increased.

For normal WEB-browsers this is no problem, because the SUSPICIOUS_IP_WINDOW is
deleted as soon as the CSS with the token is loaded.

SUSPICIOUS_IP_WINDOW = 3600 * 24 * 30
  Time (sec) before sliding window for one suspicious IP expires.

SUSPICIOUS_IP_MAX = 3
  Maximum requests from one suspicious IP in the :py:obj:`SUSPICIOUS_IP_WINDOW`."""

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-01 16:00:49 +02:00
Markus Heiser
281e36f4b7 [fix] limiter: replace real_ip by IPv4/v6 network
Closes: https://github.com/searxng/searxng/issues/2477
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-01 15:51:14 +02:00
Markus Heiser
38431d2e14 [fix] correct determination of the IP for the request
For correct determination of the IP to the request the function
botdetection.get_real_ip() is implemented.  This fonction is used in the
ip_limit and link_token method of the botdetection and it is used in the
self_info plugin.

A documentation about the X-Forwarded-For header has been added.

[1] https://github.com/searxng/searxng/pull/2357#issuecomment-1566211059

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-01 14:38:53 +02:00
Markus Heiser
b8c7c2c9aa [mod] botdetection - improve ip_limit and link_token methods
- counting requests in LONG_WINDOW and BURST_WINDOW is not needed when the
  request is validated by the link_token method [1]

- renew a ping-key on validation [2], this is needed for infinite scrolling,
  where no new token (CSS) is loaded. / this does not fix the BURST_MAX issue in
  the vanilla limiter

- normalize the counter names of the ip_limit method to 'ip_limit.*'

- just integrate the ip_limit method straight forward in the limiter plugin /
  non intermediate code --> ip_limit now returns None or a werkzeug.Response
  object that can be passed by the plugin to the flask application / non
  intermediate code that returns a tuple

[1] https://github.com/searxng/searxng/pull/2357#issuecomment-1566113277
[2] https://github.com/searxng/searxng/pull/2357#discussion_r1208542206
[3] https://github.com/searxng/searxng/pull/2357#issuecomment-1566125979

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-01 14:38:53 +02:00
Markus Heiser
52f1452c09 [mod] limiter: ip_limt - monitore suspicious IPs
To intercept bots that get their IPs from a range of IPs, there is a
``SUSPICIOUS_IP_WINDOW``.  In this window the suspicious IPs are stored for a
longer time.  IPs stored in this sliding window have a maximum of
``SUSPICIOUS_IP_MAX`` accesses before they are blocked.  As soon as the IP makes
a request that is not suspicious, the sliding window for this IP is droped.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-01 14:38:53 +02:00
Markus Heiser
9d7456fd6c [fix] limiter.toml: botdetection.ip_limit turn off link_token by default
To activate the ``link_token`` method in the ``ip_limit`` method add the
following to your ``/etc/searxng/limiter.toml``::

   [botdetection.ip_limit]
   link_token = true

Related: https://github.com/searxng/searxng/pull/2357#issuecomment-1554116941
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-01 14:38:53 +02:00
Markus Heiser
66fdec0eb9 [mod] limiter: add config file /etc/searxng/limiter.toml
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-01 14:38:53 +02:00
Markus Heiser
1ec325adcc [mod] limiter -> botdetection: modularization and documentation
In order to be able to meet the outstanding requirements, the implementation is
modularized and supplemented with documentation.

This patch does not contain functional change, except it fixes issue #2455

----

Aktivate limiter in the settings.yml and simulate a bot request by::

    curl -H 'Accept-Language: de-DE,en-US;q=0.7,en;q=0.3' \
         -H 'Accept: text/html'
         -H 'User-Agent: xyz' \
         -H 'Accept-Encoding: gzip' \
         'http://127.0.0.1:8888/search?q=foo'

In the LOG:

    DEBUG   searx.botdetection.link_token : missing ping for this request: .....

Since ``BURST_MAX_SUSPICIOUS = 2`` you can repeat the query above two time
before you get a "Too Many Requests" response.

Closes: https://github.com/searxng/searxng/issues/2455
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-05-29 14:54:56 +02:00
Markus Heiser
5226044c13 [mod] limiter: add random token to the limiter URL
By adding a random component in the limiter URL a bot can no longer send a ping
by request a static URL.

Related: https://github.com/searxng/searxng/pull/2357#issuecomment-1518525094
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-05-29 14:54:56 +02:00
Markus Heiser
dba569462d [mod] limiter: reduce request rates for requests without a ping
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-05-29 14:54:56 +02:00
dalf
c1b5ff7e1c Update searx.data - update_engine_descriptions.py 2023-05-29 07:28:50 +02:00
dalf
2ba50d392e Update searx.data - update_currencies.py 2023-05-29 07:28:18 +02:00