Remove the usage of searx.network.multi_requests
The results from Bing contains the target URL encoded in base64
See the u parameter, remove the first two character "a1", and done.
Also add a comment the check of the result_len / pageno
( from https://github.com/searx/searx/pull/1387 )
It seems there is an API change:
extratags can be either a dictionnary or None.
This commit avoid crash when extratags is None
Test query "!osm gare du nord"
The method EngineTraits.get_region(..) returns engine's region string
that **best fits** to SearXNG's locale. This means it returns a
region (country) if only a language is set in the locale. By example the method
returns for a locale tag `es` a region `ES`.
Google's search parameter `cr` restricts search results to documents originating
in a particular country / in case of a locale tag (language) as described above,
this argument should be unset in the query send to Google.
Closes: https://github.com/searxng/searxng/issues/2672
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The search engines deliver hits for many search terms [1], but these are usually
not the focus of the user. In order to arrange these hits further down in the
list, their weighting is reduced.
[1] https://github.com/searxng/searxng/pull/2589#issuecomment-1670915089
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Show URL of the ddg-search page, not the URL of a (generic) Javascript. The
latter one is not usefull for the user.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Tis patch adds some more fields to the result items and changed paging to the
``nextResultSet`` given in seekr's JSON response.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Sadly archive.is is blocked by a CAPTCHA that can't be avoid (at least in a
XPath engine).
[1] https://github.com/searxng/searxng/issues/2643
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* this is a small fix to increase the colspan of the category in engine preferences from 7 to 8, since there was a column added
=> fixing a small fallout from 4731290317
incident:
flask_babel.gettext() does not work in the engine modules.
cause:
the request() and response() functions of the engine modules run in the
processor, whose search() method runs in a thread and in the threads the
context of the Flask app does not exist. The context of the Flask app is
needed by the gettext() function for the L10n.
Solution:
copy context of the Flask app into the threads. [1]
special case:
We cannot equip the search() method of the processors with the decorator [1],
because the decorator requires a context (Flask app) that does not yet exist
at the time of the initialization of the processors (the initialization of the
processors is part of the initialization of the Flask app).
[1] https://flask.palletsprojects.com/en/2.3.x/api/#flask.copy_current_request_context
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Disable btdigg because on most SearXNG instances, SearXNG is blocked by btdigg
due to cloudflare too many requests.
This impementation did not parse the HTML page because there is an API in
XML (RSS). The RSS feed provides fewer data like amount of seeders/leechers and
the files in the torrent file. It's a tradeoff for a "stable" engine as the XML
from RSS content will change way less than the HTML page.
Closes: https://github.com/searxng/searxng/issues/2553
The Wikimedia wikis [1] engines provide good answers and have short response
times --> no reason to disable these enhgines by default. BTW: this patch adds
a (sub-) category ``wikimedia`` for the engines [1].
[1] https://meta.wikimedia.org/wiki/Wikimedia_wikis
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
SearXNG does not allow a None value in the content field of a result item.
If the key (shortDescription, uploaderName) in the JSON response from piped
exists but is set to None, SearXNG ignores this result item::
DEBUG searx : result: invalid content: { .., 'content': None, ..}
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
`pointer-events` never gets set to "none" when the button is hidden,
allowing you to click the button. And your mouse further changes it's
cursor to the pointer style.
- re-enables z-library as the new domain zlibrary-global.se is now available
from the open web. The announcement of the domain:
https://www.reddit.com/r/zlibrary/comments/13whe08/mod_note_zlibraryglobalse_domain_is_officially/
It is an official domain, it requires to log in to the "personal" subdomain
only to download files, but the search works.
- changes the result template of zlibrary to paper.html, filling the appropriate fields
- implements language filtering for zlibrary
- implement zlibrary custom filters (engine traits)
- refactor and document the zlibrary engine
We have built up detailed documentation of the *settings* and the *engines* over
the past few years. However, this documentation was still spread over various
chapters and was difficult to navigate in its entirety.
This patch rearranges the Settings & Engines documentation for better
readability.
To review new ordered docs::
make docs.clean docs.live
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The renderuing of the WEB page is very strange; except the firts position all
other positions of Anna's result page are enclosed in SGML comments. These
cooments are *uncommented* by some JS code, see query of the class
'.js-scroll-hidden' in Anna's HTML template [1].
[1] https://annas-software.org/AnnaArchivist/annas-archive/-/blob/main/allthethings/templates/macros/md5_list.html
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- torznab engine using types and clearer code
- torznab option to hide torrent and magnet links.
- document the torznab engine
- add myself to authors
Closes: https://github.com/searxng/searxng/issues/1124
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
It seems that Google is rolling out a modified WEB API [1][2].
In the past there was only the UI language in the `hl` argument but nowadays it
seems a combination of the UI language and the "search region" is mixed in this
argument and the `gl` argument has been removed. I'm very surprised that google
is starting to mix the parameters of the UI with the parameters of the search
index.
This patch modifies the get_google_info(..) function. Beside Google-WEB this
function is also used by other Google services, here are some examples to test
region & language of ..
- Google-WEB: `!go dragon boat :en-CA`
- Google-News: `!gon dragon boat :en-CA`
- Google-Videos: `!gov bmw :en-CA`
- Goolge-Images `!goi bmw :en-CA`
- [1] https://github.com/searxng/searxng/issues/2515#issuecomment-1606294635
- [2] https://github.com/searxng/searxng/issues/2515#issuecomment-1607150817
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch fixes some quirks and issues related to the engines and the network.
Each engine has its own network and this network was broken for the following
engines[1]:
- archlinux
- bing
- dailymotion
- duckduckgo
- google
- peertube
- startpage
- wikipedia
Since the files have been touched anyway, the type annotaions of the engine
modules has also been completed so that error messages from the type checker are
no longer reported.
Related and (partial) fixed issue:
- [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
- [2] https://github.com/searxng/searxng/issues/2513
- [3] https://github.com/searxng/searxng/issues/2515
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch implements a simple JSONEncoder just to fix#2502 / on the long term
SearXNG needs a data schema for the result items and a json generator for the
result list.
Closes: https://github.com/searxng/searxng/issues/2505
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Over the years the webapp module became more and more a mess. To improve the
modulaization a little this patch moves some implementations from the webapp
module to webutils module.
HINT: this patch brings non functional change
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
A blocklist and a passlist can be configured in /etc/searxng/limiter.toml::
[botdetection.ip_lists]
pass_ip = [
'51.15.252.168', # IPv4 of check.searx.space
]
block_ip = [
'93.184.216.34', # IPv4 of example.org
]
Closes: https://github.com/searxng/searxng/issues/2127
Closes: https://github.com/searxng/searxng/pull/2129
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The monolithic implementation of the limiter was divided into methods and
implemented in the Python package searx.botdetection. Detailed documentation on
the methods has been added.
The methods are divided into two groups:
1. Probe HTTP headers
- Method http_accept
- Method http_accept_encoding
- Method http_accept_language
- Method http_connection
- Method http_user_agent
2. Rate limit:
- Method ip_limit
- Method link_token (new)
The (reduced) implementation of the limiter is now in the module
searx.botdetection.limiter. The first group was transferred unchanged to this
module. The ip_limit contains the sliding windows implemented by the limiter so
far.
This merge also fixes some long outstandig issue:
- limiter does not evaluate the Accept-Language correct [1]
- limiter needs a IPv6 prefix to block networks instead of IPs [2]
Without additional configuration the limiter works as before (apart from the
bugfixes). For the commissioning of additional methods (link_toke), a
configuration must be made in an additional configuration file. Without this
configuration, the limiter runs as before (zero configuration).
The ip_limit Method implements the sliding windows of the vanilla limiter,
additionally the link_token method can be used in this method. The link_token
method can be used to investigate whether a request is suspicious. To activate
the link_token method in the ip_limit method add the following to your
/etc/searxng/limiter.toml::
[botdetection.ip_limit]
link_token = true
[1] https://github.com/searxng/searxng/issues/2455
[2] https://github.com/searxng/searxng/issues/2477
HINT: this patch has no functional change / it is the preparation for following
changes and bugfixes
Over the years, the preferences template became an unmanageable beast. To make
the source code more readable the monolith is splitted into elements. The
splitting into elements also has the advantage that a new template can make use
of them.
The reversed checkbox is a quirk that is only used in the prefereces and must be
eliminated in the long term. For this the macro 'checkbox_onoff_reversed' was
added to the preferences.html template. The 'checkbox' macro is also a quirk of
the preferences.html we don't want to use in other templates (it is an
input-checkbox in a HTML form that was misused for status display).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In my tests I see bots rotating IPs (with endless IP lists). If such a bot has
100 IPs and has three attempts (SUSPICIOUS_IP_MAX = 3) then it can successfully
send up to 300 requests in one day while rotating the IP. To block the bots for
a longer period of time the SUSPICIOUS_IP_WINDOW, as the time period in which an
IP is observed, must be increased.
For normal WEB-browsers this is no problem, because the SUSPICIOUS_IP_WINDOW is
deleted as soon as the CSS with the token is loaded.
SUSPICIOUS_IP_WINDOW = 3600 * 24 * 30
Time (sec) before sliding window for one suspicious IP expires.
SUSPICIOUS_IP_MAX = 3
Maximum requests from one suspicious IP in the :py:obj:`SUSPICIOUS_IP_WINDOW`."""
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
For correct determination of the IP to the request the function
botdetection.get_real_ip() is implemented. This fonction is used in the
ip_limit and link_token method of the botdetection and it is used in the
self_info plugin.
A documentation about the X-Forwarded-For header has been added.
[1] https://github.com/searxng/searxng/pull/2357#issuecomment-1566211059
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- counting requests in LONG_WINDOW and BURST_WINDOW is not needed when the
request is validated by the link_token method [1]
- renew a ping-key on validation [2], this is needed for infinite scrolling,
where no new token (CSS) is loaded. / this does not fix the BURST_MAX issue in
the vanilla limiter
- normalize the counter names of the ip_limit method to 'ip_limit.*'
- just integrate the ip_limit method straight forward in the limiter plugin /
non intermediate code --> ip_limit now returns None or a werkzeug.Response
object that can be passed by the plugin to the flask application / non
intermediate code that returns a tuple
[1] https://github.com/searxng/searxng/pull/2357#issuecomment-1566113277
[2] https://github.com/searxng/searxng/pull/2357#discussion_r1208542206
[3] https://github.com/searxng/searxng/pull/2357#issuecomment-1566125979
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To intercept bots that get their IPs from a range of IPs, there is a
``SUSPICIOUS_IP_WINDOW``. In this window the suspicious IPs are stored for a
longer time. IPs stored in this sliding window have a maximum of
``SUSPICIOUS_IP_MAX`` accesses before they are blocked. As soon as the IP makes
a request that is not suspicious, the sliding window for this IP is droped.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To activate the ``link_token`` method in the ``ip_limit`` method add the
following to your ``/etc/searxng/limiter.toml``::
[botdetection.ip_limit]
link_token = true
Related: https://github.com/searxng/searxng/pull/2357#issuecomment-1554116941
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In order to be able to meet the outstanding requirements, the implementation is
modularized and supplemented with documentation.
This patch does not contain functional change, except it fixes issue #2455
----
Aktivate limiter in the settings.yml and simulate a bot request by::
curl -H 'Accept-Language: de-DE,en-US;q=0.7,en;q=0.3' \
-H 'Accept: text/html'
-H 'User-Agent: xyz' \
-H 'Accept-Encoding: gzip' \
'http://127.0.0.1:8888/search?q=foo'
In the LOG:
DEBUG searx.botdetection.link_token : missing ping for this request: .....
Since ``BURST_MAX_SUSPICIOUS = 2`` you can repeat the query above two time
before you get a "Too Many Requests" response.
Closes: https://github.com/searxng/searxng/issues/2455
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>