Implement a scrapper for DuckDuckGo-Lite [1]. The existing DuckDuckGo [2]
engine does not support paging. DuckDuckgo-Lite is much faster, less verbose
and does have a paging option (reversed engineered from the input form of [1]).
[1] https://lite.duckduckgo.com/lite
[2] https://duckduckgo.com/
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* download images using the "image_proxy" network (HTTP/1 instead of HTTP/2)
* don't cache data: URL (reduce memory usage)
* after each test: purge image URL cache then call garbage collector
* download only the first 64kb of images
The default *sans-serif* font from the browsers most often renders much better
compared to Arial font.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- using more rem in style and definitions
- mobile width in preferences.less fix max-width: 75em to 80em (normalized with
style.less and other)
- do not display #backToTop position on tablet (when max-width: 80em)
- fix answer box on mobile (when max-width: 50em)
If there is no write access, there is no need for global. Remove global
statement if there is no assignment.
global-variable-not-assigned:
Using global for names but no assignment is done Used when a variable is
defined through the "global" statement but no assignment to this variable is
done.
In Pylint 2.11 the global-variable-not-assigned checker now catches global
variables that are never reassigned in a local scope and catches (reassigned)
functions [1][2]
[1] https://pylint.pycqa.org/en/latest/whatsnew/2.11.html
[2] https://github.com/PyCQA/pylint/issues/1375
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
httpx.RequestError (subclass of httpx.HTTPError) has a property request.
This property raises a RuntimeError if the attributes _request is None.
To avoid a cascade of errors, this commit reads directly the _request attribute.
searx.client.new_client: the proxies parameter is a dictonnary,
and the protocol (key of the dictionnary) is already normalized
(see usage of searx.network.network.PROXY_PATTERN_MAPPING)
the openstreetmap engine imports code from the wikidata engine.
before this commit, specific code make sure to copy the logger variable to the wikidata engine.
with this commit searx.engines.load_engine makes sure the .logger is initialized.
The implementation scans sys.modules for module name starting with searx.engines.
Currently, searx.search.Search calls on_result once the engine results have been merged
(ResultContainer.order_results).
on_result plugins can rewrite the results: once the URL(s) are modified, even they can be merged,
it won't be the case since ResultContainer.order_results has already be called.
This commit call on_result inside for each result of each engines.
In addition the on_result function can return False to remove the result.
Note: the on_result function now run on the engine thread instead of the Flask thread.
close#298
This is a workaround: inside engine code, any call to function in another engine can crash
since the logger won't be initialized except if it is done explicitly.
Instead of raising an exception and therefore hiding all results of the engine.
It make sense to remove that requirement in order to allow the implementation of
search engines that do not always have a description. In fact some search
engines that in 99% of the case have a description like Brave Search or Mojeek
crash completely if they for some reason included a result with no description.
To test this patch try Mojeek:
!mjk xyz
before and after the patch.
Suggested-by: 0xhtml in https://github.com/searx/searx/discussions/2933
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Suggestions should be added too.
suggestion_xpath: //div[@class="text-gray h6"]/a
You can try it with:
!brave recurzuoin
Suggested-by: @allendema in https://github.com/searx/searx/issues/2857#issuecomment-904837023
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Pylint 2.10 fixed [1]:
Fixed bug with cell-var-from-loop checker: it no longer has false negatives
when both unused-variable and used-before-assignment are disabled.
[1] https://pylint.pycqa.org/en/latest/whatsnew/2.10.html
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Pylint 2.10 added new default checks [1]:
use-list-literal
Emitted when list() is called with no arguments instead of using []
use-dict-literal
Emitted when dict() is called with no arguments instead of using {}
[1] https://pylint.pycqa.org/en/latest/whatsnew/2.10.html
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- Add ``# lint: pylint`` header to pylint this python file.
- Fix issues reported by pylint.
- Add source code documentation of modul searx.locales
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In commit 4d3f2f48d common used languages has been droped. By reducing the
number of `min_engines_per_lang` from 15 to 13 we get theses languages back.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This commit remove the need to update the brand for GIT_URL and GIT_BRANCH:
there are read from the git repository.
It is possible to call python -m searx.version freeze to freeze the current version.
Useful when the code is installed outside git (distro package, docker, etc...)
Deny formats has been implemented in 6ed4616d.
To harden SearXNG instances by default, other formats than HTML should be
denied. Most of JSON, RSS and CSV requests are bots [1]::
Bots are the only users of this feature on a public instance, and they abuse
it too much that the engines rate limit pretty quickly the IP address of the
instance.
[1] https://github.com/searxng/searxng/issues/95
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Do not merge this patch in master branch of SearXNG! This branch exists only
for testing the feature branch fix-searx.sh @return42.
This patch changes the buildenv to::
GIT_URL='https://github.com/return42/searxng'
GIT_BRANCH='fix-searx.sh'
SEARX_PORT='7777'
SEARX_BIND_ADDRESS='127.0.0.12'
To test installation procedure, clone feature branch (fix-searx.sh)::
$ cd ~/Downloads
$ git clone --branch fix-searx.sh https://github.com/return42/searxng searxng
$ cd searxng
$ ./utils/searx.sh install all
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Not all settings from the 'brand:' section of the YAML files are needed in the
shell scripts. This patch reduce the variables in ./utils/brand.env to what is
needed. The following ('brand:' settings) can be removed from this file:
- ISSUE_URL
- DOCS_URL
- PUBLIC_INSTANCES
- WIKI_URL
Tasks running outside of an *installed instance*, need the following settings
from the YAML configuration:
- GIT_URL <--> brand.git_url
- GIT_BRANCH <--> brand.git_branch
- SEARX_URL <--> server.base_url (aka PUBLIC_URL)
- SEARX_PORT <--> server.port
- SEARX_BIND_ADDRESS <--> server.bind_address
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
modified docs/admin/engines/settings.rst
- Fix documentation and add section 'brand'.
- Add remarks about **buildenv** variables.
- Add remarks about settings from environment variables $SEARX_DEBUG,
$SEARX_PORT, $SEARX_BIND_ADDRESS and $SEARX_SECRET
modified docs/admin/installation-searx.rst & docs/build-templates/searx.rst
Fix template location /templates/etc/searx/settings.yml
modified docs/dev/makefile.rst
Add description of the 'make buildenv' target and describe
- we have all SearXNG setups are centralized in the settings.yml file
- why some tasks need a utils/brand.env (aka instance's buildenv)
modified manage
Settings file from repository's working tree are used by default and
ask user if a /etc/searx/settings.yml file exists.
modified searx/settings.yml
Add comments about when it is needed to run 'make buildenv'
modified searx/settings_defaults.py
Default for server:port is taken from enviroment variable SEARX_PORT.
modified utils/build_env.py
- Some defaults in the settings.yml are taken from the environment,
e.g. SEARX_BIND_ADDRESS (searx.settings_defaults.SHEMA). When the
'brand.env' file is created these enviroment variables should be
unset first.
- The CONTACT_URL enviroment is not needed in the utils/brand.env
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The usages of the searx.brand namespace has been removed, the searx.brand
namespace is now longer needed.
The searx.brand namespace was an interim solution which has been added in commit
9e53470b4, see commit message there ...
This patch is a first 'proof of concept'. Later we can decide to remove the
brand namespace entirely or not.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the templates and the /config (JSON) the usage of the 'brand.*' name
space is replaced by 'searx.get_setting' function.
- new_issue_url --> get_setting('brand.new_issue_url')
- brand.GIT_URL --> get_setting('brand.git_url')
- brand.PUBLIC_INSTANCES --> get_setting('brand.public_instances')
- brand.DOCS_URL --> get_setting('brand.docs_url')
- brand.ISSUE_URL --> get_setting('brand.issue_url')
- brand.CONTACT_URL --> get_setting('general.contact_url', '')
The macro 'new_issue' from searx/templates/*/messages/no_results.html
is now imported with context::
{% from '__common__/new_issue.html' import new_issue with context %}
To get *public instances URL* from context's 'get_setting()' function::
get_setting('brand.public_instances','')
Macro's prototype does no longer need the 'new_issue_url' argument and has been
changed to::
macro new_issue(engine_name, engine_reliability)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Added function searx.get_setting(name, default=_unset):
Returns the value to which ``name`` point. If there is no such name in the
settings and the ``default`` is unset, a KeyError exception is raised.
In all the python processes ..
- make docs
- make buildenv
- make install (setup.py)
the usage of the 'brand.*' name space is replaced by 'searx.get_setting'
function.
- brand.SEARX_URL --> get_setting('server.base_url')
- brand.GIT_URL --> get_setting('brand.git_url')
- brand.GIT_BRANCH' --> get_setting('server.base_url')
- brand.ISSUE_URL --> get_setting('brand.issue_url')
- brand.DOCS_URL --> get_setting('brand.docs_url')
- brand.PUBLIC_INSTANCES --> get_setting('brand.public_instances')
- brand.CONTACT_URL --> get_setting('general.contact_url', '')
- brand.WIKI_URL --> get_setting('brand.wiki_url')
- brand.TWITTER_URL --> get_setting('brand.twitter_url', '')
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Qwant is a fast and reliable search engine and AFAIK there is no CAPTCHA. Let
us enable Qwant engines by default.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The implementation uses the Qwant API (https://api.qwant.com/v3). The API is
undocumented but can be reverse engineered by reading the network log of
https://www.qwant.com/ queries.
This implementation is used by different qwant engines in the settings.yml::
- name: qwant
categories: general
...
- name: qwant news
categories: news
...
- name: qwant images
categories: images
...
- name: qwant videos
categories: videos
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The engine was added in commit a4b07460 but now it shows new issues [1].
In the 90'th of the last century, dogpile had its own WEB index, but nowadays it
is a meta-search engine [2]
Powered by technology, Dogpile returns all the best results from leading
search engines including Google and Yahoo!
Using dogpile as an engine in SearXNG needs more investigation, a XPath solution
like we have is not enough. It is questionable whether it still makes sense to
investigate more into a meta-search engine with a ReCAPTCHA in front.
With this patch the dogpile engine is removed
[1] https://github.com/searxng/searxng/issues/202
[2] https://www.dogpile.com/support/aboutus
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Engine just for Podcasts. An API which returns Podcasts and their Info like:
website, author etc.
Upstream query example: https://gpodder.net/search.json?q=linux
Added synonyme.woxikon.de using the xpath engine. Adds a site which returns
word synonyms although just in German.
Depending on the query not all synonyms are shown because of not the best xpath
selection. But should do the job just fine.
Upstream example query: https://synonyme.woxikon.de/synonyme/test.php
BTW add about section to the YAML configuration
It now shows descriptions with their correct URLs when there are videos in the
search results, pulling content_xpath from snippet-description instead of
snippet-content.
Suggested-by: @eagle-dogtooth https://github.com/searx/searx/issues/2857#issuecomment-869119968
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Upgrade from pylint v2.8.3 to 2.9.3 raise some new issues::
searx/search/checker/__main__.py:37:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
searx/search/checker/__main__.py:38:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
searx/search/processors/__init__.py:20:0: R0402: Use 'from searx import engines' instead (consider-using-from-import)
searx/preferences.py:182:19: C0207: Use data.split('-', maxsplit=1)[0] instead (use-maxsplit-arg)
searx/preferences.py:506:15: R1733: Unnecessary dictionary index lookup, use 'user_setting' instead (unnecessary-dict-index-lookup)
searx/webapp.py:436:0: C0206: Consider iterating with .items() (consider-using-dict-items)
searx/webapp.py:950:4: C0206: Consider iterating with .items() (consider-using-dict-items)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To compress saved preferences in the URL was introduced in 5f758b2d3 and
slightly fixed in 8f4401462. But the main fail was not fixed; The decompress
function returns a binary string and this binary should first be decoded to a
string before it is passed to urllib.parse_qs.
BTW: revert the hot-fix from 5973491
Related-to: https://github.com/searxng/searxng/issues/166
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- move jshint option from gruntfile to .jshintrc
- remove trailing-whitespace from gruntfile and
- add jshint esversion: 6
- .dir-locals.el add locals for js-mode to use JSHint from the simple theme
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch disables role 'no-descending-specificity'. IMO it is better to have
this rule active (see below [1]), but it is hard to rewrite the less files to
pass this rule, so for the first I chose to disable this rule.
---
Source order is important in CSS, and when two selectors have the same
specificity, the one that occurs last will take priority. However, the situation
is different when one of the selectors has a higher specificity. In that case,
source order does not matter: the selector with higher specificity will win out
even if it comes first.
The clashes of these two mechanisms for prioritization, source order and
specificity, can cause some confusion when reading stylesheets. If a selector
with higher specificity comes before the selector it overrides, we have to think
harder to understand it, because it violates the source order
expectation. Stylesheets are most legible when overriding selectors always come
after the selectors they override. That way both mechanisms, source order and
specificity, work together nicely.
This rule enforces that practice as best it can, reporting fewer errors than it
should. It cannot catch every actual overriding selector, but it can catch
certain common mistakes.
[1] https://stylelint.io/user-guide/rules/list/no-descending-specificity/
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This fix was autogenerated by::
npx stylelint -f unix --fix 'searx/static/themes/simple/src/less/**/*.less'
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking. To get the consent
from the user, google-news requests are redirected to confirm and get a CONSENT
Cookie from https://consent.google.de/s?continue=...
This patch adds a CONSENT Cookie to the google-news request to avoid
redirection.
The behavior of the CONTENTS cookies over all google engines seems similar but
the pattern is not yet fully clear to me, here are some random samples from my
analysis ..
Using common google search from different domains::
google.com: CONSENT=YES+cb.{{date}}-14-p0.de+FX+816
google.de: CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
google.fr: CONSENT=YES+srp.gws-{{date}}-0-RC2.fr+FX+826
When searching about videos (google-videos)::
google.es: CONSENT=YES+srp.gws-{{date}}-0-RC2.es+FX+076
google.de: CONSENT=YES+srp.gws-{{date}}-0-RC2.de+FX+171
Google news has only one domain for all languages::
news.google.com: CONSENT=YES+cb.{{date}}-14-p0.de+FX+816
Using google-scholar search from different domains::
scholar.google.de: CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
scholar.google.fr: does not use such a cookie / did not ask the user
scholar.google.es: does not use such a cookie / did not ask the user
Interim summary:
Pattern is unclear and I won't apply the CONSENT cookie to all google engines.
More experience is need before we generalize the CONSENT cookies over all
google engines.
Related:
- e9a6ab401 [fix] youtube - send CONSENT Cookie to not be redirected
- https://github.com/benbusby/whoogle-search/issues/311
- https://github.com/benbusby/whoogle-search/issues/243
[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Since we added
- 1c67b6aec [enh] google engine: supports "default language"
there is a KeyError: 'hl in request,error pattern::
ERROR:searx.searx.search.processor.online:engine google news : exception : 'hl'
Traceback (most recent call last):
File "searx/search/processors/online.py", line 144, in search
search_results = self._search_basic(query, params)
File "searx/search/processors/online.py", line 118, in _search_basic
self.engine.request(query, params)
File "searx/engines/google_news.py", line 97, in request
if lang_info['hl'] == 'en':
KeyError: 'hl'
Closes: https://github.com/searxng/searxng/issues/154
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Before this commit, there are 3 node_modules directory:
* one in .
* two others in ./searx/statics/themes/*
This is no desirable:
* it declares the npm depdenencies in the shell script.
* dependabot can't updates theses dependencies.
* this is a not standard way to build a package (two different locations for the dependencies).
With this commit and the PR #150 there is one unique node_modules directory per theme.
This file is generated by webfont.
* It is now generated as searx/static/themes/simple/ion.less
* It is generated before the .less compilation.
* .gitignore includes this file
Add two new package depedencies: fontforge ttfautohint
See utils/searx.sh
the build of the themes updates:
* js/leaflet.js ( was leaflet/leaflet.js )
* css/leaflet.css ( was leaflet/leaflet.css )
* css/images ( was leaflet/images )
Same behaviour behaviour than Whoogle [1]. Only the google engine with the
"Default language" choice "(all)"" is changed by this patch.
When searching for a locate place, the result are in the expect language,
without missing results [2]:
> When a language is not specified, the language interpretation is left up to
> Google to decide how the search results should be delivered.
The query parameters are copied from Whoogle. With the ``all`` language:
- add parameter ``source=lnt``
- don't use parameter ``lr``
- don't add a ``Accept-Language`` HTTP header.
The new signature of function ``get_lang_info()`` is:
lang_info = get_lang_info(params, lang_list, custom_aliases, supported_any_language)
Argument ``supported_any_language`` is True for google.py and False for the other
google engines. With this patch the function now returns:
- query parameters: ``lang_info['params']``
- HTTP headers: ``lang_info['headers']``
- and as before this patch:
- ``lang_info['subdomain']``
- ``lang_info['country']``
- ``lang_info['language']``
[1] https://github.com/benbusby/whoogle-search
[2] https://github.com/benbusby/whoogle-search/releases/tag/v0.5.4
render automatically adds these variables to the template context:
* advanced_search
* all_categories
* categories
before render was checking if the variable was already set
but it is actually never set by the callers