Currently, searx.search.Search calls on_result once the engine results have been merged
(ResultContainer.order_results).
on_result plugins can rewrite the results: once the URL(s) are modified, even they can be merged,
it won't be the case since ResultContainer.order_results has already be called.
This commit call on_result inside for each result of each engines.
In addition the on_result function can return False to remove the result.
Note: the on_result function now run on the engine thread instead of the Flask thread.
close#298
This is a workaround: inside engine code, any call to function in another engine can crash
since the logger won't be initialized except if it is done explicitly.
Instead of raising an exception and therefore hiding all results of the engine.
It make sense to remove that requirement in order to allow the implementation of
search engines that do not always have a description. In fact some search
engines that in 99% of the case have a description like Brave Search or Mojeek
crash completely if they for some reason included a result with no description.
To test this patch try Mojeek:
!mjk xyz
before and after the patch.
Suggested-by: 0xhtml in https://github.com/searx/searx/discussions/2933
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Suggestions should be added too.
suggestion_xpath: //div[@class="text-gray h6"]/a
You can try it with:
!brave recurzuoin
Suggested-by: @allendema in https://github.com/searx/searx/issues/2857#issuecomment-904837023
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Pylint 2.10 fixed [1]:
Fixed bug with cell-var-from-loop checker: it no longer has false negatives
when both unused-variable and used-before-assignment are disabled.
[1] https://pylint.pycqa.org/en/latest/whatsnew/2.10.html
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Pylint 2.10 added new default checks [1]:
use-list-literal
Emitted when list() is called with no arguments instead of using []
use-dict-literal
Emitted when dict() is called with no arguments instead of using {}
[1] https://pylint.pycqa.org/en/latest/whatsnew/2.10.html
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- Add ``# lint: pylint`` header to pylint this python file.
- Fix issues reported by pylint.
- Add source code documentation of modul searx.locales
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In commit 4d3f2f48d common used languages has been droped. By reducing the
number of `min_engines_per_lang` from 15 to 13 we get theses languages back.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This commit remove the need to update the brand for GIT_URL and GIT_BRANCH:
there are read from the git repository.
It is possible to call python -m searx.version freeze to freeze the current version.
Useful when the code is installed outside git (distro package, docker, etc...)
Deny formats has been implemented in 6ed4616d.
To harden SearXNG instances by default, other formats than HTML should be
denied. Most of JSON, RSS and CSV requests are bots [1]::
Bots are the only users of this feature on a public instance, and they abuse
it too much that the engines rate limit pretty quickly the IP address of the
instance.
[1] https://github.com/searxng/searxng/issues/95
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Do not merge this patch in master branch of SearXNG! This branch exists only
for testing the feature branch fix-searx.sh @return42.
This patch changes the buildenv to::
GIT_URL='https://github.com/return42/searxng'
GIT_BRANCH='fix-searx.sh'
SEARX_PORT='7777'
SEARX_BIND_ADDRESS='127.0.0.12'
To test installation procedure, clone feature branch (fix-searx.sh)::
$ cd ~/Downloads
$ git clone --branch fix-searx.sh https://github.com/return42/searxng searxng
$ cd searxng
$ ./utils/searx.sh install all
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Not all settings from the 'brand:' section of the YAML files are needed in the
shell scripts. This patch reduce the variables in ./utils/brand.env to what is
needed. The following ('brand:' settings) can be removed from this file:
- ISSUE_URL
- DOCS_URL
- PUBLIC_INSTANCES
- WIKI_URL
Tasks running outside of an *installed instance*, need the following settings
from the YAML configuration:
- GIT_URL <--> brand.git_url
- GIT_BRANCH <--> brand.git_branch
- SEARX_URL <--> server.base_url (aka PUBLIC_URL)
- SEARX_PORT <--> server.port
- SEARX_BIND_ADDRESS <--> server.bind_address
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
modified docs/admin/engines/settings.rst
- Fix documentation and add section 'brand'.
- Add remarks about **buildenv** variables.
- Add remarks about settings from environment variables $SEARX_DEBUG,
$SEARX_PORT, $SEARX_BIND_ADDRESS and $SEARX_SECRET
modified docs/admin/installation-searx.rst & docs/build-templates/searx.rst
Fix template location /templates/etc/searx/settings.yml
modified docs/dev/makefile.rst
Add description of the 'make buildenv' target and describe
- we have all SearXNG setups are centralized in the settings.yml file
- why some tasks need a utils/brand.env (aka instance's buildenv)
modified manage
Settings file from repository's working tree are used by default and
ask user if a /etc/searx/settings.yml file exists.
modified searx/settings.yml
Add comments about when it is needed to run 'make buildenv'
modified searx/settings_defaults.py
Default for server:port is taken from enviroment variable SEARX_PORT.
modified utils/build_env.py
- Some defaults in the settings.yml are taken from the environment,
e.g. SEARX_BIND_ADDRESS (searx.settings_defaults.SHEMA). When the
'brand.env' file is created these enviroment variables should be
unset first.
- The CONTACT_URL enviroment is not needed in the utils/brand.env
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The usages of the searx.brand namespace has been removed, the searx.brand
namespace is now longer needed.
The searx.brand namespace was an interim solution which has been added in commit
9e53470b4, see commit message there ...
This patch is a first 'proof of concept'. Later we can decide to remove the
brand namespace entirely or not.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the templates and the /config (JSON) the usage of the 'brand.*' name
space is replaced by 'searx.get_setting' function.
- new_issue_url --> get_setting('brand.new_issue_url')
- brand.GIT_URL --> get_setting('brand.git_url')
- brand.PUBLIC_INSTANCES --> get_setting('brand.public_instances')
- brand.DOCS_URL --> get_setting('brand.docs_url')
- brand.ISSUE_URL --> get_setting('brand.issue_url')
- brand.CONTACT_URL --> get_setting('general.contact_url', '')
The macro 'new_issue' from searx/templates/*/messages/no_results.html
is now imported with context::
{% from '__common__/new_issue.html' import new_issue with context %}
To get *public instances URL* from context's 'get_setting()' function::
get_setting('brand.public_instances','')
Macro's prototype does no longer need the 'new_issue_url' argument and has been
changed to::
macro new_issue(engine_name, engine_reliability)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Added function searx.get_setting(name, default=_unset):
Returns the value to which ``name`` point. If there is no such name in the
settings and the ``default`` is unset, a KeyError exception is raised.
In all the python processes ..
- make docs
- make buildenv
- make install (setup.py)
the usage of the 'brand.*' name space is replaced by 'searx.get_setting'
function.
- brand.SEARX_URL --> get_setting('server.base_url')
- brand.GIT_URL --> get_setting('brand.git_url')
- brand.GIT_BRANCH' --> get_setting('server.base_url')
- brand.ISSUE_URL --> get_setting('brand.issue_url')
- brand.DOCS_URL --> get_setting('brand.docs_url')
- brand.PUBLIC_INSTANCES --> get_setting('brand.public_instances')
- brand.CONTACT_URL --> get_setting('general.contact_url', '')
- brand.WIKI_URL --> get_setting('brand.wiki_url')
- brand.TWITTER_URL --> get_setting('brand.twitter_url', '')
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Qwant is a fast and reliable search engine and AFAIK there is no CAPTCHA. Let
us enable Qwant engines by default.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The implementation uses the Qwant API (https://api.qwant.com/v3). The API is
undocumented but can be reverse engineered by reading the network log of
https://www.qwant.com/ queries.
This implementation is used by different qwant engines in the settings.yml::
- name: qwant
categories: general
...
- name: qwant news
categories: news
...
- name: qwant images
categories: images
...
- name: qwant videos
categories: videos
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The engine was added in commit a4b07460 but now it shows new issues [1].
In the 90'th of the last century, dogpile had its own WEB index, but nowadays it
is a meta-search engine [2]
Powered by technology, Dogpile returns all the best results from leading
search engines including Google and Yahoo!
Using dogpile as an engine in SearXNG needs more investigation, a XPath solution
like we have is not enough. It is questionable whether it still makes sense to
investigate more into a meta-search engine with a ReCAPTCHA in front.
With this patch the dogpile engine is removed
[1] https://github.com/searxng/searxng/issues/202
[2] https://www.dogpile.com/support/aboutus
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Engine just for Podcasts. An API which returns Podcasts and their Info like:
website, author etc.
Upstream query example: https://gpodder.net/search.json?q=linux
Added synonyme.woxikon.de using the xpath engine. Adds a site which returns
word synonyms although just in German.
Depending on the query not all synonyms are shown because of not the best xpath
selection. But should do the job just fine.
Upstream example query: https://synonyme.woxikon.de/synonyme/test.php
BTW add about section to the YAML configuration
It now shows descriptions with their correct URLs when there are videos in the
search results, pulling content_xpath from snippet-description instead of
snippet-content.
Suggested-by: @eagle-dogtooth https://github.com/searx/searx/issues/2857#issuecomment-869119968
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Upgrade from pylint v2.8.3 to 2.9.3 raise some new issues::
searx/search/checker/__main__.py:37:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
searx/search/checker/__main__.py:38:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
searx/search/processors/__init__.py:20:0: R0402: Use 'from searx import engines' instead (consider-using-from-import)
searx/preferences.py:182:19: C0207: Use data.split('-', maxsplit=1)[0] instead (use-maxsplit-arg)
searx/preferences.py:506:15: R1733: Unnecessary dictionary index lookup, use 'user_setting' instead (unnecessary-dict-index-lookup)
searx/webapp.py:436:0: C0206: Consider iterating with .items() (consider-using-dict-items)
searx/webapp.py:950:4: C0206: Consider iterating with .items() (consider-using-dict-items)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To compress saved preferences in the URL was introduced in 5f758b2d3 and
slightly fixed in 8f4401462. But the main fail was not fixed; The decompress
function returns a binary string and this binary should first be decoded to a
string before it is passed to urllib.parse_qs.
BTW: revert the hot-fix from 5973491
Related-to: https://github.com/searxng/searxng/issues/166
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- move jshint option from gruntfile to .jshintrc
- remove trailing-whitespace from gruntfile and
- add jshint esversion: 6
- .dir-locals.el add locals for js-mode to use JSHint from the simple theme
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch disables role 'no-descending-specificity'. IMO it is better to have
this rule active (see below [1]), but it is hard to rewrite the less files to
pass this rule, so for the first I chose to disable this rule.
---
Source order is important in CSS, and when two selectors have the same
specificity, the one that occurs last will take priority. However, the situation
is different when one of the selectors has a higher specificity. In that case,
source order does not matter: the selector with higher specificity will win out
even if it comes first.
The clashes of these two mechanisms for prioritization, source order and
specificity, can cause some confusion when reading stylesheets. If a selector
with higher specificity comes before the selector it overrides, we have to think
harder to understand it, because it violates the source order
expectation. Stylesheets are most legible when overriding selectors always come
after the selectors they override. That way both mechanisms, source order and
specificity, work together nicely.
This rule enforces that practice as best it can, reporting fewer errors than it
should. It cannot catch every actual overriding selector, but it can catch
certain common mistakes.
[1] https://stylelint.io/user-guide/rules/list/no-descending-specificity/
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This fix was autogenerated by::
npx stylelint -f unix --fix 'searx/static/themes/simple/src/less/**/*.less'
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking. To get the consent
from the user, google-news requests are redirected to confirm and get a CONSENT
Cookie from https://consent.google.de/s?continue=...
This patch adds a CONSENT Cookie to the google-news request to avoid
redirection.
The behavior of the CONTENTS cookies over all google engines seems similar but
the pattern is not yet fully clear to me, here are some random samples from my
analysis ..
Using common google search from different domains::
google.com: CONSENT=YES+cb.{{date}}-14-p0.de+FX+816
google.de: CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
google.fr: CONSENT=YES+srp.gws-{{date}}-0-RC2.fr+FX+826
When searching about videos (google-videos)::
google.es: CONSENT=YES+srp.gws-{{date}}-0-RC2.es+FX+076
google.de: CONSENT=YES+srp.gws-{{date}}-0-RC2.de+FX+171
Google news has only one domain for all languages::
news.google.com: CONSENT=YES+cb.{{date}}-14-p0.de+FX+816
Using google-scholar search from different domains::
scholar.google.de: CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
scholar.google.fr: does not use such a cookie / did not ask the user
scholar.google.es: does not use such a cookie / did not ask the user
Interim summary:
Pattern is unclear and I won't apply the CONSENT cookie to all google engines.
More experience is need before we generalize the CONSENT cookies over all
google engines.
Related:
- e9a6ab401 [fix] youtube - send CONSENT Cookie to not be redirected
- https://github.com/benbusby/whoogle-search/issues/311
- https://github.com/benbusby/whoogle-search/issues/243
[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Since we added
- 1c67b6aec [enh] google engine: supports "default language"
there is a KeyError: 'hl in request,error pattern::
ERROR:searx.searx.search.processor.online:engine google news : exception : 'hl'
Traceback (most recent call last):
File "searx/search/processors/online.py", line 144, in search
search_results = self._search_basic(query, params)
File "searx/search/processors/online.py", line 118, in _search_basic
self.engine.request(query, params)
File "searx/engines/google_news.py", line 97, in request
if lang_info['hl'] == 'en':
KeyError: 'hl'
Closes: https://github.com/searxng/searxng/issues/154
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Before this commit, there are 3 node_modules directory:
* one in .
* two others in ./searx/statics/themes/*
This is no desirable:
* it declares the npm depdenencies in the shell script.
* dependabot can't updates theses dependencies.
* this is a not standard way to build a package (two different locations for the dependencies).
With this commit and the PR #150 there is one unique node_modules directory per theme.
This file is generated by webfont.
* It is now generated as searx/static/themes/simple/ion.less
* It is generated before the .less compilation.
* .gitignore includes this file
Add two new package depedencies: fontforge ttfautohint
See utils/searx.sh
the build of the themes updates:
* js/leaflet.js ( was leaflet/leaflet.js )
* css/leaflet.css ( was leaflet/leaflet.css )
* css/images ( was leaflet/images )
Same behaviour behaviour than Whoogle [1]. Only the google engine with the
"Default language" choice "(all)"" is changed by this patch.
When searching for a locate place, the result are in the expect language,
without missing results [2]:
> When a language is not specified, the language interpretation is left up to
> Google to decide how the search results should be delivered.
The query parameters are copied from Whoogle. With the ``all`` language:
- add parameter ``source=lnt``
- don't use parameter ``lr``
- don't add a ``Accept-Language`` HTTP header.
The new signature of function ``get_lang_info()`` is:
lang_info = get_lang_info(params, lang_list, custom_aliases, supported_any_language)
Argument ``supported_any_language`` is True for google.py and False for the other
google engines. With this patch the function now returns:
- query parameters: ``lang_info['params']``
- HTTP headers: ``lang_info['headers']``
- and as before this patch:
- ``lang_info['subdomain']``
- ``lang_info['country']``
- ``lang_info['language']``
[1] https://github.com/benbusby/whoogle-search
[2] https://github.com/benbusby/whoogle-search/releases/tag/v0.5.4
render automatically adds these variables to the template context:
* advanced_search
* all_categories
* categories
before render was checking if the variable was already set
but it is actually never set by the callers
Based on commit:
- a89b823f [mod] remove overpass API call
this patch is generated by::
make themes.all
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Loading an engine should not exit the application (*). Instead
of exit, return None.
(*) RuntimeError still exit the application: syntax error, etc...
BTW: add documentation and normalize indentation (no functional change)
Suggested-by: @dalf https://github.com/searxng/searxng/pull/116#issuecomment-851865627
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Slightly modified merge of commit [1cb1d3ac] from searx [PR 2543]:
This adds Docker Hub .. as a search engine .. the engine's favicon was
downloaded from the Docker Hub website with wget and converted to a PNG
with ImageMagick .. It supports the parsing of URLs, titles, content,
published dates, and thumbnails of Docker images.
[1cb1d3ac] https://github.com/searx/searx/pull/2543/commits/1cb1d3ac
[PR 2543] https://github.com/searx/searx/pull/2543
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Access to formats can be denied by settings configuration::
search:
formats: [html, csv, json, rss]
Closes: https://github.com/searxng/searxng/issues/95
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To test & demonstrate this implementation download:
https://liste.mediathekview.de/filmliste-v2.db.bz2
and unpack into searx/data/filmliste-v2.db, in your settings.yml define a sqlite
engine named "demo"::
- name : demo
engine : sqlite
shortcut: demo
categories: general
result_template: default.html
database : searx/data/filmliste-v2.db
query_str : >-
SELECT title || ' (' || time(duration, 'unixepoch') || ')' AS title,
COALESCE( NULLIF(url_video_hd,''), NULLIF(url_video_sd,''), url_video) AS url,
description AS content
FROM film
WHERE title LIKE :wildcard OR description LIKE :wildcard
ORDER BY duration DESC
disabled : False
Query to test: "!demo concert"
This is a rewrite of the implementation from commit [1]
[1] searx/searx@8e90a21
Suggested-by: @virtadpt searx/searx#2808
No functional change, just some linting.
- fix messages from pylint (see below)
- log where general Exceptions are catched (broad-except)
- normalized various indentation
- To avoid clashes with common names, add prefix 'route_' to all @app.route
decorated functions.
Fixed messages::
searx/webapp.py:744:0: C0301: Line too long (146/120) (line-too-long)
searx/webapp.py:756:0: C0301: Line too long (132/120) (line-too-long)
searx/webapp.py:730:9: W0511: TODO, check if timezone is calculated right (fixme)
searx/webapp.py:1:0: C0114: Missing module docstring (missing-module-docstring)
searx/webapp.py:126:8: I1101: Module 'setproctitle' has no 'setthreadtitle' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
searx/webapp.py:126:36: W0212: Access to a protected member _name of a client class (protected-access)
searx/webapp.py:131:4: R1722: Consider using sys.exit() (consider-using-sys-exit)
searx/webapp.py:141:4: R1722: Consider using sys.exit() (consider-using-sys-exit)
searx/webapp.py:255:38: W0621: Redefining name 'request' from outer scope (line 32) (redefined-outer-name)
searx/webapp.py:307:4: W0702: No exception type(s) specified (bare-except)
searx/webapp.py:374:24: W0621: Redefining name 'theme' from outer scope (line 155) (redefined-outer-name)
searx/webapp.py:420:8: R1705: Unnecessary "else" after "return" (no-else-return)
searx/webapp.py:544:4: W0621: Redefining name 'preferences' from outer scope (line 917) (redefined-outer-name)
searx/webapp.py:551:4: W0702: No exception type(s) specified (bare-except)
searx/webapp.py:566:15: W0703: Catching too general exception Exception (broad-except)
searx/webapp.py:613:4: R1705: Unnecessary "elif" after "return" (no-else-return)
searx/webapp.py:690:8: W0621: Redefining name 'search' from outer scope (line 661) (redefined-outer-name)
searx/webapp.py:661:0: R0914: Too many local variables (22/20) (too-many-locals)
searx/webapp.py:674:8: R1705: Unnecessary "else" after "return" (no-else-return)
searx/webapp.py:697:11: W0703: Catching too general exception Exception (broad-except)
searx/webapp.py:748:4: R1705: Unnecessary "elif" after "return" (no-else-return)
searx/webapp.py:661:0: R0911: Too many return statements (9/6) (too-many-return-statements)
searx/webapp.py:661:0: R0912: Too many branches (29/12) (too-many-branches)
searx/webapp.py:661:0: R0915: Too many statements (74/50) (too-many-statements)
searx/webapp.py:931:4: W0621: Redefining name 'image_proxy' from outer scope (line 1072) (redefined-outer-name)
searx/webapp.py:946:4: W0621: Redefining name 'stats' from outer scope (line 1132) (redefined-outer-name)
searx/webapp.py:917:0: R0914: Too many local variables (34/20) (too-many-locals)
searx/webapp.py:917:0: R0912: Too many branches (19/12) (too-many-branches)
searx/webapp.py:917:0: R0915: Too many statements (65/50) (too-many-statements)
searx/webapp.py:1063:44: W0621: Redefining name 'preferences' from outer scope (line 917) (redefined-outer-name)
searx/webapp.py:1072:0: R0911: Too many return statements (9/6) (too-many-return-statements)
searx/webapp.py:1151:4: C0103: Variable name "SORT_PARAMETERS" doesn't conform to '(([a-z][a-zA-Z0-9_]{2,30})|(_[a-z0-9_]*)|([a-z]))$' pattern (invalid-name)
searx/webapp.py:1297:0: R1721: Unnecessary use of a comprehension (unnecessary-comprehension)
searx/webapp.py:1303:0: C0103: Argument name "e" doesn't conform to '(([a-z][a-zA-Z0-9_]{2,30})|(_[a-z0-9_]*))$' pattern (invalid-name)
searx/webapp.py:1303:19: W0613: Unused argument 'e' (unused-argument)
searx/webapp.py:1338:23: W0621: Redefining name 'app' from outer scope (line 162) (redefined-outer-name)
searx/webapp.py:1318:0: R0903: Too few public methods (1/2) (too-few-public-methods)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
pylint message: wrong-import-order
Respect PEP8 import order (standard imports first, then third-party libraries,
then local imports).
pylint message: wrong-import-position
Do not mix code & imports
BTW:
- only one import per line
- replace licence text by SPDX tag
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Remove extension of the sys.path (aka PYTHONPATH). Running instance directly
from repository's folder is a relict from the early beginning in
2014 (fd651083f) and is no longer supported.
Since commit dd46629 was merged the command line 'searx-run' exists and should
be used.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- Use result 'alt_description' as title, if not given use
default title 'unknown'.
- Use result 'description' from unsplash as 'content'
Fix error::
DEBUG:searx:result: invalid title: {..., 'title': None, 'content': '', 'engine': 'unsplash'}
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
No functional change!
- fix messages from pylint
- add ``global THREADLOCAL``
- normalized various indentation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Remove 'template' from result. Engine genius should
not use the video template. BTW: fix indentations
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- pylint searx/engines/xpath.py
- fix indentation of some long lines
- add logging
- add doc-strings
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- init stat values by None
- drop round_or_none
- don't try to get percentage if base is 'None'
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Add a new environment variable SEARX_DISABLE_ETC_SETTINGS
to disable loading of /etc/searx/settings.yml
unit tests:
* set SEARX_DISABLE_ETC_SETTINGS to 1
* remove SEARX_SETTINGS_PATH if it exists
Based on commits
- 0507e185 [fix] bar graph and rename CSS class engine-scores -> engine-score
- 3e9ad7ae [fix] make /stats more CSP compliant - github issue form
- 34859d0e [fix] make /stats more CSP compliant - oscar theme
- 0a6c4884 [fix] make /stats more CSP compliant - simple theme
- cdfb4b7f [fix] make /stats more CSP compliant - bar graph
- 965817f2 [fix] simple theme - generate missing sourceMap file
this patch is generated by::
make themes.all
Reported-by: https://github.com/searxng/searxng/issues/57
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- drop #main_stats selector in stats.less
- 'engine-score' exists before this PR.
- untabify searx/static/themes/__common__/less/stats.less
for details see comment at: d93bec7638..1204e4f07e (r633571496)
Suggested-by: @dalf in commit 1204e4f0
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Upgraded [v3.3.0] otherwise::
` width: calc(100% - 5rem);`
becomes `width: 95%` once compiled by less version 1.4.1.
[v3.3.0] https://github.com/gruntjs/grunt-contrib-less/releases/tag/v3.0.0
Suggested-by: @dalf in commit 1204e4f0
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Make 'soft_max_redirects' configurable per Xpath engine::
- name : <engine-name>
engine : xpath
soft_max_redirects: 1
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
it prepares the new architecture change,
everything about multithreading in moved in the searx.search.* packages
previously the call to the "init" function of the engines was done in searx.engines:
* the network was not set (request not sent using the defined proxy)
* it requires to monkey patch the code to avoid HTTP requests during the tests
* [mod] option to enable or disable "proxy" button next to each result
Closes: https://github.com/searxng/searxng/issues/51
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Co-authored-by: Alexandre Flament <alex@al-f.net>
When there is at least one errors or one failed checker test:
* the warning icon is displayed in the reliability column
* the link "View error logs and submit a bug report" is displayed on engine name tooltip.
Before:
* the warning icon was displayed only when one or more checker test(s) failed.
* the link "View error logs and submit a bug report" was not shown when a checker test failed but there were no error.
In the preference page, in the 'about' toolbox of an engine, add a link to the
stats page of the engine, if the engine had one or more errors.
Condition is::
reliabilities[<engine.name>].errors
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Inline styles are blocked by default with Content Security Policy (CSP). Move
the inline styles from 'new_issue.html' to::
searx/static/themes/__common__/less/new_issue.less
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
File searx/static/themes/simple/less/stats.less is not used (imported) in any
other less file. I can't say when it's usage was dropped or if it has ever been
used. ATM this file is without any usage.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* searx.network.client.LOOP is initialized in a thread
* searx.network.__init__ imports LOOP which may happen
before the thread has initialized LOOP
This commit adds a new function "searx.network.client.get_loop()"
to fix this issue
The issue exists only in the debug log::
--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.9/logging/__init__.py", line 1086, in emit
stream.write(msg + self.terminator)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 79-89: ordinal not in range(128)
Call stack:
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask/app.py", line 2464, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/searx/searx-src/searx/webapp.py", line 1316, in __call__
return self.app(environ, start_response)
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/werkzeug/middleware/proxy_fix.py", line 169, in __call__
return self.app(environ, start_response)
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/usr/local/searx/searx-src/searx/webapp.py", line 766, in search
number_of_results=format_decimal(number_of_results),
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask_babel/__init__.py", line 458, in format_decimal
locale = get_locale()
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask_babel/__init__.py", line 226, in get_locale
rv = babel.locale_selector_func()
File "/usr/local/searx/searx-src/searx/webapp.py", line 249, in get_locale
logger.debug("%s uses locale `%s` from %s", request.url, locale, locale_source)
Unable to print the message and arguments - possible formatting error.
Use the traceback above to help find the error.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- add to list of pylint scripts
- add debug log messages
- move API key int `settings.yml`
- improved readability
- add some metadata to results
Signed-off-by: Markus Heiser <markus@darmarit.de>
Springer Nature is a global publisher dedicated to providing service to research
community [1] with official API [2].
To test this PR, first get your API key following this page:
https://dev.springernature.com/signup
In searx/engines/springer.py at line 24, add this API key. I left my own key,
commented out in the line aboce. Feel free to use it, if needed.
[1] https://www.springernature.com/
[2] https://dev.springernature.com/
The new sci-hub URLs are comming from @aurora-vasiliev [1].
[1] https://github.com/searx/searx/pull/2706
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking. To get the consent
from the user, youtube requests are redirected to confirm and get a CONSENT
Cookie from https://consent.youtube.com
This patch adds a CONSENT Cookie to the youtube request to avoid redirection.
[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Reported-by: https://github.com/searx/searx/issues/2774
* display the median time instead of the average.
* add a "Reliability" column (sum up the metrics and the checker results).
* the "selected language", "SafeSearch", "Time range" values are displayed as "broken" when the checker tests fail.
Some error won't stop the engine:
* additional HTTP redirects for example
* some invalid results
secondary=True allows to flag these errors as not important.
Report to the user suspended engines.
searx.search.processor.abstract:
* manages suspend time (per network).
* reports suspended time to the ResultContainer (method extend_container_if_suspended)
* adds the results to the ResultContainer (method extend_container)
* handles exceptions (method handle_exception)
Some engine do have set result.img_src, other return a result.thumbnail. If
result.img_src is unset and a result.thumbnail is given, show it to the UI.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
I also found some items missing a thumbnail and I used text_extract for content
and title, to remove unneeded whitespaces.
BTW: added bandcamp's favicon
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
settings.yml:
* outgoing.networks:
* can contains network definition
* propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
* retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
* local_addresses can be "192.168.0.1/24" (it supports IPv6)
* support_ipv4 & support_ipv6: both True by default
see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
* either a full network description
* either reference an existing network
* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
This patch is an addition to PR #2656 which removed all usage of `base_url` from
the templates, except one was forgotten in the cookie URL of the preferences.
closes: 2740
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
I can't set `default_doi_resolver` in `settings.yml` if I'm using
`use_default_settings`. Searx seems to try to interpret all settings at root
level in `settings.yml` as dict, which is correct except for
`default_doi_resolver` which is at root level and a string::
File "/usr/lib/python3.9/site-packages/searx/settings_loader.py", line 125, in load_settings
update_settings(default_settings, user_settings)
File "/usr/lib/python3.9/site-packages/searx/settings_loader.py", line 61, in update_settings
update_dict(default_settings[k], v)
File "/usr/lib/python3.9/site-packages/searx/settings_loader.py", line 48, in update_dict
for k, v in user_dict.items():
AttributeError: 'str' object has no attribute 'items'
Signed-off-by: Markus Heiser <markus@darmarit.de>
Suggested-by: @0xhtml https://github.com/searx/searx/issues/2722#issuecomment-813391659
The `url_for` function in the template context is not the one from Flask, it is
the one from `webapp`. The `webapp.url_for_theme` is different from its
namesake of Flask and has it quirks, when called with argument `_external=True`.
The `webapp.url_for_theme` can't handle absolute URLs since it pokes a leading
'/', here is the snippet of the old code::
url = url_for(endpoint, **values)
if settings['server']['base_url']:
if url.startswith('/'):
url = url[1:]
url = urljoin(settings['server']['base_url'], url)
Next drawback of (Flask's) `_external=True` is, that it will not return the HTTP
scheme when searx (the Flask app) listens on http and is proxied by a https
server.
To get the right scheme `HTTP_X_SCHEME` is needed by Flask (werkzeug). Since
this is not provided in every environment (e.g. behind Apache mod_wsgi or the
HTTP header is not fully set for some other reasons) it is recommended to
get *script_name*, *server* and *scheme* from the configured `base_url`. If
`base_url` is specified, then these values from are given preference over any
Flask's generics.
BTW this patch normalize to use `url_for` in the `opensearch.xml` and drop the
need of `host` and `urljoin` in template's context.
Signed-off-by: Markus Heiser <markus@darmarit.de>
Instead of a hard-coded `oadoi.org` default, use the default value from
`settings.yml`.
Fix an issue in the themes: The replacement 'current_doi_resolver' contains the
doi_resolver_url, not the name of the DOI resolver. Compare return value of::
searx.plugins.oa_doi_rewrite.get_doi_resolver(...)
Fix a typo in `get_doi_resolver(..)`: suggested by @kvch:
*L32 should set doi_resolver not doi_resolvers*
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
fr.wikipedia.org (and it seems not other wikipedia websites),
adds HTML to api_result['displayTitle'].
(Search for '!wp :fr Braid' for example)
The commit uses api_result['title']
the json response has been changed and it contains html chunks which is
not compatible with our json engine, so we have to switch to html/xpath
parsing
The get_cliend_id() function:
* fetches https://soundcloud.com
* then fetches each referenced javascript URL to get the client id.
This commit fetches the javascript URLs in the reverse order: the client id is in the last javascript URL.
Added a line to the yacy entry to enable HTTP if the local yacy instance isn't using HTTPS. Otherwise, an error will be thrown in the logs: "No connection adapters were found for 'http://localhost:8090/yacysearch.json...'". This is likely related to ticket #2641 that forces HTTPS by default.
See https://github.com/requirejs/requirejs/issues/1816
requirejs loads one file: leaflet.
This commit:
* removes requirejs
* load leaflet using <script src...> HTML tag in searx/templates/oscar/base.html
Many things have been changed since last review of this engine. This patch fix
xpath selectors, implements suggestion and is a complete review / rewrite of the
engine.
Signed-off-by: Markus Heiser <markus@darmarit.de>
When initing engines a "SearxEngineResponseException" is logged very verbose,
including full traceback information:
ERROR:searx.engines:yggtorrent engine: Fail to initialize
Traceback (most recent call last):
File "share/searx/searx/engines/__init__.py", line 293, in engine_init
init_fn(get_engine_from_settings(engine_name))
File "share/searx/searx/engines/yggtorrent.py", line 42, in init
resp = http_get(url, allow_redirects=False)
File "share/searx/searx/poolrequests.py", line 197, in get
return request('get', url, **kwargs)
File "share/searx/searx/poolrequests.py", line 190, in request
raise_for_httperror(response)
File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror
raise_for_captcha(resp)
File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha
raise_for_cloudflare_captcha(resp)
File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha
raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 * 24 * 15)
searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000
For SearxEngineResponseException this is not needed. Those types of exceptions
can be a normal use case. E.g. for CAPTCHA errors like shown in the example
above. It should be enough to log a warning for such issues:
WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000
closes: #2612
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The old xpath configuration for google scholar did not work and is replaced by a
python implementation.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- unittest2 is a backport of the new features added to the unittest testing
framework in Python 2.7
- unittest2 was only needed in py2 and can be dropped now
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Bing has a list of regions that it supports and some of these regions
may have more than one possible language.
In some cases, like Switzerland, these languages are always shown as
options, so there is no issue. But in other cases, like Andorra, Bing
will only show one language at the time, either the region's default or
the request's language if the latter is supported by that region.
For example, if the HTTP request is in French, Andorra will appear as
fr-AD but if the same page is requested in any other language Andorra
will appear as ca-AD.
This is specially a problem when Bing assumes that the request is in
English because it overrides enough language codes to make several major
languages like Arabic dissappear from the languages.py file.
To avoid that issue, I set the Accept-Language header to a language
that's only supported in one region to hopefully avoid these overrides.
use a sparql request on wikidata to get the list of currencies.
currencies.json contains the translation for all supported searx languages.
Supersede #993
At the moment videos without a description are not shown - setting
default content to "" fixes this.
Another current bug is that thumbnails are not displayed. This is caused
by a double slash in the url. For this every trailing slash is now
stripped (for backwards compatibility) and the API response is correctly
parsed.
* searx understand "!ddg !g time" as : send "!g time" to DDG
* !g a DDG bang for Google: DDG return a HTTP redirect to Google
This commit adds a the allows_redirect param not to follow HTTP redirect.
The DDG engine returns a empty result as before without HTTP redirect.
on some queries (like an IT error message), wikipedia returns an HTTP error 400.
this commit returns an empty result instead of showing an error to the user.
Some JSON API returns HTML in either in the HTML or the content.
This commit adds two new parameters to the json_engine:
content_html_to_text and title_html_to_text, False by default.
If True, then the searx.utils.html_to_text removes the HTML tags.
Update crossref, openairedatasets and openairepublications engines
The duckduckgo engine requires an additional request after the results have been sent.
This commit makes sure that the second request uses the same HTTPAdapter
= the same IP address, and the same proxy.
The new version of MetaGer needs to reload the reults (into a iframe) with a
unique tag (see HTML response below).
Implementing a dedicated metager-engine for searx makes no sense to me. The
great days of MetaGer seems to be ended. I remember the good old days this
project started in the 90's of the last century. But in the last few years it
becomes more and more crap. As the name suggested, MetaGer was made for
germans in the first place. They have added a english and spain translation but
the i18n is very poor compared to what searx offers.
It's a pity, lets drop MetaGer.
This is the first response, the id (b82679980656899ba5a17ffd02a56846) is unique
for each query:
$ curl "https://metager.org/meta/meta.ger3?eingabe=foo&submit-query=&focus=web"
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<link rel="stylesheet" href="/index.css?id=b82679980656899ba5a17ffd02a56846">
<script src="/index.js?id=b82679980656899ba5a17ffd02a56846"></script>
<title>foo - MetaGer</title>
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1" />
</head>
<body>
<iframe id="mg-framed" src="https://metager.org/meta/meta.ger3?eingabe=foo&submit-query=&focus=web&mgv=b82679980656899ba5a17ffd02a56846" autofocus="true" onload="this.contentWindow.focus();"></iframe>
</body>
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Some of our interface locales include uppercase country codes,
which are separated by `_` instead of the more common `-`.
Also, a browser's `Accept-Language` header could be in lowercase.
This commit attempts to normalize those cases so a browser's
language+country codes can better match with our locales.
This solution assumes that our UI locales have nothing more than
language and optionally country. If we ever add a script specific
locale like `zh-Hant-TW` this would have to change to accomodate
that, but the idea would be pretty much the same as this fix.
The language_support variable is set to True by default,
and set to False in only 5 engines.
Except the documentation and the /config URL, this variable is not used.
This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.
Close#2485
Avoid SearxEngineXPathException errors when parsing non valid results::
.//div[@class="yuRUbf"]//a/@href index 0 not found
Traceback (most recent call last):
File "./searx/engines/google.py", line 274, in response
url = eval_xpath_getindex(result, href_xpath, 0)
File "./searx/searx/utils.py", line 608, in eval_xpath_getindex
raise SearxEngineXPathException(xpath_spec, 'index ' + str(index) + ' not found')
searx.exceptions.SearxEngineXPathException: .//div[@class="yuRUbf"]//a/@href index 0 not found
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
BTW: fix indentation by 2 spaces
The additional tests has been commented out in the google engines to not release
any CAPTCHA issues.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
BTW: make the engines ready for search.checker:
- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The 'video.html' template from the 'oscar' design supports replacement
for *author* and *length*. Google-videos does not have an author, alternatively
the publisher info from is used for the *author*.
Hint: these replacements are not supported by the 'simple' design.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
the query "time" is convinient because most of the search engine will return some results,
but some engines in the general category will return documentation about the HTML tags <time> or <input type="time">
Removes module searx/brand.py and creates a namespace at searx.brand.
This patch is a first 'proof of concept'. Later we can decide to remove the
brand namespace entirely or not.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Without this commit the module searx checks the secret_key value.
With this commit, make docs, utils/standalone_searx.py,
utils/fetch_firefox_version.py works without SEARX_DEBUG=1
For reference see https://github.com/searx/searx/pull/2386
from_bang is True when the user query contains a bang.
In this case the category is also set to 'none'.
from_bang only usage was in searx.webadapter.parse_specific :
if from_bang is True, then the EngineRef category is ignored and force to 'none'.
This commit also removes the searx.webadapter.parse_sepecific function.
see searx.search.processors.abstract.EngineProcessor
First the method searx call the get_params method.
If the return value is not None, then the searx call the method search.
check HTTP response:
* detect some comme CAPTCHA challenge (no solving). In this case the engine is suspended for long a time.
* otherwise raise HTTPError as before
the check is done in poolrequests.py (was before in search.py).
update qwant, wikipedia, wikidata to use raise_for_httperror instead of raise_for_status
According to
820b468bfe/searx/engines/__init__.py (L87-L88)
an engine can have no category at all.
Without this commit, searx raise an exception in searx/results.py
Note: in this case, the engine is not shown in the preferences.
before commit 58d72f2, category was not set in xpath.py,
so searx/engines/__init__py was setting the category to ['general']
the commit 58d72f2 set the category to [] which is not replaced by searx/engines/__init__.py
consequence: the mojeek engine is hidden in the preferences.
this commit revert the xpath.py change.
close#2368
Add a new parameter "raise_for_status", set by default to True.
When True, any HTTP status code >= 300 raise an exception ( #2332 )
When False, the engine can manage the HTTP status code by itself.
- strip html tags and superfluous quotation marks from content
- remove not needed cookie from request
- remove superfluous imports
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Error pattern::
Engines cannot retrieve results:
digg (unexpected crash time data '2020-10-16T14:09:55Z' does not match format '%Y-%m-%d %H:%M:%S')
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
recoll is a local search engine based on Xapian:
http://www.lesbonscomptes.com/recoll/
By itself recoll does not offer web or API access,
this can be achieved using recoll-webui:
https://framagit.org/medoc92/recollwebui.git
This engine uses a custom 'files' result template
set `base_url` to the location where recoll-webui can be reached
set `dl_prefix` to a location where the file hierarchy as indexed by recoll can be reached
set `search_dir` to the part of the indexed file hierarchy to be searched, use an empty string to search the entire search domain
This change is backward compatible with the existing configurations.
If a settings.yml loaded from an user defined location (SEARX_SETTINGS_PATH or /etc/searx/settings.yml),
then this settings can relied on the default settings.yml with this option:
user_default_settings:True
Devian's request and response forms has been changed.
- fixed title
- fixed time_range_dict to 'popular-*-***'
- use image from <noscript> if exists
- drop obsolete "http to https, remove domain sharding"
- use query URL https://www.deviantart.com/search/deviations?page=5&q=foo
- add searx/engines/deviantart.py to pylint check (test.pylint)
Error pattern::
There DEBUG:searx:result: invalid title: {'url': 'https://www.deviantart.com/ ...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
* URL / : the index page displayed the selected or the default category.
* URL / : when the q parameter is set using the URL, the redirect includes the URL query.
* URL /search : an empty query doesn't raise an exception.
This makes it easier to separately handle search and index requests
from a web server or from a reverse proxy.
If a request to index contains a query, a permanent redirect HTTP response
is returned. This should give some level of backwards compatibility
for users that have set a searx instance in their browser's search bar.
Xpath engine and results template changed to account for the fact that
archive.org doesn't cache .onions, though some onion engines migth have
their own cache.
Disabled by default. Can be enabled by setting the SOCKS proxies to
wherever Tor is listening and setting using_tor_proxy as True.
Requires Tor and updating packages.
To avoid manually adding the timeout on each engine, you can set
extra_proxy_timeout to account for Tor's (or whatever proxy used) extra
time.
- remove paging support: a "vqd" parameter is required between each request. This parameter is uniq for each request
- update the URL (no redirect), use the POST method
- language support: works if there is no more than request per minute, otherwise it is ignored !
* Fix "?q=test&engines=wikipedia": fix exception
* Fix "?q=test&engines=wikipedia&categories=images": now the engines from images category are included.
* Fix parse_timeout: make sure a value is always returned
* Various typing fixes (searx.webadapter, searx.search.SearchQuery)
When the user add searx as a search engine, the browser loads the /opensearch.xml URL without the cookies.
Without the query parameters, the user preferences are ignored (method and autocomplete).
In addition, opensearch.xml is modified to support automatic updates,
see https://developer.mozilla.org/en-US/docs/Web/OpenSearch
Always call initialize engines except on the first run of werkzeug with the reload feature.
the reload feature is activated when:
* searx_debug is True (SEARX_DEBUG environment variable or settings.yml)
* FLASK_APP=searx/webapp.py FLASK_ENV=development flask run (see https://flask.palletsprojects.com/en/1.1.x/cli/ )
Fix SEARX_DEBUG=0 make docs
docs/admin/engines.rst : engines are initialized
See https://github.com/searx/searx/issues/2204#issuecomment-701373438
Since 1. October 2020 google has changed the 'class' attribute of the HTML
result page.
Fix the xpath expressions and ignore <div class="g" ../> sections which do not
match to title's xpath expression.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
requests 2.24.0 uses the ssl module except if it doesn't support SNI, in this case searx fallbacks to pyopenssl.
searx logs a critical message and exit if the ssl modules doesn't support SNI and pyOpenSSL is not installed.
searx logs a critical message and exit if the ssl version is older than 1.0.2.
in requirements.txt, pyopenssl is still required to install searx as a fallback.
was previously a Dict with two or three keys: name, category, from_bang
make clear that this is a engine reference (see tests/unit/test_search.py for example)
all variables using this class are renamed accordingly.
* Log each call to get_locale: display the URL, the locale and the source (browser, preferences, form).
* Rename _get_browser_language to _get_browser_or_settings_language to match the actual code.
AJAX requests send the X-Requested-With HTTP header,
so searx.webapp.autocompleter returns the results with the expected data format.
Related to #2127Close#2203
A new "base" engine called command is introduced. It is the foundation for all command line engines for now.
You can use this engine to create your own command line engine.
Add some engines (commented out to make sure no one enables anything accidentally):
* git grep: This engine lets you grep in the searx repo.
* locate: If locate is installed and initialized, you can search on the FS.
* find: You can find files with a specific name from where you started searx.
* pattern search in files: This engine utilizes the command fgrep.
* regex search in files: This engine runs `grep` to find a file based on its contents.
and some other exceptions:
* KeyboardInterrupt
* SystemExit
* RuntimeError
* SystemError
* ImportError: an engine with an unmet dependency will stop everything.
Sending queries through POST, while better for privacy, breaks functionality
with certain extensions (e.g. Firefox containers). Since Firefox does
not send cookies when requesting `/opensearch.xml`, users cannot easily
switch to GET on the client side unless they make a custom search
engine. This commit allows admins to modify the default method on their
side so they can set it to GET if needed.
Sending query params over GET seems to be the only way to be able to
enable autocomplete in the browser. This commit adds the necessary URL
formatting to opensearch.xml. In order to identify queries coming from
the URL bar (rather than an AJAX request), which requires a different
JSON format and MIME type, the request headers are checked for
"X-Requested-With: XMLHttpRequest" which is added by jQuery request.
- enabling HTTPS for sci-hub.tw by default
- making sci-hub the default DOI resolver as it has the largest collection of scientific articles.
- replaced doai.io with dissem.in, as it redirects to this new domain.
Co-authored-by: Aurora of Earth <auroraofearth@ya.ru>
* Made first attempt at the bangs redirects plugin.
* It redirects. But in a messy way via javascript.
* First version with custom plugin
* Added a help page and a operator to see all the bangs available.
* Changed to .format because of support
* Changed to .format because of support
* Removed : in params
* Fixed path to json file and changed bang operator
* Changed bang operator back to &
* Made first attempt at the bangs redirects plugin.
* It redirects. But in a messy way via javascript.
* First version with custom plugin
* Added a help page and a operator to see all the bangs available.
* Changed to .format because of support
* Changed to .format because of support
* Removed : in params
* Fixed path to json file and changed bang operator
* Changed bang operator back to &
* Refactored getting search query. Also changed bang operator to ! and is now working.
* Removed prints
* Removed temporary bangs_redirect.js file. Updated plugin documentation
* Added unit test for the bangs plugin
* Fixed a unit test and added 2 more for bangs plugin
* Changed back to default settings.yml
* Added myself to AUTHORS.rst
* Refacored working of custom plugin.
* Refactored _get_bangs_data from list to dict to improve search speed.
* Decoupled bangs plugin from webserver with redirect_url
* Refactored bangs unit tests
* Fixed unit test bangs. Removed dubbel parsing in bangs.py
* Removed a dumb print statement
* Refactored bangs plugin to core engine.
* Removed bangs plugin.
* Refactored external bangs unit tests from plugin to core.
* Removed custom_results/bangs documentation from plugins.rst
* Added newline in settings.yml so the PR stays clean.
* Changed searx/plugins/__init__.py back to the old file
* Removed newline search.py
* Refactored get_external_bang_operator from utils to external_bang.py
* Removed unnecessary import form test_plugins.py
* Removed _parseExternalBang and _isExternalBang from query.py
* Removed get_external_bang_operator since it was not necessary
* Simplified external_bang.py
* Simplified external_bang.py
* Moved external_bangs unit tests to test_webapp.py. Fixed return in search with external_bang
* Refactored query parsing to unicode to support python2
* Refactored query parsing to unicode to support python2
* Refactored bangs plugin to core engine.
* Refactored search parameter to search_query in external_bang.py
Previously only image/jpeg was not proxied.
This commit don't proxify all MIME types starting with "image/".
This is a quick fix for the PR #1985 : the google_image engine can returns some data URL.
A new option is added to engines to hide error messages from users. It
is called `display_error_messages` and by default it is set to `True`.
If it is set to `False` error messages do not show up on the UI.
Keep in mind that engines are still suspended if needed regardless of
this setting.
Closes#1828
The gigablast API has changed and seems to have some quirks, this is the first
revise. More work (hacks) are needed.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Since there are zero results, we can remove it:
$ make engines.languages
fetch languages ..
...
fetched 0 languages from engine gigablast
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Inline styles are blocked by default with Content Security Policy (CSP). Move
the rest of inline styles to CSS and correct the HTML template of the oscar
preference page.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
A *brand* of searx is a fork which might have its own design and some special
functions which might bee reasonable in a special context.
In this sense, the fork might have its own documentation but not its own issue
tracker. The *upstream* of a brand is always https://github.com/asciimoo from
where the brand-fork pulls the master branch regularly. A fork which has its
own issue tracker is a spin-off and out of the scope of the searx project
itself. The conclusion is:
- hard code ISSUE_URL (in the Makefile)
- always refer to DOCS_URL
- links in the about page refer to the *upstream* (searx project)
except DOCS_URL
- "fork me on github" ribbons refer to the *upstream*
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
We have some variables in the build environment which are also needed in the
grunt process when building themes. Theses variables are relavant if one
creates a fork with its own branding. We treat these variables under the term
'brands'.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
We have some variables in the build environment which are also needed in the
templating process. Theses variables are relavant if one creates a fork with
its own branding. We treat these variables under the term 'brands'.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
datetime.parser.parse() does not know the Spanish date format which
leads to a ValueError. Fixes#1870
Traceback (most recent call last):
File "/usr/local/searx/searx/search.py", line 160, in search_one_http_request_safe
search_results = search_one_http_request(engine, query, request_params)
File "/usr/local/searx/searx/search.py", line 97, in search_one_http_request
return engine.response(response)
File "/usr/local/searx/searx/engines/startpage.py", line 102, in response
published_date = parser.parse(date_string, dayfirst=True)
File "/usr/local/searx/searx-ve/lib/python3.6/site-packages/dateutil/parser/_parser.py", line 1358, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/usr/local/searx/searx-ve/lib/python3.6/site-packages/dateutil/parser/_parser.py", line 649, in parse
raise ValueError("Unknown string format:", timestr)
ValueError: ('Unknown string format:', '24 Ene 2013')
When selecting other languages than 'en', bing-video did not handle the language
correct and gave very bad results. Since User-Agent is normaly rotated in
searx, the behavior of a !biv search was unpredictable and paging was broken.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The bing_news bug (discussed in #1838) was caused by wrong language tags, which
was fixed e0c99d9d / no need to change the bing_news search string.
closes: https://github.com/asciimoo/searx/issues/1838
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To get meaningfull diffs, the json file has to be sorted. Before applying any
further content patch, the json file needs a inital sort (without changing any
content).
Sorted by::
import sys, json
with open('engines_languages.json') as f:
j = json.load(f)
with open('engines_languages.json', 'w') as f:
json.dump(j, f, indent=2, sort_keys=True)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
When results are fetched from any programming related documentation site
(like git-scm.com, docs.python.org etc), content in Info box is shown as
raw HTML code.
This change addresses the issue by using "safe" filter feature provided by
Django. See,
- https://docs.djangoproject.com/en/3.0/ref/templates/builtins/#safe
- Searx issue tracker (issue #1649), for more information.
Resolves: #1649
When results are fetched from any programming related documentation site
(like git-scm.com, docs.python.org etc), content in Info box is shown as
raw HTML code.
This change addresses the issue by using "safe" filter feature provided by
Django. See,
- https://docs.djangoproject.com/en/3.0/ref/templates/builtins/#safe
- Searx issue tracker (issue #1649), for more information.
Resolves: #1649
In low width devices like mobile, tablet etc, info box is present at
bottom of the page.
This change addresses the issue by rearranging column grids for low
width devices and move side bar at top of the page. See
- https://getbootstrap.com/docs/3.3/css/#grid-column-ordering.
- and Searx issue tracker (issue#1777), for more information.
Effect: Along with Info, Suggestion and Link boxes also move to top of
the page.
Resolves: #1777
Infinite scroll adds a `hr` tag to split up the sections loaded by it.
The vim bindings `j` and `k`, which jump to the next and previous result
respectively, search for a **direct** sibling with the class `result`.
With the `hr` between results a direct sibling cannot be found. To fix
this we remove the restriction of it having to be a direct sibling.
Adding a CR in some files and in others not, is a good starting point for a
DOS+Unix mess we all have already seen in many projects.
Patch fixes all files matching (even those comming from grunt's build)::
find ./searx -exec file {} \; | grep CR
BTW: Same with mixing TAB and SPACE indent styles in one and the same file. So
if sources are tuched here in this patch, its also fixed.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Fix this error while travis build::
/home/travis/build/asciimoo/searx/searx/engines/duckduckgo_definitions.py:21:44: E225 missing whitespace around operator
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Add image format and source information to display - needs changes to engines to actually display something.
Displays result.source (website from which the image was taken) and result.img_format (image type and size).
Result is styled with result-format and result-source classes. See PR #1566 for an example of an engine which has the necessary changes.
Strip <span class="highlight">...</span> in the oscar image template.
This PR fixes the result count from bing which was throwing an (hidden) error and add a validation to avoid reading more results than avalaible.
For example :
If there is 100 results from some search and we try to get results from 120 to 130, Bing will send back the results from 0 to 10 and no error. If we compare results count with the first parameter of the request we can avoid this "invalid" results.
The new url parameter "timeout_limit" set timeout limit defined in second.
Example "timeout_limit=1.5" means the timeout limit is 1.5 seconds.
In addition, the query can start with <[number] to set the timeout limit.
For number between 0 and 99, the unit is the second :
Example: "<30 searx" means the timeout limit is 3 seconds
For number above 100, the unit is the millisecond:
Example: "<850 searx" means the timeout is 850 milliseconds.
In addition, there is a new optional setting: outgoing.max_request_timeout.
If not set, the user timeout can't go above searx configuration (as before: the max timeout of selected engine for a query).
If the value is set, the user can set a timeout between 0 and max_request_timeout using
<[number] or timeout_limit query parameter.
Related to #1077
Updated version of PR #1413 from @isj-privacore
Characters that were not ASCII were incorrectly decoded.
Add an helper function: searx.utils.ecma_unescape (Python implementation of unescape Javascript function).
* Search URL is https://www.wikidata.org/w/index.php?{query}&ns0=1 (with ns0=1 at the end to avoid an HTTP redirection)
* url_detail: remove the disabletidy=1 deprecated parameter
* Add eval_xpath function: compile once for all xpath.
* Add get_id_cache: retrieve all HTML with an id, avoid the slow to procress dynamic xpath '//div[@id="{propertyid}"]'.replace('{propertyid}')
* Create an etree.HTMLParser() instead of using the global one (see #1575)
Fetch complete JSON data block, use legend to extract images.
Unquote urlencoded strings.
Add image description as 'content'.
Add 'img_format' and 'source' data (needs PR #1567 to enable this data to be displayed).
Show images which lack ownerid instead of discarding them.
use data from embedded JSON to improve results (e.g. real page title), add image format and source info (see PR #1567), improve paging logic (it now works)
Server Timing specification: https://www.w3.org/TR/server-timing/
In the browser Dev Tools, focus on the main request, there are the responses per engine in the Timing tab.
doi_resolvers / default_doi_resolver were missing in the settings_robots.yml file, so the test server was not able to start (crash). Since the output wasn't displayed, it was not obvious why the Selenium couldn't connect to searx.
Locale and search language was always defined with english value.
This patch inits the locale on `pre_request` in order to define the
default value of locale and language preferences.
Plus the `best_match` function provided by flask babel library did not
work as expected. So the function `match_language` provided
by searx is used to detect that the language from Accepted-Language
header can be used in searx project.
This improves the user experience by loading in the next entries shortly before him getting to the bottom. It makes the scrolling more smooth without a break in between.
It also fixes an error on my browser that scrolling never hits the defined number. When I debugged it I hit `.scrolltop` of 1092.5 and the `doc.height - win.height` of 1093, so the condition was never true.
- Because there is not full image url in the dom, we replace "image_url" with the same url as the "url" (url of source).
See example HTML https://gist.github.com/Nachtalb/2dea8a4d2c723c49226ad9645838121f
- Remove unused import
- Fix google image search title
- Keep google image safe value up to date