In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking. To get the consent
from the user, google-news requests are redirected to confirm and get a CONSENT
Cookie from https://consent.google.de/s?continue=...
This patch adds a CONSENT Cookie to the google-news request to avoid
redirection.
The behavior of the CONTENTS cookies over all google engines seems similar but
the pattern is not yet fully clear to me, here are some random samples from my
analysis ..
Using common google search from different domains::
google.com: CONSENT=YES+cb.{{date}}-14-p0.de+FX+816
google.de: CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
google.fr: CONSENT=YES+srp.gws-{{date}}-0-RC2.fr+FX+826
When searching about videos (google-videos)::
google.es: CONSENT=YES+srp.gws-{{date}}-0-RC2.es+FX+076
google.de: CONSENT=YES+srp.gws-{{date}}-0-RC2.de+FX+171
Google news has only one domain for all languages::
news.google.com: CONSENT=YES+cb.{{date}}-14-p0.de+FX+816
Using google-scholar search from different domains::
scholar.google.de: CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
scholar.google.fr: does not use such a cookie / did not ask the user
scholar.google.es: does not use such a cookie / did not ask the user
Interim summary:
Pattern is unclear and I won't apply the CONSENT cookie to all google engines.
More experience is need before we generalize the CONSENT cookies over all
google engines.
Related:
- e9a6ab401 [fix] youtube - send CONSENT Cookie to not be redirected
- https://github.com/benbusby/whoogle-search/issues/311
- https://github.com/benbusby/whoogle-search/issues/243
[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Since we added
- 1c67b6aec [enh] google engine: supports "default language"
there is a KeyError: 'hl in request,error pattern::
ERROR:searx.searx.search.processor.online:engine google news : exception : 'hl'
Traceback (most recent call last):
File "searx/search/processors/online.py", line 144, in search
search_results = self._search_basic(query, params)
File "searx/search/processors/online.py", line 118, in _search_basic
self.engine.request(query, params)
File "searx/engines/google_news.py", line 97, in request
if lang_info['hl'] == 'en':
KeyError: 'hl'
Closes: https://github.com/searxng/searxng/issues/154
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Before this commit, there are 3 node_modules directory:
* one in .
* two others in ./searx/statics/themes/*
This is no desirable:
* it declares the npm depdenencies in the shell script.
* dependabot can't updates theses dependencies.
* this is a not standard way to build a package (two different locations for the dependencies).
With this commit and the PR #150 there is one unique node_modules directory per theme.
This file is generated by webfont.
* It is now generated as searx/static/themes/simple/ion.less
* It is generated before the .less compilation.
* .gitignore includes this file
Add two new package depedencies: fontforge ttfautohint
See utils/searx.sh
the build of the themes updates:
* js/leaflet.js ( was leaflet/leaflet.js )
* css/leaflet.css ( was leaflet/leaflet.css )
* css/images ( was leaflet/images )
Same behaviour behaviour than Whoogle [1]. Only the google engine with the
"Default language" choice "(all)"" is changed by this patch.
When searching for a locate place, the result are in the expect language,
without missing results [2]:
> When a language is not specified, the language interpretation is left up to
> Google to decide how the search results should be delivered.
The query parameters are copied from Whoogle. With the ``all`` language:
- add parameter ``source=lnt``
- don't use parameter ``lr``
- don't add a ``Accept-Language`` HTTP header.
The new signature of function ``get_lang_info()`` is:
lang_info = get_lang_info(params, lang_list, custom_aliases, supported_any_language)
Argument ``supported_any_language`` is True for google.py and False for the other
google engines. With this patch the function now returns:
- query parameters: ``lang_info['params']``
- HTTP headers: ``lang_info['headers']``
- and as before this patch:
- ``lang_info['subdomain']``
- ``lang_info['country']``
- ``lang_info['language']``
[1] https://github.com/benbusby/whoogle-search
[2] https://github.com/benbusby/whoogle-search/releases/tag/v0.5.4
render automatically adds these variables to the template context:
* advanced_search
* all_categories
* categories
before render was checking if the variable was already set
but it is actually never set by the callers
Based on commit:
- a89b823f [mod] remove overpass API call
this patch is generated by::
make themes.all
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Loading an engine should not exit the application (*). Instead
of exit, return None.
(*) RuntimeError still exit the application: syntax error, etc...
BTW: add documentation and normalize indentation (no functional change)
Suggested-by: @dalf https://github.com/searxng/searxng/pull/116#issuecomment-851865627
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Slightly modified merge of commit [1cb1d3ac] from searx [PR 2543]:
This adds Docker Hub .. as a search engine .. the engine's favicon was
downloaded from the Docker Hub website with wget and converted to a PNG
with ImageMagick .. It supports the parsing of URLs, titles, content,
published dates, and thumbnails of Docker images.
[1cb1d3ac] https://github.com/searx/searx/pull/2543/commits/1cb1d3ac
[PR 2543] https://github.com/searx/searx/pull/2543
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Access to formats can be denied by settings configuration::
search:
formats: [html, csv, json, rss]
Closes: https://github.com/searxng/searxng/issues/95
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To test & demonstrate this implementation download:
https://liste.mediathekview.de/filmliste-v2.db.bz2
and unpack into searx/data/filmliste-v2.db, in your settings.yml define a sqlite
engine named "demo"::
- name : demo
engine : sqlite
shortcut: demo
categories: general
result_template: default.html
database : searx/data/filmliste-v2.db
query_str : >-
SELECT title || ' (' || time(duration, 'unixepoch') || ')' AS title,
COALESCE( NULLIF(url_video_hd,''), NULLIF(url_video_sd,''), url_video) AS url,
description AS content
FROM film
WHERE title LIKE :wildcard OR description LIKE :wildcard
ORDER BY duration DESC
disabled : False
Query to test: "!demo concert"
This is a rewrite of the implementation from commit [1]
[1] searx/searx@8e90a21
Suggested-by: @virtadpt searx/searx#2808
No functional change, just some linting.
- fix messages from pylint (see below)
- log where general Exceptions are catched (broad-except)
- normalized various indentation
- To avoid clashes with common names, add prefix 'route_' to all @app.route
decorated functions.
Fixed messages::
searx/webapp.py:744:0: C0301: Line too long (146/120) (line-too-long)
searx/webapp.py:756:0: C0301: Line too long (132/120) (line-too-long)
searx/webapp.py:730:9: W0511: TODO, check if timezone is calculated right (fixme)
searx/webapp.py:1:0: C0114: Missing module docstring (missing-module-docstring)
searx/webapp.py:126:8: I1101: Module 'setproctitle' has no 'setthreadtitle' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
searx/webapp.py:126:36: W0212: Access to a protected member _name of a client class (protected-access)
searx/webapp.py:131:4: R1722: Consider using sys.exit() (consider-using-sys-exit)
searx/webapp.py:141:4: R1722: Consider using sys.exit() (consider-using-sys-exit)
searx/webapp.py:255:38: W0621: Redefining name 'request' from outer scope (line 32) (redefined-outer-name)
searx/webapp.py:307:4: W0702: No exception type(s) specified (bare-except)
searx/webapp.py:374:24: W0621: Redefining name 'theme' from outer scope (line 155) (redefined-outer-name)
searx/webapp.py:420:8: R1705: Unnecessary "else" after "return" (no-else-return)
searx/webapp.py:544:4: W0621: Redefining name 'preferences' from outer scope (line 917) (redefined-outer-name)
searx/webapp.py:551:4: W0702: No exception type(s) specified (bare-except)
searx/webapp.py:566:15: W0703: Catching too general exception Exception (broad-except)
searx/webapp.py:613:4: R1705: Unnecessary "elif" after "return" (no-else-return)
searx/webapp.py:690:8: W0621: Redefining name 'search' from outer scope (line 661) (redefined-outer-name)
searx/webapp.py:661:0: R0914: Too many local variables (22/20) (too-many-locals)
searx/webapp.py:674:8: R1705: Unnecessary "else" after "return" (no-else-return)
searx/webapp.py:697:11: W0703: Catching too general exception Exception (broad-except)
searx/webapp.py:748:4: R1705: Unnecessary "elif" after "return" (no-else-return)
searx/webapp.py:661:0: R0911: Too many return statements (9/6) (too-many-return-statements)
searx/webapp.py:661:0: R0912: Too many branches (29/12) (too-many-branches)
searx/webapp.py:661:0: R0915: Too many statements (74/50) (too-many-statements)
searx/webapp.py:931:4: W0621: Redefining name 'image_proxy' from outer scope (line 1072) (redefined-outer-name)
searx/webapp.py:946:4: W0621: Redefining name 'stats' from outer scope (line 1132) (redefined-outer-name)
searx/webapp.py:917:0: R0914: Too many local variables (34/20) (too-many-locals)
searx/webapp.py:917:0: R0912: Too many branches (19/12) (too-many-branches)
searx/webapp.py:917:0: R0915: Too many statements (65/50) (too-many-statements)
searx/webapp.py:1063:44: W0621: Redefining name 'preferences' from outer scope (line 917) (redefined-outer-name)
searx/webapp.py:1072:0: R0911: Too many return statements (9/6) (too-many-return-statements)
searx/webapp.py:1151:4: C0103: Variable name "SORT_PARAMETERS" doesn't conform to '(([a-z][a-zA-Z0-9_]{2,30})|(_[a-z0-9_]*)|([a-z]))$' pattern (invalid-name)
searx/webapp.py:1297:0: R1721: Unnecessary use of a comprehension (unnecessary-comprehension)
searx/webapp.py:1303:0: C0103: Argument name "e" doesn't conform to '(([a-z][a-zA-Z0-9_]{2,30})|(_[a-z0-9_]*))$' pattern (invalid-name)
searx/webapp.py:1303:19: W0613: Unused argument 'e' (unused-argument)
searx/webapp.py:1338:23: W0621: Redefining name 'app' from outer scope (line 162) (redefined-outer-name)
searx/webapp.py:1318:0: R0903: Too few public methods (1/2) (too-few-public-methods)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
pylint message: wrong-import-order
Respect PEP8 import order (standard imports first, then third-party libraries,
then local imports).
pylint message: wrong-import-position
Do not mix code & imports
BTW:
- only one import per line
- replace licence text by SPDX tag
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Remove extension of the sys.path (aka PYTHONPATH). Running instance directly
from repository's folder is a relict from the early beginning in
2014 (fd651083f) and is no longer supported.
Since commit dd46629 was merged the command line 'searx-run' exists and should
be used.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>