This patch fixes issue reported by ``make test.unit``::
searx/search/checker/impl.py:39: SyntaxWarning: invalid escape sequence '\>'
rep = ['<' + tag + '[^\>]*>' for tag in HTML_TAGS]
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
incident:
flask_babel.gettext() does not work in the engine modules.
cause:
the request() and response() functions of the engine modules run in the
processor, whose search() method runs in a thread and in the threads the
context of the Flask app does not exist. The context of the Flask app is
needed by the gettext() function for the L10n.
Solution:
copy context of the Flask app into the threads. [1]
special case:
We cannot equip the search() method of the processors with the decorator [1],
because the decorator requires a context (Flask app) that does not yet exist
at the time of the initialization of the processors (the initialization of the
processors is part of the initialization of the Flask app).
[1] https://flask.palletsprojects.com/en/2.3.x/api/#flask.copy_current_request_context
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To set the language from language recognition and hold the value selected by the
client, the previous implementation creates a copy of the SearchQuery object and
manipulates the SearchQuery object by calling function replace_auto_language().
This patch tries to implement a similar functionality in a more central place,
in function get_search_query_from_webapp() when the SearchQuery object is build
up.
Additional this patch uses the language preferred by the client, if language
recognition does not have a match / the existing implementation does not care
about client preferences and uses 'all' in case of no match.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Follow up of #2269
The script to update the descriptions of the engines does no longer work since
PR #2269 has been merged.
searx/engines/wikipedia.py
==========================
1. There was a misusage of zh-classical.wikipedia.org:
- `zh-classical` is dedicate to classical Chinese [1] which is not
traditional Chinese [2].
- zh.wikipedia.org has LanguageConverter enabled [3] and is going to
dynamically show simplified or traditional Chinese according to the
HTTP Accept-Language header.
2. The update_engine_descriptions.py needs a list of all wikipedias. The
implementation from #2269 included only a reduced list:
- https://meta.wikimedia.org/wiki/Wikipedia_article_depth
- https://meta.wikimedia.org/wiki/List_of_Wikipedias
searxng_extra/update/update_engine_descriptions.py
==================================================
Before PR #2269 there was a match_language() function that did an approximation
using various methods. With PR #2269 there are only the types in the data model
of the languages, which can be recognized by babel. The approximation methods,
which are needed (only here) in the determination of the descriptions, must be
replaced by other methods.
[1] https://en.wikipedia.org/wiki/Classical_Chinese
[2] https://en.wikipedia.org/wiki/Traditional_Chinese_characters
[3] https://www.mediawiki.org/wiki/Writing_systems#LanguageConverter
Closes: https://github.com/searxng/searxng/issues/2330
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
All engines has been migrated from ``supported_languages`` to the
``fetch_traits`` concept. There is no longer a need for the obsolete code that
implements the ``supported_languages`` concept.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Partial reverse engineering of the Google engines including a improved language
and region handling based on the engine.traits_v1 data.
When ever possible the implementations of the Google engines try to make use of
the async REST APIs. The get_lang_info() has been generalized to a
get_google_info() function / especially the region handling has been improved by
adding the cr parameter.
searx/data/engine_traits.json
Add data type "traits_v1" generated by the fetch_traits() functions from:
- Google (WEB),
- Google images,
- Google news,
- Google scholar and
- Google videos
and remove data from obsolete data type "supported_languages".
A traits.custom type that maps region codes to *supported_domains* is fetched
from https://www.google.com/supported_domains
searx/autocomplete.py:
Reversed engineered autocomplete from Google WEB. Supports Google's languages and
subdomains. The old API suggestqueries.google.com/complete has been replaced
by the async REST API: https://{subdomain}/complete/search?{args}
searx/engines/google.py
Reverse engineering and extensive testing ..
- fetch_traits(): Fetch languages & regions from Google properties.
- always use the async REST API (formally known as 'use_mobile_ui')
- use *supported_domains* from traits
- improved the result list by fetching './/div[@data-content-feature]'
and parsing the type of the various *content features* --> thumbnails are
added
searx/engines/google_images.py
Reverse engineering and extensive testing ..
- fetch_traits(): Fetch languages & regions from Google properties.
- use *supported_domains* from traits
- if exists, freshness_date is added to the result
- issue 1864: result list has been improved a lot (due to the new cr parameter)
searx/engines/google_news.py
Reverse engineering and extensive testing ..
- fetch_traits(): Fetch languages & regions from Google properties.
*supported_domains* is not needed but a ceid list has been added.
- different region handling compared to Google WEB
- fixed for various languages & regions (due to the new ceid parameter) /
avoid CONSENT page
- Google News do no longer support time range
- result list has been fixed: XPath of pub_date and pub_origin
searx/engines/google_videos.py
- fetch_traits(): Fetch languages & regions from Google properties.
- use *supported_domains* from traits
- add paging support
- implement a async request ('asearch': 'arc' & 'async':
'use_ac:true,_fmt:html')
- simplified code (thanks to '_fmt:html' request)
- issue 1359: fixed xpath of video length data
searx/engines/google_scholar.py
- fetch_traits(): Fetch languages & regions from Google properties.
- use *supported_domains* from traits
- request(): include patents & citations
- response(): fixed CAPTCHA detection (Scholar has its own CATCHA manager)
- hardening XPath to iterate over results
- fixed XPath of pub_type (has been change from gs_ct1 to gs_cgt2 class)
- issue 1769 fixed: new request implementation is no longer incompatible
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Implementations of the *traits* of the engines.
Engine's traits are fetched from the origin engine and stored in a JSON file in
the *data folder*. Most often traits are languages and region codes and their
mapping from SearXNG's representation to the representation in the origin search
engine.
To load traits from the persistence::
searx.enginelib.traits.EngineTraitsMap.from_data()
For new traits new properties can be added to the class::
searx.enginelib.traits.EngineTraits
.. hint::
Implementation is downward compatible to the deprecated *supported_languages
method* from the vintage implementation.
The vintage code is tagged as *deprecated* an can be removed when all engines
has been ported to the *traits method*.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
When the user choose "Auto-detected", the choice remains on the following queries.
The detected language is displayed.
For example "Auto-detected (en)":
* the next query language is going to be auto detected
* for the current query, the detected language is English.
This replace the autodetect_search_language plugin.
settings.yml:
* The default URL was unix:///usr/local/searxng-redis/run/redis.sock?db=0
* The default URL is now "false"
The default URL makes the log difficult to deal with:
if the admin didn't install a Redis instance, the logs record a false error.
It worked before because SearXNG initialized the Redis connection when the limiter started.
In this commit, SearXNG initializes Redis in searx/webapp.py
so various components can use Redis without taking care of the initialization step.
Most engines that support languages (and regions) use the Accept-Language from
the WEB browser to build a response that fits to the language (and region).
- add new engine option: send_accept_language_header
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The errors make pyright usage useless since a new error won't be seen [1].
[1] https://github.com/searxng/searxng/pull/1569
```
searx/compat.py:11:27 - error: Expression of type "Type[cached_property[_T@cached_property]]" cannot be assigned to declared type "Type[cached_property]"
"Type[cached_property[_T@cached_property]]" is incompatible with "Type[cached_property]"
Type "Type[cached_property[_T@cached_property]]" cannot be assigned to type "Type[cached_property]" (reportGeneralTypeIssues)
searx/utils.py:69:36 - error: Expression of type "None" cannot be assigned to parameter of type "str"
Type "None" cannot be assigned to type "str" (reportGeneralTypeIssues)
searx/utils.py:573:85 - error: Expression of type "None" cannot be assigned to parameter of type "int"
Type "None" cannot be assigned to type "int" (reportGeneralTypeIssues)
searx/webapp.py:1306:22 - error: Argument of type "str" cannot be assigned to parameter "__a" of type "BytesPath" in function "join"
Type "str" cannot be assigned to type "BytesPath"
"str" is incompatible with "bytes"
"str" is incompatible with protocol "PathLike[bytes]"
"__fspath__" is not present (reportGeneralTypeIssues)
searx/webapp.py:1306:68 - error: Argument of type "Literal['themes']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
Type "Literal['themes']" cannot be assigned to type "BytesPath"
"Literal['themes']" is incompatible with "bytes"
"Literal['themes']" is incompatible with protocol "PathLike[bytes]"
"__fspath__" is not present (reportGeneralTypeIssues)
searx/webapp.py:1306:78 - error: Argument of type "str | Any | None" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
Type "str | Any | None" cannot be assigned to type "BytesPath"
Type "str" cannot be assigned to type "BytesPath"
"str" is incompatible with "bytes"
"str" is incompatible with protocol "PathLike[bytes]"
"__fspath__" is not present (reportGeneralTypeIssues)
searx/webapp.py:1306:85 - error: Argument of type "Literal['img']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
Type "Literal['img']" cannot be assigned to type "BytesPath"
"Literal['img']" is incompatible with "bytes"
"Literal['img']" is incompatible with protocol "PathLike[bytes]"
"__fspath__" is not present (reportGeneralTypeIssues)
searx/engines/mongodb.py:8:6 - warning: Import "pymongo" could not be resolved (reportMissingImports)
searx/engines/mysql_server.py:9:8 - warning: Import "mysql.connector" could not be resolved (reportMissingImports)
searx/engines/postgresql.py:9:8 - warning: Import "psycopg2" could not be resolved from source (reportMissingModuleSource)
searx/engines/xpath.py:187:28 - warning: "categories" is not defined (reportUndefinedVariable)
searx/search/__init__.py:184:82 - warning: "flask" is not defined (reportUndefinedVariable)
searx/search/checker/background.py:19:26 - error: Type of "schedule" is partially unknown
Type of "schedule" is "(delay: Any, func: Any, *args: Any) -> Literal[True]" (reportUnknownVariableType)
searx/shared/__init__.py:8:12 - warning: Import "uwsgi" could not be resolved (reportMissingImports)
searx/shared/shared_uwsgi.py:5:8 - warning: Import "uwsgi" could not be resolved (reportMissingImports)
```
Since https://github.com/searxng/searxng/pull/354
the searx.network.stream(...) returns a tuple
This commits update the checker code according to
this function signature change.
* allow not to record metrics (response time, etc...)
* this commit doesn't change the UI. If the metrics are disabled
/stats and /stats/errors will return empty response.
in /preferences, the columns response time and reliability will be empty.
Disable the python code formatting from python-black, where the readability of
code suffers by formatting.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* download images using the "image_proxy" network (HTTP/1 instead of HTTP/2)
* don't cache data: URL (reduce memory usage)
* after each test: purge image URL cache then call garbage collector
* download only the first 64kb of images
If there is no write access, there is no need for global. Remove global
statement if there is no assignment.
global-variable-not-assigned:
Using global for names but no assignment is done Used when a variable is
defined through the "global" statement but no assignment to this variable is
done.
In Pylint 2.11 the global-variable-not-assigned checker now catches global
variables that are never reassigned in a local scope and catches (reassigned)
functions [1][2]
[1] https://pylint.pycqa.org/en/latest/whatsnew/2.11.html
[2] https://github.com/PyCQA/pylint/issues/1375
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Currently, searx.search.Search calls on_result once the engine results have been merged
(ResultContainer.order_results).
on_result plugins can rewrite the results: once the URL(s) are modified, even they can be merged,
it won't be the case since ResultContainer.order_results has already be called.
This commit call on_result inside for each result of each engines.
In addition the on_result function can return False to remove the result.
Note: the on_result function now run on the engine thread instead of the Flask thread.