The match_language function sometimes returns incorrect results which is why a
new function get_engine_locale is required.
A bugfix of the match_language is not easily possible, because there is almost
no documentation for it and already the call parameters are undefined. E.g. the
function processes values like the ones from yahoo::
"yahoo": [
"ar",
...
"zh_chs",
"zh_cht"
]
The get_engine_locale has been documented in detail, there is a clear
description of the assumptions as well as the requirements and approximation
rules (read doc-string for more details)::
Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to
corresponding *engine locales*:
<engine>: {
# SearXNG string : engine-string
'ca-ES' : 'ca_ES',
'fr-BE' : 'fr_BE',
'fr-CA' : 'fr_CA',
'fr-CH' : 'fr_CH',
'fr' : 'fr_FR',
...
'pl-PL' : 'pl_PL',
'pt-PT' : 'pt_PT'
}
.. hint::
The *SearXNG locale* string has to be known by babel!
In the following you will find a comparison:
>>> import babel.languages
>>> from searx.utils import match_language
>>> from searx.locales import get_engine_locale
Assume we have an engine that supports the follwoing locales:
>>> lang_list = {
... "zh-CN": "zh_CN",
... "zh-HK": "zh_HK",
... "nl-BE": "nl_BE",
... "fr-CA": "fr_CA",
... }
Assumption:
A. When a user selects a language the results should be optimized according to
the selected language.
B. When user selects a language and a territory the results should be
optimized with first priority on territory and second on language.
----
Example: (Assumption A.)
A user selects region 'zh-TW' which should end in zh_HK
hint:
CN is 'Hans' and HK ('Hant') fits better to TW ('Hant')
>>> get_engine_locale('zh-TW', lang_list)
'zh_HK'
>>> lang_list[match_language('zh-TW', lang_list)]
'zh_CN'
----
Example: (Assumption A.)
A user selects only the language 'zh' which should end in CN
>>> get_engine_locale('zh', lang_list)
'zh_CN'
>>> lang_list[match_language('zh', lang_list)]
'zh_CN'
----
Example: (Assumption B.)
A user selects region 'fr-BE' which should end in nl-BE
hint:
priority should be on the territory the user selected. If the user
prefers 'fr' he will select 'fr' without a region tag.
>>> get_engine_locale('fr-BE', lang_list, default='unknown')
'nl_BE'
>>> match_language('fr-BE', lang_list, fallback='unknown')
'fr-CA'
----
Example: (Assumption A.)
A user selects only the language 'fr' which should end in fr_CA
>>> get_engine_locale('fr', lang_list)
'fr_CA'
>>> lang_list[match_language('fr', lang_list)]
'fr_CA'
----
The difference in priority on the territory is best shown with a engine that
supports the following locales:
>>> lang_list = {
... "fr-FR": "fr_FR",
... "fr-CA": "fr_CA",
... "en-GB": "en_GB",
... "nl-BE": "nl_BE",
... }
----
Example: (Assumption A.)
A user selects only a language
>>> get_engine_locale('en', lang_list)
'en_GB'
>>> match_language('en', lang_list)
'en-GB'
hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR
takes priority ..
>>> get_engine_locale('fr', lang_list)
'fr_FR'
>>> lang_list[match_language('fr', lang_list)]
'fr_FR'
----
Example: (Assumption B.)
A user selects region 'fr-BE' which should end in nl-BE
>>> get_engine_locale('fr-BE', lang_list)
'nl_BE'
>>> lang_list[match_language('fr-BE', lang_list)]
'fr_FR'
----
If the user selects a language and there are two locales like the following:
>>> lang_list = {
... "fr-BE": "fr_BE",
... "fr-CH": "fr_CH",
... }
>>>
>>> get_engine_locale('fr', lang_list)
'fr_BE'
>>> lang_list[match_language('fr', lang_list)]
'fr_BE'
Looks like both functions return the same value, but match_language depends on the
order of the dictionary (which is not predictable):
>>> lang_list = {
... "fr-CH": "fr_CH",
... "fr-BE": "fr_BE",
... }
>>> get_engine_locale('fr', lang_list)
'fr_BE'
>>> lang_list[match_language('fr', lang_list)]
'fr_CH'
>>>
The get_engine_locale selects the locale by looking at the "population percent"
and this percentage has an higher amount in BE (68.%) compared to CH (21%)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
By using new property `qwant_categ:` the category of qwant is no longer bound to
the category of SearXNG.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Neeva is "the world's first ad-free, private search engine" and uses data from Apple, Bing, Yelp and "others".
They claim to crawl "hundreds of millions" of URLs a day (https://twitter.com/Neeva/status/1536447373903335426).
Some HTTP-Clients do have issues with the ``opensearch.xml`` from SearXNG
(related [1][2]) while other OpenSearch descriptions[3] (e.g. from qwant) work
flawles.
Inspired by the OpenSearch description from qwant and with informations from the
specification[4] the ``opensearch.xml`` has been *improved*.
- convert `<Url>` methods from lower case to upper case (`POST`|`GET`)
- add `<moz:SearchForm>` and `xmlns:moz="http://www.mozilla.org/2006/browser/search/"`
- add `<Query role="example" searchTerms="SearXNG" />` [4]
OpenSearch description documents should include at least one Query element of
`role="example"` that is expected to return search results. Search clients may
use this example query to validate that the search engine is working properly.
- modified `<LongName>` to SearXNG
- modified `<Description>` the word 'hackable' scares uninitiated users and was removed
- add the `type="image/png"` to `<Image>`
Test can be done by::
make run
Visit http://127.0.0.1:8888/ and add the search engine to your WEB-Browser /
test with different WEB-Browser from desktop and Smartphones (are there any iOS
user here, please test on Safari and Chrome).
[1] https://app.element.io/#/room/#searxng:matrix.org/$xN_abdKhNqUlgXRBrb_9F3pqOxnSzGQ1TG0s0G9hQVw
[2] https://github.com/searxng/searxng/issues/431
[3] https://developer.mozilla.org/en-US/docs/Web/OpenSearch
[4] https://github.com/dewitt/opensearch/blob/master/opensearch-1-1-draft-6.md#the-query-element
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
yep.com is still in beta, the api.yep.com does not have paging support. There
is only a 'limit' argument with a maximum of 100 results.
yep.com seems fast; there is nor need for a timeout of 12 sec.
The API returns JSON nevertheless what the HTTP header is, the "show more"
button on yep.com's web site does not set a special HTTP Accept header.
FYI: The index does not support languages, the WEB UI does not offer a language
selection of the results and the entire index seems in English.
Closes: https://github.com/searxng/searxng/issues/1619
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Most engines that support languages (and regions) use the Accept-Language from
the WEB browser to build a response that fits to the language (and region).
- add new engine option: send_accept_language_header
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The errors make pyright usage useless since a new error won't be seen [1].
[1] https://github.com/searxng/searxng/pull/1569
```
searx/compat.py:11:27 - error: Expression of type "Type[cached_property[_T@cached_property]]" cannot be assigned to declared type "Type[cached_property]"
"Type[cached_property[_T@cached_property]]" is incompatible with "Type[cached_property]"
Type "Type[cached_property[_T@cached_property]]" cannot be assigned to type "Type[cached_property]" (reportGeneralTypeIssues)
searx/utils.py:69:36 - error: Expression of type "None" cannot be assigned to parameter of type "str"
Type "None" cannot be assigned to type "str" (reportGeneralTypeIssues)
searx/utils.py:573:85 - error: Expression of type "None" cannot be assigned to parameter of type "int"
Type "None" cannot be assigned to type "int" (reportGeneralTypeIssues)
searx/webapp.py:1306:22 - error: Argument of type "str" cannot be assigned to parameter "__a" of type "BytesPath" in function "join"
Type "str" cannot be assigned to type "BytesPath"
"str" is incompatible with "bytes"
"str" is incompatible with protocol "PathLike[bytes]"
"__fspath__" is not present (reportGeneralTypeIssues)
searx/webapp.py:1306:68 - error: Argument of type "Literal['themes']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
Type "Literal['themes']" cannot be assigned to type "BytesPath"
"Literal['themes']" is incompatible with "bytes"
"Literal['themes']" is incompatible with protocol "PathLike[bytes]"
"__fspath__" is not present (reportGeneralTypeIssues)
searx/webapp.py:1306:78 - error: Argument of type "str | Any | None" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
Type "str | Any | None" cannot be assigned to type "BytesPath"
Type "str" cannot be assigned to type "BytesPath"
"str" is incompatible with "bytes"
"str" is incompatible with protocol "PathLike[bytes]"
"__fspath__" is not present (reportGeneralTypeIssues)
searx/webapp.py:1306:85 - error: Argument of type "Literal['img']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
Type "Literal['img']" cannot be assigned to type "BytesPath"
"Literal['img']" is incompatible with "bytes"
"Literal['img']" is incompatible with protocol "PathLike[bytes]"
"__fspath__" is not present (reportGeneralTypeIssues)
searx/engines/mongodb.py:8:6 - warning: Import "pymongo" could not be resolved (reportMissingImports)
searx/engines/mysql_server.py:9:8 - warning: Import "mysql.connector" could not be resolved (reportMissingImports)
searx/engines/postgresql.py:9:8 - warning: Import "psycopg2" could not be resolved from source (reportMissingModuleSource)
searx/engines/xpath.py:187:28 - warning: "categories" is not defined (reportUndefinedVariable)
searx/search/__init__.py:184:82 - warning: "flask" is not defined (reportUndefinedVariable)
searx/search/checker/background.py:19:26 - error: Type of "schedule" is partially unknown
Type of "schedule" is "(delay: Any, func: Any, *args: Any) -> Literal[True]" (reportUnknownVariableType)
searx/shared/__init__.py:8:12 - warning: Import "uwsgi" could not be resolved (reportMissingImports)
searx/shared/shared_uwsgi.py:5:8 - warning: Import "uwsgi" could not be resolved (reportMissingImports)
```
The engine name is not only a *name* its also a identifier that is used in
logs, HTTP headers and more. Unicode characters in the name of an engine could
cause various issues.
Closes: https://github.com/searxng/searxng/issues/1544
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>