1
0
mirror of https://github.com/searxng/searxng.git synced 2024-11-09 06:30:10 +01:00
Commit Graph

1409 Commits

Author SHA1 Message Date
Markus Heiser
f63ffbb22b [fix] engine - yahoo: rewrite and fix issues
Languages are supported by mapping the language to a domain.  If domain is not
found in :py:obj:`lang2domain` URL ``<lang>.search.yahoo.com`` is used.

BTW: fix issue reported at https://github.com/searx/searx/issues/3020

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-16 20:05:26 +00:00
Markus Heiser
38a157b56f [pylint] engines: yahoo fix several issues reported from pylint
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-16 20:05:26 +00:00
MrPaulBlack
00b0394e19 [fix] language param for qwant 2021-10-14 16:11:44 +00:00
Noémi Ványi
4cc1ee8565 [fix] qwant engine - only get results from categories
Reported-by: https://github.com/searx/searx/issues/3014
Cherry-picked: https://github.com/searx/searx/commit/3bcca43
2021-10-12 18:42:50 +00:00
Paolo Basso
64df011e2f [mod] engines - add zlibrary engine 2021-10-11 14:58:44 +00:00
Markus Heiser
3abbe6d25b [fix] engine torznab - categories, before join convert int to str
BTW add init() function and replace SearxEngineAPIException by ValueError.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-07 15:27:55 +00:00
Markus Heiser
9fb77065bd [fix] engine torznab - marginal issues reported from linters
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-07 15:27:55 +00:00
Paolo Basso
d803df8d89 [mod] engines - add torznab WebAPI 2021-10-07 15:27:55 +00:00
Markus Heiser
19e41c137e [mod] set 'engine.supported_languages' from the origin python module
The key of the dictionary 'searx.data.ENGINES_LANGUAGES' is the *engine name*
configured in settings.xml.  When multiple engines are configured to use the
same origin engine (e.g. `engine: google`)::

    - name: google
      engine: google
      use_mobile_ui: false
      ...

    - name: google italian
      engine: google
      use_mobile_ui: false
      language: it
      ...

    - name: google mobile ui
      engine: google
      shortcut: gomui
      use_mobile_ui: true

There exists no entry for ENGINES_LANGUAGES[engine.name] (e.g. `name: google
mobile ui` or `name: google italian`).  This issue can be solved by recreate the
ENGINES_LANGUAGES::

    make data.languages

But this is nothing an SearXNG admin would like to do when just configuring
additional engines, since this just doubles entries in ENGINES_LANGUAGES and
BTW: `make data.languages` has various external requirements which might be not
installed or not available, on a production host.

With this patch, if engine.name fails, ENGINES_LANGUAGES[engine.engine] is used
to get the engine.supported_languages (e.g. `google` for the engine named
`google mobile`).

For an engine, when there is `language: ...` in the YAML settings, the engine
supports only one language, in this case engine.supported_languages should
contains this value defined in settings.yml (e.g. `it` for the engine named
`google italian`).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Closes: https://github.com/searxng/searxng/issues/384
2021-10-07 08:45:02 +02:00
Alexandre Flament
8a897b86f1 [mod] engines - IMDB: add thumbnails 2021-10-05 09:10:02 +02:00
Paul Alcock
823d44ed0a [mod] engines - add IMDB / Internet Movie Database
Merged from @Guilvareux's commit [1] and slightly modfied / see [2].

[1] https://github.com/searx/searx/pull/2980/commits/f2f90071
[2] https://github.com/searx/searx/pull/2980
2021-10-03 11:44:25 +02:00
Markus Heiser
a5b7ed9550 [mod] engine duckduckgo - update supported_languages_url
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-01 20:01:41 +02:00
Markus Heiser
4c9b8b29ee [mod] engine duckduckgo - use DuckDuckGo-Lite
Implement a scrapper for DuckDuckGo-Lite [1].  The existing DuckDuckGo [2]
engine does not support paging.  DuckDuckgo-Lite is much faster, less verbose
and does have a paging option (reversed engineered from the input form of [1]).

[1] https://lite.duckduckgo.com/lite
[2] https://duckduckgo.com/

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-01 20:01:41 +02:00
Markus Heiser
ecb3912bd0 [fix] engine stackexchange - decode HTML entities in title & content
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-29 08:08:18 +02:00
Markus Heiser
b62851559b [mod] replace old stackoverflow engine by Stack Exchange API v2.3
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-28 19:12:37 +02:00
Markus Heiser
55fee1e45d [mod] engines - add Stack Exchange API v2.3
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-28 19:01:04 +02:00
Alexandre Flament
b046322c7b
Merge pull request #333 from dalf/enh-engine-descriptions
RFC: /preferences: display engine descriptions
2021-09-25 11:29:25 +02:00
Alexandre Flament
ab569c1e12 [fix] openstreetmap engine: optmizer SPARQL query
add
hint:Query hint:optimizer "None".
to the SPARQL query to keep the response time small.

It tells the optimizer to follow the path from ?item to the different property values
instead of the other way around.
See https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_optimization#Property_paths
2021-09-25 11:16:22 +02:00
Alexandre Flament
8961131497 [fix] fix the about section of some engines 2021-09-24 20:20:30 +02:00
Alexandre Flament
6f11b61cd5 [fix] openstreetmap engine: map "all" language to English 2021-09-24 20:12:18 +02:00
Markus Heiser
443bf35e09 [pylint] fix global-variable-not-assigned issues
If there is no write access, there is no need for global.  Remove global
statement if there is no assignment.

global-variable-not-assigned:
  Using global for names but no assignment is done Used when a variable is
  defined through the "global" statement but no assignment to this variable is
  done.

In Pylint 2.11 the global-variable-not-assigned checker now catches global
variables that are never reassigned in a local scope and catches (reassigned)
functions [1][2]

[1] https://pylint.pycqa.org/en/latest/whatsnew/2.11.html
[2] https://github.com/PyCQA/pylint/issues/1375

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-17 10:14:27 +02:00
Alexandre Flament
602cbc2c99
Merge pull request #297 from dalf/engine-logger-enh
debug mode: more readable logging
2021-09-14 07:06:28 +02:00
Alexandre Flament
f8793fbda0 [fix] logger per engine: make .logger is always initialized
the openstreetmap engine imports code from the wikidata engine.
before this commit, specific code make sure to copy the logger variable to the wikidata engine.

with this commit searx.engines.load_engine makes sure the .logger is initialized.
The implementation scans sys.modules for module name starting with searx.engines.
2021-09-13 08:47:59 +02:00
Alexandre Flament
0e42db9da1 [mod] xpath engine: remove logging of the requested URL 2021-09-11 10:13:16 +02:00
Markus Heiser
f0059b80ed [pylint] engines: drop no longer needed 'missing-function-docstring'
Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914168470
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-07 13:26:59 +02:00
Markus Heiser
82847df300 [fix] add 'categories' to PYLINT_ADDITIONAL_BUILTINS_FOR_ENGINES
androp no longer needed (see line 591 in 7b235a1)::

    # pylint: disable=undefined-variable

Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914068609
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-07 10:29:38 +02:00
Markus Heiser
cd033b5416 [fix] drop useless pylint: disable=undefined-variable
Since 7b235a1 (see line 591) it is no longer needed to disable
'undefined-variable' for names defined in::

   PYLINT_ADDITIONAL_BUILTINS_FOR_ENGINES

Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914068609
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-07 10:26:15 +02:00
Alexandre Flament
ea60c03827 [fix] fix openstreetmap engine
close #298

This is a workaround: inside engine code, any call to function in another engine can crash
since the logger won't be initialized except if it is done explicitly.
2021-09-06 22:44:22 +02:00
Markus Heiser
aecfb2300d [mod] one logger per engine - drop obsolete logger.getChild
Remove the no longer needed `logger = logger.getChild(...)` from engines.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-06 18:05:46 +02:00
Markus Heiser
7b235a1c36 [mod] one logger per engine
Suggested-by: @dalf in https://github.com/searxng/searxng/issues/98#issuecomment-849013518
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-06 17:47:28 +02:00
Markus Heiser
9ff881f937 [fix] remove minimum length of content for XPath engine
Instead of raising an exception and therefore hiding all results of the engine.

It make sense to remove that requirement in order to allow the implementation of
search engines that do not always have a description.  In fact some search
engines that in 99% of the case have a description like Brave Search or Mojeek
crash completely if they for some reason included a result with no description.

To test this patch try Mojeek:

    !mjk xyz

before and after the patch.

Suggested-by: 0xhtml in https://github.com/searx/searx/discussions/2933
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-04 12:41:23 +02:00
Allen
a5a0a4e106 [fix] Correct engine name in for Rumble 2021-09-04 10:22:26 +02:00
Allen
49bbd250d9 [fix] Update about section of Invidious
Another website and new documentation
2021-09-04 10:22:07 +02:00
Markus Heiser
b83c14cf6b [pylint] Pylint 2.10 - fix use-list-literal & use-dict-literal
Pylint 2.10 added new default checks [1]:

use-list-literal
  Emitted when list() is called with no arguments instead of using []

use-dict-literal
  Emitted when dict() is called with no arguments instead of using {}

[1] https://pylint.pycqa.org/en/latest/whatsnew/2.10.html

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-08-31 10:40:29 +02:00
Noémi Ványi
3d5e6e0abb [enh] google: add filter=0 to Google engine for more results
backport from searx ( 23b3b56a06ef831af0a1b30a12c26ebd50e329bb )
2021-08-21 17:46:16 +02:00
Samuel Dudik
7a7ef9cea6 [fix] Seznam engine - some XPath selectors has been changed
Merged from https://github.com/dudik/searx/commit/5a4207759

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-07-27 07:13:41 +02:00
Alexandre Flament
48fe83b901
Merge pull request #221 from dalf/fix-peertube_fetch_supported_languages
[fix] peertube: update _fetch_supported_languages
2021-07-25 10:30:53 +02:00
Markus Heiser
fe67f1478f [fix] qwant engine - prevent API locale exception on lang 'all'
Has been reported in [1], error message::

    Error
        Error: searx.exceptions.SearxEngineAPIException
        Percentage: 0
        Parameters: ('API error::locale must be a string,locale must be one of
        the following values: en_gb, en_ie, en_us, en_ca, en_in, en_my, en_au,
        en_nz, cy_gb, gd_gb, de_de, de_ch, de_at, fr_fr, br_fr, fr_be, fr_ch,
        fr_ca, fr_ad, fc_ca, ec_ca, co_fr, es_es, es_ar, es_cl, es_co, es_mx,
        es_pe, es_ad, ca_es, ca_ad, ca_fr, eu_es, eu_fr, it_it, it_ch, pt_br,
        pt_pt, pt_ad, nl_be, nl_nl, pl_pl, zh_hk, zh_cn, fi_fi, bg_bg, et_ee,
        hu_hu, da_dk, nb_no, sv_se, ko_kr, th_th, cs_cz, ro_ro, el_gr',)
        File name: searx/engines/qwant.py:114
        Function: response
        Code: raise SearxEngineAPIException('API error::' + msg)

[1] https://github.com/searxng/searxng/issues/222

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-07-24 14:48:27 +02:00
Markus Heiser
ca57c7421b [fix] qwant engine - prevent exception on date/time value is None
Has been reported in [1], error messages::

  Error
       Error: ValueError
       Percentage: 0
       Parameters: ()
       File name: searx/engines/qwant.py:159
       Function: response
       Code: pub_date = datetime.fromtimestamp(item['date'], None)

    Error
        Error: TypeError
        Percentage: 0
        Parameters: ('an integer is required (got type NoneType)',)
        File name: searx/engines/qwant.py:196
        Function: response
       Code: pub_date = datetime.fromtimestamp(item['date'])

Fix timedelta from seconds to milliseconds [1], error message::

    Error
        Error: TypeError
        Percentage: 0
        Parameters: ('unsupported type for timedelta seconds component: NoneType',)
        File name: searx/engines/qwant.py:195
        Function: response
        Code: length = timedelta(seconds=item['duration'])

[1] https://github.com/searxng/searxng/issues/222

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-07-24 14:48:14 +02:00
Alexandre Flament
b0a12924a0 [fix] peertube: update _fetch_supported_languages
update the regex to match the changes in peertube source code
fix "make data.languages"
2021-07-23 12:03:16 +02:00
Alexandre Flament
f523fd3ea7
Merge pull request #211 from MarcAbonce/onions_v3_fix_searxng
Update onion engines to v3
2021-07-16 17:25:37 +02:00
Alexandre Flament
d47b8e36cf
Merge pull request #207 from return42/mongodb
[enh] add mongodb offline engine
2021-07-16 16:15:01 +02:00
Alexandre Flament
0d65a81b1c [mod] qwant engine: fix typos / minor change
minor modification of commit 628b5703f3
(no functionnal change)
2021-07-16 15:32:12 +02:00
Marc Abonce Seguin
1b05ea6a6b update onion engines to v3
remove not_evil which has been down for a while now:
https://old.reddit.com/r/onions/search/?q=not+evil&restrict_sr=on&t=year
2021-07-16 01:36:34 -07:00
Markus Heiser
0a9cd08bf1 [enh] add mongodb offline engine
Cherry-Pick: https://github.com/searx/searx/commit/198aad43
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-07-15 21:35:33 +02:00
Markus Heiser
628b5703f3 [mod] improve video results of the qwant engine
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-07-15 20:10:37 +02:00
Alexandre Flament
f376b4ed3e
Merge pull request #205 from unixfox/patch-2
Add missing parameter for mobile UI search
2021-07-15 17:19:12 +02:00
Émilien Devos
6c9f276571
Add missing parameter for mobile UI search 2021-07-15 13:00:32 +00:00
Markus Heiser
ef6e1bd6b9 [fix] Qwant engines - implement API v3 and add 'quant videos'
The implementation uses the Qwant API (https://api.qwant.com/v3). The API is
undocumented but can be reverse engineered by reading the network log of
https://www.qwant.com/ queries.

This implementation is used by different qwant engines in the settings.yml::

  - name: qwant
    categories: general
    ...
  - name: qwant news
    categories: news
    ...
  - name: qwant images
    categories: images
    ...
  - name: qwant videos
    categories: videos
    ...

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-07-14 09:47:32 +02:00
Markus Heiser
513c73a309 [drop] engine torrentz: torrentz2.eu and torrentz2.is are offline
[1] https://torrentfreak.com/torrentz2-eu-domain-suspended-by-registry-on-public-prosecutors-order-200628/

Suggested-by: @rasos https://github.com/searx/searx/issues/1875#issuecomment-877755872
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-07-11 13:24:33 +02:00
Émilien Devos
d9d9bd720d
Fix google images
Proposed fix in https://github.com/searx/searx/pull/2115#issuecomment-876716010
2021-07-10 14:09:29 +00:00
Markus Heiser
0ef6aa5126 [docs] add documentation from the sources of the google engines
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-21 18:25:52 +02:00
Markus Heiser
05e90f2e57 [fix] google answers: normalize space of the answers.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-21 16:50:25 +02:00
Markus Heiser
f096d68ec6 [mod] google engine: reduce mobile UI parameters to what is needed
Reverse engineering shows that not all of the parameters used by google's mobile
UI (aka "more results" button) are needed [1].

[1] https://github.com/searxng/searxng/pull/160#issuecomment-865013625

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-21 16:50:23 +02:00
Alexandre Flament
7a5c36408a [mod] google: add "use_mobile_ui" parameter to use mobile endpoint.
disable by default, it has to be enabled in settings.yml

related to  #159
2021-06-21 14:52:04 +02:00
Markus Heiser
9328c66e93 [fix] google news - send CONSENT Cookie to not be redirected
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking.  To get the consent
from the user, google-news requests are redirected to confirm and get a CONSENT
Cookie from https://consent.google.de/s?continue=...

This patch adds a CONSENT Cookie to the google-news request to avoid
redirection.

The behavior of the CONTENTS cookies over all google engines seems similar but
the pattern is not yet fully clear to me, here are some random samples from my
analysis ..

Using common google search from different domains::

    google.com:        CONSENT=YES+cb.{{date}}-14-p0.de+FX+816
    google.de:         CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
    google.fr:         CONSENT=YES+srp.gws-{{date}}-0-RC2.fr+FX+826

When searching about videos (google-videos)::

    google.es:         CONSENT=YES+srp.gws-{{date}}-0-RC2.es+FX+076
    google.de:         CONSENT=YES+srp.gws-{{date}}-0-RC2.de+FX+171

Google news has only one domain for all languages::

    news.google.com:   CONSENT=YES+cb.{{date}}-14-p0.de+FX+816

Using google-scholar search from different domains::

    scholar.google.de: CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
    scholar.google.fr: does not use such a cookie / did not ask the user
    scholar.google.es: does not use such a cookie / did not ask the user

Interim summary:

  Pattern is unclear and I won't apply the CONSENT cookie to all google engines.
  More experience is need before we generalize the CONSENT cookies over all
  google engines.

Related:

- e9a6ab401 [fix] youtube - send CONSENT Cookie to not be redirected
- https://github.com/benbusby/whoogle-search/issues/311
- https://github.com/benbusby/whoogle-search/issues/243

[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-18 13:21:20 +02:00
Markus Heiser
dd7b53d369 [fix] google-news engine - KeyError: 'hl in request
Since we added

- 1c67b6aec [enh] google engine: supports "default language"

there is a KeyError: 'hl in request,error pattern::

    ERROR:searx.searx.search.processor.online:engine google news : exception : 'hl'
    Traceback (most recent call last):
      File "searx/search/processors/online.py", line 144, in search
        search_results = self._search_basic(query, params)
      File "searx/search/processors/online.py", line 118, in _search_basic
        self.engine.request(query, params)
      File "searx/engines/google_news.py", line 97, in request
        if lang_info['hl'] == 'en':
      KeyError: 'hl'

Closes: https://github.com/searxng/searxng/issues/154
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-18 11:34:11 +02:00
Markus Heiser
343570f7fb [pylint] searx/engines/duckduckgo_definitions.py
BTW: normalize indentations

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-14 09:22:29 +02:00
Markus Heiser
2ac3e5b20b [fix] log messages from: google- images, news, scholar, videos
- HTTP header Accept-Language --> lang_info['headers']['Accept-Language']
- remove obsolete query_url log messages which is already logged by
  httpx._client:HTTP request

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-11 16:31:50 +02:00
Markus Heiser
1ac3961336 [mod] google - get_lang_info add documentataion & comments
BTW: remove obsolete log messages from google engine

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-11 16:06:36 +02:00
Alexandre Flament
1c67b6aece [enh] google engine: supports "default language"
Same behaviour behaviour than Whoogle [1].  Only the google engine with the
"Default language" choice "(all)"" is changed by this patch.

When searching for a locate place, the result are in the expect language,
without missing results [2]:

  > When a language is not specified, the language interpretation is left up to
  > Google to decide how the search results should be delivered.

The query parameters are copied from Whoogle.  With the ``all`` language:

- add parameter ``source=lnt``
- don't use parameter ``lr``
- don't add a ``Accept-Language`` HTTP header.

The new signature of function ``get_lang_info()`` is:

    lang_info = get_lang_info(params, lang_list, custom_aliases, supported_any_language)

Argument ``supported_any_language`` is True for google.py and False for the other
google engines.  With this patch the function now returns:

- query parameters: ``lang_info['params']``
- HTTP headers: ``lang_info['headers']``
- and as before this patch:
  - ``lang_info['subdomain']``
  - ``lang_info['country']``
  - ``lang_info['language']``

[1] https://github.com/benbusby/whoogle-search
[2] https://github.com/benbusby/whoogle-search/releases/tag/v0.5.4
2021-06-10 10:22:01 +02:00
Markus Heiser
bf10b4a857 [fix] openstreetmap - fix some minor whitespace & indentation issues
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-09 18:08:23 +02:00
Alexandre Flament
c75425655f [enh] openstreetmap / map template: improve results
implements ideas described in #69

* update the engine
* use wikidata
* update map.html template
2021-06-09 18:08:23 +02:00
Markus Heiser
5c5db719d2
Merge pull request #97 from return42/drop-searx-admin
[docs] reorder blog articles
2021-06-08 10:56:18 +00:00
Alexandre Flament
8194db4e21 [fix] peertube fetch supported languages
close #127
2021-06-04 16:17:20 +02:00
Markus Heiser
f122cb0e27 [fix] typo: online_dictionnary --> online_dictionary
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-04 15:05:58 +02:00
Markus Heiser
79cc82a4db [docs] add engine "Demo Online Engine"
This engine just exists for documentation purpose.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-04 15:05:58 +02:00
Markus Heiser
1c8cf1d3a8 [docs] add engine "Demo Offline Engine"
This engine just exists for documentation purpose.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-04 15:04:38 +02:00
Alexandre Flament
7457f3fe40
Merge pull request #124 from return42/searx-merge
merge redis offline engine from searx
2021-06-02 12:35:33 +02:00
Markus Heiser
39c18274c6 [fix] enigine redis - avoid error when the engine is loaded
Should be _redis_client to avoid an error when the engine is loaded.

Suggested-by: @dalf https://github.com/searxng/searxng/pull/124#pullrequestreview-673885664
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-02 09:54:58 +02:00
Alexandre Flament
8375974dff [fix] sys.exit(1) when there is duplicate engine name 2021-06-01 16:37:20 +02:00
Markus Heiser
8908937046 [mod] searx.engines.load_engine return None instead of sys.exit(1)
Loading an engine should not exit the application (*). Instead
of exit, return None.

(*) RuntimeError still exit the application: syntax error, etc...

BTW: add documentation and normalize indentation (no functional change)

Suggested-by: @dalf https://github.com/searxng/searxng/pull/116#issuecomment-851865627
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-01 16:35:17 +02:00
Alexandre Flament
70a9208972 [mod] searx.engines.__init__: refactoring 2021-06-01 16:32:40 +02:00
Adam Tauber
e4b6558339 [enh] add redis offline engine / https://redis.io/
Slightly modified merge of commit [97269be6], [01a8a5814a] and [c8d2b5eb] from
searx.

[97269be6] https://github.com/searx/searx/commit/97269be6
[01a8a581] https://github.com/searx/searx/commit/01a8a581
[c8d2b5eb] https://github.com/searx/searx/commit/c8d2b5eb

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-01 11:51:25 +02:00
Alexandre Flament
4b07df62e5 [mod] move all default settings into searx.settings_defaults 2021-06-01 08:10:15 +02:00
Kyle Anthony Williams
d6a2d4f969 [enh] add engine - Docker Hub
Slightly modified merge of commit [1cb1d3ac] from searx [PR 2543]:

      This adds Docker Hub .. as a search engine .. the engine's favicon was
      downloaded from the Docker Hub website with wget and converted to a PNG
      with ImageMagick .. It supports the parsing of URLs, titles, content,
      published dates, and thumbnails of Docker images.

[1cb1d3ac] https://github.com/searx/searx/pull/2543/commits/1cb1d3ac
[PR 2543] https://github.com/searx/searx/pull/2543

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-30 15:18:36 +02:00
Alexandre Flament
1113f7e616 [mod] the bittorent search engines are available only in the files category
related to #101
2021-05-29 16:14:19 +02:00
Noémi Ványi
87a01a1736 [enh] add MySQL engine
Slightly modified merge of [c00a33fe] from searx.

[c00a33fe] c00a33feee

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-28 17:36:46 +02:00
Noémi Ványi
324aa96062 [enh] add PostgreSQL engine
Slightly modified merge of [22079ff] from searx.

[22079ff] 22079ffdef

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-28 17:34:44 +02:00
Markus Heiser
32b5a0ef7b
Merge pull request #93 from return42/genius-misc
Some minor Genius improvements
2021-05-27 14:23:22 +00:00
Markus Heiser
25b5797a0c
Merge pull request #103 from searxng/add-sqlite-engine2
[enh] add offline engine for sqlite database
2021-05-27 14:06:42 +00:00
Alexandre Flament
2ea34a3c36 [enh] add offline engine for sqlite database
To test & demonstrate this implementation download:

  https://liste.mediathekview.de/filmliste-v2.db.bz2

and unpack into searx/data/filmliste-v2.db, in your settings.yml define a sqlite
engine named "demo"::

    - name : demo
      engine : sqlite
      shortcut: demo
      categories: general
      result_template: default.html
      database : searx/data/filmliste-v2.db
      query_str :  >-
        SELECT title || ' (' || time(duration, 'unixepoch') || ')' AS title,
               COALESCE( NULLIF(url_video_hd,''), NULLIF(url_video_sd,''), url_video) AS url,
               description AS content
          FROM film
         WHERE title LIKE :wildcard OR description LIKE :wildcard
         ORDER BY duration DESC
      disabled : False

Query to test: "!demo concert"

This is a rewrite of the implementation from commit [1]

[1] searx/searx@8e90a21

Suggested-by: @virtadpt searx/searx#2808
2021-05-27 14:27:11 +02:00
Markus Heiser
dc21cb5d4b [fix] unsplash engine - 'searx:result: invalid title:'
- Use result 'alt_description' as title, if not given use
  default title 'unknown'.
- Use result 'description' from unsplash as 'content'

Fix error::

    DEBUG:searx:result: invalid title: {..., 'title': None, 'content': '', 'engine': 'unsplash'}

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-25 17:26:58 +02:00
Markus Heiser
a88e3e4fea [pylint] searx/engines/unsplash.py, add logger & norm indentation
- fix messages from pylint
- add logger and log request URL
- normalized various indentation

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-25 16:45:32 +02:00
Markus Heiser
f963759ccc [fix] engine genius should not use the video template
Remove 'template' from result.  Engine genius should
not use the video template.  BTW: fix indentations

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-24 16:31:14 +02:00
Markus Heiser
3a71d4b175 [pylint] searx/engines/genius.py, add logger & normalized indentation
- pylint searx/engines/genius.py
- add logger and log ignored exceptions
- normalized various indentation

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-24 16:19:06 +02:00
Markus Heiser
84a943f867 [enh] XPath engine - add time safe-search support
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-23 22:26:18 +02:00
Markus Heiser
6bfe3fd033 [enh] XPath engine - add time range support
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-23 16:49:30 +02:00
Markus Heiser
1933577c8e [enh] XPath engine - add ISO 639-1 {lang} replacement to search-URL
BTW: remove obsolte params['query'] and not needed paging condition.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-23 15:05:36 +02:00
Markus Heiser
8cd544b2a6 [doc] add documentation about the XPath engine
- pylint searx/engines/xpath.py
- fix indentation of some long lines
- add logging
- add doc-strings

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-23 11:48:21 +02:00
Markus Heiser
ffcebf5e12 [enh] xpath engine - add request parameter 'soft_max_redirects'
Make 'soft_max_redirects' configurable per Xpath engine::

    - name : <engine-name>
      engine : xpath
      soft_max_redirects: 1
      ...

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-17 15:04:55 +02:00
Alexandre Flament
8c1a65d32f [mod] multithreading only in searx.search.* packages
it prepares the new architecture change,
everything about multithreading in moved in the searx.search.* packages

previously the call to the "init" function of the engines was done in searx.engines:
* the network was not set (request not sent using the defined proxy)
* it requires to monkey patch the code to avoid HTTP requests during the tests
2021-05-05 13:12:42 +02:00
Marc Abonce Seguin
448bfe6005 fix Qwant's fetch_languages function 2021-05-02 17:46:40 -07:00
Michael Ilsaas
0c43cf89ca [fix] URL to solidtorrent result page
Reported-by: https://github.com/searx/searx/pull/2786
2021-04-29 10:40:47 +02:00
Markus Heiser
dc29f1d826 [pylint] tag PYLINT_FILES by comment # lint: pylint
These py files are linted by `test.pylint`, all other files are linted by
`test.pep8`.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-26 20:18:20 +02:00
Markus Heiser
28b25185c5 [brand] searxng -- fix links to issue tracker & WEB-GUI
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-25 14:25:08 +02:00
Markus Heiser
8efabd3ab7 [mod] core.ac.uk engine
- add to list of pylint scripts
- add debug log messages
- move API key int `settings.yml`
- improved readability
- add some metadata to results

Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-04-24 09:00:53 +02:00
spongebob33
7528e38c8a add core.ac.uk engine 2021-04-24 08:55:45 +02:00
Alexandre Flament
d01741c9a2
Merge pull request #15 from return42/add-springer
Add a search engine for Springer Nature
2021-04-22 13:23:31 +02:00
Pierre Chevalier
a80bf1ba97 [enh] Add Springer Nature engine
Springer Nature is a global publisher dedicated to providing service to research
community [1] with official API [2].

To test this PR, first get your API key following this page:

   https://dev.springernature.com/signup

In searx/engines/springer.py at line 24, add this API key.  I left my own key,
commented out in the line aboce.  Feel free to use it, if needed.

[1] https://www.springernature.com/
[2] https://dev.springernature.com/
2021-04-22 12:35:25 +02:00
habsinn
41a2e3785e [enh] add engine using API from "The Art Institute of Chicago" 2021-04-22 12:25:43 +02:00
Markus Heiser
e9a6ab4015 [fix] youtube - send CONSENT Cookie to not be redirected
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking.  To get the consent
from the user, youtube requests are redirected to confirm and get a CONSENT
Cookie from https://consent.youtube.com

This patch adds a CONSENT Cookie to the youtube request to avoid redirection.

[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Reported-by: https://github.com/searx/searx/issues/2774
2021-04-22 12:09:09 +02:00
Alexandre Flament
c6d5605d27
Merge pull request #7 from searxng/metrics
Metrics
2021-04-22 08:34:17 +02:00
Alexandre Flament
b7848e3422 [fix] searxng fix: sjp engine 2021-04-21 16:31:29 +02:00
Alexandre Flament
7acd7ffc02 [enh] rewrite and enhance metrics 2021-04-21 16:24:46 +02:00
Alexandre Flament
aae7830d14 [mod] refactoring: processors
Report to the user suspended engines.

searx.search.processor.abstract:
* manages suspend time (per network).
* reports suspended time to the ResultContainer (method extend_container_if_suspended)
* adds the results to the ResultContainer (method extend_container)
* handles exceptions (method handle_exception)
2021-04-21 16:24:46 +02:00
Alexandre Flament
48720e20a8 Merge remote-tracking branch 'searx/master' 2021-04-19 09:35:12 +02:00
Noémi Ványi
8362257b9a
Merge pull request #2736 from plague-doctor/sjp
Add new engine: SJP - Słownik języka polskiego
2021-04-16 17:30:14 +02:00
Noémi Ványi
e56323d3c8
Merge pull request #2759 from ypid/fix/typo
Fix grammar mistake in debug log output
2021-04-16 17:26:45 +02:00
Plague Doctor
d275d7a35e Code refactoring. 2021-04-16 12:23:27 +10:00
Markus Heiser
062d589f86 [fix] xpath expressions to grap all items from bandcamp's response
I also found some items missing a thumbnail and I used text_extract for content
and title, to remove unneeded whitespaces.

BTW: added bandcamp's favicon

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-15 08:52:11 +02:00
Kyle Anthony Williams
4d3c399ee9 [feat] add bandcamp engine 2021-04-15 08:52:11 +02:00
Alexandre Flament
d14994dc73 [httpx] replace searx.poolrequests by searx.network
settings.yml:

* outgoing.networks:
   * can contains network definition
   * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
     keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
   * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
   * local_addresses can be "192.168.0.1/24" (it supports IPv6)
   * support_ipv4 & support_ipv6: both True by default
     see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
   * either a full network description
   * either reference an existing network

* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
2021-04-12 17:25:56 +02:00
Robin Schneider
dfc66ff0f0
Fix grammar mistake in debug log output 2021-04-11 22:12:53 +02:00
Alexandre Flament
eaa694fb7d [enh] replace requests by httpx 2021-04-10 15:38:33 +02:00
Plague Doctor
599ff39ddf Fix conflicts 2021-04-09 06:54:03 +10:00
Plague Doctor
6631f11305 Add new engine: SJP 2021-04-08 10:21:54 +10:00
Plague Doctor
7035bed4ee Add new engine: Wordnik.com 2021-04-08 09:58:00 +10:00
Noémi Ványi
07f5edce3d Add Meilisearch engine
Website: https://www.meilisearch.com/
2021-04-06 21:57:05 +02:00
Alexandre Flament
725a69616b
Merge pull request #2681 from dalf/fix-wikipedia-title
[fix] wikipedia: remove HTML from the title
2021-03-27 17:43:36 +01:00
Noémi Ványi
9bb312c505 Remove duplicated key from dict in Semantic Scholar 2021-03-27 16:58:32 +01:00
Noémi Ványi
f596f5767b fix Semantic Scholar engine 2021-03-27 16:54:01 +01:00
Adam Tauber
28286cf3f2 [fix] update seznam engine to be compatible with the new website 2021-03-27 15:29:04 +01:00
Alexandre Flament
fcfcf662ff [fix] wikipedia: remove HTML from the title
fr.wikipedia.org (and it seems not other wikipedia websites),
adds HTML to api_result['displayTitle'].
(Search for '!wp :fr Braid' for example)

The commit uses api_result['title']
2021-03-25 08:31:39 +01:00
Adam Tauber
0ba71c3644 [fix] make ina engine compatible with the new response json 2021-03-25 01:20:41 +01:00
Adam Tauber
5f450fda74 [enh] add year filter to duckduckgo 2021-03-25 00:25:36 +01:00
Adam Tauber
fd737dc9d8 [fix] remove debug code 2021-03-24 23:54:39 +01:00
Alexandre Flament
38c210d746
[mod] soundcloud: faster initialization
The get_cliend_id() function:
* fetches https://soundcloud.com
* then fetches each referenced javascript URL to get the client id.

This commit fetches the javascript URLs in the reverse order: the client id is in the last javascript URL.
2021-03-21 09:29:53 +01:00
Adam Tauber
4c631ac6d0 [fix] remove debug code 2021-03-15 21:47:27 +01:00
Noémi Ványi
8158d8654a fix Microsoft Academic engine 2021-03-15 20:21:28 +01:00
Adam Tauber
f97b4ff7b6 [fix] update youtube_noapi paging 2021-03-15 17:22:31 +01:00
Adam Tauber
dd34ac396c
Merge pull request #2652 from kvch/solr-engine
Add Apache Solr engine
2021-03-15 15:39:39 +01:00
Alexandre Flament
1664258061
Merge pull request #2655 from return42/fix-imports
[fix] remove unused import from yahoo-news engine
2021-03-15 08:38:34 +01:00
Markus Heiser
6e1f1085ef [fix] remove unused import from yahoo-news engine
Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-14 15:13:57 +01:00
Markus Heiser
3703ebb22a [drop] Acgsou engine - www.acgsou.com no longer exists
- https://www.acgsou.com/ acgsou.com is redirected to 36dm.club
- @rinpatch do not plan on maintaining the engine [1]

[1] https://github.com/searx/searx/pull/1283#issuecomment-798783585

Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-14 11:49:18 +01:00
Noémi Ványi
ff527e2681 Add Solr engine 2021-03-13 21:18:09 +01:00
Alexandre Flament
92dd5e245e
Merge pull request #2626 from mikeri/solidtorrents
Add Solid Torrents engine
2021-03-12 19:45:22 +01:00
Alexandre Flament
a1a492baed
Merge pull request #2641 from dalf/disable_http_by_default
[mod] by default allow only HTTPS, not HTTP
2021-03-12 19:21:46 +01:00
Markus Heiser
96422e5c9f [fix] APKMirror engine - update xpath selectors and fix img_src
BTW: make the code slightly more readable

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-09 08:34:57 +01:00
Markus Heiser
d2faea423a [fix] rewrite Yahoo-News engine
Many things have been changed since last review of this engine.  This patch fix
xpath selectors, implements suggestion and is a complete review / rewrite of the
engine.

Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-08 11:43:34 +01:00
Alexandre Flament
99e0651cea [mod] by default allow only HTTPS, not HTTP
Related to https://github.com/searx/searx/pull/2373
2021-03-08 11:35:08 +01:00
Michael Ilsaas
5549d58de3 Add Solid Torrents engine 2021-03-07 18:14:30 +01:00
Adam Tauber
44f4a9d49a [enh] add ability to send engine data to subsequent requests 2021-03-06 12:12:35 +01:00
Markus Heiser
4845183128 [mod] don't dump traceback of SearxEngineResponseException on init
When initing engines a "SearxEngineResponseException" is logged very verbose,
including full traceback information:

    ERROR:searx.engines:yggtorrent engine: Fail to initialize
    Traceback (most recent call last):
      File "share/searx/searx/engines/__init__.py", line 293, in engine_init
        init_fn(get_engine_from_settings(engine_name))
      File "share/searx/searx/engines/yggtorrent.py", line 42, in init
        resp = http_get(url, allow_redirects=False)
      File "share/searx/searx/poolrequests.py", line 197, in get
        return request('get', url, **kwargs)
      File "share/searx/searx/poolrequests.py", line 190, in request
        raise_for_httperror(response)
      File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror
        raise_for_captcha(resp)
      File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha
        raise_for_cloudflare_captcha(resp)
      File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha
        raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 * 24 * 15)
    searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000

For SearxEngineResponseException this is not needed.  Those types of exceptions
can be a normal use case.  E.g. for CAPTCHA errors like shown in the example
above. It should be enough to log a warning for such issues:

    WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000

closes: #2612

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-05 17:26:22 +01:00
Markus Heiser
d48e2e7b0b [enh] google scholar - python implementation of the engine
The old xpath configuration for google scholar did not work and is replaced by a
python implementation.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-01 15:16:37 +01:00
Alexandre Flament
f77983e174
Merge pull request #2602 from MarcAbonce/fix-bing-fetch-languages
Fix fetch_languages for Bing
2021-03-01 09:06:37 +01:00
GazoilKerozen
5f6ac3afa2
Add Freesound engine (#2596)
Add freesound engine with player.

Co-authored-by: Gazoil <maildeguzel@gmail.com>
2021-03-01 08:52:36 +01:00
Marc Abonce Seguin
d6681fd33b remove articles number from engines_languages.json 2021-02-25 23:54:21 -07:00
Marc Abonce Seguin
9b6ffed061 fix fetch_languages for bing
Bing has a list of regions that it supports and some of these regions
may have more than one possible language.

In some cases, like Switzerland, these languages are always shown as
options, so there is no issue. But in other cases, like Andorra, Bing
will only show one language at the time, either the region's default or
the request's language if the latter is supported by that region.

For example, if the HTTP request is in French, Andorra will appear as
fr-AD but if the same page is requested in any other language Andorra
will appear as ca-AD.

This is specially a problem when Bing assumes that the request is in
English because it overrides enough language codes to make several major
languages like Arabic dissappear from the languages.py file.

To avoid that issue, I set the Accept-Language header to a language
that's only supported in one region to hopefully avoid these overrides.
2021-02-25 23:51:49 -07:00
Noémi Ványi
1be6ab2a91 Fix paging of Bing Images 2021-02-22 21:19:34 +01:00
datagram1
1d0a32a2c5 Added rumble.com video search engine. TODO video embedding.
Update rumble.py

some lines too long.

Disable Rumble engine

disabled : True

PEP8 fix

change line spacing
2021-02-20 12:48:56 +00:00
Alexandre Flament
44a6593c13
Merge pull request #2573 from unixfox/yggtorrent
update yggtorrent url + add it back
2021-02-16 08:22:07 +01:00
Emilien Devos
4b37e10dd9 fix yggtorrent url + add it back 2021-02-15 13:38:34 +01:00
Thorben Günther
fbbd4cc21f
Improve peertube searching
At the moment videos without a description are not shown - setting
default content to "" fixes this.
Another current bug is that thumbnails are not displayed. This is caused
by a double slash in the url. For this every trailing slash is now
stripped (for backwards compatibility) and the API response is correctly
parsed.
2021-02-13 19:47:33 +01:00
Alexandre Flament
45027765e3
Merge pull request #2566 from dalf/remove-yandex
[remove] yandex engine
2021-02-12 17:12:07 +01:00
Alexandre Flament
c22d4c764c [fix] duckduckgo engine: "!ddg !g" do not redirect to google
* searx understand "!ddg !g time" as : send "!g time" to DDG
* !g a DDG bang for Google: DDG return a HTTP redirect to Google

This commit adds a the allows_redirect param not to follow HTTP redirect.

The DDG engine returns a empty result as before without HTTP redirect.
2021-02-12 11:10:08 +01:00
Alexandre Flament
d76660463b
Merge pull request #2562 from dalf/mod-json-engine
[mod] json_engine: add content_html_to_text and title_html_to_text
2021-02-12 10:58:28 +01:00
Alexandre Flament
7dcf67a47a
Merge pull request #2565 from dalf/upd-wikipedia
[upd] wikipedia engine: return an empty result on query with illegal characters
2021-02-12 10:57:05 +01:00
Alexandre Flament
2b60d0d243
Merge pull request #2564 from dalf/fix-seznam
[fix] fix seznam engine
2021-02-12 10:56:53 +01:00
Alexandre Flament
7e83818879
Merge pull request #2560 from dalf/fix-duckduckgo
Fix duckduckgo
2021-02-12 10:56:40 +01:00
Alexandre Flament
74c8b5606f
Merge pull request #2541 from return42/mediathekviewweb
[enh] add engine MediathekViewWeb (API)
2021-02-11 15:11:26 +01:00
Alexandre Flament
5d9db6c2f7 [remove] yandex engine 2021-02-11 14:28:06 +01:00
Alexandre Flament
35dd069402 [fix] fix seznam engine
no paging support
2021-02-11 12:53:19 +01:00
Alexandre Flament
7d6e69e2f9 [upd] wikipedia engine: return an empty result on query with illegal characters
on some queries (like an IT error message), wikipedia returns an HTTP error 400.
this commit returns an empty result instead of showing an error to the user.
2021-02-11 12:29:21 +01:00
Alexandre Flament
ff84a1af35 [mod] json_engine: add content_html_to_text and title_html_to_text
Some JSON API returns HTML in either in the HTML or the content.
This commit adds two new parameters to the json_engine:
content_html_to_text and title_html_to_text, False by default.

If True, then the searx.utils.html_to_text removes the HTML tags.

Update crossref, openairedatasets and openairepublications engines
2021-02-10 16:42:11 +01:00
Alexandre Flament
436d366448
Merge pull request #2544 from mrwormo/congresslibrary
[Engine] Add Library of Congress engine
2021-02-10 10:13:46 +01:00
Alexandre Flament
d2dac11392 [mod] duckduckgo engine: better support of the language preference
After the main request, send a second to https://duckduckgo.com/t/sl_h

See https://github.com/searx/searx/issues/2259
2021-02-09 14:36:43 +01:00
Markus Heiser
bc1be3f0e9 [enh] add engine MediathekViewWeb (API)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-09 13:08:01 +01:00
mrwormo
051da88328 Add Library of Congress engine 2021-02-09 12:45:39 +01:00
Alexandre Flament
5e055b069b [fix) fix apk_mirror engine 2021-02-09 11:02:12 +01:00
Marc Abonce Seguin
64e81794fe add support for Chinese variants in Wikipedia 2021-02-08 21:56:45 -07:00
Hermógenes Oliveira
514faa9162 [feat] recoll: paged json support 2021-02-07 10:05:35 -03:00
mrwormo
c4c1636b18 Add Creative Commons search engine 2021-02-04 11:31:35 +01:00
Alexandre Flament
ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Markus Heiser
7f505bdc6f [fix] google: avoid unnecessary SearxEngineXPathException errors
Avoid SearxEngineXPathException errors when parsing non valid results::

    .//div[@class="yuRUbf"]//a/@href index 0 not found
    Traceback (most recent call last):
      File "./searx/engines/google.py", line 274, in response
        url = eval_xpath_getindex(result, href_xpath, 0)
      File "./searx/searx/utils.py", line 608, in eval_xpath_getindex
        raise SearxEngineXPathException(xpath_spec, 'index ' + str(index) + ' not found')
    searx.exceptions.SearxEngineXPathException: .//div[@class="yuRUbf"]//a/@href index 0 not found

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:50 +01:00
Markus Heiser
b1fefec40d [fix] normalize the language & region aspects of all google engines
BTW: make the engines ready for search.checker:

- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:46 +01:00
Markus Heiser
8cdad5d85d [fix] google-videos: parse values for 'length' & 'author'
The 'video.html' template from the 'oscar' design supports replacement
for *author* and *length*.  Google-videos does not have an author, alternatively
the publisher info from is used for the *author*.

Hint: these replacements are not supported by the 'simple' design.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:51:24 +01:00
Markus Heiser
89b3050b5c [fix] revise of the google-Video engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:39:30 +01:00
Alexandre Flament
8c46b767d0 [fix] google_news: avoid one HTTP redirect except for the English results
also add
params['soft_max_redirects'] = 1
to avoid false error reporting in /stats/errors
2021-01-24 08:53:35 +01:00
Markus Heiser
5f92dfcdbe [fix] google-news: query uses locale without country tag
Wthout country-region tag google will redirect to correct the contry tag [1]:

    SEARX_DEBUG=1 searx-checker -v "google news"
    ...
    https://news.google.com:443 "GET /search?q=computer&hl=en...      HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=computer&hl=en-US&.... HTTP/1.1" 200 None
    ...

[1] https://github.com/searx/searx/pull/2483#issuecomment-765600849

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-23 11:37:14 +01:00
Markus Heiser
baec54c492 [fix] revise of the google-news engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 18:49:45 +01:00
Alexandre Flament
b405646749
Merge pull request #2451 from mrwormo/invidious-engine
[Fix] Invidious Engine
2021-01-16 19:25:45 +01:00
Alexandre Flament
a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
mrwormo
2dff3887f0 [fix] Invidious engine by enabling requests by randomly picking amongst working instances 2021-01-14 12:12:56 +01:00
Alexandre Flament
3f8ebf70b1 [fix] pylint: use "raise ... from ..." 2020-12-20 09:46:53 +01:00
Alexandre Flament
eb33ae6893 [fix] Python 3.9: use html.unescape instead of HTMLParser.unescape 2020-12-20 09:46:53 +01:00
Alexandre Flament
02fc4147ce [mod] dictzone, translated, currency_convert: use engine_type online_curency and online_dictionnary 2020-12-17 11:39:36 +01:00
Alexandre Flament
7ec8bc3ea7 [mod] split searx.search into different processors
see searx.search.processors.abstract.EngineProcessor

First the method searx call the get_params method.

If the return value is not None, then the searx call the method search.
2020-12-17 11:39:36 +01:00
lucky13820
fea8958e99
Fix the StartPage result title is showing the url
Fix the issue 2395 where StartPage result title is showing the url. https://github.com/searx/searx/issues/2395
2020-12-16 13:54:14 -08:00
Alexandre Flament
292b73a3fc
Merge pull request #2385 from joshu9h/patch-1
[Fix] Startpage
2020-12-14 17:56:48 +01:00
Alexandre Flament
36600118fb
Merge pull request #2372 from dalf/remove-broken-engines
[remove] remove searchcode_doc and twitter
2020-12-13 17:11:05 +01:00
joshu9h
8260435c8b
[Fix] Startpage 2020-12-13 15:43:50 +01:00
Alexandre Flament
3c4a9c1188
Merge pull request #2358 from dalf/fix-command
[fix] command engine: SearchQuery.query is str not bytes
2020-12-11 14:53:24 +01:00
Alexandre Flament
d703119d3a [enh] add raise_for_httperror
check HTTP response:
* detect some comme CAPTCHA challenge (no solving). In this case the engine is suspended for long a time.
* otherwise raise HTTPError as before

the check is done in poolrequests.py (was before in search.py).

update qwant, wikipedia, wikidata to use raise_for_httperror instead of raise_for_status
2020-12-11 14:37:08 +01:00
Alexandre Flament
033f39bff7
Merge pull request #2376 from dalf/fix-mojeek
Fix mojeek
2020-12-11 13:14:54 +01:00
Alexandre Flament
6bc6d5e9fd
Merge pull request #2371 from dalf/mod-genius
[mod) genious: return valid results even if contents are empty
2020-12-11 13:14:03 +01:00
Alexandre Flament
d41cafd5f3 [fix] xpath, mojeek: fix commit 58d72f2692
before commit 58d72f2, category was not set in xpath.py,
so searx/engines/__init__py was setting the category to ['general']

the commit 58d72f2 set the category to [] which is not replaced by searx/engines/__init__.py
consequence: the mojeek engine is hidden in the preferences.

this commit revert the xpath.py change.

close #2368
2020-12-10 10:52:06 +01:00
Noémi Ványi
3a63dfbdd7 display if an engine does not support https
Closes #302
2020-12-09 20:49:54 +01:00
Alexandre Flament
1c9e7cef50 [remove] remove searchcode_doc and twitter
* twitter: the API has changed. the engine needs to rewritten.
* searchcode_doc: the API about documentation doesn't exist anymore.
2020-12-09 13:14:31 +01:00
Alexandre Flament
fa73f10f11 [mod) genious: return valid results even if contents are empty 2020-12-09 13:01:34 +01:00
Alexandre Flament
a77d8c8227
Merge pull request #2359 from dalf/update-duden
[mod] duden engine
2020-12-08 20:33:38 +01:00
Alexandre Flament
bd4869ecd0
Merge pull request #2366 from dalf/remove-seedpeer
[remove] seedpeer engine
2020-12-08 20:33:23 +01:00
Alexandre Flament
56c64d6b64 [remove] seedpeer engine
the website is offline.
2020-12-07 21:02:29 +01:00
Alexandre Flament
c1a9732268
Merge pull request #2364 from dalf/fix-youtube-noapi
[fix] youtube_noapi engine
2020-12-07 20:26:00 +01:00
Alexandre Flament
13d3004703
Merge pull request #2365 from dalf/fix-soundcloud
[fix] soundclound: accept result without content
2020-12-07 20:25:17 +01:00
Alexandre Flament
62073c0e1d
Merge pull request #2361 from dalf/fix-1x
[fix] 1x engine
2020-12-07 20:24:47 +01:00
Alexandre Flament
923bc02c17
Merge pull request #2363 from dalf/fix-wikipedia-minor
[fix] wikipedia: minor fix: return no result instead of crash in some very few cases.
2020-12-07 18:33:37 +01:00
Alexandre Flament
deb1bde20d [fix] soundclound: accept result without content 2020-12-07 17:45:36 +01:00
Alexandre Flament
34df0f7910 [fix] youtube_noapi engine 2020-12-07 17:44:31 +01:00
Alexandre Flament
58d51e082d [fix] wikipedia: minor fix: return no result instead of crash in some very few cases.
In few cases, the JSON results doesn't contains the key 'type'.
2020-12-07 17:42:05 +01:00
Alexandre Flament
4ec810749b [fix] 1x engine 2020-12-07 15:46:00 +01:00
Alexandre Flament
1e781863fa [fix] command engine: SearchQuery.query is str not bytes
see c225db45c8
2020-12-07 10:43:42 +01:00
Alexandre Flament
9bf594cbcf [mod] duden engine
* add params['soft_max_redirects'] = 1  (when there is spelling suggestion)
* avoid try..except
* use eval_xpath_* functions
2020-12-07 10:31:11 +01:00
Alexandre Flament
a458451d20
Merge pull request #2356 from dalf/fix-ddd
[fix] duckduckgo_definitions: fix relative image URL
2020-12-07 10:16:53 +01:00
Alexandre Flament
925bb561a2
Merge pull request #2352 from dalf/no_http
Remove HTTP connections as much as possible
2020-12-06 10:18:49 +01:00
Alexandre Flament
28cc644f0a [fix] duckduckgo_definitions: fix relative image URL
ddg returns relative URL to https://duckduckgo.com/
2020-12-06 10:14:09 +01:00
Alexandre Flament
cdceec1cbb
Merge pull request #2354 from dalf/fix-wikipedia
[fix] wikipedia engine: don't raise an error when the query is not found
2020-12-04 20:42:45 +01:00
Alexandre Flament
f0054d67f1 [fix] wikipedia engine: don't raise an error when the query is not found
Add a new parameter "raise_for_status", set by default to True.
When True, any HTTP status code >= 300 raise an exception ( #2332 )
When False, the engine can manage the HTTP status code by itself.
2020-12-04 20:04:39 +01:00
Alexandre Flament
bef2f2efa8 [fix] wikidata: fix crash when the item has no description at all and at least one URL. 2020-12-04 17:17:20 +01:00
Alexandre Flament
244e812f37 [fix] remove searx/engines/filecrop.py (dead code) 2020-12-04 16:48:15 +01:00
Alexandre Flament
fa909c7c02 [mod] stackoverflow & yandex: detect CAPTCHA response 2020-12-03 13:23:19 +01:00
Alexandre Flament
64cccae99e [mod] various engines: use eval_xpath* functions and searx.exceptions.*
Engine list: ahmia, duckduckgo_images, elasticsearch, google, google_images, google_videos, youtube_api
2020-12-03 10:22:48 +01:00
Alexandre Flament
ad72803ed9 [mod] xpath, 1337x, acgsou, apkmirror, archlinux, arxiv: use eval_xpath_* functions 2020-12-03 10:22:48 +01:00
Alexandre Flament
de887c6347 [mod] bing_news: use eval_xpath_getindex
remove unused function searx.utils.list_get
2020-12-03 10:22:48 +01:00
Alexandre Flament
1d0c368746 [enh] record details exception per engine
add an new API /stats/errors
2020-12-03 10:22:48 +01:00
Markus Heiser
bef185723a [refactor] digg - improve results and clean up source code
- strip html tags and superfluous quotation marks from content
- remove not needed cookie from request
- remove superfluous imports

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-12-02 21:54:27 +01:00
Markus Heiser
6b0a896f01 [mod] digg - pylint searx/engines/digg.py
Eliminate redundant file names which are tested by test.pylint and ignored by
test.pep8

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-12-02 20:59:30 +01:00
Markus Heiser
173b744ef0 [fix] digg - the ISO time stamp of published date has been changed
Error pattern::

    Engines cannot retrieve results:
    digg (unexpected crash time data '2020-10-16T14:09:55Z' does not match format '%Y-%m-%d %H:%M:%S')

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-12-02 20:40:12 +01:00
Alexandre Flament
b00d108673 [mod] pylint: numerous minor code fixes 2020-12-01 15:21:19 +01:00
Alexandre Flament
9ed3ee2beb [mod] wikidata: WDGeoAttribute class: doesn't change the method signature of get_str 2020-12-01 15:21:17 +01:00
Alexandre Flament
3cfef61123 [fix] /stats: report error percentage instead of error count
This bug exists since the PR https://github.com/searx/searx/pull/751
2020-12-01 15:07:09 +01:00
Noémi Ványi
4a36a3044d
Add recoll engine (#2325)
recoll is a local search engine based on Xapian:
http://www.lesbonscomptes.com/recoll/

By itself recoll does not offer web or API access,
this can be achieved using recoll-webui:
https://framagit.org/medoc92/recollwebui.git

This engine uses a custom 'files' result template

set `base_url` to the location where recoll-webui can be reached
set `dl_prefix` to a location where the file hierarchy as indexed by recoll can be reached
set `search_dir` to the part of the indexed file hierarchy to be searched, use an empty string to search the entire search domain
2020-11-30 08:35:15 +01:00
M. Efe Çetin
d1f527c3af
Photon API Link Update
Via https://photon.komoot.io/
2020-11-27 10:22:28 +03:00
Alexandre Flament
3786920df9 [enh] Add multiple outgoing proxies
credits go to @bauruine see https://github.com/searx/searx/pull/1958
2020-11-20 15:29:21 +01:00
Markus Heiser
c71d214b0c [refactor] deviantart - improve results and clean up source code
Devian's request and response forms has been changed.

- fixed title
- fixed time_range_dict to 'popular-*-***'
- use image from <noscript> if exists
- drop obsolete "http to https, remove domain sharding"
- use query URL https://www.deviantart.com/search/deviations?page=5&q=foo
- add searx/engines/deviantart.py to pylint check (test.pylint)

Error pattern::

    There DEBUG:searx:result: invalid title: {'url': 'https://www.deviantart.com/  ...

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-11-14 17:09:56 +01:00
Alexandre Flament
3038052c79 [mod] remove unused import
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url  # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
2020-11-14 14:11:02 +01:00
Alexandre Flament
c3d9b17c2a
Merge pull request #2292 from kvch/elasticsearch-engine
New engine: Elasticsearch
2020-11-14 13:25:08 +01:00
Alexandre Flament
102c08838b
Merge pull request #2289 from dalf/pylint
[mod] pylint: add extension-pkg-whitelist=lxml.etree
2020-11-14 13:24:31 +01:00
Noémi Ványi
43e697681e New engine: Elasticsearch 2020-11-10 19:53:38 +01:00
Alexandre Flament
58d72f2692 [mod] pylint: minor code change to allow pylint globally
This commit is only a step, it doesn't fix all the issues reported by pylint
2020-11-03 11:35:53 +01:00
Alexandre Flament
eed43783f9 [fix] comamnd engine: fix import 2020-11-03 10:55:08 +01:00
Alexandre Flament
a08df82574 [fix] scanr_structure engine: fix import 2020-11-03 10:54:02 +01:00
Alexandre Flament
95bd6033fa [mod] wikidata engine: use one SPARQL request instead of 2 HTTP requests. 2020-10-28 08:09:25 +01:00
Alexandre Flament
ca593728af [mod] duckduckgo_definitions: display only user friendly attributes / URL
various bug fixes
2020-10-28 08:09:25 +01:00
a01200356
c3daa08537 [enh] Add onions category with Ahmia, Not Evil and Torch
Xpath engine and results template changed to account for the fact that
archive.org doesn't cache .onions, though some onion engines migth have
their own cache.

Disabled by default. Can be enabled by setting the SOCKS proxies to
wherever Tor is listening and setting using_tor_proxy as True.

Requires Tor and updating packages.

To avoid manually adding the timeout on each engine, you can set
extra_proxy_timeout to account for Tor's (or whatever proxy used) extra
time.
2020-10-25 17:59:05 -07:00
Nicholas Kegler
8e15d3e4c1 Open Semantic Search Engine 2020-10-25 17:50:00 +01:00
Noémi Ványi
e158eeee4b Propagate error messages from YouTube API 2020-10-09 17:34:26 +02:00
Adam Tauber
835d16cbb1
Merge pull request #2255 from kvch/yacy-improvements
Add yacy improvements: HTTP digest auth, category checking
2020-10-09 16:34:42 +02:00
Alexandre Flament
cfd21bc475 [fix] fix duckduckgo engine
- remove paging support: a "vqd" parameter is required between each request. This parameter is uniq for each request
- update the URL (no redirect), use the POST method
- language support: works if there is no more than request per minute, otherwise it is ignored !
2020-10-09 16:00:42 +02:00
Noémi Ványi
72c7fd25fe Add yacy improvements: HTTP digest auth, category checking 2020-10-09 15:06:05 +02:00