1
0
mirror of https://github.com/searxng/searxng.git synced 2024-11-22 04:01:40 +01:00
Commit Graph

91 Commits

Author SHA1 Message Date
Markus Heiser
10d3af84b8 [fix] engine: duckduckgo - don't quote query string
The query string send to DDG must not be qouted.

The query string was URL-qouted in #4011, but the URL-qouted query string result
in unexpected *URL decoded* and other garbish results as reported in #4019
and #4020.  To test compare the results of a query like::

    !ddg Häuser und Straßen :de
    !ddg Häuser und Straßen :all
    !ddg 房屋和街道 :all
    !ddg 房屋和街道 :zh

Closed:

- [#4019] https://github.com/searxng/searxng/issues/4019
- [#4020] https://github.com/searxng/searxng/issues/4020

Related:

- [#4011] https://github.com/searxng/searxng/pull/4011

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-17 18:14:22 +01:00
Nicolas Dato
abd9b271bc [fix] engine: duckduckgo - only uses first word of the search terms
during the revision in PR #3955 the query string was accidentally converted into
a list of words, further the query must be quoted before POSTed in the ``data``
field, see ``urllib.parse.quote_plus`` [1]

[1] https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote_plus

Closed: #4009
Co-Authored-by: @return42
2024-11-14 09:33:54 +01:00
Markus Heiser
b183e620d8 [refactor] engine: duckduckgo - https://html.duckduckgo.com/html
The entire source code of the duckduckgo engine has been reengineered and
purified.

1. DDG used the URL https://html.duckduckgo.com/html for no-JS requests whose
   response is also easier to parse than the previous
   https://lite.duckduckgo.com/lite/ URL

2. the bot detection of DDG has so far caused problems and often led to a
   CAPTCHA, this can be circumvented using `'Sec-Fetch-Mode'] = “navigate”`

Closes: https://github.com/searxng/searxng/issues/3927
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-10-29 14:56:27 +01:00
Markus Heiser
050451347b [fix] engine: duckduckgo - CAPTCHA detection
The previous implementation could not distinguish a CAPTCHA response from an
ordinary result list.  In the previous implementation a CAPTCHA was taken as a
result list where no items are in.

DDG does not block IPs.  Instead, a CAPTCHA wall is placed in front of request
on a dubious request.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-10-19 14:55:44 +02:00
Alexander Sulfrian
6a7b1a1a57 [fix] Do not show DDG user-agent from zero click
We do not want to show the user-agent information from the duckduckgo
zero click info. This is the user-agent used by searxng and not the
user-agent used by the user.

This was already done for the IP address in:
0fb3f0e4ae
2024-08-30 09:02:37 +02:00
Bnyro
1fe13d0ba4 [refactor] duckduckgo: use extr helper function in get_vqd 2024-06-15 11:24:05 +02:00
Allen
0fa81fc782 [enh] add re-usable func to filter text 2024-05-29 17:56:17 +02:00
Jeff Alyanak
0fb3f0e4ae [fix] do not show DDG IP from zero click
The zero click result from DuckDuckGo for IP should not be displayed.  It will
return the IP of the searxng server, not the user's IP, and looks a bit strange
when the `self_info` plugin is enabled as two different IPs get returned.
2024-05-29 11:23:26 +02:00
allendema_searxng_pi
68365c8c1d [enh] add instant answers from ddg 2024-05-24 10:44:17 +02:00
Markus Heiser
f1a148f53e [fix] ddg engine: if no vqd value can be determined, don't save None
Closes: https://github.com/searxng/searxng/issues/3370
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-04-08 10:56:11 +02:00
Markus Heiser
8205f170ff [mod] pylint all engines without PYLINT_SEARXNG_DISABLE_OPTION
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-11 14:55:38 +01:00
Markus Heiser
e97e1f9110 [fix] duckduckgo.fetch_traist - URL of region definitions has changed
- https://duckduckgo.com/dist/util/u.7669f071a13a7daa57cb.js

updated from u661.js to u.7669f071a13a7daa57cb / should be updated
automatically?  The last change was on March 23rd in dba8977b09 [1]

- [1] https://github.com/searxng/searxng/pull/2269

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-10 10:32:54 +01:00
Markus Heiser
d97b84bea2 [fix] ddg engines (get_vqd) - the vqd value is no longer in the form
Closes: https://github.com/searxng/searxng/issues/3276
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-05 16:27:04 +01:00
Markus Heiser
14323d683f [fix] ddg-lite & ddg-extra: don't send empty vqd value
DDG's bot detection is sensitive to the vqd value.  For some search terms (such
as extremely long search terms that are often sent by bots), no vqd value can be
determined.

If SearXNG cannot determine a vqd value, then no request should go out to
DDG (WEB): a request with a wrong vqd value leads to DDG temporarily putting
SearXNG's IP on a block list.

Requests from IPs in this block list run into timeouts.

Not sure, but it seems the block list is a sliding window: to get my IP rid from
the bot list I had to cool down my IP for 1h (send no requests from that IP to
DDG).

Since such issues can't reproduce in a local instance I tested this patch 24h on
my public SearXNG instance: There are still errors (rare), but the reliability
is still 100%.

Related:

- https://github.com/searxng/searxng/pull/2922
- https://github.com/searxng/searxng/pull/2923

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-10-12 08:52:28 +02:00
Markus Heiser
3388441917 [fix] ddg-lite vqd value: some search terms do not have a vqd value
Some search terms do not have results and therefore no vqd value

BTW: remove a leftover from 9197efa

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-10-10 09:12:30 +02:00
Markus Heiser
9197efa2a7 [fix] duckduckgo lite engine: set HTTP header 'Referer'
We have had problems with this before, the bot protection from ddg-lite seems to
have included this referer in the rating [1][2].

From reverse engineering:

- The Referer ``https://google.com/`` was set in commt 257dc7d6c4 --> DDG lite
  does not like this referer anymore!

- The 'Referer' header is only set on second and follow up pages but not on the
  first page

- The vqd value is not needed on the first page, the ddg-lite client sets this
  value only on follow up pages / this can help to reduce the vqd requests from
  SearXNG.

Related to 'Referer' header & ddg requests:

[1] https://github.com/searxng/searxng/pull/2161
[2] https://github.com/searxng/searxng/pull/2081

Closes: https://github.com/searxng/searxng/issues/2796
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-10-10 08:40:53 +02:00
Bnyro
48cb58bd2e [feat] duckduckgo: support for videos and news 2023-10-09 06:53:43 +02:00
Markus Heiser
71358e9c67 Revert "[fix] engine - duckduckgo vqd edge-case"
This reverts commit 102502a4f0.
2023-09-22 09:31:25 +02:00
jazzzooo
102502a4f0 [fix] engine - duckduckgo vqd edge-case 2023-09-20 20:05:06 +02:00
jazzzooo
223b3487c3 [fix] spelling 2023-09-18 16:20:27 +02:00
Markus Heiser
696c35d2c3 [fix] engine - duckduckgo_images / determination of vqd value incorrect
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-09-05 22:24:51 +02:00
Markus Heiser
e8706fb738 [fix] engine & network issues / documentation and type annotations
This patch fixes some quirks and issues related to the engines and the network.
Each engine has its own network and this network was broken for the following
engines[1]:

- archlinux
- bing
- dailymotion
- duckduckgo
- google
- peertube
- startpage
- wikipedia

Since the files have been touched anyway, the type annotaions of the engine
modules has also been completed so that error messages from the type checker are
no longer reported.

Related and (partial) fixed issue:

- [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
- [2] https://github.com/searxng/searxng/issues/2513
- [3] https://github.com/searxng/searxng/issues/2515

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-25 13:58:26 +02:00
Markus Heiser
caebd297e9 [fix] engine ddg: minor change in the API of ddg
Closes: https://github.com/searxng/searxng/issues/2419
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-05-12 18:58:49 +02:00
Markus Heiser
a762172bf7 [fix] engine ddg: quote !bangs in a request send to ddg
Closes: https://github.com/searxng/searxng/issues/392
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-03 09:52:16 +02:00
Markus Heiser
c80e82a855 [mod] DuckDuckGo: reversed engineered & upgrade to data_type: traits_v1
Partial reverse engineering of the DuckDuckGo (DDG) engines including a
improved language and region handling based on the enigne.traits_v1 data.

- DDG Lite
- DDG Instant Answer API
- DDG Images
- DDG Weather

docs/src/searx.engine.duckduckgo.rst:
  Online documentation of the DDG engines (make docs.live)

searx/data/engine_traits.json
  Add data type "traits_v1" generated by the fetch_traits() functions from:

  - "duckduckgo" (WEB),
  - "duckduckgo images" and
  - "duckduckgo weather"

  and remove data from obsolete data type "supported_languages".

searx/autocomplete.py:
  Reversed engineered Autocomplete from DDG.  Supports DDG's languages.

searx/engines/duckduckgo.py:
  - fetch_traits():  Fetch languages & regions from DDG.

  - get_ddg_lang(): Get DDG's language identifier from SearXNG's locale.  DDG
    defines its languages by region codes.  DDG-Lite does not offer a language
    selection to the user, only a region can be selected by the user.

  - Cache ``vqd`` value: The vqd value depends on the query string and is needed
    for the follow up pages or the images loaded by a XMLHttpRequest (DDG
    images).  The ``vqd`` value of a search term is stored for 10min in the
    redis DB.

  - DDG Lite engine: reversed engineered request method with improved Language
    and region support and better ``vqd`` handling.

searx/engines/duckduckgo_definitions.py: DDG Instant Answer API
  The *instant answers* API does not support languages, or at least we could not
  find out how language support should work.  It seems that most of the features
  are based on English terms.

searx/engines/duckduckgo_images.py: DDG Images
  Reversed engineered request method.  Improved language and region handling
  based on cookies and the enigne.traits_v1 data.  Response: add image format to
  the result list

searx/engines/duckduckgo_weather.py: DDG Weather
  Improved language and region handling based on cookies and the
  enigne.traits_v1 data.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
dba8977b09 [mod] DuckDuckGo: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the DuckDuckGo engines.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
257dc7d6c4 [fix-2146] set different HTTP Referer header to DuckDuckGo requests
For what ever reasons, ddg-lite don't like the Referer

  https://lite.duckduckgo.com/

In an interactive session in the WEB browser the the Reverer has exactly this
value, but ddg-lite don't like this value when the request is build up by
SearXNG.  The new value is:

  https://google.com/

What fakes a user comes from a google link.

Related: https://github.com/searxng/searxng/pull/2081
Closes: https://github.com/searxng/searxng/issues/2146

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-03 08:45:51 +01:00
Rudis Muiznieks
128b8c7f0a
Add HTTP Referer header to DuckDuckGo requests
closes #2080
2023-01-06 16:07:37 -06:00
Rudis Muiznieks
6804ff048d
Fix: add trailing slash to duckduckgo url
Close #1854
2022-12-22 07:49:58 -06:00
Markus Heiser
8df1f0c47e [mod] add 'Accept-Language' HTTP header to online processores
Most engines that support languages (and regions) use the Accept-Language from
the WEB browser to build a response that fits to the language (and region).

- add new engine option: send_accept_language_header

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-01 17:01:59 +02:00
Martin Fischer
b02f762687 [enh] add more categories 2022-01-05 11:00:11 +01:00
Markus Heiser
3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:26:22 +01:00
Markus Heiser
a5b7ed9550 [mod] engine duckduckgo - update supported_languages_url
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-01 20:01:41 +02:00
Markus Heiser
4c9b8b29ee [mod] engine duckduckgo - use DuckDuckGo-Lite
Implement a scrapper for DuckDuckGo-Lite [1].  The existing DuckDuckGo [2]
engine does not support paging.  DuckDuckgo-Lite is much faster, less verbose
and does have a paging option (reversed engineered from the input form of [1]).

[1] https://lite.duckduckgo.com/lite
[2] https://duckduckgo.com/

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-01 20:01:41 +02:00
Alexandre Flament
d14994dc73 [httpx] replace searx.poolrequests by searx.network
settings.yml:

* outgoing.networks:
   * can contains network definition
   * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
     keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
   * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
   * local_addresses can be "192.168.0.1/24" (it supports IPv6)
   * support_ipv4 & support_ipv6: both True by default
     see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
   * either a full network description
   * either reference an existing network

* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
2021-04-12 17:25:56 +02:00
Adam Tauber
5f450fda74 [enh] add year filter to duckduckgo 2021-03-25 00:25:36 +01:00
Alexandre Flament
c22d4c764c [fix] duckduckgo engine: "!ddg !g" do not redirect to google
* searx understand "!ddg !g time" as : send "!g time" to DDG
* !g a DDG bang for Google: DDG return a HTTP redirect to Google

This commit adds a the allows_redirect param not to follow HTTP redirect.

The DDG engine returns a empty result as before without HTTP redirect.
2021-02-12 11:10:08 +01:00
Alexandre Flament
d2dac11392 [mod] duckduckgo engine: better support of the language preference
After the main request, send a second to https://duckduckgo.com/t/sl_h

See https://github.com/searx/searx/issues/2259
2021-02-09 14:36:43 +01:00
Alexandre Flament
ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Alexandre Flament
a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
Alexandre Flament
b00d108673 [mod] pylint: numerous minor code fixes 2020-12-01 15:21:19 +01:00
Alexandre Flament
3038052c79 [mod] remove unused import
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url  # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
2020-11-14 14:11:02 +01:00
Alexandre Flament
cfd21bc475 [fix] fix duckduckgo engine
- remove paging support: a "vqd" parameter is required between each request. This parameter is uniq for each request
- update the URL (no redirect), use the POST method
- language support: works if there is no more than request per minute, otherwise it is ignored !
2020-10-09 16:00:42 +02:00
Alexandre Flament
2006eb4680 [mod] move extract_text, extract_url to searx.utils 2020-10-02 18:13:56 +02:00
Dalf
1022228d95 Drop Python 2 (1/n): remove unicode string and url_utils 2020-09-10 10:39:04 +02:00
Gordon Quad
385e9b5c9e add correction support for duckduckgo 2020-06-13 22:43:10 +01:00
Dalf
85b3723345 [mod] speed optimization
compile XPath only once
avoid redundant call to urlparse
get_locale(webapp.py): avoid useless call to request.accept_languages.best_match
2019-11-15 09:33:15 +01:00
Adam Tauber
ed1c1bdb04 [fix] pep8 2019-10-14 15:09:39 +02:00
Adam Tauber
94ea9d6622 [fix] duckduckgo paging - closes #1677 2019-10-14 13:52:15 +02:00
Noémi Ványi
b63d645a52 Revert "remove 'all' option from search languages"
This reverts commit 4d1770398a.
2019-01-07 21:19:00 +01:00