Startpage has changed its HTML layout, classes like ``w-gl__result__main`` do no
longer exists and the result items have been slightly changed in their
structure.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
CCC media serves several recording formats, to name a few:
- application/x-subrip
- video/mp4
- video/webm
- audio/mpeg
- audio/opus
- audio/mpeg
not all of them are suitable for a video frame. If available we should prefer
video/mp4 due to its minimal data rates.
Closes: https://github.com/searxng/searxng/issues/3431
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To test this patch I used .. and checked the diff of the `messages.pot` file::
$ ./manage pyenv.cmd pybabel extract -F babel.cfg \
-o ./searx/translations/messages.pot searx/
$ git diff ./searx/translations/messages.pot
----
hint from @dalf: f-string are not supported [1] but there is no error [2].
[1] python-babel/babel#594
[2] python-babel/babel#715
Closes: https://github.com/searxng/searxng/issues/3412
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
`youtube_api.py` throws an exception if the search results contain a channel, as
channels have no videoId. This PR adds a keycheck for parsing the json response.
In the past, some files were tested with the standard profile, others with a
profile in which most of the messages were switched off ... some files were not
checked at all.
- ``PYLINT_SEARXNG_DISABLE_OPTION`` has been abolished
- the distinction ``# lint: pylint`` is no longer necessary
- the pylint tasks have been reduced from three to two
1. ./searx/engines -> lint engines with additional builtins
2. ./searx ./searxng_extra ./tests -> lint all other python files
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In commit 8af181533 in PR:
- https://github.com/searxng/searxng/pull/3321
the category `journal_article` has been removed, `book_any` has been removed
longer time ago.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
paging is broken in searchcode.com's API .. not sure it will ever been fixed /
this commit disables paging in the engine and BTW pylint `searchcode_code.py`.
Closes: https://github.com/searxng/searxng/issues/3287
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Parse the result list from ask.com given in the variable named
window.MESON.initialState::
<script nonce="..">
window.MESON = window.MESON || {};
window.MESON.initialState = {"siteConfig": ...
...}};
window.MESON.loadedLang = "en";
</script>
The result list is in field::
json_resp['search']['webResults']['results']
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In Presearch there are languages for the UI and regions for narrowing down the
search. With this change the SearXNG engine supports a search by region. The
details can be found in the documentation of the source code.
To test, you can search terms like::
!presearch bmw :zh-TW
!presearch bmw :en-CA
1. You should get results corresponding to the region (Taiwan, Canada)
2. and in the language (Chinese, Englisch).
3. The context in info box content is in the same language.
Exceptions:
1. Region or language is not supported by Presearch or
2. SearXNG user did not selected a region tag, example::
!presearch bmw :en
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
DDG's bot detection is sensitive to the vqd value. For some search terms (such
as extremely long search terms that are often sent by bots), no vqd value can be
determined.
If SearXNG cannot determine a vqd value, then no request should go out to
DDG (WEB): a request with a wrong vqd value leads to DDG temporarily putting
SearXNG's IP on a block list.
Requests from IPs in this block list run into timeouts.
Not sure, but it seems the block list is a sliding window: to get my IP rid from
the bot list I had to cool down my IP for 1h (send no requests from that IP to
DDG).
Since such issues can't reproduce in a local instance I tested this patch 24h on
my public SearXNG instance: There are still errors (rare), but the reliability
is still 100%.
Related:
- https://github.com/searxng/searxng/pull/2922
- https://github.com/searxng/searxng/pull/2923
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Some search terms do not have results and therefore no vqd value
BTW: remove a leftover from 9197efa
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
We have had problems with this before, the bot protection from ddg-lite seems to
have included this referer in the rating [1][2].
From reverse engineering:
- The Referer ``https://google.com/`` was set in commt 257dc7d6c4 --> DDG lite
does not like this referer anymore!
- The 'Referer' header is only set on second and follow up pages but not on the
first page
- The vqd value is not needed on the first page, the ddg-lite client sets this
value only on follow up pages / this can help to reduce the vqd requests from
SearXNG.
Related to 'Referer' header & ddg requests:
[1] https://github.com/searxng/searxng/pull/2161
[2] https://github.com/searxng/searxng/pull/2081
Closes: https://github.com/searxng/searxng/issues/2796
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Instead of thumbnail use img_src in the result item, otherwise the "movies"
categories looks clunky.
Related:
- b4e0d2eedc (r128785388)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Anna’s Archive has cleaned up their languages, available file extensions and
changed the HTML form.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>