This patch is to hardening the parsing of the bing response:
1. To fix [2087] check if the selected result item contains a link, otherwise
skip result item and continue in the result loop. Increment the result
pointer when a result has been added / the enumerate that counts for skipped
items is no longer valid when result items are skipped.
To test the bugfix use: ``!bi :all cerbot``
2. Limit the XPath selection of result items to direct children nodes (list
items ``li``) of the ordered list (``ol``).
To test the selector use: ``!bi :en pontiac aztek wiki``
.. in the result list you should find the wikipedia entry on top,
compare [2068]
[2087] https://github.com/searxng/searxng/issues/2087
[2068] https://github.com/searxng/searxng/issues/2068
Modify the XPath selector to get the wikipedia result plus small fixes.
About result content: especially with the Wikipedia result, we'd get several
paragraph elements, only the first paragraph would be taken and displayed on the
search result
- fix issue reported #1809
- filter out `None` value from issn and isbn list
- add comments (from publicationName)
- add publisher
Closes: https://github.com/searxng/searxng/issues/1809
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Some result items from core.ac.uk do not have an URL::
Traceback (most recent call last):
File "searx/search/processors/online.py", line 154, in search
search_results = self._search_basic(query, params)
File "searx/search/processors/online.py", line 142, in _search_basic
return self.engine.response(response)
File "SearXNG/searx/engines/core.py", line 73, in response
'url': source['urls'][0].replace('http://', 'https://', 1),
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The google news are in a rework, the content area of a news item has been
removed.
Closes: https://github.com/searxng/searxng/issues/1790
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Fix::
searx/locales.py:docstring of searx.locales.get_engine_locale:17: \
WARNING: Definition list ends without a blank line; unexpected unindent.
Improvement: don't show default values in the generated documentation whe it is
more a mess than a usefull information (`:meta hide-value:`).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
no_result_for_http_status contains a list of HTTP status.
These HTTP status are seen an empty result list.
In other cases an exception is thrown as usual.
Previously raise_for_httperror were ignoring all HTTP error,
which make defective engines invisible in the stats.