searxng/searx/engines/digg.py

# SPDX-License-Identifier: AGPL-3.0-or-later
# lint: pylint
"""
 Digg (News, Social media)
"""

from json import loads
from urllib.parse import urlencode
from datetime import datetime

from lxml import html

# about
about = {
    "website": 'https://digg.com',
    "wikidata_id": 'Q270478',
    "official_api_documentation": None,
    "use_official_api": False,
    "require_api_key": False,
    "results": 'HTML',
}

# engine dependent config
categories = ['news', 'social media']
paging = True
base_url = 'https://digg.com'

# search-url
search_url = base_url + (
    '/api/search/'
    '?{query}'
    '&from={position}'
    '&size=20'
    '&format=html'
)

def request(query, params):
    offset = (params['pageno'] - 1) * 20
    params['url'] = search_url.format(
        query = urlencode({'q': query}),
        position = offset,
    )
    return params

def response(resp):
    results = []

    # parse results
    for result in loads(resp.text)['mapped']:

        # strip html tags and superfluous quotation marks from content
        content = html.document_fromstring(
            result['excerpt']
        ).text_content()

        # 'created': {'ISO': '2020-10-16T14:09:55Z', ...}
        published = datetime.strptime(
            result['created']['ISO'], '%Y-%m-%dT%H:%M:%SZ'
        )
        results.append({
            'url': result['url'],
            'title': result['title'],
            'content' : content,
            'template': 'videos.html',
            'publishedDate': published,
            'thumbnail': result['images']['thumbImage'],
        })

    return results
[enh] engines: add about variable move meta information from comment to the about variable so the preferences, the documentation can show these information 2021-01-13 11:31:25 +01:00			`# SPDX-License-Identifier: AGPL-3.0-or-later`
[pylint] tag PYLINT_FILES by comment `# lint: pylint` These py files are linted by `test.pylint`, all other files are linted by `test.pep8`. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2021-04-26 20:18:20 +02:00			`# lint: pylint`
update versions.cfg to use the current up-to-date packages 2015-05-02 15:45:17 +02:00			`"""`
			`Digg (News, Social media)`
			`"""`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00
			`from json import loads`
Drop Python 2 (1/n): remove unicode string and url_utils 2020-08-06 17:42:46 +02:00			`from urllib.parse import urlencode`
[fix] update digg engine 2019-10-16 15:11:27 +02:00			`from datetime import datetime`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00
[refactor] digg - improve results and clean up source code - strip html tags and superfluous quotation marks from content - remove not needed cookie from request - remove superfluous imports Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2020-12-02 21:54:27 +01:00			`from lxml import html`

[enh] engines: add about variable move meta information from comment to the about variable so the preferences, the documentation can show these information 2021-01-13 11:31:25 +01:00			`# about`
			`about = {`
			`"website": 'https://digg.com',`
			`"wikidata_id": 'Q270478',`
			`"official_api_documentation": None,`
			`"use_official_api": False,`
			`"require_api_key": False,`
			`"results": 'HTML',`
			`}`

Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00			`# engine dependent config`
			`categories = ['news', 'social media']`
			`paging = True`
[refactor] digg - improve results and clean up source code - strip html tags and superfluous quotation marks from content - remove not needed cookie from request - remove superfluous imports Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2020-12-02 21:54:27 +01:00			`base_url = 'https://digg.com'`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00
			`# search-url`
[refactor] digg - improve results and clean up source code - strip html tags and superfluous quotation marks from content - remove not needed cookie from request - remove superfluous imports Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2020-12-02 21:54:27 +01:00			`search_url = base_url + (`
			`'/api/search/'`
			`'?{query}'`
			`'&from={position}'`
			`'&size=20'`
			`'&format=html'`
			`)`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00
			`def request(query, params):`
[fix] update digg engine 2019-10-16 15:11:27 +02:00			`offset = (params['pageno'] - 1) * 20`
[refactor] digg - improve results and clean up source code - strip html tags and superfluous quotation marks from content - remove not needed cookie from request - remove superfluous imports Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2020-12-02 21:54:27 +01:00			`params['url'] = search_url.format(`
			`query = urlencode({'q': query}),`
			`position = offset,`
			`)`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00			`return params`

			`def response(resp):`
			`results = []`

			`# parse results`
[refactor] digg - improve results and clean up source code - strip html tags and superfluous quotation marks from content - remove not needed cookie from request - remove superfluous imports Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2020-12-02 21:54:27 +01:00			`for result in loads(resp.text)['mapped']:`

			`# strip html tags and superfluous quotation marks from content`
			`content = html.document_fromstring(`
			`result['excerpt']`
			`).text_content()`
[enh] reduce the number of http outgoing connections. engines that still use http : gigablast, bing image for thumbnails, 1x and dbpedia autocompleter 2015-05-02 11:43:12 +02:00
[fix] digg - the ISO time stamp of published date has been changed Error pattern:: Engines cannot retrieve results: digg (unexpected crash time data '2020-10-16T14:09:55Z' does not match format '%Y-%m-%d %H:%M:%S') Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2020-10-30 09:36:11 +01:00			`# 'created': {'ISO': '2020-10-16T14:09:55Z', ...}`
[refactor] digg - improve results and clean up source code - strip html tags and superfluous quotation marks from content - remove not needed cookie from request - remove superfluous imports Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2020-12-02 21:54:27 +01:00			`published = datetime.strptime(`
			`result['created']['ISO'], '%Y-%m-%dT%H:%M:%SZ'`
			`)`
			`results.append({`
			`'url': result['url'],`
			`'title': result['title'],`
			`'content' : content,`
			`'template': 'videos.html',`
			`'publishedDate': published,`
			`'thumbnail': result['images']['thumbImage'],`
			`})`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00
			`return results`