searxng/searx/engines/digg.py

"""
 Digg (News, Social media)

 @website     https://digg.com/
 @provide-api no

 @using-api   no
 @results     HTML (using search portal)
 @stable      no (HTML can change)
 @parse       url, title, content, publishedDate, thumbnail
"""

import random
import string
from dateutil import parser
from json import loads
from urllib.parse import urlencode
from lxml import html
from datetime import datetime

# engine dependent config
categories = ['news', 'social media']
paging = True

# search-url
base_url = 'https://digg.com/'
search_url = base_url + 'api/search/?{query}&from={position}&size=20&format=html'

# specific xpath variables
results_xpath = '//article'
link_xpath = './/small[@class="time"]//a'
title_xpath = './/h2//a//text()'
content_xpath = './/p//text()'
pubdate_xpath = './/time'

digg_cookie_chars = string.ascii_uppercase + string.ascii_lowercase +\
    string.digits + "+_"


# do search-request
def request(query, params):
    offset = (params['pageno'] - 1) * 20
    params['url'] = search_url.format(position=offset,
                                      query=urlencode({'q': query}))
    params['cookies']['frontend.auid'] = ''.join(random.choice(
        digg_cookie_chars) for _ in range(22))
    return params


# get response from search-request
def response(resp):
    results = []

    search_result = loads(resp.text)

    # parse results
    for result in search_result['mapped']:

        published = datetime.strptime(result['created']['ISO'], "%Y-%m-%d %H:%M:%S")
        # append result
        results.append({'url': result['url'],
                        'title': result['title'],
                        'content': result['excerpt'],
                        'template': 'videos.html',
                        'publishedDate': published,
                        'thumbnail': result['images']['thumbImage']})

    # return results
    return results
update versions.cfg to use the current up-to-date packages 2015-05-02 15:45:17 +02:00			`"""`
			`Digg (News, Social media)`

			`@website https://digg.com/`
			`@provide-api no`

			`@using-api no`
			`@results HTML (using search portal)`
			`@stable no (HTML can change)`
			`@parse url, title, content, publishedDate, thumbnail`
			`"""`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00
[fix] digg now requires cookie for search 2017-08-30 18:20:43 +02:00			`import random`
			`import string`
[enh] py3 compatibility 2016-11-30 18:43:03 +01:00			`from dateutil import parser`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00			`from json import loads`
Drop Python 2 (1/n): remove unicode string and url_utils 2020-08-06 17:42:46 +02:00			`from urllib.parse import urlencode`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00			`from lxml import html`
[fix] update digg engine 2019-10-16 15:11:27 +02:00			`from datetime import datetime`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00
			`# engine dependent config`
			`categories = ['news', 'social media']`
			`paging = True`

			`# search-url`
			`base_url = 'https://digg.com/'`
[fix] update digg engine 2019-10-16 15:11:27 +02:00			`search_url = base_url + 'api/search/?{query}&from={position}&size=20&format=html'`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00
			`# specific xpath variables`
			`results_xpath = '//article'`
			`link_xpath = './/small[@class="time"]//a'`
			`title_xpath = './/h2//a//text()'`
			`content_xpath = './/p//text()'`
			`pubdate_xpath = './/time'`

[fix] digg cookie characters created in module import 2017-08-30 21:14:12 +02:00			`digg_cookie_chars = string.ascii_uppercase + string.ascii_lowercase +\`
			`string.digits + "+_"`

Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00
			`# do search-request`
			`def request(query, params):`
[fix] update digg engine 2019-10-16 15:11:27 +02:00			`offset = (params['pageno'] - 1) * 20`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00			`params['url'] = search_url.format(position=offset,`
[fix] update digg engine 2019-10-16 15:11:27 +02:00			`query=urlencode({'q': query}))`
[fix] digg now requires cookie for search 2017-08-30 18:20:43 +02:00			`params['cookies']['frontend.auid'] = ''.join(random.choice(`
[fix] digg cookie characters created in module import 2017-08-30 21:14:12 +02:00			`digg_cookie_chars) for _ in range(22))`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00			`return params`


			`# get response from search-request`
			`def response(resp):`
			`results = []`

			`search_result = loads(resp.text)`

			`# parse results`
[fix] update digg engine 2019-10-16 15:11:27 +02:00			`for result in search_result['mapped']:`
[enh] reduce the number of http outgoing connections. engines that still use http : gigablast, bing image for thumbnails, 1x and dbpedia autocompleter 2015-05-02 11:43:12 +02:00
[fix] update digg engine 2019-10-16 15:11:27 +02:00			`published = datetime.strptime(result['created']['ISO'], "%Y-%m-%d %H:%M:%S")`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00			`# append result`
[fix] update digg engine 2019-10-16 15:11:27 +02:00			`results.append({'url': result['url'],`
			`'title': result['title'],`
			`'content': result['excerpt'],`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00			`'template': 'videos.html',`
[fix] update digg engine 2019-10-16 15:11:27 +02:00			`'publishedDate': published,`
			`'thumbnail': result['images']['thumbImage']})`
Digg + Twitter corrections Digg engines, with thumbnails Add pubdate for twitter 2014-12-28 22:57:59 +01:00
			`# return results`
			`return results`