searxng/searx/engines/deviantart.py

"""
 Deviantart (Images)

 @website     https://www.deviantart.com/
 @provide-api yes (https://www.deviantart.com/developers/) (RSS)

 @using-api   no (TODO, rewrite to api)
 @results     HTML
 @stable      no (HTML can change)
 @parse       url, title, thumbnail_src, img_src

 @todo        rewrite to api
"""

from lxml import html
import re
from urllib.parse import urlencode
from searx.utils import extract_text


# engine dependent config
categories = ['images']
paging = True
time_range_support = True

# search-url
base_url = 'https://www.deviantart.com/'
search_url = base_url + 'search?page={page}&{query}'
time_range_url = '&order={range}'

time_range_dict = {'day': 11,
                   'week': 14,
                   'month': 15}


# do search-request
def request(query, params):
    if params['time_range'] and params['time_range'] not in time_range_dict:
        return params

    params['url'] = search_url.format(page=params['pageno'],
                                      query=urlencode({'q': query}))
    if params['time_range'] in time_range_dict:
        params['url'] += time_range_url.format(range=time_range_dict[params['time_range']])

    return params


# get response from search-request
def response(resp):
    results = []

    # return empty array if a redirection code is returned
    if resp.status_code == 302:
        return []

    dom = html.fromstring(resp.text)

    # parse results
    for row in dom.xpath('//div[contains(@data-hook, "content_row")]'):
        for result in row.xpath('./div'):
            link = result.xpath('.//a[@data-hook="deviation_link"]')[0]
            url = link.attrib.get('href')
            title = link.attrib.get('title')
            thumbnail_src = result.xpath('.//img')[0].attrib.get('src')
            img_src = thumbnail_src

            # http to https, remove domain sharding
            thumbnail_src = re.sub(r"https?://(th|fc)\d+.", "https://th01.", thumbnail_src)
            thumbnail_src = re.sub(r"http://", "https://", thumbnail_src)

            url = re.sub(r"http://(.*)\.deviantart\.com/", "https://\\1.deviantart.com/", url)

            # append result
            results.append({'url': url,
                            'title': title,
                            'img_src': img_src,
                            'thumbnail_src': thumbnail_src,
                            'template': 'images.html'})

    # return results
    return results
update versions.cfg to use the current up-to-date packages 2015-05-02 15:45:17 +02:00			`"""`
			`Deviantart (Images)`

			`@website https://www.deviantart.com/`
			`@provide-api yes (https://www.deviantart.com/developers/) (RSS)`

			`@using-api no (TODO, rewrite to api)`
			`@results HTML`
			`@stable no (HTML can change)`
			`@parse url, title, thumbnail_src, img_src`

			`@todo rewrite to api`
			`"""`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00
Improves PEP8 compatibility. 2014-02-05 20:24:31 +01:00			`from lxml import html`
Add thumbnails in images results - Modify engines to create/fetch an URL for the thumbnails - Modify themes to show thumbnails instead of full images. In Courgette, the result is not very beautiful. Should we change it ? 2015-01-17 19:21:09 +01:00			`import re`
Drop Python 2 (1/n): remove unicode string and url_utils 2020-08-06 17:42:46 +02:00			`from urllib.parse import urlencode`
[mod] move extract_text, extract_url to searx.utils 2020-10-02 18:13:56 +02:00			`from searx.utils import extract_text`
Drop Python 2 (1/n): remove unicode string and url_utils 2020-08-06 17:42:46 +02:00
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00
add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`# engine dependent config`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`categories = ['images']`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`paging = True`
add time range search for deviantart 2016-07-19 10:06:47 +02:00			`time_range_support = True`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00
add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`# search-url`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`base_url = 'https://www.deviantart.com/'`
[fix] update devianart engine 2019-10-16 14:42:31 +02:00			`search_url = base_url + 'search?page={page}&{query}'`
add time range search for deviantart 2016-07-19 10:06:47 +02:00			`time_range_url = '&order={range}'`

			`time_range_dict = {'day': 11,`
			`'week': 14,`
			`'month': 15}`
[enh] paging support for deviantart 2014-01-30 00:09:47 +01:00
[fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00
add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`# do search-request`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`def request(query, params):`
add year filter to engines with time range support && tests Following engines does not support "Last year": * Bing News * DeviantArt * DuckDuckGo * Yahoo * YouTube (noapi) 2016-12-11 16:41:14 +01:00			`if params['time_range'] and params['time_range'] not in time_range_dict:`
			`return params`

[fix] update devianart engine 2019-10-16 14:42:31 +02:00			`params['url'] = search_url.format(page=params['pageno'],`
[enh] paging support for deviantart 2014-01-30 00:09:47 +01:00			`query=urlencode({'q': query}))`
[fix] time range detection 2016-07-26 00:22:05 +02:00			`if params['time_range'] in time_range_dict:`
add time range search for deviantart 2016-07-19 10:06:47 +02:00			`params['url'] += time_range_url.format(range=time_range_dict[params['time_range']])`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`return params`


add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`# get response from search-request`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`def response(resp):`
			`results = []`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00
			`# return empty array if a redirection code is returned`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`if resp.status_code == 302:`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`return []`

[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`dom = html.fromstring(resp.text)`
Flake8 2015-01-17 19:24:35 +01:00
add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`# parse results`
[fix] update devianart engine 2019-10-16 14:42:31 +02:00			`for row in dom.xpath('//div[contains(@data-hook, "content_row")]'):`
			`for result in row.xpath('./div'):`
			`link = result.xpath('.//a[@data-hook="deviation_link"]')[0]`
			`url = link.attrib.get('href')`
			`title = link.attrib.get('title')`
			`thumbnail_src = result.xpath('.//img')[0].attrib.get('src')`
			`img_src = thumbnail_src`

			`# http to https, remove domain sharding`
			`thumbnail_src = re.sub(r"https?://(th\|fc)\d+.", "https://th01.", thumbnail_src)`
			`thumbnail_src = re.sub(r"http://", "https://", thumbnail_src)`

			`url = re.sub(r"http://(.*)\.deviantart\.com/", "https://\\1.deviantart.com/", url)`

			`# append result`
			`results.append({'url': url,`
			`'title': title,`
			`'img_src': img_src,`
			`'thumbnail_src': thumbnail_src,`
			`'template': 'images.html'})`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00
			`# return results`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`return results`