searxng/searx/engines/deviantart.py

"""
 Deviantart (Images)

 @website     https://www.deviantart.com/
 @provide-api yes (https://www.deviantart.com/developers/) (RSS)

 @using-api   no (TODO, rewrite to api)
 @results     HTML
 @stable      no (HTML can change)
 @parse       url, title, thumbnail_src, img_src

 @todo        rewrite to api
"""

from urllib import urlencode
from lxml import html
import re
from searx.engines.xpath import extract_text

# engine dependent config
categories = ['images']
paging = True
time_range_support = True

# search-url
base_url = 'https://www.deviantart.com/'
search_url = base_url + 'browse/all/?offset={offset}&{query}'
time_range_url = '&order={range}'

time_range_dict = {'day': 11,
                   'week': 14,
                   'month': 15}


# do search-request
def request(query, params):
    offset = (params['pageno'] - 1) * 24

    params['url'] = search_url.format(offset=offset,
                                      query=urlencode({'q': query}))
    if params['time_range'] in time_range_dict:
        params['url'] += time_range_url.format(range=time_range_dict[params['time_range']])

    return params


# get response from search-request
def response(resp):
    results = []

    # return empty array if a redirection code is returned
    if resp.status_code == 302:
        return []

    dom = html.fromstring(resp.text)

    regex = re.compile(r'\/200H\/')

    # parse results
    for result in dom.xpath('.//span[@class="thumb wide"]'):
        link = result.xpath('.//a[@class="torpedo-thumb-link"]')[0]
        url = link.attrib.get('href')
        title = extract_text(result.xpath('.//span[@class="title"]'))
        thumbnail_src = link.xpath('.//img')[0].attrib.get('src')
        img_src = regex.sub('/', thumbnail_src)

        # http to https, remove domain sharding
        thumbnail_src = re.sub(r"https?://(th|fc)\d+.", "https://th01.", thumbnail_src)
        thumbnail_src = re.sub(r"http://", "https://", thumbnail_src)

        url = re.sub(r"http://(.*)\.deviantart\.com/", "https://\\1.deviantart.com/", url)

        # append result
        results.append({'url': url,
                        'title': title,
                        'img_src': img_src,
                        'thumbnail_src': thumbnail_src,
                        'template': 'images.html'})

    # return results
    return results
update versions.cfg to use the current up-to-date packages 2015-05-02 15:45:17 +02:00			`"""`
			`Deviantart (Images)`

			`@website https://www.deviantart.com/`
			`@provide-api yes (https://www.deviantart.com/developers/) (RSS)`

			`@using-api no (TODO, rewrite to api)`
			`@results HTML`
			`@stable no (HTML can change)`
			`@parse url, title, thumbnail_src, img_src`

			`@todo rewrite to api`
			`"""`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`from urllib import urlencode`
Improves PEP8 compatibility. 2014-02-05 20:24:31 +01:00			`from lxml import html`
Add thumbnails in images results - Modify engines to create/fetch an URL for the thumbnails - Modify themes to show thumbnails instead of full images. In Courgette, the result is not very beautiful. Should we change it ? 2015-01-17 19:21:09 +01:00			`import re`
Deviant Art's unit test 2015-01-29 01:13:33 +01:00			`from searx.engines.xpath import extract_text`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00
add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`# engine dependent config`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`categories = ['images']`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`paging = True`
add time range search for deviantart 2016-07-19 10:06:47 +02:00			`time_range_support = True`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00
add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`# search-url`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`base_url = 'https://www.deviantart.com/'`
[fix] pep8 compatibilty 2016-01-18 12:47:31 +01:00			`search_url = base_url + 'browse/all/?offset={offset}&{query}'`
add time range search for deviantart 2016-07-19 10:06:47 +02:00			`time_range_url = '&order={range}'`

			`time_range_dict = {'day': 11,`
			`'week': 14,`
			`'month': 15}`
[enh] paging support for deviantart 2014-01-30 00:09:47 +01:00
[fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00
add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`# do search-request`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`def request(query, params):`
[enh] paging support for deviantart 2014-01-30 00:09:47 +01:00			`offset = (params['pageno'] - 1) * 24`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00
[enh] paging support for deviantart 2014-01-30 00:09:47 +01:00			`params['url'] = search_url.format(offset=offset,`
			`query=urlencode({'q': query}))`
[fix] time range detection 2016-07-26 00:22:05 +02:00			`if params['time_range'] in time_range_dict:`
add time range search for deviantart 2016-07-19 10:06:47 +02:00			`params['url'] += time_range_url.format(range=time_range_dict[params['time_range']])`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`return params`


add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`# get response from search-request`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`def response(resp):`
			`results = []`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00
			`# return empty array if a redirection code is returned`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`if resp.status_code == 302:`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`return []`

[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`dom = html.fromstring(resp.text)`
Flake8 2015-01-17 19:24:35 +01:00
Fix anomalous backslash in string 2016-07-11 15:29:47 +02:00			`regex = re.compile(r'\/200H\/')`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00
			`# parse results`
[fix] deviantart engine xpaths 2016-07-19 09:37:02 +02:00			`for result in dom.xpath('.//span[@class="thumb wide"]'):`
			`link = result.xpath('.//a[@class="torpedo-thumb-link"]')[0]`
			`url = link.attrib.get('href')`
			`title = extract_text(result.xpath('.//span[@class="title"]'))`
Deviant Art's unit test 2015-01-29 01:13:33 +01:00			`thumbnail_src = link.xpath('.//img')[0].attrib.get('src')`
Add thumbnails in images results - Modify engines to create/fetch an URL for the thumbnails - Modify themes to show thumbnails instead of full images. In Courgette, the result is not very beautiful. Should we change it ? 2015-01-17 19:21:09 +01:00			`img_src = regex.sub('/', thumbnail_src)`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00
[enh] reduce the number of http outgoing connections. engines that still use http : gigablast, bing image for thumbnails, 1x and dbpedia autocompleter 2015-05-02 11:43:12 +02:00			`# http to https, remove domain sharding`
			`thumbnail_src = re.sub(r"https?://(th\|fc)\d+.", "https://th01.", thumbnail_src)`
			`thumbnail_src = re.sub(r"http://", "https://", thumbnail_src)`

			`url = re.sub(r"http://(.*)\.deviantart\.com/", "https://\\1.deviantart.com/", url)`

add comments to deviantart engine 2014-09-02 16:48:18 +02:00			`# append result`
[fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00			`results.append({'url': url,`
			`'title': title,`
			`'img_src': img_src,`
Add thumbnails in images results - Modify engines to create/fetch an URL for the thumbnails - Modify themes to show thumbnails instead of full images. In Courgette, the result is not very beautiful. Should we change it ? 2015-01-17 19:21:09 +01:00			`'thumbnail_src': thumbnail_src,`
[fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00			`'template': 'images.html'})`
add comments to deviantart engine 2014-09-02 16:48:18 +02:00
			`# return results`
[enh] deviantart engine added 2013-10-20 11:12:10 +02:00			`return results`