searxng/searx/engines/google_images.py

"""
 Google (Images)

 @website     https://www.google.com
 @provide-api yes (https://developers.google.com/web-search/docs/),
              deprecated!

 @using-api   yes
 @results     JSON
 @stable      yes (but deprecated)
 @parse       url, title, img_src
"""

from urllib import urlencode, unquote
from json import loads

# engine dependent config
categories = ['images']
paging = True
safesearch = True

# search-url
url = 'https://ajax.googleapis.com/'
search_url = url + 'ajax/services/search/images?v=1.0&start={offset}&rsz=large&safe={safesearch}&filter=off&{query}'


# do search-request
def request(query, params):
    offset = (params['pageno'] - 1) * 8

    if params['safesearch'] == 0:
        safesearch = 'off'
    else:
        safesearch = 'on'

    params['url'] = search_url.format(query=urlencode({'q': query}),
                                      offset=offset,
                                      safesearch=safesearch)

    return params


# get response from search-request
def response(resp):
    results = []

    search_res = loads(resp.text)

    # return empty array if there are no results
    if not search_res.get('responseData', {}).get('results'):
        return []

    # parse results
    for result in search_res['responseData']['results']:
        href = result['originalContextUrl']
        title = result['title']
        if 'url' not in result:
            continue
        thumbnail_src = result['tbUrl']

        # http to https
        thumbnail_src = thumbnail_src.replace("http://", "https://")

        # append result
        results.append({'url': href,
                        'title': title,
                        'content': result['content'],
                        'thumbnail_src': thumbnail_src,
                        'img_src': unquote(result['url']),
                        'template': 'images.html'})

    # return results
    return results
update versions.cfg to use the current up-to-date packages 2015-05-02 15:45:17 +02:00			`"""`
			`Google (Images)`

			`@website https://www.google.com`
			`@provide-api yes (https://developers.google.com/web-search/docs/),`
			`deprecated!`

			`@using-api yes`
			`@results JSON`
			`@stable yes (but deprecated)`
			`@parse url, title, img_src`
			`"""`
[enh] added google images engine 2013-10-19 22:19:14 +02:00
[fix] pep8 2014-12-16 17:26:16 +01:00			`from urllib import urlencode, unquote`
[enh] google images refactor 2013-10-19 23:12:18 +02:00			`from json import loads`
[enh] added google images engine 2013-10-19 22:19:14 +02:00
add comments to google-engines 2014-09-01 15:10:05 +02:00			`# engine dependent config`
[mod] category -> images 2013-10-19 22:19:31 +02:00			`categories = ['images']`
add comments to google-engines 2014-09-01 15:10:05 +02:00			`paging = True`
[enh] add safesearch to google_images 2015-02-08 22:15:25 +01:00			`safesearch = True`
[enh] added google images engine 2013-10-19 22:19:14 +02:00
add comments to google-engines 2014-09-01 15:10:05 +02:00			`# search-url`
[enh] engine cfg compatibilty 2013-10-23 23:55:37 +02:00			`url = 'https://ajax.googleapis.com/'`
[enh] add safesearch to google_images 2015-02-08 22:15:25 +01:00			`search_url = url + 'ajax/services/search/images?v=1.0&start={offset}&rsz=large&safe={safesearch}&filter=off&{query}'`
[fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00
[enh] added google images engine 2013-10-19 22:19:14 +02:00
add comments to google-engines 2014-09-01 15:10:05 +02:00			`# do search-request`
[enh] added google images engine 2013-10-19 22:19:14 +02:00			`def request(query, params):`
[enh] paging support for google images 2014-01-30 01:21:33 +01:00			`offset = (params['pageno'] - 1) * 8`
add comments to google-engines 2014-09-01 15:10:05 +02:00
[enh] set google safesearch filter more restictive 2015-02-08 22:29:26 +01:00			`if params['safesearch'] == 0:`
[enh] add safesearch to google_images 2015-02-08 22:15:25 +01:00			`safesearch = 'off'`
[enh] set google safesearch filter more restictive 2015-02-08 22:29:26 +01:00			`else:`
			`safesearch = 'on'`
[enh] add safesearch to google_images 2015-02-08 22:15:25 +01:00
[enh] paging support for google images 2014-01-30 01:21:33 +01:00			`params['url'] = search_url.format(query=urlencode({'q': query}),`
[enh] add safesearch to google_images 2015-02-08 22:15:25 +01:00			`offset=offset,`
			`safesearch=safesearch)`
add comments to google-engines 2014-09-01 15:10:05 +02:00
[enh] added google images engine 2013-10-19 22:19:14 +02:00			`return params`

[fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00
add comments to google-engines 2014-09-01 15:10:05 +02:00			`# get response from search-request`
[enh] added google images engine 2013-10-19 22:19:14 +02:00			`def response(resp):`
			`results = []`
add comments to google-engines 2014-09-01 15:10:05 +02:00
[enh] google images refactor 2013-10-19 23:12:18 +02:00			`search_res = loads(resp.text)`
add comments to google-engines 2014-09-01 15:10:05 +02:00
			`# return empty array if there are no results`
			`if not search_res.get('responseData', {}).get('results'):`
[fix] handling empty resultset 2013-10-20 19:45:13 +02:00			`return []`
add comments to google-engines 2014-09-01 15:10:05 +02:00
			`# parse results`
[enh] google images refactor 2013-10-19 23:12:18 +02:00			`for result in search_res['responseData']['results']:`
[enh] engine cfg compatibilty 2013-10-23 23:55:37 +02:00			`href = result['originalContextUrl']`
[enh] google images refactor 2013-10-19 23:12:18 +02:00			`title = result['title']`
Google images' unit test 2015-01-31 16:16:30 +01:00			`if 'url' not in result:`
[fix] skipping empty urls 2013-10-22 23:35:17 +02:00			`continue`
Add thumbnails in images results - Modify engines to create/fetch an URL for the thumbnails - Modify themes to show thumbnails instead of full images. In Courgette, the result is not very beautiful. Should we change it ? 2015-01-17 19:21:09 +01:00			`thumbnail_src = result['tbUrl']`
add comments to google-engines 2014-09-01 15:10:05 +02:00
[enh] reduce the number of http outgoing connections. engines that still use http : gigablast, bing image for thumbnails, 1x and dbpedia autocompleter 2015-05-02 11:43:12 +02:00			`# http to https`
			`thumbnail_src = thumbnail_src.replace("http://", "https://")`

add comments to google-engines 2014-09-01 15:10:05 +02:00			`# append result`
[fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00			`results.append({'url': href,`
			`'title': title,`
Google images' unit test 2015-01-31 16:16:30 +01:00			`'content': result['content'],`
Add thumbnails in images results - Modify engines to create/fetch an URL for the thumbnails - Modify themes to show thumbnails instead of full images. In Courgette, the result is not very beautiful. Should we change it ? 2015-01-17 19:21:09 +01:00			`'thumbnail_src': thumbnail_src,`
[fix] Google image with special chars It seems like Google image is doing a double urlencode on the url of the images. So we need to unquote once before sending to the browser the urls. It solves the 404 we could see with some image with specials chars in url. Exemple https://searx.laquadrature.net/?q=etes&pageno=1&category_images (there are two of those in the list) 2014-12-08 21:12:50 +01:00			`'img_src': unquote(result['url']),`
[fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00			`'template': 'images.html'})`
add comments to google-engines 2014-09-01 15:10:05 +02:00
			`# return results`
[enh] added google images engine 2013-10-19 22:19:14 +02:00			`return results`