searxng/searx/engines/vimeo.py

#  Vimeo (Videos)
#
# @website     https://vimeo.com/
# @provide-api yes (http://developer.vimeo.com/api),
#              they have a maximum count of queries/hour
#
# @using-api   no (TODO, rewrite to api)
# @results     HTML (using search portal)
# @stable      no (HTML can change)
# @parse       url, title, publishedDate,  thumbnail, embedded
#
# @todo        rewrite to api
# @todo        set content-parameter with correct data

from urllib import urlencode
from lxml import html
from HTMLParser import HTMLParser
from searx.engines.xpath import extract_text
from dateutil import parser

# engine dependent config
categories = ['videos']
paging = True

# search-url
base_url = 'https://vimeo.com'
search_url = base_url + '/search/page:{pageno}?{query}'

# specific xpath variables
results_xpath = '//div[contains(@class,"results_grid")]/ul/li'
url_xpath = './/a/@href'
title_xpath = './/span[@class="title"]'
thumbnail_xpath = './/img[@class="js-clip_thumbnail_image"]/@src'
publishedDate_xpath = './/time/attribute::datetime'

embedded_url = '<iframe data-src="//player.vimeo.com/video{videoid}" ' +\
    'width="540" height="304" frameborder="0" ' +\
    'webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>'


# do search-request
def request(query, params):
    params['url'] = search_url.format(pageno=params['pageno'],
                                      query=urlencode({'q': query}))

    return params


# get response from search-request
def response(resp):
    results = []

    dom = html.fromstring(resp.text)
    p = HTMLParser()

    # parse results
    for result in dom.xpath(results_xpath):
        videoid = result.xpath(url_xpath)[0]
        url = base_url + videoid
        title = p.unescape(extract_text(result.xpath(title_xpath)))
        thumbnail = extract_text(result.xpath(thumbnail_xpath)[0])
        publishedDate = parser.parse(extract_text(result.xpath(publishedDate_xpath)[0]))
        embedded = embedded_url.format(videoid=videoid)

        # append result
        results.append({'url': url,
                        'title': title,
                        'content': '',
                        'template': 'videos.html',
                        'publishedDate': publishedDate,
                        'embedded': embedded,
                        'thumbnail': thumbnail})

    # return results
    return results
Merge branch 'integrated-videos' of https://github.com/Cqoicebordel/searx into Cqoicebordel-integrated-videos Conflicts: searx/engines/vimeo.py 2015-01-07 11:48:36 +01:00			`# Vimeo (Videos)`
[fix] pep8 : engines (errors E121, E127, E128 and E501 still exist) 2014-12-07 16:37:56 +01:00			`#`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00			`# @website https://vimeo.com/`
[fix] pep8 : engines (errors E121, E127, E128 and E501 still exist) 2014-12-07 16:37:56 +01:00			`# @provide-api yes (http://developer.vimeo.com/api),`
			`# they have a maximum count of queries/hour`
			`#`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00			`# @using-api no (TODO, rewrite to api)`
			`# @results HTML (using search portal)`
			`# @stable no (HTML can change)`
Integrated media in results + Deezer Engine New "embedded" item for the results, allow to give an iframe to display the media directly in the results. Note that the attributes src of the iframes are not set, but instead data-src is set, allowing to only load the iframe when clicked. Deezer engine based on public API (no key). 2015-01-05 02:04:23 +01:00			`# @parse url, title, publishedDate, thumbnail, embedded`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00			`#`
			`# @todo rewrite to api`
			`# @todo set content-parameter with correct data`

[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 22:10:46 +01:00			`from urllib import urlencode`
[enh] Improved vimeo search engines, and add the configuration in the sample 2014-01-06 22:15:46 +01:00			`from lxml import html`
Merge branch 'integrated-videos' of https://github.com/Cqoicebordel/searx into Cqoicebordel-integrated-videos Conflicts: searx/engines/vimeo.py 2015-01-07 11:48:36 +01:00			`from HTMLParser import HTMLParser`
[fix] import 2014-03-24 12:04:07 +01:00			`from searx.engines.xpath import extract_text`
extract publishDate from vimeo 2014-03-18 15:56:22 +01:00			`from dateutil import parser`
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 22:10:46 +01:00
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00			`# engine dependent config`
			`categories = ['videos']`
			`paging = True`
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 22:10:46 +01:00
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00			`# search-url`
engines: use https when possible 2015-04-26 18:13:09 +02:00			`base_url = 'https://vimeo.com'`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00			`search_url = base_url + '/search/page:{pageno}?{query}'`

			`# specific xpath variables`
Fix Vimeo engine. Solve #368 2015-06-12 19:53:38 +02:00			`results_xpath = '//div[contains(@class,"results_grid")]/ul/li'`
			`url_xpath = './/a/@href'`
			`title_xpath = './/span[@class="title"]'`
			`thumbnail_xpath = './/img[@class="js-clip_thumbnail_image"]/@src'`
			`publishedDate_xpath = './/time/attribute::datetime'`
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 22:10:46 +01:00
Integrated media in results + Deezer Engine New "embedded" item for the results, allow to give an iframe to display the media directly in the results. Note that the attributes src of the iframes are not set, but instead data-src is set, allowing to only load the iframe when clicked. Deezer engine based on public API (no key). 2015-01-05 02:04:23 +01:00			`embedded_url = '<iframe data-src="//player.vimeo.com/video{videoid}" ' +\`
			`'width="540" height="304" frameborder="0" ' +\`
			`'webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>'`

[fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00			`# do search-request`
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 22:10:46 +01:00			`def request(query, params):`
[fix] pep8 : engines (errors E121, E127, E128 and E501 still exist) 2014-12-07 16:37:56 +01:00			`params['url'] = search_url.format(pageno=params['pageno'],`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00			`query=urlencode({'q': query}))`

[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 22:10:46 +01:00			`return params`

[fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00			`# get response from search-request`
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 22:10:46 +01:00			`def response(resp):`
			`results = []`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 22:10:46 +01:00			`dom = html.fromstring(resp.text)`
[enh] Improved vimeo search engines, and add the configuration in the sample 2014-01-06 22:15:46 +01:00			`p = HTMLParser()`
[mod] vimeo engine mods 2014-01-11 11:14:46 +01:00
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00			`# parse results`
[enh] Improved vimeo search engines, and add the configuration in the sample 2014-01-06 22:15:46 +01:00			`for result in dom.xpath(results_xpath):`
Integrated media in results + Deezer Engine New "embedded" item for the results, allow to give an iframe to display the media directly in the results. Note that the attributes src of the iframes are not set, but instead data-src is set, allowing to only load the iframe when clicked. Deezer engine based on public API (no key). 2015-01-05 02:04:23 +01:00			`videoid = result.xpath(url_xpath)[0]`
			`url = base_url + videoid`
[enh] Improved vimeo search engines, and add the configuration in the sample 2014-01-06 22:15:46 +01:00			`title = p.unescape(extract_text(result.xpath(title_xpath)))`
Fix Vimeo engine. Solve #368 2015-06-12 19:53:38 +02:00			`thumbnail = extract_text(result.xpath(thumbnail_xpath)[0])`
Vimeo's unit test 2015-01-31 19:49:54 +01:00			`publishedDate = parser.parse(extract_text(result.xpath(publishedDate_xpath)[0]))`
Integrated media in results + Deezer Engine New "embedded" item for the results, allow to give an iframe to display the media directly in the results. Note that the attributes src of the iframes are not set, but instead data-src is set, allowing to only load the iframe when clicked. Deezer engine based on public API (no key). 2015-01-05 02:04:23 +01:00			`embedded = embedded_url.format(videoid=videoid)`
extract publishDate from vimeo 2014-03-18 15:56:22 +01:00
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00			`# append result`
[fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00			`results.append({'url': url,`
			`'title': title,`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00			`'content': '',`
[fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00			`'template': 'videos.html',`
extract publishDate from vimeo 2014-03-18 15:56:22 +01:00			`'publishedDate': publishedDate,`
Integrated media in results + Deezer Engine New "embedded" item for the results, allow to give an iframe to display the media directly in the results. Note that the attributes src of the iframes are not set, but instead data-src is set, allowing to only load the iframe when clicked. Deezer engine based on public API (no key). 2015-01-05 02:04:23 +01:00			`'embedded': embedded,`
[fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00			`'thumbnail': thumbnail})`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 17:10:25 +02:00
			`# return results`
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 22:10:46 +01:00			`return results`