From 11d6d80d8831459161ac08ad0587ab8775ffd3e3 Mon Sep 17 00:00:00 2001 From: Adam Tauber Date: Sat, 28 Nov 2015 19:26:45 +0100 Subject: [PATCH] [doc] engines overview added --- docs/dev/engine_overview.rst | 314 +++++++++++++++++++++++++++++++++++ docs/index.rst | 1 + 2 files changed, 315 insertions(+) create mode 100644 docs/dev/engine_overview.rst diff --git a/docs/dev/engine_overview.rst b/docs/dev/engine_overview.rst new file mode 100644 index 000000000..7f170e9d6 --- /dev/null +++ b/docs/dev/engine_overview.rst @@ -0,0 +1,314 @@ +Engine overview +=============== + + +searx is a `metasearch-engine `__, +so it is using different search engines to provide better results. + +Because there is no general search-api which can be used for every +search-engine, there must be build an adapter between searx and the +external search-engine. This adapters are stored in the folder +`*searx/engines* `__, +and this site is build to make an general documentation about this +engines + + +.. contents:: + :depth: 3 + +general engine configuration +---------------------------- + +It is required to tell searx what results can the engine provide. The +arguments can be inserted in the engine file, or in the settings file +(normally ``settings.yml``). The arguments in the settings file override +the one in the engine file. + +Really, it is for most options no difference if there are contained in +the engine-file or in the settings. But there is a standard where to +place specific arguments by default. + + +engine-file +~~~~~~~~~~~ + ++---------------------+-----------+-----------------------------------------+ +| argument | type | information | ++=====================+===========+=========================================+ +| categories | list | pages, in which the engine is working | ++---------------------+-----------+-----------------------------------------+ +| paging | boolean | support multible pages | ++---------------------+-----------+-----------------------------------------+ +| language\_support | boolean | support language choosing | ++---------------------+-----------+-----------------------------------------+ + +settings.yml +~~~~~~~~~~~~ + ++------------+----------+-----------------------------------------------+ +| argument | type | information | ++============+==========+===============================================+ +| name | string | name of search-engine | ++------------+----------+-----------------------------------------------+ +| engine | string | name of searx-engine (filename without .py) | ++------------+----------+-----------------------------------------------+ +| shortcut | string | shortcut of search-engine | ++------------+----------+-----------------------------------------------+ +| timeout | string | specific timeout for search-engine | ++------------+----------+-----------------------------------------------+ + +overrides +~~~~~~~~~ + +There are some options, with have default values in the engine, but are +often overwritten by the settings. If the option is assigned in the +engine-file with ``None`` it has to be redefined in the settings, +otherwise searx is not starting with that engine. + +The naming of that overrides can be wathever you want. But we recommend +the using of already used overrides if possible: + ++-----------------------+----------+--------------------------------------------------------------+ +| argument | type | information | ++=======================+==========+==============================================================+ +| base\_url | string | base-url, can be overwrite to use same engine on other url | ++-----------------------+----------+--------------------------------------------------------------+ +| number\_of\_results | int | maximum number of results per request | ++-----------------------+----------+--------------------------------------------------------------+ +| language | string | ISO code of language and country like en\_US | ++-----------------------+----------+--------------------------------------------------------------+ +| api\_key | string | api-key if required by engine | ++-----------------------+----------+--------------------------------------------------------------+ + +example-code +~~~~~~~~~~~~ + +.. code:: python + + # engine dependent config + categories = ['general'] + paging = True + language_support = True + +doing request +------------- + +To perform a search you have to specific at least a url on which the +request is performing + +passed arguments +~~~~~~~~~~~~~~~~ + +This arguments can be used to calculate the search-query. Furthermore, +some of that parameters are filled with default values which can be +changed for special purpose. + ++----------------------+------------+------------------------------------------------------------------------+ +| argument | type | default-value, informations | ++======================+============+========================================================================+ +| url | string | ``''`` | ++----------------------+------------+------------------------------------------------------------------------+ +| method | string | ``'GET'`` | ++----------------------+------------+------------------------------------------------------------------------+ +| headers | set | ``{}`` | ++----------------------+------------+------------------------------------------------------------------------+ +| data | set | ``{}`` | ++----------------------+------------+------------------------------------------------------------------------+ +| cookies | set | ``{}`` | ++----------------------+------------+------------------------------------------------------------------------+ +| verify | boolean | ``True`` | ++----------------------+------------+------------------------------------------------------------------------+ +| headers.User-Agent | string | a random User-Agent | ++----------------------+------------+------------------------------------------------------------------------+ +| category | string | current category, like ``'general'`` | ++----------------------+------------+------------------------------------------------------------------------+ +| started | datetime | current date-time | ++----------------------+------------+------------------------------------------------------------------------+ +| pageno | int | current pagenumber | ++----------------------+------------+------------------------------------------------------------------------+ +| language | string | specific language code like ``'en_US'``, or ``'all'`` if unspecified | ++----------------------+------------+------------------------------------------------------------------------+ + +parsed arguments +~~~~~~~~~~~~~~~~ + +The function ``def request(query, params):`` is always returning the +``params`` variable back. Inside searx, the following paramters can be +used to specific a search-request: + ++------------+-----------+----------------------------------------------------------+ +| argument | type | information | ++============+===========+==========================================================+ +| url | string | requested url | ++------------+-----------+----------------------------------------------------------+ +| method | string | HTTP request methode | ++------------+-----------+----------------------------------------------------------+ +| headers | set | HTTP header informations | ++------------+-----------+----------------------------------------------------------+ +| data | set | HTTP data informations (parsed if ``method != 'GET'``) | ++------------+-----------+----------------------------------------------------------+ +| cookies | set | HTTP cookies | ++------------+-----------+----------------------------------------------------------+ +| verify | boolean | Performing SSL-Validity check | ++------------+-----------+----------------------------------------------------------+ + +example-code +~~~~~~~~~~~~ + +.. code:: python + + # search-url + base_url = 'https://example.com/' + search_string = 'search?{query}&page={page}' + + # do search-request + def request(query, params): + search_path = search_string.format( + query=urlencode({'q': query}), + page=params['pageno']) + + params['url'] = base_url + search_path + + return params + +returning results +----------------- + +Searx has the possiblity to return results in different media-types. +Currently the following media-types are supported: + +- default +- images +- videos +- torrent +- map + +to set another media-type as default, you must set the parameter +``template`` to the required type. + +default +~~~~~~~ + ++--------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| result-parameter | information | ++====================+====================================================================================================================================+ +| url | string, which is representing the url of the result | ++--------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| title | string, which is representing the title of the result | ++--------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| content | string, which is giving a general result-text | ++--------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| publishedDate | `datetime.datetime `__, represent when the result is published | ++--------------------+------------------------------------------------------------------------------------------------------------------------------------+ + +images +~~~~~~ + +to use this template, the parameter + ++--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| result-parameter | information | ++====================+===========================================================================================================================================================+ +| template | is set to ``images.html`` | ++--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| url | string, which is representing the url to the result site | ++--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| title | string, which is representing the title of the result *(partly implemented)* | ++--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| content | *(partly implemented)* | ++--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| publishedDate | `datetime.datetime `__, represent when the result is published *(partly implemented)* | ++--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| img\_src | string, which is representing the url to the result image | ++--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| thumbnail\_src | string, which is representing the url to a small-preview image | ++--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + +videos +~~~~~~ + ++--------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| result-parameter | information | ++====================+====================================================================================================================================+ +| template | is set to ``videos.html`` | ++--------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| url | string, which is representing the url of the result | ++--------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| title | string, which is representing the title of the result | ++--------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| content | *(not implemented yet)* | ++--------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| publishedDate | `datetime.datetime `__, represent when the result is published | ++--------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| thumbnail | string, which is representing the url to a small-preview image | ++--------------------+------------------------------------------------------------------------------------------------------------------------------------+ + +torrent +~~~~~~~ + ++------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ +| result-parameter | information | ++------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ +| template | is set to ```torrent.html``` | ++------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ +| url | string, which is representing the url of the result | ++------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ +| title | string, which is representing the title of the result | ++------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ +| content | string, which is giving a general result-text | ++------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ +| publishedDate | [datetime.datetime](https://docs.python.org/2/library/datetime.html#datetime-objects), represent when the result is published _(not implemented yet)_ | ++------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ +| seed | int, number of seeder | ++------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ +| leech | int, number of leecher | ++------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ +| filesize | int, size of file in bytes | ++------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ +| files | int, number of files | ++------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ +| magnetlink | string, which is the [magnetlink](https://en.wikipedia.org/wiki/Magnet_URI_scheme) of the result | ++------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ +| torrentfile | string, which is the torrentfile of the result | ++------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ + + +map +~~~ + ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| result-parameter | information | ++=========================+====================================================================================================================================+ +| url | string, which is representing the url of the result | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| title | string, which is representing the title of the result | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| content | string, which is giving a general result-text | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| publishedDate | `datetime.datetime `__, represent when the result is published | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| latitude | latitude of result (in decimal format) | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| longitude | longitude of result (in decimal format) | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| boundingbox | boundingbox of result (array of 4. values ``[lat-min, lat-max, lon-min, lon-max]``) | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| geojson | geojson of result (http://geojson.org) | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| osm.type | type of osm-object (if OSM-Result) | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| osm.id | id of osm-object (if OSM-Result) | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| address.name | name of object | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| address.road | street adress of object | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| address.house\_number | house number of object | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| address.locality | city, place of object | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| address.postcode | postcode of object | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ +| address.country | country of object | ++-------------------------+------------------------------------------------------------------------------------------------------------------------------------+ + diff --git a/docs/index.rst b/docs/index.rst index abb620627..09dd1798d 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -34,6 +34,7 @@ Developer documentation dev/contribution_guide dev/install/installation + dev/engine_overview dev/search_api dev/plugins dev/translation