For a bit repo like f-droid.org, it makes sense to standardize on a single
format for metadata files. This adds support for enforcing a single data
format, or a reduced set of data formats. So f-droid.org would run like
this if it changed to YAML:
accepted_formats = ['txt', 'yaml']
Then once everything was converted to YAML, it could look like this:
accepted_formats = ['yaml']
In order to prevent confusion caused by multiple metadata files for a given
app, fdroid will exit with an error if it finds any app metadata file with
the same package ID as one that has already been parsed.
This puts process of setting up the defaults for the internal dict
that represents a parsed app into a single method that is reused for all
metadata formats.
YAML is a format that is quite similar to the .txt format, but is a
widespread standard that has editing modes in popular editors. It is also
easily parsable in python.
The .pickle for testing is a lightly edited version of the real metadata
for org.videolan.vlc:
* comments were removed
This puts all of the needed post parsing checks on the metadata into a
single method that is used by all parsing methods (.txt, JSON, XML, YAML).
This provides the single place to normalize the internal representation of
the metadata.
It would be good to also change the internal representation to use more Python
bool/int types so that less post parsing is needed for JSON, XML, and YAML.
The SMSSecure test .pickle was changed to account for the use of lstrip()
and rstrip() on all 'script' types.
This also changes the example JSON to use ints for versionCodes
No need for a specific metadata type for 'Requires Root'. Just use 'bool',
then convert on output. This keeps the internal representation consistent
since all boolean-ish values will be Python bools. This in turn makes
parsing metadata in various formats easier. This also makes setting
booleans more tolerant since all these values are valid:
Yes yes No no True true False false
While the current text metadata format is good for human readability and
editability, it is difficult to produce and parse using code. XML is a
widespread standard format for easy automatic parsing and creating, while
having decent human readability.
The .pickle for testing is a lightly edited version of the real metadata
for net.osmand.plus:
* comments were removed
* "NonFreeNet" was added as an AntiFeature
The AntiFeatures metadata is a comma-separated list of tags, like
Categories, so it should also be stored internally as a list. This makes
parsing XML and JSON easier.
The test cases' .pickle files look like they change a lot, but they really
don't, its only the change of default AntiFeatures value from None to []
The only data produced by parsing metadata that was stored as a tuple was
the comments' key/values. Everywhere else, lists are used. This changes
the comments to also only use lists to keep the data consistent internally.
That makes it vastly easier to produce the exact same metadata dict when
parsing .txt, JSON, XML, etc.
The .pickle was created by dumping the output from parsing the current .txt
metadata for org.adaway. The JSON started from that pickle dump, but was
then hand edited to be more proper JSON, e.g. using boolean values.
fdroidserver currently uses plain python 2.x strings rather than unicode
strings, so the parsed JSON should produce the same format as the parsed
.txt metadata.
This is a test to cover future modifications of the .txt metadata parsing.
The pickle file was generated by just dumping the current parsed metadata,
so this test will always succeed if the parsing is not changed.
While the current text metadata format is good for human readability and
editability, it is difficult to produce and parse using code. JSON is a
widespread standard format for easy automatic parsing and creating, while
having decent human readability.