hc – holiday converter¶
Contents:
hc introduction¶
hc – holiday converter
Supports the following inputs and outputs:
- Input: German school holidays (HTML tables) from http://www.schulferien.org/
- Input and output: (School) holidays format used by opening_hours.js
Supports the following data serialization languages:
- YAML
- JSON
Documentation¶
Authors¶
Installation¶
Latest release¶
You can install hc
by invoking the following commands:
gpg --recv-keys 'C505 B5C9 3B0D B3D3 38A1 B600 5FE9 2C12 EE88 E1F0'
mkdir --parent /tmp/hc && cd /tmp/hc
wget -r -nd -l 1 https://pypi.python.org/pypi/hc --accept-regex '^https://(test)?pypi\.python\.org/packages/.*\.whl.*'
current_release="$(find . -type f -name '*.whl' | sort | tail -n 1)"
gpg --verify "${current_release}.asc" "${current_release}" && pip3 install --upgrade "${current_release}"
Refer to Verifying PyPI and Conda Packages for more details. Note that this might pull down dependencies in an unauthenticated way! You might want to install the dependencies yourself beforehand.
Or if you feel lazy and agree that pip/issues/1035
should be fixed you can also install hc
like this:
pip3 install hc
Development version¶
If you want to be more on the bleeding edge of hc
development
consider cloning the git
repository and installing from it:
gpg --recv-keys 'EF96 BC32 AC57 CFC7 2DF0 1D8C 489A 4D5E C353 C98A'
git clone --recursive https://gitlab.com/ypid/hc.git
cd hc && git verify-commit HEAD
echo 'Check if the HEAD commit has a good signature and only proceed in that case!' && read -r fnord
echo 'Then chose one of the commands below to install hc and its dependencies:'
pip3 install --upgrade .
./setup.py develop --user
./setup.py install --user
./setup.py install
This will also get you the cache which is tracked in git as well to do integration testing over the whole dataset. Please be sure to use the cache by symlinking it to your user cache directory. The following should do the trick:
hc_cache="$(python3 -c 'from appdirs import user_cache_dir; print(user_cache_dir("hc"))')"
ln -sT "$PWD/tests/cache/" "$hc_cache"
CLI interface¶
Holiday converter tool
usage: hc [-h] [-V] [-d] [-v] [-q] [-n] [-c CACHE_DIR] [-i INPUT_FILE]
[-f {schulferien_html}] [-F FROM_DATE] [-T TO_DATE] [-u]
[-t {yaml,json}] [-s {opening_hours.js}] [-D]
output-file
Positional Arguments¶
output-file | Where to write the output file. ‘-‘ will write to STDOUT. |
Named Arguments¶
-V, --version | show program’s version number and exit |
-d, --debug | Write debugging and higher to STDOUT|STDERR. |
-v, --verbose | Write information and higher to STDOUT|STDERR. |
-q, --quiet, --silent | |
Only write errors and higher to STDOUT|STDERR. | |
-n, --no-cache | Do not cache intermediary files. Default: True |
-c, --cache-dir | |
Cache directory, defaults to the default cache directory of your operating system. | |
-i, --input-file | |
File path to the input file to process. ‘-‘ will read from STDIN. | |
-f, --input-format, --from | |
Possible choices: schulferien_html Format of the input file. Default: “schulferien_html”. Default: “schulferien_html” | |
-F, --from-date | |
Process date range starting at given RFC 3339 date. Default: Current year and month “”2020-04”“. Default: “2020-04” | |
-T, --to-date | Process date range ending at given RFC 3339 date. Default: One year in the further “”2021-04”“. Default: “2021-04” |
-u, --update-output | |
Update the output file instead of constructing it from scratch. Implementation incomplete. Default: False | |
-t, --output-format, --to | |
Possible choices: yaml, json Format of the output file. Default: “yaml”. Default: “yaml” | |
-s, --output-structure | |
Possible choices: opening_hours.js Structure of the output file. Default: “opening_hours.js”. Default: “opening_hours.js” | |
-D, --dry-run | Don’t write output. Default: False |
History¶
This tool was created because Germany as of 2017 seems to be unable/unwilling to provide school holidays or holidays in general in a machine readable format. There are sites like http://www.schulferien.org/ which do a really good job in getting the data anyway through various sources and “providing them”. Back in 2013, everything was great and schulferien.org just provided all iCal files they had for the school holidays of the current year and the following years as far as they are defined by the German Kultusministerkonferenz. A Perl Script has been used to parse all the iCal files and convert them (ref: convert_ical_to_json). Unfortunately, those days are over and after checking out all the available sources the least bad one was to go ahead and parse the HTML table of schulferien.org because the HTML version still provides all data. schulferien.org was contacted before to find a better solution but none has been found. One concern from schulferien.org are the use of (faulty) scripts which put load on their servers. It is therefore one key design goal of this tool to make the fewest requests to external resources possible and use extensive caching. This has been implemented see Design principles.
Refer to this issue for more details.
Design principles¶
Generic
When you look around on the Internet, you find hundreds of public and/or school holiday APIs, libraries, websites providing HTML calenders, iCALs, PDFs, at least for Germany. Most of these have some artificial kind of limitation or restriction. This is an attempt to harvest holidays (which are generally not copyright protected), convert them and provide them without any limitations.
The available tools where found useless for the use case of bundling holiday definitions for opening_hours.js which is why this tool has been written.
Free Software
All sources are provided under the GNU Affero General Public License v3 (AGPL-3.0). Resources such as holiday data is released under a Creative Commons Zero v1.0 Universal. Enjoy.
Idempotent.
The program can be run against it’s output and should not make any changes to it. This property is checked by integration testing.
Caching
Make the fewest requests to external resources possible and use extensive caching. The cache is provided as separate git repository (hc-tests-cache) to also make use of the cache during CI testing which is done against a support matrix of Python versions and environments and therefore runs in parallel a number of times for each commit.
Expendable
Convert from anything to anything using a common internal data structure.
hc package¶
Submodules¶
hc.datatypes module¶
Data types definition
-
class
hc.datatypes.
MonthDayList
¶ Bases:
list
-
class
hc.datatypes.
PhData
¶ Bases:
list
-
hc.datatypes.
fix_data_types
(dataset)¶
-
hc.datatypes.
fix_ph_data
(dataset)¶
hc.defaults module¶
hc defaults
hc.helpers module¶
hc helpers
-
hc.helpers.
get_date_from_relative_month
(relative_month)¶
-
hc.helpers.
get_month_number
(month_name)¶
-
hc.helpers.
get_relative_month
(date)¶
hc.opening_hours_js module¶
OpenStreetMap opening_hours.js format. Refer to https://github.com/opening-hours/opening_hours.js/blob/master/holidays/README.md for the “spec”.
-
class
hc.opening_hours_js.
OpeningHoursJS
(defs=None)¶ Bases:
object
-
FIRST_LEVEL_SORTING
= {'PH': '20', 'SH': '30', '_nominatim_url': '10'}¶
-
SH_DATA_SORTING
= {'name': '0'}¶
-
get_school_holidays
(out=None)¶
-
read
(in_defs)¶
-
static
update_sh_format
(sh_data)¶
-
-
hc.opening_hours_js.
find_ind
(lst, key, value)¶
hc.schulferien_org module¶
schulferien.org interface
hc.yaml module¶
YAML representation
-
class
hc.yaml.
PrettyHolidayYAMLDumper
(stream, default_style=None, default_flow_style=None, canonical=None, indent=None, width=None, allow_unicode=None, line_break=None, encoding=None, explicit_start=None, explicit_end=None, version=None, tags=None, block_seq_indent=None, top_level_colon_align=None, prefix_colon=None)¶ Bases:
ruamel.yaml.dumper.RoundTripDumper
YAML dumper optimized human readability of the holiday format.
-
represent_dict
(data)¶ write out tag if saved on loading
-
represent_list
(data)¶
-
yaml_representers
= {None: <unbound method SafeRepresenter.represent_undefined>, <type 'float'>: <unbound method SafeRepresenter.represent_float>, <type 'int'>: <unbound method SafeRepresenter.represent_int>, <type 'list'>: <unbound method SafeRepresenter.represent_list>, <type 'long'>: <unbound method SafeRepresenter.represent_long>, <type 'dict'>: <unbound method PrettyHolidayYAMLDumper.represent_dict>, <type 'NoneType'>: <unbound method RoundTripRepresenter.represent_none>, <type 'set'>: <unbound method SafeRepresenter.represent_set>, <type 'str'>: <unbound method SafeRepresenter.represent_str>, <type 'tuple'>: <unbound method SafeRepresenter.represent_list>, <type 'unicode'>: <unbound method SafeRepresenter.represent_unicode>, <type 'bool'>: <unbound method SafeRepresenter.represent_bool>, <class 'collections.OrderedDict'>: <unbound method PrettyHolidayYAMLDumper.represent_dict>, <class 'ruamel.yaml.comments.CommentedSet'>: <unbound method RoundTripRepresenter.represent_set>, <class 'ruamel.yaml.comments.CommentedSeq'>: <unbound method RoundTripRepresenter.represent_list>, <class 'ruamel.yaml.comments.CommentedOrderedMap'>: <unbound method RoundTripRepresenter.represent_ordereddict>, <class 'ruamel.yaml.comments.CommentedMap'>: <unbound method RoundTripRepresenter.represent_dict>, <class 'ruamel.yaml.scalarstring.PreservedScalarString'>: <unbound method RoundTripRepresenter.represent_preserved_scalarstring>, <class 'ruamel.yaml.scalarstring.SingleQuotedScalarString'>: <unbound method RoundTripRepresenter.represent_single_quoted_scalarstring>, <class 'ruamel.yaml.scalarstring.DoubleQuotedScalarString'>: <unbound method RoundTripRepresenter.represent_double_quoted_scalarstring>, <class 'ruamel.yaml.scalarint.ScalarInt'>: <unbound method RoundTripRepresenter.represent_scalar_int>, <class 'ruamel.yaml.scalarint.BinaryInt'>: <unbound method RoundTripRepresenter.represent_binary_int>, <class 'ruamel.yaml.scalarint.OctalInt'>: <unbound method RoundTripRepresenter.represent_octal_int>, <class 'ruamel.yaml.scalarint.HexInt'>: <unbound method RoundTripRepresenter.represent_hex_int>, <class 'ruamel.yaml.scalarint.HexCapsInt'>: <unbound method RoundTripRepresenter.represent_hex_caps_int>, <class 'ruamel.yaml.timestamp.TimeStamp'>: <unbound method RoundTripRepresenter.represent_datetime>, <class 'hc.datatypes.PhData'>: <unbound method PrettyHolidayYAMLDumper.represent_list>, <class 'hc.datatypes.MonthDayList'>: <unbound method PrettyHolidayYAMLDumper.represent_list>, <type '_ordereddict.ordereddict'>: <unbound method SafeRepresenter.represent_ordereddict>, <type 'datetime.datetime'>: <unbound method SafeRepresenter.represent_datetime>, <type 'datetime.date'>: <unbound method SafeRepresenter.represent_date>}¶
-
-
hc.yaml.
dump_holidays_as_yaml
(unserialized_data, add_vspacing=True)¶
-
hc.yaml.
get_clean_yaml
(serialized_data, add_vspacing=False)¶
Module contents¶
Holiday converter tool
Contributing and issue reporting¶
You can contribute and report issues in the usual way as documented by GitHub. Unit and integration tests can be run locally and are automatically run in CI. Acceptable contributions need to pass all of them.
If you found a security vulnerability that might put users at risk please send your report/patch to ypid@riseup.net. Please consider using OpenPGP to encrypt your email.
FAQ¶
Am I allowed to use the data the script gathers?¶
The author hopes so but keep in mind that he is not a lawyer. As this is about German law, the § 5 UrhG should apply after which content like school holidays in Germany are not copyright protected. The official source is https://www.kmk.org/service/ferien.html.