bods-dev-handbook

Testing the schema

Changes to the schema should not be merged until all of the changes are covered by tests, and the tests pass.

Exceptions may be made for rapid iteration on pre-release changes, where test coverage/failure is documented, and expected to be resolved before a new version is released.

In summary, we test:

Metaschema

BODS v0.4 uses JSON Schema 2020-12 as its base metaschema, extended with some custom properties which are used to further constrain the BODS schema. These are:

Properties from the extended metaschema should not be present in any BODS data, only in the schema. Therefore they don’t need to be documented for data publishers, only for schema architects and developers.

The metaschema file is found in data-standard/tests/schema/meta-schema.json. As part of the data standard repository tests, the BODS schema files are validated against the metaschema.

Python tests

Test files are found in the data-standard/tests directory.

Tests for the BODS schema are organised into:

The tests are written using pytest. Fixtures for fetching files, loading the schema, creating a validator, and other helper functions can be found in conftest.py.

The tests and a flake8 code quality check are run automatically when a branch is pushed to the data-standard repository.

Code quality

We use Black, * iSort and flake8 for code linting. Pull requests are automatically checked and must pass these before they can be merged.

Running tests locally

Tests can be run in your local development environment (ie. in a virtualenv or docker container or similar) from inside the data-standard repository.

Make sure the test requirements are installed, ie.:

pip install -r requirements_test.txt

To run all the tests:

pytest tests/

To run one set of tests, eg.:

pytest tests/test_schema.py

To run code linting:

flake8 tests/

(There is no output if all the code is conformant.)

and:

black tests/ --line-length=119

Adding tests

The tests in the data standard repository are present to validate that the JSON Schema works as expected. They are not to validate data, and don’t test any requirements imposed on data by the data standard which are not enforced by the JSON Schema - these should be covered by a validation tool.

Schema structure

Schema function

If constraints are added to or removed from the JSON schema (eg. a string field which previously had no maximum length now has a maximum length), valid and invalid test data should be added to the appropriate subdirectory in tests/data/.

Use one file per requirement, with the minimum contents possible to test only the requirement in question. This is so that if any requirements change in future, we have the minimum amount to update in the test files.

Name the test files to make it clear which requirement is being tested.

After adding new files make sure to run the tests (pytest tests/test_data.py) to check they pass.

Testing against valid data

A minimum valid BODS entity statement looks like this:

[
    {
        "statementId": "2f7bf9370f1254068e5e946df067d07d",
        "declarationSubject": "xyz",
        "statementDate": "2017-11-18",
        "recordId": "123",
        "recordType": "entity",
        "recordDetails": {
            "entityType": {
                "type": "unknownEntity"
            },
            "isComponent": false
        }
    }
]

Start from a minimal statement like this, and add only the field you are testing. If you are testing a field in a nested object (eg. publicationDetails/publisher/name) you may need to add more data to cover additional required fields (eg. publicationDetails/publicationDate). Check the schema itself to find out which fields are required for the various objects.

Testing against invalid data

As with valid data, start from a minimal valid statement and add invalid values (or remove in the case of a required field) for only the field you are testing. There should be one validation error per file only. The test will fail if there is more than one.

We also have to test that the validation error is the one we expect, so we need to map the data files to the type and location of the error we’re looking for. Do this by updating expected_errors.csv (in the same directory as the invalid data). The structure of this file is:

Validation keywords are from the JSON Schema standard, and are one of:

The validation keyword may sometimes need to be set to oneOf, anyOf or allOf if a value is constrained by multiple possible subschemas, rather than the keyword actual validation that is taking place. Making this more precise is a todo.

The JSON path always begins with $[0] because each test file is an array of one statement. The path is separated by .. Elements in arrays are represented by [0], [1], etc. for the position of the error in the array. When the error relates to a missing required field, the JSON path ends at the parent. ie. To test a missing statementDate, the JSON path is $[0] (because that is the location of the required field error), not $[0].statementDate.

The property is the name of the specific field you’re testing. To test a missing statementDate, set this value to statementDate. To test an incorrect address type, set this to type.

Docs and examples