Testing

Tests for ETL pipelines are, by definition, integration tests. The library provides utilities to simplify setting up a connection to Neo4j.

The utilities integrate with pytest. See the provided examples for possible setup and usage.

There are two ways to use Neo4j with tests: either with the excellent TestContainers or by using an existing Neo4j installation.

The module utils contains mock implementations of core components to set up an ETLContext and its dependencies. These utilities are designed to integrate seamlessly and use environment variables for configuration.

Fixtures

In addition to the above module, the following code is recommended to be placed in the root of the testing package as conftest.py. The pytest fixtures defined within it will be available to all tests.

 1@pytest.fixture(scope="session")
 2def neo4j_driver():
 3    """
 4    Creates a Neo4j driver instance.
 5    If the environment variable NEO4J_TEST_CONTAINER is set, it will be used as the image name to start Neo4j in a
 6    TestContainer.
 7    If the variable is not set, then the following environment variables will be used to connect to running instance:
 8    `NEO4J_URI`
 9    `NEO4J_USERNAME`
10    `NEO4J_PASSWORD`
11    `NEO4J_TEST_DATABASE`
12    The later can be used to direct tests away from the default DB,
13    :return:
14    """
15    neo4j_container = os.getenv("NEO4J_TEST_CONTAINER")
16    if neo4j_container is not None:
17        print(f"found NEO4J_TEST_CONTAINER with {neo4j_container}, using test containers")
18        from testcontainers.neo4j import Neo4jContainer
19        with (Neo4jContainer(
20                image=neo4j_container,
21                username="neo4j",
22                password=str(uuid.uuid4()))
23                      .with_env("NEO4J_PLUGINS", "[\"apoc\", \"graph-data-science\"],")
24                      .with_env("NEO4J_ACCEPT_LICENSE_AGREEMENT", "yes") as neo4j):
25            driver = neo4j.get_driver()
26            yield driver
27            driver.close()
28            return
29    else:
30        print(
31            f"found NEO4J_TEST_CONTAINER not set, using instance at  {os.getenv('NEO4J_URI')} and database={os.getenv('NEO4J_TEST_DATABASE')}")
32        # do not use test containers, but a running remote db
33        with GraphDatabase.driver(os.getenv('NEO4J_URI'),
34                                  auth=(os.getenv('NEO4J_USERNAME'),
35                                        os.getenv('NEO4J_PASSWORD')),
36                                  notifications_min_severity="OFF") as driver:
37            yield driver
38            driver.close()
39            return
40
41
42@pytest.fixture
43def neo4j_driver_with_empty_db(neo4j_driver):
44    with neo4j_driver.session(database=get_database_name(), default_access_mode=WRITE_ACCESS) as session:
45        session.run("MATCH (n) DETACH DELETE n")
46        yield neo4j_driver
47
48
49@pytest.fixture
50def etl_context(neo4j_driver_with_empty_db, tmp_path) -> ETLContext:
51    return MockETLContext(neo4j_driver_with_empty_db, tmp_path)
52

The following fixtures are provided:

  • neo4j_driver_with_empty_db : Provides a Neo4j driver object connected to an empty database. Typically used in data setup steps.

  • neo4j_driver : Provides a Neo4j driver object connected to a running Neo4j installation.

  • etl_context : Provides a ETLContext for testing purposes. The simple reporter is used, and a pytest temporary directory is set up for error files (Validation).

Selecting a Neo4j Installation

Testcontainers

If the NEO4J_TEST_CONTAINER environment variable is set to the name of a Neo4j Docker image, TestContainers will be used to run Neo4j. This requires Docker to be available on the host running the tests.

The provided installation will have apoc core and gds plugins installed.

External Installation

If the NEO4J_TEST_CONTAINER environment variable is not set, the NEO4J_URI, NEO4J_USERNAME, and NEO4J_PASSWORD variables must be set. These will be used to connect to an existing Neo4j instance.

The functions in utils respect the NEO4J_TEST_DATABASE variable when running queries, allowing for separate databases per user.