CLI
The module etl_lib.cli.run_tools defines functions to query details of past ETL runs. It utilizes the click package to define a click.group() named cli, which can be easily integrated into a command-line utility.
See the GTFS example for guidance on building a command-line tool.
from etl_lib.cli.run_tools import cli
@cli.command("<your own command>")
@click.argument(<your own arguments>)
@click.pass_context
def main(ctx, input_directory):
pass
if __name__ == '__main__':
cli()
Commands
The cli group provides the following commands:
$ python <your-cli>.py --help
Usage: <your-cli>.py [OPTIONS] COMMAND [ARGS]...
Environment variables can be configured via a .env file or overridden via
CLI options:
- NEO4J_URI: Neo4j database URI
- NEO4J_USERNAME: Neo4j username
- NEO4J_PASSWORD: Neo4j password
- LOG_FILE: Path to the log file
- DATABASE_NAME: Neo4j database name (default: neo4j)
Options:
--neo4j-uri TEXT Neo4j database URI
--neo4j-user TEXT Neo4j username
--neo4j-password TEXT Neo4j password
--log-file TEXT Path to the log file
--database-name TEXT Neo4j database name (default: neo4j)
--help Show this message and exit.
Commands:
delete Delete runs based on run ID, date, or age.
detail Show a breakdown of the task for the specified run, including...
query Retrieve the list of the last x ETL runs from the database and...
Query
$ python <your-cli>.py query --help
Usage: <your-cli>.py query [OPTIONS]
Retrieve the list of the last x ETL runs from the database and display them.
Options:
--number-runs INTEGER Number of rows to process, defaults to 10
--help Show this message and exit.
Example output:
$ python <your-cli>.py query
Listing runs in database 'neo4j'
+--------+--------------------------------------+------------------+------------------+-----------+
| name | ID | startTime | endTime | changes |
|--------+--------------------------------------+------------------+------------------+-----------|
| main | 69260954-0b94-4043-be1b-f99ce5a64d3a | 2025-02-09 17:19 | 2025-02-09 17:20 | 4566469 |
+--------+--------------------------------------+------------------+------------------+-----------+
The changes column represents the sum of all modifications in that run, including CSV rows read, constraints added, properties set, etc.
Detail
$ python <your-cli>.py detail --help
Usage: <your-cli>.py detail [OPTIONS] RUN_ID
Show a breakdown of the task for the specified run, including statistics.
Options:
--details Show stats for each task
--help Show this message and exit.
Example output:
$ python <your-cli>.py detail 69260954-0b94-4043-be1b-f99ce5a64d3a
Showing details for run ID: 69260954-0b94-4043-be1b-f99ce5a64d3a
+-------------------------------------------------------------------------------+----------+-----------+------------+-----------+
| task | status | batches | duration | changes |
|-------------------------------------------------------------------------------+----------+-----------+------------+-----------|
| TaskGroup(schema-init) | success | | 0:00:00 | 0 |
| Task(SchemaTask) | success | | 0:00:00 | 0 |
| TaskGroup(csv-loading) | success | | 0:00:57 | 4566469 |
| LoadAgenciesTask(/Users/bert/Downloads/mdb-2333-202412230030/agency.txt) | success | 1 / - | 0:00:00 | 6 |
| LoadRoutesTask(/Users/bert/Downloads/mdb-2333-202412230030/routes.txt) | success | 1 / - | 0:00:00 | 1495 |
| LoadStopsTask(/Users/bert/Downloads/mdb-2333-202412230030/stops.txt) | success | 1 / - | 0:00:00 | 33360 |
| LoadTripsTask(/Users/bert/Downloads/mdb-2333-202412230030/trips.txt) | success | 19 / - | 0:00:03 | 733552 |
| LoadCalendarTask(/Users/bert/Downloads/mdb-2333-202412230030/calendar.txt) | success | 1 / - | 0:00:00 | 424 |
| LoadStopTimesTask(/Users/bert/Downloads/mdb-2333-202412230030/stop_times.txt) | success | 380 / - | 0:00:54 | 3797632 |
| TaskGroup(post-processing) | success | | 0:00:07 | 0 |
| Task(CreateSequenceTask) | success | | 0:00:07 | 0 |
+-------------------------------------------------------------------------------+----------+-----------+------------+-----------+
In cases where the expected number of batches is unknown, the 380 / - format is used.
Adding the --details flag provides additional task-specific statistics:
Example output:
$ python <your-cli>.py detail 69260954-0b94-4043-be1b-f99ce5a64d3a --details
Showing statistics for Task 'TaskGroup(csv-loading)' with status 'success'
+----------------+---------+
| Name | Value |
|----------------+---------|
| csv_lines_read | 1995192 |
| properties_set | 576085 |
| valid_rows | 1995192 |
+----------------+---------+
Delete
$ python <your-cli>.py delete --help
Usage: <your-cli>.py delete [OPTIONS]
Delete runs based on run ID, date, or age. One and only one of --run-id,
--since, or --older must be provided.
Options:
--run-id TEXT Run IDs to delete, works with comma separated list
--before [%Y-%m-%d] Delete runs before a specific date in format YYYY-MM-DD
--older INTEGER Delete runs older than x days
--help Show this message and exit.