etl_lib.core.ProgressReporter module

class Neo4jProgressReporter(context, database)[source]

Bases: ProgressReporter

Extends the ProgressReporter to additionally write the status updates from the tasks to a Neo4j database.

Parameters:

database (str)

__init__(context, database)[source]

Creates a new Neo4j progress reporter.

Parameters:
  • contextetl_lib.core.ETLContext containing a Neo4jConnection instance.

  • database (str) – Name of the database to write the status updates to.

finished_task(task, result)[source]

Marks the task as finished.

Stops the time recording for the tasks and performs logging. Logging will include details from the provided summery.

Parameters:
  • task (Task) – Task to be marked as finished.

  • result (TaskReturn) – result of the task execution, such as status and summery information.

Return type:

Task

Returns:

Task to be marked as started.

register_tasks(root, **kwargs)[source]

Registers a etl_lib.core.Task with this reporter.

Needs to be called once with the root task. The function will walk the tree of tasks and register them in turn.

Parameters:
  • main – Root of the task tree.

  • root (Task)

report_progress(task, batches, expected_batches, stats)[source]

Optionally provide updates during execution of a task, such as batches processed so far.

This is an optional call, as not all etl_lib.core.Task need batching.

Parameters:
  • task (Task) – Task reporting updates.

  • batches (int) – Number of batches processed so far.

  • expected_batches (int) – Number of expected batches. Can be None if the overall number of batches is not known before execution.

  • stats (dict) – dict of statistics so far (such as nodes_created).

Return type:

None

started_task(task)[source]

Marks the task as started.

Start the time keeping for this task and performs logging.

Parameters:

task (Task) – Task to be marked as started.

Return type:

Task

Returns:

The task that was provided.

class ProgressReporter(context)[source]

Bases: object

Responsible for reporting progress of etl_lib.core.Task .

This specific implementation uses the python logging module to log progress. Non-error logging is using the INFO level.

__init__(context)[source]
end_time: datetime
finished_task(task, result)[source]

Marks the task as finished.

Stops the time recording for the tasks and performs logging. Logging will include details from the provided summery.

Parameters:
  • task (Task) – Task to be marked as finished.

  • result (TaskReturn) – result of the task execution, such as status and summery information.

Return type:

Task

Returns:

Task to be marked as started.

register_tasks(main)[source]

Registers a etl_lib.core.Task with this reporter.

Needs to be called once with the root task. The function will walk the tree of tasks and register them in turn.

Parameters:

main (Task) – Root of the task tree.

report_progress(task, batches, expected_batches, stats)[source]

Optionally provide updates during execution of a task, such as batches processed so far.

This is an optional call, as not all etl_lib.core.Task need batching.

Parameters:
  • task (Task) – Task reporting updates.

  • batches (int) – Number of batches processed so far.

  • expected_batches (int) – Number of expected batches. Can be None if the overall number of batches is not known before execution.

  • stats (dict) – dict of statistics so far (such as nodes_created).

Return type:

None

start_time: datetime
started_task(task)[source]

Marks the task as started.

Start the time keeping for this task and performs logging.

Parameters:

task (Task) – Task to be marked as started.

Return type:

Task

Returns:

The task that was provided.

get_reporter(context)[source]

Returns a ProgressReporter instance.

If the ETLContext env holds the key REPORTER_DATABASE then a Neo4jProgressReporter instance is created with the given database name.

Otherwise, a ProgressReporter (no logging to database) instance will be created.

Return type:

ProgressReporter