etl_lib.core.Task module

class ParallelTaskGroup(context, tasks, name)[source]

Bases: TaskGroup

Task group for parallel execution of jobs.

This class uses a ThreadPoolExecutor to run the provided tasks run_internal() functions in parallel. Care should be taken that the Tasks can operate without blocking.locking each other.

Parameters:
__init__(context, tasks, name)[source]

Construct a TaskGroup object.

Parameters:
run_internal(**kwargs)[source]

Place to provide the logic to be performed.

This base class provides all the housekeeping and reporting, so that implementation must/should not need to care about them. Exceptions should not be captured by implementations. They are handled by this base class.

Parameters:

kwargs – will be passed to run_internal

Return type:

TaskReturn

Returns:

An instance of TaskReturn.

class Task(context)[source]

Bases: object

ETL job that can be executed.

Provides reporting, time tracking and error handling. Implementations must provide the run_internal() function.

__init__(context)[source]

Construct a Task object.

Parameters:

contextETLContext instance. Will be available to subclasses.

abort_on_fail()[source]

Should the pipeline abort when this job fails.

Return type:

bool

Returns:

True indicates that no other Tasks should be executed if run_internal() fails.

context

ETLContext giving access to various resources.

depth: int

Level or depth of the task in the hierarchy. The root task is depth 0. Updated by the Reporter

end_time: datetime

Time when the execute() has finished., None before.

execute(**kwargs)[source]

Executes the task.

Implementations of this Interface should not overwrite this method, but provide the Task functionality inside run_internal() which will be called from here. Will use the ProgressReporter from context to report status updates.

Parameters:

kwargs – will be passed to run_internal

Return type:

TaskReturn

abstractmethod run_internal(**kwargs)[source]

Place to provide the logic to be performed.

This base class provides all the housekeeping and reporting, so that implementation must/should not need to care about them. Exceptions should not be captured by implementations. They are handled by this base class.

Parameters:

kwargs – will be passed to run_internal

Return type:

TaskReturn

Returns:

An instance of TaskReturn.

start_time: datetime

Time when the execute() was called., None before.

success: bool

True if the task has finished successful. False otherwise, None before the task has finished.

task_name()[source]

Option to overwrite the name of this Task.

Name is used in reporting only.

Return type:

str

Returns:

Sting describing the task. Defaults to the class name..

uuid

Uniquely identifies a Task.

class TaskGroup(context, tasks, name)[source]

Bases: Task

Base class to allow wrapping of Task or TaskGroups to form a hierarchy of jobs.

Implementations only need to provide the Tasks to execute as an array. The summery statistic object returned from the group execute method will be a merged/aggregated one.

Parameters:
__init__(context, tasks, name)[source]

Construct a TaskGroup object.

Parameters:
abort_on_fail()[source]

Should the pipeline abort when this job fails.

Returns:

True indicates that no other Tasks should be executed if run_internal() fails.

run_internal(**kwargs)[source]

Place to provide the logic to be performed.

This base class provides all the housekeeping and reporting, so that implementation must/should not need to care about them. Exceptions should not be captured by implementations. They are handled by this base class.

Parameters:

kwargs – will be passed to run_internal

Return type:

TaskReturn

Returns:

An instance of TaskReturn.

sub_tasks()[source]
Return type:

[<class ‘etl_lib.core.Task.Task’>]

task_name()[source]

Option to overwrite the name of this Task.

Name is used in reporting only.

Return type:

str

Returns:

Sting describing the task. Defaults to the class name..

class TaskReturn(success=True, summery=None, error=None)[source]

Bases: object

Return object for the execute() function, transporting result information.

Parameters:
__init__(success=True, summery=None, error=None)[source]
Parameters:
error: str

Error message.

success: bool

Success or failure of the task.

summery: dict

dict holding statistics about the task performed, such as rows inserted, updated.