etl_lib.core.Task module
- class ParallelTaskGroup(context, tasks, name)[source]
Bases:
TaskGroupTask group for parallel execution of jobs.
This class uses a ThreadPoolExecutor to run the provided tasks
run_internal()functions in parallel. Care should be taken that the Tasks can operate without blocking.locking each other.- __init__(context, tasks, name)[source]
Construct a TaskGroup object.
- Parameters:
context –
etl_lib.core.ETLContext.ETLContextinstance.tasks (
list[Task]) – an array of Task instances. These will be executed in parallel whenrun_internal()is called. The Tasks in the array could itself be other TaskGroups.name (
str) – short name of the TaskGroup.
- run_internal(**kwargs)[source]
Place to provide the logic to be performed.
This base class provides all the housekeeping and reporting, so that implementation must/should not need to care about them. Exceptions should not be captured by implementations. They are handled by this base class.
- Parameters:
kwargs – will be passed to run_internal
- Return type:
- Returns:
An instance of
TaskReturn.
- class Task(context)[source]
Bases:
objectETL job that can be executed.
Provides reporting, time tracking and error handling. Implementations must provide the
run_internal()function.- __init__(context)[source]
Construct a Task object.
- Parameters:
context –
ETLContextinstance. Will be available to subclasses.
- abort_on_fail()[source]
Should the pipeline abort when this job fails.
- Return type:
- Returns:
True indicates that no other Tasks should be executed if
run_internal()fails.
- context
ETLContextgiving access to various resources.
-
depth:
int Level or depth of the task in the hierarchy. The root task is depth 0. Updated by the Reporter
- execute(**kwargs)[source]
Executes the task.
Implementations of this Interface should not overwrite this method, but provide the Task functionality inside
run_internal()which will be called from here. Will use theProgressReporterfromcontextto report status updates.- Parameters:
kwargs – will be passed to run_internal
- Return type:
- abstractmethod run_internal(**kwargs)[source]
Place to provide the logic to be performed.
This base class provides all the housekeeping and reporting, so that implementation must/should not need to care about them. Exceptions should not be captured by implementations. They are handled by this base class.
- Parameters:
kwargs – will be passed to run_internal
- Return type:
- Returns:
An instance of
TaskReturn.
-
success:
bool True if the task has finished successful. False otherwise, None before the task has finished.
- task_name()[source]
Option to overwrite the name of this Task.
Name is used in reporting only.
- Return type:
- Returns:
Sting describing the task. Defaults to the class name..
- uuid
Uniquely identifies a Task.
- class TaskGroup(context, tasks, name)[source]
Bases:
TaskBase class to allow wrapping of Task or TaskGroups to form a hierarchy of jobs.
Implementations only need to provide the Tasks to execute as an array. The summery statistic object returned from the group execute method will be a merged/aggregated one.
- __init__(context, tasks, name)[source]
Construct a TaskGroup object.
- Parameters:
context –
etl_lib.core.ETLContext.ETLContextinstance.tasks (
list[Task]) – a list of :py:class:`etl_lib.core.Task.Rask instances. These will be executed in the order provided whenrun_internal()is called.name (
str) – short name of the TaskGroup for reporting.
- abort_on_fail()[source]
Should the pipeline abort when this job fails.
- Returns:
True indicates that no other Tasks should be executed if
run_internal()fails.
- run_internal(**kwargs)[source]
Place to provide the logic to be performed.
This base class provides all the housekeeping and reporting, so that implementation must/should not need to care about them. Exceptions should not be captured by implementations. They are handled by this base class.
- Parameters:
kwargs – will be passed to run_internal
- Return type:
- Returns:
An instance of
TaskReturn.