etl_lib.core.BatchProcessor module
- class BatchProcessor(context, task=None, predecessor=None)[source]
Bases:
ABCAllows assembly of
etl_lib.core.Task.Taskout of smaller building blocks.This way, functionally, such as reading from a CSV file, writing to a database or validation can be implemented and tested independently and re-used.
BatchProcessors form, a linked list, where each processor only knows about its predecessor.
BatchProcessors process data in batches. A batch of data is requested from the provided predecessors
get_batch()and returned in batches to the caller. Usage of Generators ensure that not all data must be loaded at once.- Parameters:
task (Task)
- __init__(context, task=None, predecessor=None)[source]
Constructs a new
etl_lib.core.BatchProcessorinstance.- Parameters:
context –
etl_lib.core.ETLContext.ETLContextinstance. It Will be available to subclasses.task (
Task) –etl_lib.core.Task.Taskthis processor is part of. Needed for status reporting only.predecessor – Source of batches for this processor. Can be None if no predecessor is needed (such as when this processor is the start of the queue).
- context
etl_lib.core.ETLContext.ETLContextinstance. Providing access to general facilities.
- abstractmethod get_batch(max_batch__size)[source]
Provides a batch of data to the caller.
The batch itself could be called and processed from the provided predecessor or generated from other sources.
- Parameters:
max_batch__size (
int) – The max size of the batch the caller expects to receive.- Return type:
- Returns
A generator that yields batches.
- predecessor
Predecessor, used as a source of batches.
- task
The
etl_lib.core.Task.Taskowning instance.
- class BatchResults(chunk, statistics=<factory>, batch_size=9223372036854775807)[source]
Bases:
objectReturn object of the
get_batch()method, wrapping a batched data together with meta information.- __init__(chunk, statistics=<factory>, batch_size=9223372036854775807)
- append_result(org, stats)[source]
Appends the stats dict to the provided org.
- Parameters:
org (
BatchResults) – The original BatchResults object.stats (
dict) – dict containing statistics to be added to the org object.
- Return type:
- Returns:
- New BatchResults object, where the
statisticsattribute is the merged result of the provided parameters. Values in the dicts with the same key are added.
- New BatchResults object, where the