etl_lib.data_source.CSVBatchSource module
- class CSVBatchSource(csv_file, context, task=None, **kwargs)[source]
Bases:
BatchProcessorBatchProcessor that reads a CSV file using the csv package.
File can optionally be gzipped. The returned batch of rows will have an additional _row column, containing the source row of the data, starting with 0.
- __init__(csv_file, context, task=None, **kwargs)[source]
Constructs a new CSVBatchSource.
- Parameters:
csv_file (
Path) – Path to the CSV file.context –
etl_lib.core.ETLContext.ETLContextinstance.kwargs – Will be passed on to the csv.DictReader providing a way to customise the reading to different csv formats.
task (Task)
- get_batch(max_batch__size)[source]
Provides a batch of data to the caller.
The batch itself could be called and processed from the provided predecessor or generated from other sources.
- Parameters:
max_batch__size (
int) – The max size of the batch the caller expects to receive.- Return type:
- Returns
A generator that yields batches.