etl_lib.data_source.CSVBatchSource module
- class CSVBatchSource(context, task=None, csv_file=None, **kwargs)[source]
Bases:
BatchProcessorBatchProcessor that reads a CSV file using the csv package.
File can optionally be gzipped. The returned batch of rows will have an additional _row column, containing the source row of the data, starting with 0.
- __init__(context, task=None, csv_file=None, **kwargs)[source]
Constructs a new CSVBatchSource.
- Parameters:
context –
etl_lib.core.ETLContext.ETLContextinstance.task (
Task|None) –etl_lib.core.Task.Taskinstance owning this processor.csv_file (
Path) – Path to the CSV file.kwargs – Will be passed on to the csv.DictReader providing a way to customise the reading to different csv formats.
- get_batch(max_batch_size)[source]
Provides a batch of data to the caller.
The batch itself could be called and processed from the provided predecessor or generated from other sources.
- Parameters:
max_batch_size (
int) – The max size of the batch the caller expects to receive.- Return type:
- Returns
A generator that yields batches.