etl_lib.data_source.CSVBatchSource module

class CSVBatchSource(csv_file, context, task=None, **kwargs)[source]

Bases: BatchProcessor

BatchProcessor that reads a CSV file using the csv package.

File can optionally be gzipped. The returned batch of rows will have an additional _row column, containing the source row of the data, starting with 0.

Parameters:
__init__(csv_file, context, task=None, **kwargs)[source]

Constructs a new CSVBatchSource.

Parameters:
  • csv_file (Path) – Path to the CSV file.

  • contextetl_lib.core.ETLContext.ETLContext instance.

  • kwargs – Will be passed on to the csv.DictReader providing a way to customise the reading to different csv formats.

  • task (Task)

get_batch(max_batch__size)[source]

Provides a batch of data to the caller.

The batch itself could be called and processed from the provided predecessor or generated from other sources.

Parameters:

max_batch__size (int) – The max size of the batch the caller expects to receive.

Return type:

Generator[BatchResults, None, None]

Returns

A generator that yields batches.