etl_lib.core.ValidationBatchProcessor module

class ValidationBatchProcessor(context, task, predecessor, model, error_file)[source]

Bases: BatchProcessor

Batch processor for validation, using Pydantic.

Parameters:
__init__(context, task, predecessor, model, error_file)[source]

Constructs a new ValidationBatchProcessor.

The etl_lib.core.BatchProcessor.BatchResults returned from the get_batch() of this implementation will contain the following additional entries:

  • valid_rows: Number of valid rows.

  • invalid_rows: Number of invalid rows.

Parameters:
get_batch(max_batch_size)[source]

Provides a batch of data to the caller.

The batch itself could be called and processed from the provided predecessor or generated from other sources.

Parameters:

max_batch_size (int) – The max size of the batch the caller expects to receive.

Return type:

Generator[BatchResults, None, None]

Returns

A generator that yields batches.