etl_lib.core.ValidationBatchProcessor module

class ValidationBatchProcessor(context, task, predecessor, model, error_file)[source]

Bases: BatchProcessor

Batch processor for validation, using Pydantic.

Parameters:
__init__(context, task, predecessor, model, error_file)[source]

Constructs a new ValidationBatchProcessor.

The etl_lib.core.BatchProcessor.BatchResults returned from the get_batch() of this implementation will contain the following additional entries:

  • valid_rows: Number of valid rows.

  • invalid_rows: Number of invalid rows.

Parameters:
  • context (ETLContext) – etl_lib.core.ETLContext.ETLContext instance.

  • task (Task) – etl_lib.core.Task.Task instance owning this batchProcessor.

  • predecessor – BatchProcessor which get_batch() function will be called to receive batches to process.

  • model (Type[BaseModel]) – Pydantic model class used to validate each row in the batch.

  • error_file (Path) – Path to the file that will receive each row that did not pass validation. Each row in this file will contain the original data together with all validation errors for this row.

get_batch(max_batch__size)[source]

Provides a batch of data to the caller.

The batch itself could be called and processed from the provided predecessor or generated from other sources.

Parameters:

max_batch__size (int) – The max size of the batch the caller expects to receive.

Return type:

Generator[BatchResults, None, None]

Returns

A generator that yields batches.