etl_lib.core.ValidationBatchProcessor module
- class ValidationBatchProcessor(context, task, predecessor, model, error_file)[source]
Bases:
BatchProcessorBatch processor for validation, using Pydantic.
- Parameters:
context (ETLContext)
task (Task)
error_file (Path)
- __init__(context, task, predecessor, model, error_file)[source]
Constructs a new ValidationBatchProcessor.
The
etl_lib.core.BatchProcessor.BatchResultsreturned from theget_batch()of this implementation will contain the following additional entries:valid_rows: Number of valid rows.
invalid_rows: Number of invalid rows.
- Parameters:
context (
ETLContext) –etl_lib.core.ETLContext.ETLContextinstance.task (
Task) –etl_lib.core.Task.Taskinstance owning this batchProcessor.predecessor – BatchProcessor which
get_batch()function will be called to receive batches to process.model (
Type[BaseModel]) – Pydantic model class used to validate each row in the batch.error_file (
Path) – Path to the file that will receive each row that did not pass validation. Each row in this file will contain the original data together with all validation errors for this row.
- get_batch(max_batch__size)[source]
Provides a batch of data to the caller.
The batch itself could be called and processed from the provided predecessor or generated from other sources.
- Parameters:
max_batch__size (
int) – The max size of the batch the caller expects to receive.- Return type:
- Returns
A generator that yields batches.