etl_lib.core.ValidationBatchProcessor module
- class ValidationBatchProcessor(context, task, predecessor, model, error_file)[source]
Bases:
BatchProcessorBatch processor for validation, using Pydantic.
- Parameters:
context (ETLContext)
task (Task)
error_file (Path | None)
- __init__(context, task, predecessor, model, error_file)[source]
Constructs a new ValidationBatchProcessor.
The
etl_lib.core.BatchProcessor.BatchResultsreturned from theget_batch()of this implementation will contain the following additional entries:valid_rows: Number of valid rows.
invalid_rows: Number of invalid rows.
- Parameters:
context (
ETLContext) –etl_lib.core.ETLContext.ETLContextinstance.task (
Task) –etl_lib.core.Task.Taskinstance owning this batchProcessor.predecessor – BatchProcessor which
get_batch()function will be called to receive batches to process.model (
Optional[Type[BaseModel]]) – Pydantic model class used to validate each row in the batch. Optional.error_file (
Path|None) – Path to the file that will receive each row that did not pass validation. Required if model is provided.
- get_batch(max_batch_size)[source]
Provides a batch of data to the caller.
The batch itself could be called and processed from the provided predecessor or generated from other sources.
- Parameters:
max_batch_size (
int) – The max size of the batch the caller expects to receive.- Return type:
- Returns
A generator that yields batches.