etl_lib.task.data_loading.ParallelSQLLoad2Neo4jTask module
- class ParallelSQLLoad2Neo4jTask(context, batch_size=5000, table_size=10, max_workers=None, prefetch=4)[source]
-
Parallelized version of SQLLoad2Neo4jTask: reads via SQLBatchSource, splits into non-overlapping partitions (grid), processes each partition in parallel through a CypherBatchSink, and closes the loop.
- Subclasses must implement:
_sql_query()
_cypher_query()
optionally override _count_query() and _id_extractor().
- Control parameters:
batch_size: max items per partition batch table_size: dimension of the splitting grid max_workers: parallel threads per partition group (defaults to table_size) prefetch: number of partition-groups to prefetch
- Parameters:
context (ETLContext)
batch_size (int)
table_size (int)
max_workers (int)
prefetch (int)
- __init__(context, batch_size=5000, table_size=10, max_workers=None, prefetch=4)[source]
Construct a Task object.
- Parameters:
context (
ETLContext) –ETLContextinstance. Will be available to subclasses.batch_size (int)
table_size (int)
max_workers (int)
prefetch (int)
- run_internal()[source]
Place to provide the logic to be performed.
This base class provides all the housekeeping and reporting, so that implementation must/should not need to care about them. Exceptions should not be captured by implementations. They are handled by this base class.
- Parameters:
kwargs – will be passed to run_internal
- Return type:
- Returns:
An instance of
TaskReturn.