etl_lib.task.data_loading.ParallelSQLLoad2Neo4jTask module

class ParallelSQLLoad2Neo4jTask(context, batch_size=5000, table_size=10, max_workers=None, prefetch=4)[source]

Bases: Task, ABC

Parallelized version of SQLLoad2Neo4jTask: reads via SQLBatchSource, splits into non-overlapping partitions (grid), processes each partition in parallel through a CypherBatchSink, and closes the loop.

Subclasses must implement:
  • _sql_query()

  • _cypher_query()

  • optionally override _count_query() and _id_extractor().

Control parameters:

batch_size: max items per partition batch table_size: dimension of the splitting grid max_workers: parallel threads per partition group (defaults to table_size) prefetch: number of partition-groups to prefetch

Parameters:
__init__(context, batch_size=5000, table_size=10, max_workers=None, prefetch=4)[source]

Construct a Task object.

Parameters:
run_internal()[source]

Place to provide the logic to be performed.

This base class provides all the housekeeping and reporting, so that implementation must/should not need to care about them. Exceptions should not be captured by implementations. They are handled by this base class.

Parameters:

kwargs – will be passed to run_internal

Return type:

TaskReturn

Returns:

An instance of TaskReturn.