Configuration
All parts of an ETL pipeline have access to the ETLContext class and can retrieve configuration parameters via env().
This configuration is backed by a dictionary passed to the context’s constructor. The following code demonstrates how to use environment variables to populate this dictionary:
context = ETLContext(env_vars=dict(os.environ))
Using environment variables makes it easy to configure the ETL pipeline externally.
The following parameters are currently recognized:
Name |
Domain |
Description |
|---|---|---|
|
Neo4j Connection |
Connection URL, such as |
|
Neo4j Connection |
Database user that the ETL pipeline will use |
|
Neo4j Connection |
Password for the specified database user |
|
Neo4j Connection |
Name of the database to use during the ETL pipeline |
|
Reporting |
Name of the database to store ETL metadata.
See Neo4j Reporter for more details. If not provided,
reporting will be done only to the console or a log file.
|
|
Validation |
Directory where error files should be created.
See Validation for more details. If not provided, error files will be placed into the same directory as the input files.
|
|
Testing |
Docker image name to use for testing, esp.:
neo4j:5.26.1-enterprise.See Testing for more details. If provided, TestContainers
will be used with the image name provided.
|
|
Testing |
Name of the Neo4j database to use during integration testing.
Only considered if
NEO4J_TEST_CONTAINER is not given.Allows to run integration tests against an external Neo4j installation
without impacting other DBs.
|
Neo4j Driver Configuration
You can pass configuration options directly to the Neo4j Python Driver by using environment variables prefixed with NEO4J_DRIVER_.
The prefix is stripped, the name is lowercased, and the value is parsed to the appropriate type before being passed to the driver constructor.
See the Neo4j Python Driver API documentation for a complete list of valid options and their meanings.
Name |
Type |
Description |
|---|---|---|
|
int |
The maximum total number of connections allowed, per host, to be managed by the connection pool. |
|
float |
The maximum amount of time in seconds to wait for a TCP connection to be established. |
|
float |
The maximum amount of time in seconds to wait for a connection to become available from the pool. |
|
float |
The maximum amount of time in seconds to wait for a TCP write operation to complete. |
|
float |
The maximum time in seconds a pooled connection can remain open before being closed. |
|
float |
The maximum amount of time in seconds that a managed transaction will retry before failing. |
|
float |
The maximum amount of time in seconds to wait for a connection liveness check. |
|
bool |
Specify whether TCP keep-alive should be enabled. |
|
bool |
Specify whether to use an encrypted connection between the driver and server. |
|
str |
Specify the client agent name. |
|
str |
Set the minimum severity for notifications the server should send to the client (e.g., |
|
list |
A comma-separated list of notification categories to disable (e.g., |
|
list |
A comma-separated list of notification classifications to disable. |
|
str |
The severity level at which notifications should be logged as warnings by the driver. |
|
bool |
Specify whether to disable sending anonymous telemetry data to the server. |