dplutils.pipeline.PipelineExecutor

dplutils.pipeline.PipelineExecutor#

class dplutils.pipeline.PipelineExecutor(graph: PipelineGraph)[source]#

Base class for pipeline executors.

This class defines the interface for the execution of a graph of PipelineTask objects.

Subclasses must override the execute method, which is called by run to execute the pipeline and return and generator of dataframes of the final tasks in the graph.

__init__(graph: PipelineGraph)[source]#

Methods

__init__(graph)

execute()

Execute the task graph in batches.

from_graph(graph)

run()

Validate and run the pipeline.

set_config([coord, value, from_yaml])

Set task configuration options for this instance.

set_config_from_dict(config)

set_context(key, value)

validate()

writeto(outdir[, partition_by_task, ...])

Run pipeline, writing results to parquet table.

Attributes

run_id

tasks_idx