dplutils.pipeline.stream.StreamingGraphExecutor.task_submit

dplutils.pipeline.stream.StreamingGraphExecutor.task_submit#

abstractmethod StreamingGraphExecutor.task_submit(task: PipelineTask, df_list: list[DataFrame]) Any[source]#

Run or arrange for the running of task

Implementations must override this method and arrange for the function of task to be called on a dataframe made from the concatenation of df_list. The return value will be maintained in a pending queue, and both task_resolve_output and is_task_ready will take these as input, but will otherwise not be inspected. Typically the return value would be a handle to the remote result or a future, or equivalent.

Note

PipelineTask expects a single DataFrame as input, while this function receives a batch of such. It MUST concatenate these into a single DataFrame prior to execution (e.g. with pd.concat(df_list)). This is not done in the driver code as the dataframes in df_list may not be local.