A special DAG that can insert connector nodes into a normal dag based on some rules.
A connector node is a boundary at which the dag can be split into smaller dags.
A connector basically translates to a local dataset in between mapreduce jobs in the final workflow.
Insert connector nodes into the dag.
A connector node is a boundary at which the pipeline can be split into sub dags.
It is treated as a sink within one subdag and as a source in another subdag.
A connector is inserted in front of a reduce node (aggregator plugin type, etc)
when there is a path from some source to one or more reduce nodes or sinks.
This is required because in a single mapper, we can't write to both a sink and do a reduce.
We also can't have 2 reducers in a single mapreduce job.
A connector is also inserted in front of any node if the inputs into the node come from multiple sources.
A connector is also inserted in front of a reduce node that has another reduce node as its input.
After splitting, the result will be a collection of subdags, with each subdag representing a single
mapreduce job (or possibly map-only job). Or in spark, each subdag would be a series of operations from
one rdd to another rdd.
the nodes that had connectors inserted in front of them