A DAG (directed acyclic graph) where edges represent a happens-before relationship.
In these types of scenarios, certain edges may be redundant and can be removed.
This simplifies the dag into something that is much easier to work with if it needs
to be used as a fork-join type of dag for workflow execution.
Fields inherited from class co.cask.cdap.etl.planner.Dag
Flattens the control dag to remove connections between branches of different forks, which would
make the dag unusable in pure fork-join workflows.
For example the following dag is not a fork-join dag:
|--> n2 -------|
| |--> n5
|--> n3 -------|
|--> n4 --> n6
There are many ways to turn this into a fork-join while still respecting all happens-before relationships,
but for simplicity we'll use an algorithm that doesn't have any nested forks and will turn the above into:
|--> n2 --|
| | |--> n5 --|
n1 --|--> n3 --|--> n2.n3.n4 --| |--> n5.n6
| | |--> n6 --|
|--> n4 --|
The algorithm is to insert a join node whenever it sees a fork. Every time there is a fork, we will follow
each branch to its endpoint (a node that forks, merges, or is a sink), then insert a join node that each
branch endpoint connects to.
public int trim()
Trims any redundant control connections.
n1 ------> n2
has a redundant edge n1 -> n3, because the edge from n2 -> n3 already enforces n1 -> n3.
The approach is look at each node (call it nodeB). For each input into nodeB (call it nodeA),
if there is another path from nodeA to nodeB besides the direct edge, we can remove the edge nodeA -> nodeB.