Technical Details about Parallel Mapping: ----------------------------------------- First, we define some clean terminology conventions to aid in thinking about this topic. At each stage of a parallel flow-graph, there are a couple points of view, which of course, could add confusion unless we have good definitions. To distribute any node, such as Stg1, it is done by: on the input arcs of the stage to be distributed, multiplying the produce-amnt by the number of parallel branches. Example: Input arc(s) of Stg1 Node: P = k * Nstg1 T = k C = k This gives rise to Nstg1 parallel copies of Stg1. We'll call this "fan-out" from the prior stage, and this is easily accomplished with basic capabilities. It was fairly simple. Now the next challenge is gathering the input from all "copies" of Stg1. Each node of Stg2 must get a distinct arc from each node of Stg1. We'll call this "fan-in". Does not matter how many Stg2 branches there are (that's the "fan-out" aspect). So even if there is only one Stg2 branch, it must get one arc from each "copy" of stage-1. That is, a Stage-2 node cannot fire until it gets input from all Stage-1 nodes, and it won't misfire by getting more from any one the Stage-1 branch (ex. from another wave of data). Therefore, the output arc(s) of Stage-1 must be replicated, so there are Nstg1 of them, for separate tracking of their outputs. The output arc (copies) of Stg1 Node are balanced: P = m * Nstg2 T = m C = m That is, a given arc gets the full amount from stage-1. This basically says, when any Stg1 node produces, it frees all Stg2 nodes of waiting for input from it, but they may need input from the other Stg1 branch-arc-copies. In operation, each firing of a Stg1 node produces on only one of the output arc copies (ie. serial production rule). This becomes a state-variable of the node, and is conveniently accessed/controlled in the Scheduler code. This convention enables us to talk cleaning about the distribution of exactly one stage, independent of the distribution of any other stage. Ultimately, it allows users to control distribution of a stage from a single point, - the node being distributed, (plus the mapping of that node too, of course). This should also work through module boundaries by virtue of the fact that all modules (and bundles) become flattened prior to execution. So all this should hold true. This probably restricts the distributed node to only have replicated output arcs (no mixture of single arcs). But what would the alternative be? - Only some nodes conditionally producing output? So that's how it works.