Contents:
Introduction:A Data-Flow-Graph (DFG) describes the tasks and inherent data-dependencies of applications or processes; Examples include: manufactoring processes, job-scheduling, and parallel software applications. The SCHEDULER utility accepts DFG files and after partitioning, allocating, and scheduling the flow-graph nodes, produces corresponding job-sequences for each resource. In computer systems, these can be used as software-programs for each of the targeted processor elements (PEs).The SCHEDULER tool can perform the partitioning/mapping/scheduling operation automatically, or you can assist by specifying (all or some of) the mappings and the tool will do the tedious work of generating the individual job-sequences (or PE-programs) and calculating the indices to move job-tokens (or for sending+receiving data items), etc.. As initially configured, the SCHEDULER utility produces time-line plots and pseudo-code that is directly interpretable by CSIM PE models. Once a satisfactory mapping has been determined, the pseudo-code can be conveniently expanded to compilable C-source code for specific processor families. The input to the SCHEDULER is an XML DFG text file, which is often drawn with CSIM's GUI. The format of the DFG input file is published in Appendix-A that is generated by CSIM's GUI, or other tools. Graph Syntax:A DFG consists of nodes and arcs. The nodes represent computational tasks, while the arcs represent data dependencies, buffering, and the direction of data transfer between task-nodes. A task-node cannot execute until sufficient data exists on its input arcs. When the data-amount threshold has been reached on each of its input arcs, a node can fire. When a node fires, it consumes data from its input arcs, executes the task, and it produces data onto its output arcs. Therefore, each arc is characterized by its produce, threshold, and consume amounts. Figure 1 shows an example of a very simple DFG.
Figure 1 - Example of a simple Data Flow Graph (DFG). This graph syntax, and this over-all method for developing parallel algorithm software and systems, is based on well developed and documented principles formally called computational flow graphs that are closely related to Petri net theory. See [1] for the formal background. DFGs can be hierarchical. A hierarchical DFG contains nodes that are subgraphs. The subgraphs are drawn on separate diagrams. The top-level graph must have one START node, and one EXIT node. Nodes with instance names beginning as GEN are special in that they produce periodic data at a specific interval; also called a monotonic interval. You may have multiple GEN nodes in a DFG, but each name must be unique. Any node in which the first three letters are GEN will be considered a GEN node. For instance, GEN1, GENa, GEN2, etc. are all GEN nodes. Because the START, GEN, and EXIT nodes are often considered special control nodes, they are often not mapped to any physical processor element. You can do this by specifying their type as NO_PE. Actually, any node can be set as a NO_PE type node. Such nodes do not have to wait for any PEs to become available, and their input and output arcs do not imply nor generate data-transfers, since their existence is only logical (conceptual) there is no place to transfer data to-or-from. However, their effect relative to the DFG is taken into account as far as triggering their dependent nodes accordingly.
Node Attributes:Nodes are characterized by five attributes:
Three special type options provide additional control as to how a task is mapped to processor elements. These options are:
By convention, the Compute Time is often specified in micro-seconds (uSec). It indicates how long a processor is busy computing the task. It also determines the time between when the output data is produced onto the node's output arcs from when the task began to execute. Time is a floating- point quantity. Some tasks do not produce their output data in one batch at the completion of a firing. Rather, they produce data in several batches during their execution. To describe this behavior, and many others as well, nodes have an iteration parameter. The Iterations parameter determines how many times a node will execute and produce data for one firing. Here a firing means the satisfaction of the node's input thresholds and consumption of one batch of input data. Normally, the iteration parameter is one (1). The iteration value is an integer. The time between successive output iterations is the compute time of the node. For example, if a task executes for 100-mS and produces 10 batches of data in this time at equally spaced intervals, then the task would be described as a DFG node having 10-iterations, and a compute-time of 10-mS. Once a node fires, it is assigned to a specific processor element and all iterations of the task occur on that processor. The final optional parameter of a DFG node is the Mapping Assignment. This is an optional parameter. The mapping parameter designates the specific processor element a task will execute on. A list of processor assignments can be specified. A list is equivalent to specifying that each instance of the task is to execute on the respective sequence of processor elements. For example, the first instance of the task is to execute on the first listed processor, the second on the second, etc.. If the list is exhausted, the next instance is assigned to the first processor on the list in a round-robin fashion.
Additional Optional Node Attributes:Aside from the traditional required parameters described above, an additional optional attribute is available:
Arc Attributes:An arc is a directional connection between two nodes. It represents a dependency or flow of data between tasks. The end of the arc connected to the task-node which supplies the data is called the source. The end to which data flows is called the destination. The nodes connected by an arc are referred to as the source and destination nodes of the arc, respectively.An arc implies a potential buffer, queue, or temporary storage of data between task-operations. We speak of the amount of data held by an arc as the data in the arc's associated buffer. An arc is characterized by its produce, threshold, and consume amounts. These are parameters that influence how a graph executes. The amount of data held by the arc's buffer varies dynamically as the graph executes. Arcs are characterized by 5 attributes.
An arc's initial-amount is normally zero. This suffices for feed-through DFG graphs, but causes graphs containing feed-back to dead-lock because the node receiving a feed-back-arc can never fire the first time. The initial-amount can be set to a non-zero integer quantity. This is sometimes called, priming-the-pump. Non-zero initial-amounts are rarely be used. Most users can ignore it because it automatically defaults to zero. However, it is available when needed. There can be several arcs connected to a given node. To distinguish the data carried by each arc coming into -and going out of- a node, a name can be given to the destination and source of each arc, respectively. The source and destination names are not relevant on a flat graph for performance modeling because no data actually flows. However they become relevant for functional flow-graph modeling, target code-generation, and for connection of internal-to-external arcs on any kind of hierarchical graphs. For target code-generation, the source and destination names become the subroutine parameter names used in the generated code.
Additional Optional Arc Attributes:Aside from the traditional required parameters described above, some additional optional attributes are available, as follows:
Macros and Graph-Variables:You can define frequently referenced values, expressions, or changeable parameters as macros that are defined in a common place Macros can be defined and modified under the EDIT / MACRO menu of the GUI editor. To define a macro, use the macro name you wish to define, followed by an equals sign, and then the value or expression you wish to define it as. Macro definitions can contain other macros and will be evaluated accordingly. Expressions can contain parenthesis, addition, subtraction, multiplication, and division on integers or reals. Macros can also be character strings. For example:
mSec = * 1.0e3 Sec = * 1.0e6 Minutes = * 60.0 Sec FFT_Delay = 0.68 mSec FFT_Size = 4096 FIR_Node = FFT_Delay MATMUL = FFT_DELAY * 40.7 + 2.3 PE2 = Board1Mcm0Pe0These macros can then be used anywhere in the DFG. Their values will be replaced prior to evaluating the DFG. Arbitrary arithmetic expressions will be evaluated for macros. Several basic functions are available for use within expressions.
Note that macros are not data-variables in the normal sense of variables in program code. Although in many respects they are similar, except that the macros are expanded and evaluated in each instance. (Deferred evaluation.) For example: macro a = 5 macro b = 2 * a print a, b a = 5, b = 2*a = 10 macro a = 7 print a, b a = 7, b = 2*a = 14This could result in infinite recursion for self-inclusive macros. A related class, called variable, assigns the current evaluated value to the object name. (Immediate evaluation.) For example: macro a = 5 variable b = 2 * a print a, b a = 5, b = 10 macro a = 7 print a, b a = 7, b = 10The variable type is especially valuable when using random numbers. For example, often several values depend on one random quantity. This would not be possible to express with macros, because each reference would be a new call to the random generator. Instead, one random number can be generated and saved as a variable and the same value can be referenced multiple times. For example: Using a variable to store the RANDOM value: variable message_size = ROUND( 1000 * RANDOM ) macro Bandwidth = 100.0 macro xfer_delay = message_size / Bandwidth print message_size, xfer_delay message_size = 375 xfer_delay = 3.75 print message_size, xfer_delay message_size = 375 xfer_delay = 3.75 But with only macros: macro message_size = ROUND( 1000 * RANDOM ) macro Bandwidth = 100.0 macro xfer_delay = message_size / Bandwidth print message_size, xfer_delay message_size = 375 xfer_delay = 5.21 print message_size, xfer_delay message_size = 863 xfer_delay = 1.09 (RANDOM was re-evaluated on each reference of the macro.) Usage:The SCHEDULER utility expects the DFG source file to be the first command-line argument. If no arguments are specified, it will prompt you.The SCHEDULER accepts, as the optional second command-line argument, the netinfo file produced by the net_extract command of the CSIM simulator. This ensures consistency with the name- to -logical-ID numbers used by the ROUTER and the simulation models themselves. Example: scheduler radar.dfg netinfoIf no netinfo file is on the command-line, then the SCHEDULER knows nothing about the architecture it is mapping to, nor what processors are available. In this situation, the SCHEDULER will ask for the number of processors the DFG should be mapped to. In either case, the SCHEDULER will then perform the mapping by either using the user-supplied assignments or a set of heuristics that attempt to balance the processing load equally among processors, to minimize the processing latency, and to minimize inter-PE communications traffic by making maximum use of high data locality (I.E. automatic mapping). In general, the automatic mode is less optimal than manually optimized mappings. However, the automatic mode is useful for quick first-cut attempts and it will be improved over time. During scheduling, you will see the list of assignments being made as the DFG(s) executes.
Rate Monotonic Scheduling (RMS):To use Rate Monotonic Scheduling (RMS) with the CSIM Scheduler, the -stim scheduler option is to be used with properly prepared DFG files. The files may be generated automatically with the dfg_gen utility from *.txt files generated by spread-sheets. Or, it may be generated manually with the CSIM DFG GUI. The scheduler generates *.prog files that are intended to be used with the multi_priority_pe. Each task block that runs at an independent monotonic rate is to be graphed as a separate data flow graph. A 'stim_file' lists all the graphs to be used.
Each graph that represents a separate task block must have START and EXIT
nodes specified. If the graph is to be scheduled at a monotonic rate, a graph
attribute or macro is to define it as RMS = 1 or true. The rate at which
the graph is to be repeated is defined by a GEN node which follows the START
node. The loop may be terminated by either a time limit or a loop count. The scheduler generates the *.prog files. If a looping attribute is used, the task blocks are unwrapped in the *.prog files until the looptime or loopcount is reached. Unique message ID's are used for each iteration. If no loop attribute is used, the graph is executed only once.
To schedule the files for RMS scheduling, invoke: Note the format of the *.prog files generated. Each task begins with TASK and ends with END_OF_TASK. The first statement after TASK is a monotonic statement which specifies the loop time. The last statement in each *.prog file is a 'progmdone' line. Global Attributes for RMS:
Outputs:The SCHEDULER produces three (3) types of output files:
The time-line plots are considered ideal because the static SCHEDULER bases its estimates of task completion and communication times on a simplistic ideal model. The ideal time-lines are useful to compare with the actual time-lines produced by detailed architecture simulations. The comparison highlights the effects of architectural issues on the given mapping.
The ideal time-line can be viewed with XGRAPH
by selecting Tools / Plot Ideal Timeline from the GUI menu. xgraph IdealTline.datThe resulting graph shows the projected activities of the processors versus time. It is usually helpful to overlay the inter-processor communication dependencies underneath the activity time-line. This produces a very complete picture of the data-flow-graph's execution history. You can do this by placing both plot-files on the XGRAPH command-line, as in: xgraph IdealSpiderweb.dat IdealTline.dat(Placing the activity time-line second, ensures that the horizontal activity-bars are drawn over-top the diagonal communication traces. This gives the activity-bars priority which prevents their obscuration.) Each horizontal processor activity bar-segment on the time-line plot corresponds to the execution of a DFG-node-task on the given processor. The SCHEDULER uses a colorization function to assist in distinguishing the task-segments. Tasks are color-coded by the last digit of the function-names: (or, last letter) 0 = red (b,l,v,D,N,X) 1 = blue, (c,m,w,E,O,Y) 2 = green, (d,n,x,F,P,Z) 3 = violet, (e,o,y,G,Q) 4 = orange, (f,p,z,H,R) 5 = yellow, (g,q,I,S) 6 = pink, (h,r,J,T) 7 = cyan, (i,s,A,K,U) 8 = light-gray, (j,t,B,L,V) 9 = dark-gray. (a,k,u,C,M,W)Being somewhat arbitrary, the color-table usually produces a rather random colorization which helps distinguish tasks. Knowing this color-map also helps to select or modify your task-names to control your desired color visualizations. Basing it on the last character of task-names is often helpful, since many mappings place multiple iterations of a similar task on a given node. This enables distinguishing otherwise similarly-named tasks, such as with: xyz_1, xyz_2, xyz_3, etc..
Command-Line Arguments
Appendix A - DFG File Format Appendix C - Dynamic Scheduler Appendix D - Special Attribute Options - Reset_After_Firing, Multicast Node, Multiple Timers, Hard EXIT, etc.. Appendix E - Mapping Hints - Useful things to know about mapping tasks to processors. Appendix F - DFG Animation - How to view DFG execution.
References:[1] Peterson, J., Petri Net Theory and the Modeling of Systems, Prentice-Hall, Englewood Cliffs, NJ, pp. 214-221, 1981.[2] Siework, Bell, Newel, Computer Structures Principles and Examples, McGraw Hill Computer Science Series, New York, 1982. (Questions, Comments, & Suggestions: admin@csim.com) |