DYNAMIC SCHEDULER-2.0 Documentation

Feb 16, 2004

Contents

  1. Introduction/Overview
  2. New DFG Node Types
  3. New Macro and Expression Capabilities
  4. Creating New Task-Node Type Definitions
  5. File locations
  6. Pathological DFG Cases


1. Introduction/Overview

The SCHEDULER_2.0 is a refinement to the original static SCHEDULER_1. It processes Data-Flow-Graphs (DFGs) with enhanced capabilities and enables dynamic operation. A DFG describes the inherent data dependencies and task-relationships of an application. Typically a DFG describes a software application. It may contain mapping information to assign task-nodes to specific hardware resources. A DFG is distinct from a hardware architecture graph.

While the SCHEDULER_2.0 introduces several new capabilities, it still complies to the standard interfaces and Computational Flow-Graph (CFG) paradigm based on Petri-networks. The DFG-GUI is still used to capture and edit DFG graphs.

The SCHEDULER_2.0 now implements a task-node model that is user-definable and is analogous to the device models for the hardware domain. Many of the same capabilities and conventions for describing hardware-devices are now available to describe DFG task-nodes. Some features for describing task-nodes are necessarily different, reflecting the distinct nature of DFG-task-nodes from that of hardware devices.

With this new capability, several new node-types have already been added. For instance: the All-Or-Nothing, and Get-Any-In-Group node-types help to implement new and different mapping/allocation policies.

This document describes the new features. The standard features are covered in the basic scheduler document.



2. New DFG Node Types

To new node-types have been defined:
  1. All-Or-Nothing (AON)
  2. Get-Any-In-Group (GetAny)
They are described in the sections below.


2.1 All-Or-Nothing (AON) Node-Type

The AON node-type is a special kind of task-node. Any node having the letters AON in its type-name is considered an AON type of node.

The basic purpose of an AON node is to fire a group of nodes (the AON-node's descendents) if-and-only-if all nodes in the group are ready to-, and can-, execute. If any node in the descendent group cannot execute because a resource is un-available at the time the AON-node completes its task, then the AON will not produce, and none of the descendent nodes will be fired.

Specifically: When an AON node fires, it will produce data on its output arcs only if:

	1. Its descendent nodes have not reached their input-arc thresholds
	   and therefore are not ready to fire.
Or,
	2. Descendent nodes are ready to fire, and resources (PEs) are
	   available for them.
In the case of (2) above, the resources are assigned immediately.

If the descendent nodes would be ready to fire once the AON produced, but resources for the descendent nodes are not available (at the production-time), then the AON node will produce no data on its output arcs. In such a case, the firing of the AON node will, in effect, be forgotten, as if it never happened.

Limitations: The output-arcs of an AON node should all have the same produce/threshold/consume ratios. Operation with mixed-ratios is undefined.



2.2 Get-Any-In-Group (GetAny) Node-Type

The GetAny node-type specifies a different resource allocation style for mapping-lists than the default explicit sequence. With the normal explicit sequence mapping, the task-node must execute on the devices listed in the node's mapping-list, in the order they are specified. This means that the task may wait for the next device in the list to be available, while other devices in the list are free.

The GetAny node-type works differently, in that instead of enforcing a mapping sequence, it will map to any device in its mapping-list that is free. If none are free, then it will map to the first device in its list that becomes free.

Any node having the letters GetAny in its type-name is considered a GetAny type of node.






3. New Macro and Expression Capabilities

SCHEDULER_2.0 has an expanded macro and expression evaluation capability over previous versions of the scheduler. The following new functions have been added: More functions can now be easily added.

A new operator has been added to the set of previously available operators (+,-,*,/). You can now use the % percent sign to indicate the modulus operation. Modulus, (x % y), is the integer remainder if x is divided by y.

Note that macros are not data-variables in the normal sense of variables in program code. Although in many respects they are similar, except that the macros are expanded and evaluated in each instance. (Deferred evaluation.)

For example:

	macro a = 5
	macro b = 2 * a
					print a, b
						a = 5, 
						b = 2*a = 10

	macro a = 7
					print a, b
						a = 7,
						b = 2*a = 14
This could result in infinite recursion for self-inclusive macros.

A new class called variable has been added that assigns the current evaluated value to the object name. (Immediate evaluation.)

For example:

        macro a = 5
        variable b = 2 * a
					print a, b
						a = 5, 
						b = 10

        macro a = 7
					print a, b
						a = 7,
						b = 10

The variable type is especially valuable when using random numbers. For example, often several values depend on one random quantity. This would not be possible to express with macros, because each reference would be a new call to the random generator. Instead, one random number can be generated and saved as a variable and the same value can be referenced multiple times.

For example:

Using a variable to store the RANDOM value:

	variable message_size = ROUND( 1000 * RANDOM )
	macro Bandwidth = 100.0
	macro xfer_delay = message_size / Bandwidth

					print message_size, xfer_delay
						message_size = 375
						xfer_delay = 3.75

					print message_size, xfer_delay
						message_size = 375
						xfer_delay = 3.75

But with only macros:

	macro message_size = ROUND( 1000 * RANDOM )
	macro Bandwidth = 100.0
	macro xfer_delay = message_size / Bandwidth

					print message_size, xfer_delay
						message_size = 375
						xfer_delay = 5.21

					print message_size, xfer_delay
						message_size = 863
						xfer_delay = 1.09

       (RANDOM was re-evaluated on each reference of the macro.) 







4. Creating New Task-Node Type Definitions

DFG task nodes are now described in user-editable model-code that is very similar to the way the hardware models are described. The sections below describe the standard node-definitions, from which new variant node-types can be created. Subsequent sections show how to create the custom node-types.


4.1 Standard Task-Node Definition (Base-Class)

A standard set of task-node types are provided. These provide the classic node-types, and they form the base-class from which new types of DFG nodes can be evolved. This section describes the standard base-class node model.

The basic node model contains two parts, which are implemented as threads:

  1. TaskNode - For each task-node in a graph, there is exactly one TaskNode thread. It stays in existence throughout the life of the graph.

  2. FireNode - For each instance of a given task-node that is executing on a PE in the system, there is exactly one FireNode thread. It comes into existence when the task is assigned and begins to execute on the PE. It goes out of existence when the task completes and after the output data is produced. At any instant, there may be none, or several, FireNode threads belonging to a task-node of the DFG.

Figure 1 shows the organization of the dynamic SCHEDULER and its interaction with the hardware models. The SCHEDULER is implemented as a single entity that contains many threads. The threads are of the TaskNode and FireNode variety.


Figure 1 - Dynamic SCHEDULER Model

Most of the usual language-features are available as they are in hardware models, such as DELAY(), printf, or anything that can be done in ANSI-C.

Additionally, several new functions specific to DFGs are available:

The RECEIVE / SEND functions are NOT available for DFGs, because these are specific to transferring data through hardware ports. Instead, the analogous Block_for and Produce are provided for DFGs.

The previous figure showed the feed-forward portion of the control process with colored arrows. Figure 2, below, shows the complete process by adding the feed-back paths.


Figure 2 - Complete Dynamic SCHEDULER Model


Actual Base-Class Node Model:

The standard DFG task-node is modeled as two threads:
  1. TaskNode Thread Definition

    /*-------------------------------------------------------------*/
     /* TaskNode thread waits for node-firing condition, and        */
     /* dispenses a fire_node thread for each firing.               */
     /* Specifically:                                               */
     /*     1. Wait for sufficient data on my input arcs, and       */
     /*             consume it,                                     */
     /*     2. Wait for and/or get resource (PE) to execute on.     */
     /*     3. Trigger fire_node, while going back to step 1.       */
     /*                                                             */
     /* (TaskNode is always running:  Except while transitioning    */
     /*  between steps, it is waiting either on step 1 or 2.)       */
     /*-------------------------------------------------------------*/
     DEFINE_THREAD: TaskNode
      {
       int runflag=1;
       struct DFG_NODE *thisnode;
       struct fire_mapping *firemapped;
    
       thisnode = (struct DFG_NODE *)THREAD_VAR;
    
       while (runflag)
        {
         /* Wait until all my input-arcs have sufficient data. */
         Block_for( thisnode->input_arcs );
    
         /* Get a resource and pass to fire_node thread. */
         if (index(thisnode->type_name,"GetAny")==0)
          firemapped = Get_Any_Resource( thisnode );
         else
          firemapped = Get_Next_Resource( thisnode );
    
         if (index(thisnode->task,"GEN")==0)
          TRIGGER_THREAD( fire_gen_node, 0.0, (int)firemapped );   /* GEN-type node. */
         else
          TRIGGER_THREAD( fire_node, 0.0, (int)firemapped );   /* Normal-type node. */
    
         /* Make sure "START" node fires only once. */
         if (thisnode->input_arcs==0) runflag = 0;
        }
      }
     END_DEFINE_THREAD.
    
    
    
  2. FireNode Thread Definition

     /*-------------------------------------------------------------*/
     /* Fire_node thread takes care of:                             */
     /*     1. Delaying for the "compute-time",                     */
     /*     2. Producing data to the node's output arcs,            */
     /*     3. Iterating the firings "iterations" times,            */
     /*     4. Releasing the resource on completion,                */
     /*     5. Printing activity to screen for user viewing, and    */
     /*     6. Saving program instructions for recv, send, compute. */
     /*-------------------------------------------------------------*/
     DEFINE_THREAD: fire_node
      {
       /* Set-up local variables. */
       int iter;
       struct DFG_NODE *thisnode;
       struct fire_mapping *firemapped;
    
       firemapped = (struct fire_mapping *)THREAD_VAR;
       thisnode = firemapped->task_node;
    
       for (iter=0; iter<thisnode->iterations; iter++)
        { /*iter*/
          /* Call any optional user function here. */
    
          /* Print-out notable acitivity and record Time-Line data. */
          Report( firemapped, iter );
    
          /* Issue the corresponding Program-Commands. */
          Issue_Compute_Command( thisnode, firemapped->pe_assignment );
    
          /* Delay for the compute duration. */
    	/* This Delay is not used in the dynamic version, */
    	/* in lieu of PE-handshake in Issue_Compute_Command. */
          /* DELAY( thisnode->compute_time ); */
    
          /* Produce data onto this node's output arcs. */
          Produce( thisnode, firemapped->pe_assignment );
          if (verbose>0) printf("   Producing for '%s'\n", thisnode->task );
        } /*iter*/
    
       /* Release the Processor Resource. */
       free_pe( firemapped->pe_assignment );
       free( firemapped );
      }
     END_DEFINE_THREAD.
    
    





4.2 Custom (User-Definable) Task-Nodes

Because the task-node model is open user-editable code, new node-types can be defined. This is particularly valuable for creating nodes with execution rules that differ from the classic set. For example, a node which can fire when any one, or a subset, of its input arcs have reached their thresholds, instead of waiting for all. Or, a node-type can be defined which produces data on only one, or a subset, of its output arcs when it fires, in response to the evaluation of data or a random function. The latter case is useful for creating branch, valve, or control nodes.

How do you make your new Task-Node types known to the GUI? So that user's can quickly know the types of new DFG node types you provide, the correct spelling of their type-names, and be able to quickly select them, you can add stub-definitions in the following form to a DFG file.

	DEFINE_NODE_TYPE: MyNewCustomNode_1
	 PORT_LIST( out );
	 DEFAULT_ATTRIBUTES( firing_rate = 1.0 );
	END_DEFINE_NODE_TYPE.
Notice that you can provide the default attributes of your custom node types this way too. The format is similar to the definition of a hardware device, except for the keyword "NODE".

This section describes how such custom nodes are created

You can define a custom task-node by modifying the TaskNode or FireNode threads of the standard SCHEDULER model. Place if-branches, around lines of your custom code, that are sensitive to the type-name of your custom node type.

For example:

	if ( strcmp( thisnode->type_name, "YourCustomNodeName" )==0 )
	 {
		/*Your custom code. */
	 }

Where YourCustomNodeName is your custom node-type name.

Then set the type-names of any of your DFG task-nodes to the type-names that you have provided for. You can do this conveniently using the GUI.

Next, build and run the simulation. You may see warning messages about any syntactical errors in your custom code during compilation, or logical errors during the simulation. You would then iteratively fix any such errors.





4.3 Conditional Branch Node Example

The following is an actual example of a custom branch node. In this example, we want to make a task-node with two output arcs. But unlike the conventional node that would produce data onto both output arcs after it fires, this node will randomly select and produce onto only one of them.

This forms a kind of branch in the flow-graph, because only one of two descendent branches will receive the stimulus-token. It changes the outcome of the successive execution stages of the graph.

We will call this new node-type: RandomBranch
We would set the type, of any nodes in our DFG that we wish to be of this type, to this name in the DFG-GUI.

To define this custom node-type, a few lines of code within the SCHEDULER's fire_node thread are modified. Specifically, the line:

	/* Produce data onto this node's output arcs. */
	Produce( thisnode, firemapped->pe_assignment, comp_no );
... is modified to:
	if (strcmp(thisnode->type_name,"RandomBranch")==0)
	 {
	  double ftmp;
	  char expression[] = "TRUNC( 1.9999 * RANDOM)";
	  char pname[3];

	  eval( expression, &ftmp );
	  sprintf(pname,"p%d", (int)ftmp );
	  Produce_on_port( thisnode, firemapped->pe_assignment, comp_no, pname );
	 }
	else
	  /* For regular nodes: */
	  /* Produce data onto this node's output arcs. */
	  Produce( thisnode, firemapped->pe_assignment, comp_no );
First the if-statement must branch regular nodes around the custom code. So we see the test on the node-type name and the standard Produce() call after the else (at the bottom).

The "core" of the custom code block consists of a few temporary variable declarations, and the use of the eval call to evaluate the random expression. (This could have been done by calling the C-subroutine random directly, but was done this way for illustrative purposes. It shows how DFG-macros can be evaluated too.) The result of the evaluation should be an integer 0 or 1. This is value passed into the standard sprintf function to make the port-name: p0 or p1 accordingly.

The randomly selected port-name is then passed to the Produce_on_port() function, which causes data to be produced only on the named port.

Many other custom nodes can be developed by writing custom code that calls the built-in routines at appropriate times.







4.4 Support Routines for Customizing Nodes

This section summarizes the available subroutine interface for building custom task-nodes. The available functions are: The functions in the list above are grouped according to their logical operation. They are ordered according to the sequence of their normal usage.

The Block_for routine is used to wait for sufficient data to accumulate on all the input arcs to a task-node. Upon satisfaction of all threshold amounts, it decrements the input arcs by the consume-amount and returns.

The Block_for_port routine is like the Block_for routine, except that it operates on the single input arc specified by the port-name. This routine is intended for forming graph-nodes that operate on a variant rule-set as compared to the standard computational-flow-graph paradigm.

The Get_Next_Resource routine checks to see if a PE resource is available to execute the task. It consults the mapping-list of the task node to see what PE, or group of PEs, the task is mapped to. If a PE is not available, this routine will sleep until the next appropriate PE is available. When the PE is available, it is assigned to this task, and the routine returns.

The Get_Any_Resource routine is like the Get_Next_Resource, except that it will allocate any available PE from the group specified in the task's mapping-list, irregardless of sequence. "First-available".

The Get_Descendant_Resources routine is a special variant of the Get_Next_Resource function for implementing AON nodes. It is called from an AON task-node when it completes its computation delay, just prior to producing data on its output arcs. This routine checks to see that all the descendent task nodes can be immediately assigned to available PEs. If so, then it assigns them and returns with +1, indicating successful descendant allocation and indicating that data should be produced on the output arcs. If not, then it returns with 0, indicating that no output should be produced on the output arcs of this node. (The descendant nodes are the task-nodes attached to the other-ends of the output arcs of the current task-node.)

The Issue_Xfer_Commands routine causes the sendmessg / recvmessg commands to be issued to the respective PEs from-which, and to-which, the transfers must originate, and be sent, respectively on all the input-arcs for a firing of the task-node.

The Issue_Xfer_Command_on_port routine is like the Issue_Xfer_Commands routine, except that it issues transfer commands for only the input-arc specified the port-name of the task-node. This routine is intended for forming graph-nodes that operate on a variant rule-set as compared to the standard computational-flow-graph paradigm.

The Issue_Compute_Command routine causes the compute-command to be issued to the assigned PE for the task-node at-hand.

The pull_data routine is used for passing actual data values across flow-graph arcs. This routine removes data from an input arc, placing it in a local variable. It is normally called after returning from a Block_for to ensure there are sufficient for data on the input-arcs. Additional discussion is provided here.

The push_data routine is used for passing actual data values across flow-graph arcs. This routine places data from a local variable onto the arc attached to the specified output port of a task-node. It is normally called after issuing the compute operation, but just prior to incrementing the data amounts (producing) on the output arcs, so that descendant task-nodes can pull the data from the arcs when they fire. Additional discussion is provided here.

The push_data_outall routine is like the push_data function, but it places the data item on all of a node's output arcs, instead of a specific arc. It is used for passing actual data values across flow-graph arcs. This function does not require a port-name. Additional discussion is provided here.

The free_pe routine de-allocates the PE assigned to the current task-node after completing the computation. The PE is then ready for re-assignment.

The Produce routine increments the data-amounts on all the output arcs of the task-node. The increment amounts are as-specified by the arc's produce-amount. If any descendent tasks that were waiting for sufficient data are now ready-to-run, they are returned from their Block_for calls.

The Produce_on_port routine is like the Produce routine except that it increments the current data-amount of only the arc attached to the specified port of the task-node. This routine is intended for forming graph-nodes that operate on a variant rule-set as compared to the standard computational-flow-graph paradigm. In particular, this function is useful for creating branch-node type of task-nodes. It enables only one (or a subset) of several output arcs to receive data.

Below is the formal definition of each of the functions.


 void Block_for( struct ARC *inlst_arc )
 void Block_for_port( struct ARC *inlst_arc, char *pname )

 int Get_Next_Resource( struct DFG_NODE *thisnode )
 int Get_Any_Resource( struct DFG_NODE *thisnode )
 int Get_Descendant_Resources( struct DFG_NODE *thisnode )

 void Issue_Xfer_Commands( struct DFG_NODE *tasknode, int map_pe, int comp_no )
 void Issue_Xfer_Command_on_port( struct DFG_NODE *tasknode, int map_pe, int comp_no, char *pname )

 void Issue_Compute_Command( struct DFG_NODE *tasknode, int map_pe,  int comp_no )

 void *pull_data( struct DFG_NODE *nodeptr, char *portname )
 void push_data( struct DFG_NODE *nodeptr, char *portname, void *data )
 void push_data_outall( struct DFG_NODE *nodeptr, void *data )
 void add_macro( char *macro_name, char *macro_stringvalue )
 int  eval( char *macro_string, double *return_data )

 void free_pe( int pe_id )

 void Produce( struct DFG_NODE *tasknode, int map_pe, int comp_no )
 void Produce_on_port( struct DFG_NODE *tasknode, int map_pe, int comp_no, char *pname )



4.6 Passing Live Parameters

Two functions enable parameters to be passed between user-customized flow-graph nodes via the arcs. These functions implement a data-queue on each arc. Normally, the pull_data function would be called from the front of a customized fire_node thread to pull (or consume) data from specific input arcs. The user may place arbitrary C-code operations on the data in the intervening portion of the fire_node thread. The push_data function would be called after the firing-delay of the customized fire_node thread to put data on specific output arcs.

The functions are described in detail below.


4.6.1 Pull_data Function

The pull_data function gets data from an input arc of a node. The function takes two arguments: It returns with a pointer to the data pulled from the input arc, if any. If no data was on the arc, then it returns zero. The formal definition of the pull_data functions is:

void *pull_data( struct DFG_NODE *thisnode, char *port_name )

Each call to pull_data pulls the next data item from the given arc-queue. It returns with a pointer to that item. Many data-items may be pulled from the queue during a given firing. If pull_data is called while the port-queue is empty, the returned data-pointer will be zero. It is good practice to test for the zero-pointer condition.


4.6.2 Push_data Function

The push_data function puts data onto an output arc of a node. The function takes three arguments: The formal definition of the push_data functions is:

push_data( struct DFG_NODE *thisnode, char *port_name, int *data_pointer )

Each call to push_data pushes another data item onto the given arc-queue. Many data-items may be pushed onto a queue during a given firing. All items are queued, so that if the descendent node does not execute while the producer fires several times, no data will be lost. The descendent will pull the data in the sequence it was pushed on the queue.


4.6.3 Push_data_outall Function

The push_data_outall function puts data onto all the output arcs of a node. This function is similar to the push_data function except that it puts the data item on all of a node's output arcs. Consequently, it does not require a port-name in the argument list. The function takes two arguments: The formal definition of the push_data_outall functions is:

push_data_outall( struct DFG_NODE *thisnode, int *data_pointer )

Each call to push_data_outall pushes another data item onto each output-arc-queue. Many data-items may be pushed onto the queues during a given firing. All items are queued, so that if descendent nodes do not execute while the producer fires several times, no data will be lost. The descendents will pull the data in the sequence it was pushed on the queue.


4.6.4 Accessing Macro Values/Expressions

You can access DFG-macro and variable-values by using the eval() function. The eval function accepts a macro or variable (or expression) as a character string as its first argument. It performs any macro expansions to fully expand the string. (The string is modified by this call, so it must be large enough to hold the expanded string.) Eval then performs any arithmetic evaluations that can be done, and sets the second argument to be the numeric result. (It will evaluate arbitrary expressions.)

Eval returns a value indicating the type of data that input string expression evaluated to.

Where: -1 = error,
	0 = alpha-numeric string returned in result parameter,
	1 = integer, and,
	2 = double-float.
Numerically resolved results (cases 1 and 2) are returned in parameter "value". They are always returned as type double. Integers should be deconverted. (ie. ival = (int)dval; ).

The formal defintion of the eval function is:

	int eval( char *line, double *result )


Setting Macro Values

New DFG-macros can be set, or old ones can be set to new values at anytime, by the add_macro function. The add_macro function accepts a macro as its first argument, and its value as its second argument. Both arguments must be character-strings. (The character-strings are copied, so you can re-use your character-strings after calling add_macro.)

The formal defintion of the add_macro function is:

	
	void add_macro( char *macro, char *value )





5. File locations

The dynamic scheduler model file is in directory:
$CSIM_ROOT/model_libs/core_models/dynam_sched.dir
In file:
dynamic_sched.sim
You can include this file in your model if you are not using any custom DFG-nodes. (Otherwise copy it, and modify the copy.) Instantiate one instance of the scheduler in a box that is not connected to anything (similar to the monitor.sim).

Several complete examples are contained the examples directories below the dynam_sched.dir directory.

Usage:
To use the Dynamic Scheduler, include:
          monitor.sim (as in any core_models simulation),
and
          dynamic_sched.sim.
The order you include them should not matter. The dynamic_sched.sim file will include other files, such as dynsched.h, SchedRoutines.c, parse.c, and parse3.c. (You may need to copy these to your local directory.)






6. Pathological Cases

The following link discusses unusual DFG configurations that may be good to be aware of:

Pathological Cases.