Core Performance Models

User Documentation

March 15, 2005

Simulation Environment

The simulation environment may be organized by the user in a way that s/he can specify the directories where the core models reside, the directory that the models have been compiled, the simulation directory, the program files directory and the simulation results directory. All these directories have default locations and need not be specified by the user. The directories are specified by environment variables. The following environment variables may be used: During simulation, the user may view either the hardware/architecture diagram or the software/DFG diagram being animated. He may switch the settings as follows:
  setenv SIM_GRAPH filename
(Where filename contains architecture or software graph)
It is defaulted to the compiled architecture graph.

Simulation Command Line Options

(sim.exe options)

Processor Model XGRAPH Attributes

The processor models use two attributes: Y_name, Y_coord to define the device label and Y-coordinate, respectively, to be placed along the Y-axis of the graph. These attributes are identified in the model by a statement:
There are two ways to set these attributes: via the GUI or with a text file. If the attributes are not set, the variables MY_NAME and my_id will be used instead.

The attributes may be set by the architecture GUI as part of the device properties pop-up menu, e.g.,

        Y_coord = 7 
        Y_name = PE7

Reading Attributes from File

The attributes may also be read from a file during simulation time. Attributes read from a file override any other setting. The file is read via a command line option:
        sim.exe -a 'filename'
The attributes file consists of three entries per line: the device name which begins with a /, the attribute name and the attribute value. Lines having a device name that does not begin with a / are skipped and can be used as headers, trailers or other comments. Valid delimiters between entries are " ,:;=\t" An example attributes file contents is as follows:
  Object  Macro          Value 

  /pe0       Y_name      pe0 
  /pe1       Y_name      pe1 
  /pe2       Y_name      pe2 
  /pe3       Y_name      pe3 

  /pe0       Y_coord      0 
  /pe1       Y_coord      1 
  /pe2       Y_coord      2 
  /pe3       Y_coord      3 

Utilization Time Window Setting

Processor and link utilization is measured across a specified window. The window may be defined as parameters, as attributes or it may be turned on and off by DFG nodes.

The default window parameters are defined in parameters.sim as:

        %define DFG_TRACE  0 
        %define TIME1  0.0 
        %define TIME2  1.0e20
The corresponding attributes which override the default parameters are:
The default window effectively spans the entire simulation. To select a different window, set attribute values for summary_start_time and summary_end_time in the GUI as global attributes or as Monitor device attributes.

To use a DFG driven window setting, set:

        DFG_trace_enable = 1; 
        DFG node task named TRACE_ON starts the window trace. 
        DFG node task named TRACE_OFF ends the window trace.
If the simulation ends before the TIME2 setting, statistics will be based on the simulation end time.

Routing Table

The routing paths between devices are defined in the netinfo.rte file. This file contains lines in the following format:
        alt src - dst: x x x ; 
        alt - is the alternate path sequence number between the devices 
        src - is the source device 
        dst - is the destination device 
        x x x - is the route with x being the port number for multiport devices. 
        as in: 
        1   1 - 2: 3 4 5 ;
The netinfo.rte file may optionally contain information about the maximum number of alternate paths requested and found. That number will be contained in the first line of the file in the following format:
        0  0 - 0: r ;
where r is the maximum number of alternate paths.

The Monitor.sim model reads this file into an internally kept array called route_table defined as:

        int **route_table; 
        route_table = (int **)malloc( Num_devices * Num_devices * Max_alt_paths * sizeof(int *) ); 
        route_table[ (src*Num_devices+dst) * Max_alt_paths + alt ] = (int *)malloc( (i+1) * sizeof(int) ); 
        /* where i is the path length from src to dst */ 
        route_table[ (src*Num_devices+dst) * Max_alt_paths + alt ][k] = x; /* for k = 0 to i-1 */ 
        route_table[ (src*Num_devices+dst) * Max_alt_paths + alt ][i] = -1;
When building a message to be sent across the network, the route_list entry of the message should be built as follows:
        message->route_list = route_table[ (message->src * Num_devices + message->dst) * Max_alt_paths + alt]; 

        Num_devices = Number_of_devices + 1; /* where Number_of_devices is the maximum device id */ 
        Max_alt_paths = r;   /* when defined in netinfo.rte, else */ 
        Max_alt_paths = MAX_ALT_PATHS;  /* where MAX_ALT_PATHS is defined in parameters.sim */

Description of Event Note Handling

Event note attributes may be added to DFG nodes via the GUI. These notes get stored in the EventHist.dat file by the processor executing the node task. Event notes attributes are of two types: pre-event and post-event notes.

Pre-event attributes are of the form:

        PreEventNoteXXX = String
Post-event attributes are of the form:
        PostEventNoteXXX = String
The attribute must have the keyword PreEventNote or PostEventNote and may be followed by any other characters. The String is the note that may consist of any number of words or characters. Multiple pre-event and post-event attributes may be defined for a single node.

The pre-event note is written into the EventHist.dat file by the generic_pe model right before executing the Compute operation of the node. The post-event note is written into the EventHist.dat file right after executing the Compute operation.

Here is an example for DFG task node T11 attributes:

        PreEventNote_1 = Start T11 Memory Cycles 
        PreEventNote_2 = Start T11 Compute Cycles 
        PostEventNote_1 = End T11 Memory Cycles 
        PostEventNote_2 = End T11 Compute Cycles
In the EventHist file we find the following:
        /pe1 @ 494.850000 Start T11 Compute Cycles 
        /pe1 @ 494.850000 Start T11 Memory Cycles 
        /pe1 @ 494.850000 : begin T11 
        /pe1 @ 594.850000 : end T11 
        /pe1 @ 594.850000 End T11 Compute Cycles 
        /pe1 @ 594.850000 End T11 Memory Cycles 

Processor Memory Tracing and Management

The purpose of modeling the processor's on-chip memory was twofold:
  1. To estimate the memory requirements for the processors in different applications.
  2. To account for system/communication degradation due to finite memory buffering limits at the I/O.
The processor model accounts for three types of on-chip memory buffers
  1. Input Buffer
  2. Output Buffer
  3. Total Processor Memory which also includes the I/O Buffers.
Input Buffer Operations

 The input buffer is incremented by the amount of data received at the input port.

The input buffer is decremented by the amount of data read by the processor's 'op_recvmessg' instruction.

If the input data received exceeds the buffer limit, the input communication agent does not receive any more data at the input port and waits until the buffer is reduced by an 'op_recvmessg' instruction.

If upon executing an 'op_recvmessg', the processor does not find enough data for the message-id (mid), it waits and holds up further instruction execution.

The input buffer limit must be set large enough to avoid a deadlock where both instruction processing flow and the input communication agent are waiting on each other. If a deadlock occurs, the simulation halts and notifies the user with a pop-up message.

Output Buffer Operations

 The output buffer is incremented by the amount of data placed by an 'op_sendmessg' or 'op_postmessg' instruction.

The output buffer is decremented by the output communication agent whenever it sends the message out the output port. It is decremented by the data amount in the message.

If the amount of data placed by an 'op_sendmessg' or 'op_postmessg' instruction would cause the buffer to exceed its limit, it waits and holds up the instruction processing flow until the output communication agent reduces the amount of data in the buffer.

The size of the output buffer must be at least the size of the largest single message sent. The simulation will halt and notify the user with a pop-up message when a message being sent is larger than the size of the output buffer.

Total Memory Operations

 The processor's total memory is incremented by the input communication agent whenever data is received at the input port, by the amount of data in the message(packet_length).

The processor's total memory is decremented by the output communication agent whenever a message is sent out the output port, by the amount of data in the message (packet_length=message_length).

The total memory used at any one point in time is thus the total data received at the input port minus the total data sent at the output port, up to that point in time.

There is suggestive code that is commented out which will let an 'op_cecompute' instruction consume or generate data in the processor. This code was never implemented since it requires expanding the fields in the 'op_cecompute' instruction.

Memory Tracing Files

There are three files called IQtrace.dat, OQtrace.dat and Mtrace.dat that are generated by the processor models for the three buffers, respectively. They trace the amount of data used by each buffer along the simulation timeline. These files can be used directly by XGRAPH. Generation of these files is controlled by bits 6, 7 and 8 of the trace_level vector in the subroutines.sim file.

The X-axis is the simulation timeline. The Y-axis shows the amount of data used by each processor at every point in time. The processors are separated on the Y-axis by a global parameter named M_range whose value should be at least as large as the maximum amount of data held in memory. The processors are offset on the Y-axis by a value equal to my_id * M_range.

Buffer Size Specification

The buffer size is controlled by global parameters set in the parameters.sim file or by device attributes.

When no memory/buffer attributes are used, the generic_pe model uses the global parameters set in parameters.sim:

If you want to override the settings of these parameters, use either global variables or instance attributes via the GUI with the following respective names:
Infinite_Mem is a switch which will disable limits on the buffer size so that the performance of simulations with finite buffer sizes may be compared to simulations with infinite buffer sizes. It will also by-pass potential deadlocks in the simulation. It is defaulted to 0 which lets the buffer limits to be used. When it is set to 1, infinite buffer sizes are assumed.

Live XGRAPH Display

To dynamically view the XGRAPH equivalent of the the ProcTline and Spider plots while the simulation is running, the use of system sockets has been implemented in the core models. For a reference on the use of sockets with XGRAPH see: Live XGRAPH Display.

There are two ways to activate the live XGRAPH plot. One is by using the simulation command line with the -S option:

        > sim.exe -S socket_number
When no socket_number is specified, a default value of 13330 is used. The range of numbers to use is from 1000 to 16383. For successive simulations, use different socket numbers because the system may not clear itself yet from the previous socket.

The second way to activate the live XGRAPH plot is to set the attribute: plot_socket_number = socket_number

The attribute may be set by the GUI globally or at the Monitor device instance. The -S option should then not be used in sim.exe command line. The -S option in the sim.exe command line overrides the attribute socket_number setting.

When invoking the simulation, the simulation window showing the architecture will first open. After clicking on the Run/Cont button, an emtpy XGRAPH plot will open and the simulation will halt. This allows the user to readjust the positions of the overlapping simulation and XGRAPH windows. The user will then again click on the Run/Cont button to run the simulation while displaying live XGRAPH data.

The range of the XGRAPH plot is set by default attributes in the Monitor: plot_xrange = 10 plot_yrange = Number_of_devices

The Number_of_devices is the number of devices in the architecture. These attributes may be changed by the user in the same way all other attributes are set.

Displaying Selected Routing Paths

You can display routing paths interactively by clicking on selected source and destination devices. For further information, follow the instructions under Displaying Selected Routing Paths.