CSIM as a Performance Modeling Tool:
An Overview

Dr. K. Burgess, Dr. R. Artz

This document briefly covers our use of CSIM to construct performance models. This introduction includes the installation of CSIM and the construction and interpretation of a performance model from a user's viewpoint. Details of the CSIM modeling language are left to the full documentation of CSIM at: csim.com. However, this document includes a summary and supplemental information about both CSIM distribution and some user community's performance modeling additions.

Choosing CSIM for Performance Modeling
CSIM Performance Model Distribution: Setup & Primer
- 2.1 Installation
- 2.2 Building Hardware and Software Models
- 2.3 Running Simulations
- 2.4 Analyzing Results
An Overview of the CSIM Tools
A Performance Model Demonstration
Appendices

1. Choosing CSIM for Performance Modeling

Modeling is meant to make predictions about an underlying system. In many cases, those questions are constrained by components whose existence is speculated and for which there exist few specifications. In many of these cases, software driving the system is not only unwritten, but in parts is unspecified. Further, modeled systems may be the cumulative effort of geographically separate teams working from different companies. Without many missing details, the modeling task requires a relatively coarse modeling resolution, a modeling system that is robust to changing specifications, and a modeling description that can be decomposed into separately alterable sections representative of pieces of the final system that will be developed by various disparate teams.

Why Choose CSIM?

As a member of the user community, we have considered several modeling languages and tools for these sorts of tasks. CSIM meets all of the above requirements and, with support from CSIM.com, has met all of our needs. First of all, CSIM's design easily allows for a separate construction of hardware and software models. Software models can be 'hosted' on a changing hardware model through use of a mapping file that associates software processes with hardware devices (details below). This means that software can easily be reallocated to different hardware. It also means that software designers at one location can 'test' (coarse-level) software performance on earlier hardware models even as those hardware models are being modified at another location. Likewise, hardware modelers can test new hardware models through use of existing, but not necessarily up-to-date software models.

Secondly, and in contrast with other considered modeling languages, all CSIM models can be constructed more naturally from a top-down rather than bottom-up approach. Hardware models can be specified at a system level first, then for each system module, and finally at the level of each device within the modules. Similarly, software models can be specified in terms of a computer software configuration item (CSCI) followed by constituent software components (CSC) and finally in terms of software units (CSU). The top down approach mimics the way large systems are typically designed and primarily means that the bottom-level details can be changed as new information becomes available without major revisions of the model.

Thirdly, the CSIM source code is designed in such a way that one or two individuals can maintain it.

In general, one of CSIM's greatest strengths is in its role as a Systems Engineering tool. The process of constructing hardware and software models requires a level of communication between the various software and hardware teams. The model itself represents a way to accumulate and disseminate hardware and software structure and interaction information. Design and performance weaknesses can be exposed at the earliest stages of development allowing for corrections that have the lowest impact on cost and schedule.

Why Not Choose an Alternative?

We are concerned with situations where both architecture and software designs may change in significant ways before the program ends. Without design and device details, lower-level models tend to provide pointlessly overly precise information about incorrect models. These alternative modeling languages also strongly mix software with hardware design. This creates configuration management nightmares when teams across the country have to incorporate changes to their modeled components simultaneously. Software and hardware coupling also makes it difficult to swap in and out alternative scenarios; modeling of a hardware failure or modeling situations including or excluding various software functions. The alternative tools do allow for the development of software schedulers and message routers, but the construction of these tools would be awkward and development very time consuming.

Efficiency. By way of example, one specific system model developed by us, and the entire CSIM tool package fit onto a 1.4MB floppy disk and be run on just about any standard computer. A one-tenth second CSIM simulation of this model takes approximately three minutes.

In contrast, a model of only a portion of the same system, was created in another proprietary tool. In addition to a product installation that requires a CD, plus a 100MB Zip disk to hold the compressed models, and a 250MB RAM 200+MHz Pentium to run. Typical run times consume days, even on a much more powerful PC than above!

Modeling Goals

The primary goal considered here is to provide a performance modeling tool that helps our Systems Integration team verify that all system components are likely to work together. This includes looking at metrics such as signal contentions, message transfer delays, component utilization under various scenarios, and in particular identifying any system bottlenecks. Performance models also can be used to analyze system responses to component and certain high-level software failures, to track changes in system design, and test hypothetical alternative architectures and devices.

2. CSIM Performance Model Distribution: Setup & Primer

CSIM is distributed by CSIM.com. Two other organizations re-distribute CSIM with customized versions of performance models. This section lists and explains the steps required to unpack and run CSIM. This section also describes the run-time and post-processing analysis tools, and points of contact for getting help. The descriptions of this section require a minimal knowledge of UNIX and no knowledge of CSIM. An overview of CSIM's tools and the structure of hardware and software models will be given in the following sections.

The standard CSIM distribution consists of a single file named csim_install_xx.tar, where the xx is the version number. The contents of this self-contained, compressed archive file include:

The most recent version of CSIM tool distribution supporting Sun Solaris, PC-Linux, SGI-Irix, HP-UX, and Mac-OSX.
Several libraries of re-usable general purpose models. And,
A set of simple CSIM demonstration models and examples.

The instructions in this subsection cover unpacking the CSIM distribution, building and running a model, and analyzing the simulation output. The following unpacking instructions can be followed regardless of any existing distribution. The unpacking procedure will not overwrite any user-created files. However, this procedure will overwrite any standard distribution files that have been modified.

2.1 Installing or Updating CSIM, and Setting Environment

Note: Sites with a shared file system need install only one copy of CSIM which can be shared by all users. In other words, only one person at each site needs to perform this install; not each user. We recommend appointing this person as custodian of the installation. The custodian(s) can efficiently handle tool updates and configuring the individual user's environments. (Of course, there is no harm in installing multiple copies, it is just not as efficient to maintain.)

(1) Recommendation: Be in a C-shell or compatible shell. The tcsh shell is preferred. If this isn't your default shell, simply type tcsh or csh. (Although any shell can be used with CSIM, the instructions here apply to tcsh or csh. These instructions have been tested in these shells.)

(2) Go to the directory where you wish to install CSIM.
Example:
cd /proj/xx

(3) Type:
tar xvf install_v*.tar
and,
gunzip -r csim

This will unpack and uncompress the CSIM files into a directory called csim under your current directory. (i.e. /proj/xx/csim ) This forms the CSIM root location. The files in the csim directory will have a directory structure similar to the one in Appendix CSIM Distribution Directory Tree.

Now you have installed the CSIM files. Next, you need to adjust the setup according to the environment at your site.

(4) Edit the csim/tools/setup file. Look for the keyword "CSIM_ROOT" and change this directory path to reflect your installation location. Also look for your machine's "CSIM_C_COMPILER" variable, and, if necessary, change any "gcc" entries to the "cc" command of your local compiler.

(5) You may want to edit csim/tools/platform/gui_setups file, where 'platform' corresponds to the type of your system(s), if your local text editor is not the stated default ("textedit" or "nedit" respectively). A common alternative "xterm -e vi" will work on most systems. (This launches the vi editor within an xterm-window.)

(6) Source the csim/tools/setup file.
Example: Source /proj/xx/csim/tools/setup
"Source" is a csh/tcsh command that executes a script file. In this case, it tells your shell where the appropriate CSIM tool executables are for your platform, and makes quick aliases for them.

This final step (6) is the only one which must be repeated for each session and user of CSIM. For convenience, we recommend placing the source command in your home directory .cshrc or .tcshrc file, to occur automatically every login.

2.2 Building Hardware and Software Models

The CSIM distribution includes a number of demonstrations in the directory csim/demo_examples. This section focuses on the performance modeling demonstration discussed further in More on the Performance Model Demonstration. This model includes two 'hardware' models and two 'software' models and resides in the demo directory 'csim/demo_examples/Lesson_Models_for_HwSw_Performance_Modeling'. Below are instructions for building the simpler of two software models on the simpler of two hardware descriptions.

Basic Setup

(1) Make sure you have sourced the csim/tools/setup file, as described in step (6) of the installation above.
(2) Create a directory where you have write access. Then copy the example performance model directory contents to that directory:
Example:
mkdir ~/myperfmodel
cp -r $CSIM_ROOT/demo_examples/Lesson_Models_for_HwSw_Performance_Modeling ~/myperfmodel

(3) Move into your directory where the models are:
Example:
cd ~/myperfmodel

Build the Hardware Model

(4) Open the GUI on the arch1.sim file.
Type:
gui arch1.sim
This will open the simpler of two 'hardware' architecture models in the CSIM graphical tool and display a diagram similar to that in Figure 1.

Figure 1 - arch1.sim. A Simple Hardware Model.
(5) Choose the menu item "Tools->Build Simulation" to compile the HW architecture model.
(6) Choose the menu item "Tools->Build Routing Table" to construct the network routing information tables.
Alternative Build Note: Steps four through six can be substituted for the following three non-graphical commands executed from the C-shell:
csim arch.sim
sim.exe -netinfo
router netinfo
Nothing about CSIM requires use of the GUI. Non-graphical commands are often quicker.

Build the Software Model

(7) Choose "File->Open->Open a new file" and choose the file "flow1.dfg". This provides a graphical view of the simpler of two SW models and yields a diagram similar to that in Figure 2.

Figure 2 - flow1.dfg. A Simple DFG.
(6) Choose the menu item "Tools->Build DFG SW" to build the simple 'software' model.
(7) Choose the menu item "Tools->Plot Ideal TimeLine" to see the ideal timeline.

2.3 Running Simulations

(1) The performance modeling simulation can be viewed from the perspective of either the hardware or software. By default, the simulation is viewed from the hardware perspective and the default graphical window depicts the hardware architecture: a graphical description similar to Figure 1.

This can be changed to view the simulation from the data flow graph (DFG) representation (a view similar to that in Figure 2). To accomplish this, type setenv SIM_GRAPH flow1.dfg before starting up the GUI.

The simulation view can be changed back to the hardware perspective (before starting the GUI) by typing unsetenv SIM_GRAPH or equivalently setenv SIM_GRAPH arch1.sim.

(2) If not in the GUI, type gui arch1.sim or gui flow1.dfg. Choose the menu item Tools->Run Simulation. This step could equally have been executed from the command line via sim.exe.

(3) Optionally choose Animation->Animation Types->Nodes: Concurrent Activities and/or choose Animation->Animation Types->Links: User/Model Defined. These commands override default device and node coloring discussed in Run-time analysis below and are less useful for models (such as these) that use "core_models" components.

(4) Click on "Run/Continue".

(5) When asked (at the UNIX shell window), choose a verbosity level. This verbosity level controls the detail of command-line feedback the simulation will provide about the state of messages in the simulation. Zero is typically chosen to minimize output and speed up the simulation.

2.4 Analyzing Results

The simulation shows the flow of messages from creation to destination by coloring the various device and DFG objects. The simulation can be slowed down by adjusting the "Speed Slowdown" slider, stopping, stepping through the simulation, or crawling through the simulation per simulation control panel buttons. These and other GUI-accessible controls are described in CSIM's documentation (see The CSIM Graphical Simulator).

When viewing a general simulation or plotted output, colors have the following meanings:

Network Switch Links are:

Purple when unused. They are thin before their first use or when a window is refreshed. They become thick after any use and remain that way until the window is refreshed (usually by changing views).
Yellow on each segment where a packet request has been made and where each wormhole has been established. The yellow back tracks leaving purple behind its retreat-all the way to its destination if the wormhole is denied.1
Orange is shown for control messages (such as token transfers) in network models.
Blue when the first packet of a given message block passes across a link.
Green if data flows down a segment unencumbered.
Red to show contention if data is already flowing down a segment when a second data stream attempts to create a wormhole down that same segment.
HW Generic XBar Links are:

Purple when unused.
Green when data is flowing through the device. (There is no contention on this XBar model.)
Network XBars and NIC/NIU devices are:
Black (uncolored) when unused.
Blue when the device is processing its first packet of a given message block.
Green if handling a control message (not implemented consistently).
Red when the device is being used by further packets.
HW Devices, HW Modules, SW Nodes, and SW Supernodes are:

Light cyan with a thin border for devices and nodes and a thick border for modules and supernodes.
Blue when they initiate the first packet of a given message block.
Other devices listed in the timeline plots including the generic_pe and multi_priority_pe are:

Color-coded by a mapping from the LAST letter or digit of the data flow diagram node name that is providing the device's instructions. The mapping is made according to Table 1 (copied from subroutines.sim).
Timeline plot devices supporting an unnamed monotonic (an alarm that goes off periodically to restart a data flow: say at 33.3ms) are pink.

Table 1 - DFG Task Node Name Color Code Mapping
SIM-Display Panel
colorize() value Last Character of
Task Name XGRAPH Color
(colormap() value)

0 Black --None-- 0 Black

1 Fuchsia 8,F,T,b,p 12 Fuchsia

2 Blue 9,G,U,c,q 3 Blue

3 Cyan H,V,d,r 9 Cyan

4 Navy I,W,e,s 14 Navy

5 Yellow J,X,f,t 7 Yellow

6 Dark-Gray K,Y,g,u 11 Dark-Gray

7 Gray O,L,Z,h,v 10 Light-Gray

8 Red 1,M,i,w 2 Red

9 Green 2,N,j,x 4 Green

10 Violet 3,A,O,k,y 5 Violet

11 Orange 4,B,P,l,z 6 Orange

12 Gold 5,C,Q,m 15 Gold

13 Pink 6,D,R,n 8 Pink

14 Dark Cyan 7,E,S,a,o 13 Aqua

15 White -- None -- 1 White

**Table 1** - DFG Task Node Name Color Code Mapping
SIM-Display Panel colorize() value	Last Character of Task Name	XGRAPH Color (colormap() value)
0 Black	--None--	0 Black
1 Fuchsia	8,F,T,b,p	12 Fuchsia
2 Blue	9,G,U,c,q	3 Blue
3 Cyan	H,V,d,r	9 Cyan
4 Navy	I,W,e,s	14 Navy
5 Yellow	J,X,f,t	7 Yellow
6 Dark-Gray	K,Y,g,u	11 Dark-Gray
7 Gray	O,L,Z,h,v	10 Light-Gray
8 Red	1,M,i,w	2 Red
9 Green	2,N,j,x	4 Green
10 Violet	3,A,O,k,y	5 Violet
11 Orange	4,B,P,l,z	6 Orange
12 Gold	5,C,Q,m	15 Gold
13 Pink	6,D,R,n	8 Pink
14 Dark Cyan	7,E,S,a,o	13 Aqua
15 White	-- None --	1 White

In addition to an object's color, a link's simulated values can be obtained during simulation runtime by choosing a link and then choosing "Options->Examine Link". Similarly, a list of all active links can be obtained by choosing "Options->List Active Links".

Post-processing Analysis

There are a number of ways to analyze simulation output. For instance, any number of specialized 'hooks' can be placed into the 'C' code of hardware devices that might output information into an external file or provide graphical information at runtime (e.g., a 'meter'). Below are six methods that provide post-process information using existing simulation output. Most of the post-processing analysis tools use a plotting package called "xgraph". This package processes data files created by the simulation.

(1) To view the run-time-generated process timeline, choose "Tools->Plot Proc Timeline" from the GUI. Equivalently, type "xgraph ProcTline.dat &" from the command line.

(2) To view the run-time-generated timeline as well as the network utilization paths, choose "Tools->Plot Comm+Proc Tline" from the GUI. Equivalently, type "xgraph Spider.dat ProcTline.dat &" from the command line.

(3) To view system-wide contention levels, type "view_contention LinkTline.dat" from the command line. (This step can not easily be processed directly from the GUI.) The view_contention executable creates a file 'LinkTline.hst' that can be viewed graphically with the command "xgraph LinkTline.hst". Other contention analysis options are described at www.csim.com/view_contention.html.

(4) To create a specialized plot of simulation events, use the event tool. Like the contention analysis tool, this involves processing a text file: type "timeline EventHist.dat" and respond to a series of data-generated questions. The results are then viewed by entering "xgraph EventHist.tln". This is a powerful tool but it is also fairly complicated. Instructions for its use are available at www.csim.com/timeline/timeline.html.

(5) View the ASCII file 'summaries.dat' for processor, link, and port utilizations. Note that these statistics are collected between the default times "Time1" = 0 msec and "Time2" = 1,000,000 msec. This time window can be changed by editing the file "csim/model_libs/core_models/parameters.sim".

(6) To create a supplemental event history file, run the simulation at high verbosity and pipe the output to a separate file. This is accomplished as follows:

From the GUI, choose "Tools->Modify Commands->Run Simulation" and alter the command line to be "./sim.exe -V 10 > simoutput.txt &". Alternatively, type "sim.exe -V 10 > simoutput.txt &" directly from the command line. Choose a number more or less than ten to increase or diminish the quantity of information output.
View this (usually enormous) file directly or use the grep tool to filter the information. For instance, to obtain a file containing all activity on a device 'source', type "grep source simoutput.txt > anotherfilename.txt". To view all of the activity occurring to a message with ID number 2, type "grep mid=2 simoutput.txt | more".

A. Getting Help

A well-written manual can be seen on-line at www.csim.com/. Frequently asked CSIM questions are available at www.csim.com/faq.html and www.csim.com/faq2.html.

CSIM has an "Issues Database" for users. A number of issues and resolutions have been posted there.

For general CSIM modeling questions contact admin@csim.com.

3. An Overview of the CSIM Tools

CSIM is a discrete event simulator that includes three primary model construction tools, a number of simulation analysis tools, and a set of component and system models. A thorough description of the primary and analysis tools can be found at http://www.csim.com/. The following sections will only provide an overview of these tools. This overview also includes a description of CSIM's device models.

3.1 The CSIM PreProcessor

The backbone of CSIM is the CSIM preprocessor ("csim") described thoroughly in www.csim.com/simulator/csim_doc.html. The preprocessor is a primary CSIM tool that converts CSIM source code into 'C' source code (producing an ASCII file called "out.c"). Any number of 'C' compilers can compile this source code-e.g. "gcc". This makes CSIM fairly machine and operating system independent. In particular, all CSIM modeled components and support files are stored in ASCII text files. These flat files are easily read and processed with standard editors and other operating system text-processing tools (significantly including "grep" and "perl"). The CSIM code handled by the CSIM preprocessor consists of behavioral descriptions (how components behave) and topological descriptions (how components interact).

A. Behavioral Description

The two most basic components of a CSIM model are "devices" and "links". Devices model behavior in terms of messages. Devices either create messages (a message source), process messages (a message sink), or modify messages. In the latter case, messages may be conditionally passed from one link to another through a device (a 'switch'), may be delayed, altered in size, tagged with some specific information before being sent on its way, or may be altered by some combination of these actions. The links describe how that information is passed between devices. Links can be thought of as data pipes where the flow of information is specified by rate, latency and direction parameters. Further parameters exist to specify a data pipe's queue length and cost (a variable used by routing algorithms-see Router).

Many CSIM users do not need to know how to create modeled devices. Instead, users typically need to know how about the existing device models, and know how to combine these devices with links to build a module or system architecture; that is, users typically provide only a model's topology. A summary of CSIM preprocessor functions that provide a modeled device with its behavior is given in appendix CSIM Device-Level Behavioral Description.

B. Topological Description

A topology is a description of how devices are connected together. CSIM's topology descriptions include inheritance of attributes on objects. The simple topological description of the performance demonstration's "arch1.sim" was given in diagram form in Figure 1. In this figure, one device of type "generic_pe" is given the name "source". This device produces information that is sent out of a port created by the device (labeled "io_port") and across a link to a similar device named "sink". The link restricts data flow to occur only at a rate of 100 MB/sec. The link also imposes a fixed latency of 1.5 microseconds. The behavior of these two devices is not apparent from the topological description. The behavior is typically described in comments within the file and occasionally in supporting documents such as that provided in Device Models.

Devices and links can be combined into modules that cumulatively are used to describe a system architecture. An example of a module is provided in A More Complicated Architecture File: arch2.sim. This example module exists in the file "arch2.sim" (see Figure 3) and is depicted in Figure 4. As portrayed, this module contains links connecting two external interfaces and three additional devices. A wider border seen graphically in Figure 3 indicates that the "dual_processor" box represents a module and not a device.

The topological descriptions of a module or an architecture (really just a top-level module) are saved into a standard ASCII file in the standard Extensible Markup Language (XML) format. For instance, the XML description corresponding to Figure 1 is given in Figure 7 (appendix p.22). The topological XML information is easily reverse engineered and there are many times when it is more convenient to directly alter the XML source file using an ASCII editor. However, CSIM has provided a GUI tool (discussed below) to automate the XML construction. In fact, many CSIM developers will never look at a raw XML file.

The second part of CSIM's model topology is "Instance Attributes". These are variable assignments that are inherited by a device (or data flow graph node discussed later) when assigned to a parent module. These attributes and how they are invoked are discussed in www.csim.com/simulator/instance_attributes.html.

3.2 CSIM GUI: Describing 'Hardware' Models and Data Flow Graphs

The CSIM GUI is well described in www.csim.com/gui/gui_doc.html. The GUI provides not only a graphical way of entering the underlying XML topological descriptions for 'hardware', but also a way to graphically describe a primitive software model called a data flow graph (DFG).

By convention, files that describe devices, modules, and architectures are given the extension ".sim". DFG files are typically given the extension ".dfg". Both 'hardware' (.sim) and 'software' (.dfg) files are constructed from the GUI in the same way: as boxes connected by lines. In the 'hardware' case, the boxes represent devices or modules and the lines represent links. In the 'software' case, the boxes represent tasks, also called "nodes" or "supernodes", and the connecting lines represent data, also called "arcs". Figure 1 and Figure 2 represent the CSIM GUI's graphical description of a sample hardware and software topology.

The CSIM GUI keeps track of included ('hardware') devices by their name, type, and topological location. A device's name, along with any parent module's name, uniquely identifies itself in the simulation. The behavior of devices is associated with a box through its "type". The relative connection of boxes by lines, the absolute location of devices and modules, and the name and type of each device is information stored in XML format. This XML information is then passed to the simulation by way of the CSIM preprocessor.

The GUI employs a simple mouse-based point, click, and drag methodology for creating boxes and links. A device's name and type are provided via the GUI's "Open Properties" button. As was mentioned in the prior section, the topology of a 'hardware' architecture is primarily provided by the links attaching the various devices. A link acts upon messages being passed between devices (or between devices in modules) by controlling the data rate, latency, allowable direction which messages may pass, queue length, and the cost per segment. This information is accessed for each link via the link's "Open Properties" button.

Data flow graphs, like a system's 'hardware' description, are constructed from boxes and lines by the GUI. However, only a default node type is typically used. The GUI provides five specific node attributes to control the behavior of this default node. Those attributes are:

Instance Name: A node's (unique) name.
Type_Name: A node's type (the default if field is empty), or "NO_PE" (described in www.csim.com/sched/scheduler.html#anch2).
Map PE: A description to associate the software action to one or a group of 'hardware' devices capable of processing 'software' commands.
Compute Time: A delay representing the time when all input arcs 'fire' until all output arcs are 'triggered'. This is discussed below.
Iterations: A counter to repeat the compute time delay and triggering specifications.

Every DFG must have a unique node named "START" and should have at least one node named "EXIT". The "START" node starts up the 'software'. An "EXIT" node stops execution of the data flow graph.

From a performance modeling perspective, nodes (depicted by boxes in the GUI) can be thought of as representing event-driven software processes. Upon occurrence of an event, such a software process comes into existence and gains some control of a processor's resources (a processor listed in the "Map PE" field). The activation event can be the passage of some amount of scheduled time (a CSIM "monotone"), or the existence of a trigger such as the arrival of some data. Once such a software process gains control, it consumes an amount of system resources (specified by the node's "Compute Time"). After the process has completed, it passes control to one or more other nodes representing other processes. In CSIM, the passing of control between processes is modeled through use of arcs (depicted by lines connecting boxes in the GUI).

Arcs connect two nodes acting in the role of source and sink. An arc's actions are controlled by three parameters:

Produce Amount (P): The number of bytes placed into an arc's buffer when its source node has 'triggered'; i.e. when the source node has finished delaying the amount "Compute Time". If a node is source to more than one arc, the produce amount specified by each such arc is placed into the respective arcs' buffer.
Threshold Amount (T): The number of bytes required in each arc's buffer before the sink node can begin to consume its processor's resources. When a sink node depends upon more than one arc for control, this threshold must be met or exceeded respectively for each such arc.
Consume Amount (C): A number of bytes removed from the arc's buffer by the sink node once the sink node has begun to consume its processor's resources.

Arcs dictate the passing of data or control between nodes by specifying the amount of data that passes between nodes and by specifying which nodes become active. Nodes are mapped to 'hardware' devices and an active node causes its associated device to consume resources (CPU). Consequently, arcs dictate which processors consume resources and dictate the size of messages passing over links between these associated devices. One interpretation of DFG components is that nodes act to consume CPU, and arcs act to spawn processes and move data between nodes. Thus much of a DFG's richness lies in its arcs.

Analogous to hardware modules, DFGs can have supernodes. The box "SuperNode" depicted in Figure 6 from the file "flow2.dfg" represents one such supernode. Like modules, supernodes are indicated graphically by a heavy border and contain an external interface.

The GUI primarily simplifies the creation of DFGs, architecture and modules. However, the GUI also has a "Tools" menu that

Executes the CSIM preprocessor ("Build Simulation"),
Builds a routing table,
Runs the static Scheduler ("Build DFG SW"), and
Runs some xgraph commands (discussed below) to perform pre-process and post-process simulation analysis.

The hooks for these commands are in an ASCII file saved into a machine-dependent tools subdirectory; e.g., csim/tools/sun_solaris/gui_setups.

3.3 SCHEDULER - Tool for Software Data Flow Graphs (DFGs)

The Scheduler is documented in www.csim.com/sched/scheduler.html. Analogous to the CSIM preprocessor, the Scheduler processes a DFG's XML description. However, instead of producing 'C' source code, the Scheduler produces lists of 'software' programming commands, called pseudo-code, stored in ASCII files. These command files are tailored for devices that internally indicate a need for such instructions. The command files are called ".prog files" and are processed by these specified 'programmable' devices at simulation run time. CSIM describes the DFG and use of the Scheduler as follows:

A DFG describes the tasks and inherent data dependencies of an application; in particular, software applications. The SCHEDULER utility accepts DFG files and after partitioning, allocating, and scheduling the flow-graph nodes, produces corresponding software-programs for each of the targeted processor elements (PEs).

When triggered, a DFG arc indicates that a "Consume Amount" number of bytes should be transferred between two nodes. The Scheduler takes this event and instructs the first node's associated device to send a message. The message created includes the number of 'consumed' bytes and the address of the device associated with the destination node. However, if two software nodes are mapped to the same device, then the entire message process is skipped. In effect, messages passed between software nodes mapped to the same device act as if they arrive instantaneously.

The Scheduler can be invoked in two different ways: statically and dynamically. In its static form, the Scheduler typically produces an ASCII file for every programmable device in an associated architecture. The commands in these files indicate the receiving and sending of blocks of data (and for sending, the identification of the receive device), device delays (corresponding to the consumption of CPU time) and other related commands itemized below. These ASCII files are automatically placed into the (current) model directory but can easily be redirected to another directory. By convention, we place these instruction files into a "programs" subdirectory. This can be accomplished by recasting CSIM's default "sched" and "stim" aliases, and then redirecting the simulation to this directory via the following three commands:

alias sched "`alias sched` -o ./programs"
alias stim "`alias stim` -d -o ./programs"
setenv PMOD_PROG ./programs

The Scheduler's dynamic form produces the same commands but provides those commands interactively into a runtime buffer instead of into pre-processed files. The dynamic form, described at the address www.csim.com/sched/dyn/index.html, is more flexible and allows for a wider array of 'software' models.

3.4 ROUTER

A compiled CSIM simulation necessarily has the ability to interpret its own topology descriptions. This ability has been exploited to use the simulation itself to produce a "netinfo" file. The "netinfo" file assigns an logical-ID (integer) to every device, link, and device-type used in the simulation. CSIM's router places into a file "netinfo.net", a list of all connections (links) between neighboring devices and the user-entered 'cost' of traversing each of those links. It then uses this information to determine the best pathways from each device to every other device in an architecture and places that information into a "netinfo.rte" file. This routing information is read into the simulation at initialization time (by subroutines contained in the core_models library file "subroutines.sim"). Routes are chosen at run-time to direct messages sent between devices. The router tool is based on "a breadth-first version of the Dijkstra shortest distance search algorithm." This tool is described at www.csim.com/router/router.html.

3.5 Analysis Tools

As discussed in Post-processing analysis above, CSIM provides a number of analysis tools. Most of these tools operate on output information provided by the devices themselves. Process and data flows, and message contention are viewed through a graphical interpreter called "xgraph". These are separately described in CSIM documentation under the Time Line Viewer and Contention Viewer. Further, as discussed in Post-processing analysis, standard UNIX tools can be used to filter and analyze high-verbosity simulation output.

3.6 Device Models

The following models are described here:

**List 1** - PreDefined Models
delay_box.sim	dynam_sched/SchedRoutines3b.sim	dynam_sched/dynamic_pe.sim
dynam_sched/dynamic_sched.sim	generic_pe.sim	generic_xbar.sim
latency.sim	lbus.sim	monitor.sim
multi_models/monitor.sim	multi_models/multi_task_pe.sim	multi_models/parameters.sim
multi_models/subroutines.sim	multi_priority_pe.sim	parameters.sim
race_nic.sim	race_xbar.sim	racepp_nic.sim
racepp_nic_fd.sim	racepp_xbar.sim	racepp_xbar_fd.sim
subroutines.sim	switcher.sim	c40.sim
cascade_bus.sim	lanai.sim	myrinet_xbar.sim

4. More on the Performance Model Demonstration

The simple demonstration performance model used above was constructed to highlight CSIM's performance modeling ability. This description does assume a basic understanding of CSIM. An overview of CSIM and links to CSIM documentation was provided in An Overview of CSIM above.

Any CSIM developer can develop useful performance models using only the CSIM GUI and the device models provided with CSIM distribution. Such a developed system typically consists of two user-generated files. The first contains a topological description of the modeled system's hardware. The second contains a data flow graph (DFG) representative of software instructions for the hardware model. By convention the 'hardware' file is given an extension ".sim" and the 'software' file is given an extension of ".dfg".

One such simple representative system resides in the directory "csim/demo_examples/Lesson_Models_for_HwSw_Performance_Modeling". This directory contains five files and one directory:

arch1.sim: A simple architecture description. See Figure 1.
arch2.sim: A slightly more complex architecture description. See Figure 3 and Figure 4.
flow1.dfg: A simple data flow graph. See Figure 2.
flow2.dfg: A slightly more complex data flow graph. See Figure 5 and Figure 6.
flow2map.csim: A hardware mapping file used by flow2.dfg. See Listing 1.
programs: A directory that can hold simulation instruction files (".prog files") generated by the scheduler. (Refer to Scheduler.)

The remainder of this subsection describes each of these files and how they can be executed. As will be shown, some of the power of CSIM for system performance modeling is demonstrated by the application of either DFG (either .dfg file) to either system architecture description (either .sim file).

4.1 A Simple Architecture File: arch1.sim

A diagram of the "arch1.sim" file is included in Figure 1. In these diagrams, boxes represent devices or modules and connecting lines represent communications links. Devices are behavior description files written in standard 'C' that contain any number of CSIM constructs (discussed in The CSIM PreProcessor). Modules (shown in Figure 3) are topological descriptions representing a grouping of devices, (other) modules, links, and include one or more ports that represent an external interface.

The CSIM event simulator will pass messages through links into and out of devices. Links restrict the flow of these messages in time by user-modified parameters. These parameters include transfer rate 'R', link latency 'L', link flow characteristics 'D' (full-duplex vs. half-duplex vs. simplex), and message queue size 'Q'. The devices may create, modify, or destroy messages.

Devices can be written by the user or obtained from a library. All devices in Figure 1 through Figure 4 are part of CSIM distribution and are located in the core_models subdirectory. Both library and user-generated devices can be imported into the model using the GUI by selecting `File->Import->by Reference to File' and choosing a desired device file. (For a description of the GUI, see The CSIM GUI: Describing 'Hardware' Models and Data Flow Graphs.) Behind the scenes, this GUI import action places reference lines into the arch1.sim file of the form:

%include ../../core_models/generic_pe.sim

The remainder of the arch1.sim file contains Extensible Markup Language (XML) instructions describing the architecture's topology. These XML instructions are fairly easily reverse engineered and include information about where in a diagram the device, module and link the components are located, what behavior description files are associated with each device block, and how the components are connected (via links).

In Figure 1, the device titled "Monitor" reads software instructions generated by a DFG and sets up an environment to track simulation results useful for post-simulation analysis. The "generic_pe" devices named "source" and "sink" process this DFG 'software' to determine when, how large, and how many messages they will create and send out or receive from their port ("io_port"). The generic_pe device, described in Device Models, is one of several devices that can process `software' files-that is, alter their simulation behavior via runtime instructions. Further, the DFG is only one of several ways that these 'programmable' devices can be controlled. In general, devices containing the CSIM preprocessor instruction "DEVICE_CLASS=(programmable);" are in some way 'programmable'. Such devices include "generic_pe.sim", "multi_priority_pe.sim", "c40.sim", "sharc.sim", "multi_task_pe.sim" and "dynamic_pe.sim".

Concerning Figure 1, a DFG will stimulate the timing and movement of data between the two specified "generic_pe.sim" devices. All data that is passed between these devices is limited to moving at a rate of 100 MB/second (100 bytes/msec after incurring a latency of 1.5 msec). Because the link is half-duplex ("hdplx"), data can only flow in one direction at a time. The queue length value of one implies that the link will not place unread messages into a buffer. This means that the "sink" must read a message sent by "source" before a second message can be sent.

4.2 A More Complicated Architecture File: arch2.sim

The architecture in Figure 3 includes a module, named "dual_processor", depicted in Figure 4. Modules are graphically differentiated from devices by the width of their (light blue) border and the existence of external ports (depicted as small orange squares). The external ports are named by the attached link. Connections with a higher level arc use the same name; e.g., "port1". Modules can be used to apply "instantiation variables" or "instance attributes" to a part of the architecture. These are variables whose values are locally applied to devices contained within a module. However, in all other ways devices and arcs nested in modules act as though the entire architecture were flattened into a single layer. This flattened view can be imposed on an architecture at simulation run-time by clicking once on a module and choosing `View->Flatten Selected Nodes'.

Figure 3 - arch2.sim. A Slightly More Complex Hardware Model.

Figure 4 - The arch2.sim dual processor module.

The architecture in arch2.sim (Figure 3) is similar to that in arch1.sim (Figure 1). The similarities ensure that DFGs designed for arch1.sim will also operate on arch2.sim with no alterations. The reverse is true as well with a qualification: all DFGs designed for arch2.sim can be re-mapped onto arch1.sim through use of a mapping file. This mapping, in fact, is the purpose of the included file "flow2map.csim" discussed later. As mentioned, the difference between the Figure 1 and Figure 3 architectures is the module named 'dual_processor'. This module contains a point-to-point switch allowing simultaneous information flow over independent pairs of attached links. Thus information can flow from the top-level device "source" to the module device "processor2" at the same time that information is flowing from the module device "processor1" to the top-level device "sink".

4.3 A Simple Data Flow Graph: flow1.dfg

The (software) DFG depicted in Figure 2 can be associated with either of the above (hardware) architecture files (arch1.sim or arch2.sim). CSIM requires a unique START node in all DFGs to mark the beginning of a flow of data. The "START" node in this graph is assigned to a top-level 'hardware' device named "source". Of course it is convenient for a 'programmable' (generic_pe) device named "source" to exist as it does in both Figure 1 and Figure 3. DFG-generated instructions contained in the connecting arc require the device associated with this START node to move a single byte of data to the hardware device assigned to the DFG process named "Proc1". In this case, the "Proc1" node is assigned to the same hardware device as "START" (i.e., "source"). When two consecutive nodes in a DFG are mapped to the same source, CSIM knows the data stays local to the processor and doesn't consider moving the data indicated by the intervening arc. (Think about it: If it did, over what path in the associated architecture would a device send data back to itself?). Instead, CSIM ignores the send and receive process specified by the arc and only imposes specified delays that may be indicated by the nodes. In this case, the "START" node imposes a zero msec delay-i.e., no delay. The node "Proc1" instructs the hardware device "source" to delay 11 msec and place a single byte of data into a queue (P=1). The delay and depositing of a byte into a queue is repeated 10 times. Whenever the data queue accumulates two bytes (T=2), those two bytes are sent by the hardware device "source" to the hardware device "sink" assigned to the node "Proc2". The device associated with this latter node 'consumes' those two bytes (C=2). The device associated with "Proc2" then follows the instructions provided by the "Proc2" node; in this case begin by delaying 7 msec. Because 10 bytes are placed sequentially into a queue that is 'triggered' every other byte, the node "Proc2" 'fires' five times. The simulation controlled by flow1.dfg will end whenever the "EXIT" node is reached.

4.4 A More Complicated Data Flow Graph: flow2.dfg

The DFG file "flow2.dfg" depicted in Figure 5 differs from the simpler file "flow1.dfg" in three primary ways. First, it contains a 'supernode'-the DFG analogy for a module. This supernode, called "SuperNode", is depicted in Figure 6. Second, some nodes send messages to and receive messages from more than one node. Third, the nodes are all assigned to indirect variable names such as "Asource". A mapping file, flow2map.csim, handles these assignments and is discussed in the next section.

Figure 5 - flow2.dfg: A More Complicated DFG.

Figure 6 - A supernode named "SuperNode" and assigned the type "Butterfly".

The DFG supernode is entirely analogous with the 'hardware' module. The CSIM GUI presents the supernode as having a thicker border (see Figure 5). Supernodes have an interface drawn by the CSIM GUI as small orange squares located at one end of an arc. The connecting arc names these 'ports' and each port must be attached to an arc in the higher-level DFG. Like the top-level flow graph, supernodes can contain any combination of nodes, arcs, and other supernodes (however recursion isn't allowed).

Nodes receiving signals from more than one connecting arc require that all of these arcs' queues exceed their threshold size before the node 'fires'. Thus, in Figure 6, the hardware associated with both "Proc1" and "Proc3" must send two bytes to the hardware associated with "Proc2" before this latter node is triggered. Once triggered, it will thrice cycle between a delay of seven msec followed by placing a byte into a queue designated for the node named "Proc4". Likewise, once a node has gone through any specified delay cycle, it places any bytes of data into the queues of all attached arcs. So, for instance, when the node "Proc3" in Figure 6 has delayed five msec, it places a byte into queues designated for both "Proc2" and for "Proc4".

4.5 A Node to Device Mapping File: flow2map.csim

As noted in the prior subsection, nodes in the file flow2.dfg are all assigned to indirect variable names: "Asource", "Asink", "Bsource", and "Bsink". The single supernode is given the mapping name "AsuperNode". The CSIM Scheduler (see Scheduler) processes DFGs and associates these mapping names to hardware. (The Scheduler is able to make this association by using a file named "netinfo" produced in processing the associated hardware file.) Nodes that are nested in a supernode are given a mapping name that is the concatenation of the supernode mapping name, the delimiter "/", and the nested node mapping name. Thus the DFG described by flow2.dfg will yield four node to device mappings: "Asource", "Asink", "AsuperNode/Bsource", and "AsuperNode/Bsink".

The hardware devices to which these nodes can be mapped are:

From arch1.sim (Figure 1): "/source" and "/sink". (Top-level names are preceded by the "/" delimiter.) The device "/Monitor" of type "Monitor" will not handle 'software' instructions and so would never be assigned with a node.
From arch2.sim (Figure 3 and Figure 4): "/source", "/sink", "/dual_processor/processor1" and "/dual_processor/processor2".

The missing link is an association between each possible DFG node map, and any device names. This is accomplished by the file flow2map.csim given in Listing 1. It is necessary to edit this file when changing assignment of flow2.dfg nodes from arch1.sim to arch2.sim. In particular, when the variable "usearch1" is defined (as it is in Listing 1), the software node "AsuperNode/Bsource" is assigned to the device "/source" found in arch1.sim. Otherwise, it is assigned to the device "/dual_processor/processor1" found in arch2.sim.


             Listing 1 - flow2map.csim

	<xml version="1.0" standalone="yes">
	<csim_sw_file>
	CGUIformatVersion 1.850000
	%define usearch1
	%ifdef usearch1
	  macro Asource             = /source
	  macro Asink               = /sink
	  macro AsuperNode/Bsource  = /source
	  macro AsuperNode/Bsink    = /sink
	%endif
	%ifndef usearch1
	  macro Asource             = /source
	  macro Asink               = /sink
	  macro AsuperNode/Bsource  = /dual_processor/processor1
	  macro AsuperNode/Bsink    = /dual_processor/processor2
	%endif
	</csim_sw_file>

4.6 Running the Demonstration Performance Models

All four combinations of running either DFG model "flow1.dfg" or "flow2.dfg" on either architecture file "arch1.sim" or "arch2.sim" will work. The operation extends the description to build and run this model in Instructions to build a Performance Modeling Demo and Instructions to run the simulation.

Type "source ../../models/csimsetup".
Type "gui arch1.sim" or "gui arch2.sim".
Choose the menu item "Tools->Build Simulation" to compile the HW architecture model.
Choose the menu item "Tools->Build Routing Table" to construct the network routing information tables.
If you will choose to use the DFG "flow2.dfg", then edit the mapping file "flow2map.csim". If you previously chose to use "arch1.sim", then make sure there is a file entry "%define usearch1". This will enable the first set of mappings in Table 2. Otherwise, (if you previously chose "arch2.sim") remove this line, or replace it with the 'C' comment "/* %define usearch1 */", or replace it with the entry "%undef usearch1" to obtain the second set of mappings in Table 2.
Choose the menu item "File->Open->Open a new file" and then choose the file "flow1.dfg" or "flow2.dfg". This step is equivalent to exiting the GUI (choosing "File->Exit") and typing "gui flow1.dfg" or "gui flow2.dfg".
Choose the menu item "Tools->Build DFG SW" to build the SW model and generate program instruction files (".prog files").
Choose the menu item "Tools->Run Simulation" to run the resulting simulation.

These four combinations of 'hardware' and 'software' models demonstrate some of CSIM's ability to efficiently handle performance modeling. Both 'hardware' and 'software' designs can be constructed from the top down where a coarse model is refined, and modules are added or replaced, as more information becomes available. It is easy to see that one design group may be using a particular 'hardware' and 'software' model set while a second group alters the 'hardware' model and a third group modifies the 'software' model. At any time, and with an appropriate mapping file, updated 'software' can be mapped to either an original or updated 'hardware'. In this way, design tradeoffs can be addressed in both a system's architecture and software design while the system is being developed.

Appendices

A. CSIM Distribution Directory Tree

1. $CSIM_ROOT Directory Tree

Directory tree starting from: csim (Top of CSIM distribution package.)

csim   .......................................................        6.317-KB
   |-- tools   ...............................................     7360.697-KB
   |-- model_libs   ..........................................     2432.954-KB
   |-- demo_examples   .......................................      409.479-KB

2. Tools Directory Tree

Directory tree starting from: csim/tools

csim/tools   .................................................       12.373-KB
   |-- sun_solaris   .........................................     1605.874-KB
   |   |-- general_utilities   ...............................      256.060-KB
   |
   |-- sgi_irix   ............................................     1763.669-KB
   |   |-- general_utilities   ...............................      498.940-KB
   |
   |-- mac_osx   .............................................     1764.114-KB
   |   |-- general_utilities   ...............................      276.842-KB
   |
   |-- i86_linux2.2   ........................................     1298.092-KB
   |   |-- general_utilities   ...............................      295.574-KB
   |
   |-- hp_ux   ...............................................     1816.703-KB
   |   |-- general_utilities   ...............................      507.316-KB
   |
   |-- bin   .................................................        1.159-KB

12 Directories.
Total Space Used = 10.096716-MB       (3.559003-MB Compressed)

3. Model_Libs Directory Tree

Directory tree starting from: csim/model_libs

csim/model_libs   ............................................        5.632-KB
   |-- perfmod2   ............................................      302.197-KB
   |-- icons   ...............................................      591.073-KB
   |-- core_models   .........................................      263.188-KB
   |   |-- sharc_test.dir   ..................................       33.729-KB
   |   |-- raceway_test.dir   ................................       29.004-KB
   |   |-- pe_xbar_test.dir   ................................        7.398-KB
   |   |-- pe_bus_test.dir   .................................        7.357-KB
   |   |-- myrinet_test.dir   ................................       19.015-KB
   |   |-- multi_models   ....................................       82.637-KB
   |   |   |-- raceway_test.dir   ............................       20.273-KB
   |   |   |   |-- programs   ................................        7.800-KB
   |   |   |
   |   |   |-- pe_bus_test.dir   .............................        8.598-KB
   |   |   |   |-- programs   ................................        2.988-KB
   |   |   |
   |   |   |-- myrinet_test.dir   ............................       20.644-KB
   |   |       |-- programs   ................................        7.800-KB
   |   |
   |   |-- dynam_sched   .....................................      459.569-KB
   |   |   |-- raceway_test.dir   ............................       18.836-KB
   |   |
   |   |-- c40_test.dir   ....................................       19.642-KB
   |
   |-- arith_lib   ...........................................      120.637-KB
   |
   |-- COTS_boards   .........................................        2.560-KB
   |   |-- MERC   ............................................       42.901-KB
   |   |   |-- programs   ....................................        1.024-KB
   |   |
   |   |-- CSPI   ............................................       12.648-KB
   |   |   |-- programs   ....................................        2.088-KB
   |   |
   |   |-- ALEX   ............................................       11.165-KB
   |       |-- test   ........................................       35.757-KB
   |           |-- programs   ................................        1.024-KB
   |
   |-- general_blocks   ......................................       89.242-KB
       |-- Arithmetic   ......................................       19.730-KB
       |-- Comparison   ......................................        6.437-KB
       |-- Conversions   .....................................        2.778-KB
       |-- Counters   ........................................        6.797-KB
       |-- DS_Type_Operations   ..............................        3.860-KB
       |-- Data_Structure_Access   ...........................        8.174-KB
       |-- Delays   ..........................................        3.294-KB
       |-- Examples   ........................................       66.421-KB
       |-- Execution_Control   ...............................        5.777-KB
       |-- File_Access   .....................................        9.899-KB
       |-- Generators   ......................................       24.289-KB
       |-- Logical   .........................................        4.412-KB
       |-- Loops   ...........................................        6.436-KB
       |-- Memory   ..........................................        9.201-KB
       |-- Miscellaneous   ...................................        7.322-KB
       |-- Plot_Generation   .................................        3.049-KB
       |-- Probes   ..........................................        2.432-KB
       |-- Queues   ..........................................        4.507-KB
       |-- Servers   .........................................        5.350-KB
       |-- Statistics   ......................................        7.102-KB
       |-- Switches   ........................................        2.135-KB
       |-- Timers   ..........................................        1.982-KB
       |-- Traffic_Generators   ..............................        6.117-KB
       |-- Vector_Operations   ...............................       11.636-KB

52 Directories.
Total Space Used = 2.432954-MB

4. Demonstrations / Examples Directory

Directory tree starting from: csim/demo_examples

csim/demo_examples   .........................................       12.887-KB
   |-- demo0   ...............................................        1.985-KB
   |-- demo1   ...............................................       39.416-KB
   |-- demo2   ...............................................       46.575-KB
   |-- demo2_slider   ........................................       55.994-KB
   |-- demo3   ...............................................       43.326-KB
   |-- demo4   ...............................................      144.143-KB
   |-- demo6   ...............................................       23.447-KB
   |-- demo7   ...............................................       41.706-KB

9 Directories.
Total Space Used = 0.409479-MB

B. CSIM Device-Level Behavioral Descriptions

The behavioral descriptions of devices consist of standard C-language with a small set of extension functions developed by CSIM. A simple CSIM device, "HW_dummy.sim", is given in Figure 7. This device is a data sink that accepts messages but does nothing with them. The device begins and ends with the keywords "DEFINE_DEVICE_TYPE:" and "END_DEFINE_DEVICE_TYPE." The function "PORT_LIST" provides the simulator with this device's topological constraints-e.g. the number of allowable connections. A command "DEVICE_CLASS" provides the simulator with class information (at the moment, "programmable" is the only one being used).

DEFINE_DEVICE_TYPE: HW_dummy PORT_LIST( inp ); /* Contains only an input port (inp). */ /* Local Variables */ int length_in; struct message_struct *message_in; int my_id; DEFINE_THREAD: start_up /* Start blocking delay device */ { DELAY(0.001); /* Launch both port-handling process. */ TRIGGER_THREAD( process_inp, 0.1, 0 ); my_id = myid( MY_NAME ); fprintf(LinkTline,"replace_y_axis %d %s\n", my_id, MY_NAME); } END_DEFINE_THREAD. DEFINE_THREAD: process_inp /* Process handles input port messages. */ { while (1) /* Wait for data arrival */ RECEIVE( "inp", &message_in, &length_in ); } END_DEFINE_THREAD. END_DEFINE_DEVICE_TYPE Figure 7: A simple CSIM device "HW_dummy".

A device's behavior is specified in software processes called "threads" that begin and end with the keywords "DEFINE_THREAD:" and "END_DEFINE_THREAD." Each device must have a startup thread. The startup thread in Figure 7 delays the remainder of the device's actions 0.001 msec1, then spawns a second thread named "process_inp", writes information into a post-processing summary file, and ends. The second spawned thread, "process_inp", contains an infinite loop with a function "RECEIVE()" that blocks until a message is received. Received messages are processed and recorded automatically by the "RECEIVE()" function.

As mentioned, threads can use any standard 'C' functions along with CSIM-specific functions to build up a device's behavior. Other CSIM-specific functions include:

CSIM_TIME: the current simulation time.
MY_NAME: the device's topological name.
MY_ID: the device's logical ID number.
THREAD_VAR: handle to arguments passed into a thread.
DELAY( delay_amt ).
TRIGGER_THREAD( thread, delay_amt, thread_var ): spawn a thread.
SEND( port, message_ptr, length ): send a message.
RECEIVE( port, &message, &length ): block until a message is received and then dequeue message.
RECEIVE_IR( port, &message, &length ): same as above except cease blocking as soon as the message begins to arrive (as opposed to when it has completely arrived), and don't dequeue the message.
CHECK( port, &status ): check for received message without blocking.
CSIM_ANNOUNCE(): print the current simulation time and the device's topological name.
csim_printf(""): a macro for {CSIM_ANNOUNCE(); printf("");}.
PREEMPT_INCOMING( ), PREEMPT_OUTGOING( ): preempt messages.
highlight_box( int color), highlight_link(char *port_name, int color): animate displays.
Annotate( char *strng, int color, float xoffset, float yoffset): display textual information at simulation time.
WAIT( SYNCHRON **x, int flavor ): block a thread.
RESUME( SYNCHRON **x, int flavor ): unblock a thread.
list_in_ports( int *nports ), list_out_ports( int *nports ): specify ports connected to other devices.
get_attribute(char *attribute_name, char *value): obtain values assigned to variables that have scope defined by the calling device's topological location.
halt(): stopping a simulation and returning control to the user.

It is often useful to pass some 'C' language code, including functions and global variable declarations, directly through to the 'C' compiler by enclosing it in a DEFINE_GLOBAL: and END_DEFINE_GLOBAL. pair. This is called a global block.

Finally, CSIM provides the following CSIM preprocessor directives that mimic their 'C' equivalents:

%include, %define, %ifdef, %ifndef, %endif: CSIM preprocessor directives.
macro: identical to %define except that it can only exist outside of devices and global blocks.
variable: differing from a macro only in immediately evaluating its expression.

C. Software Commands Processed by the Programmable Class

The Scheduler produces commands for devices containing the CSIM preprocessor instruction "DEVICE_CLASS=(programmable);" and for the STIM_SCHEDULER.1 Such devices include "generic_pe.sim", "multi_priority_pe.sim", "c40.sim", "sharc.sim", "multi_task_pe.sim" and "dynamic_pe.sim". The commands or 'software' instructions ordinarily produced by the static Scheduler are a subset of those handled by these programmable devices. All such commands are initially read from a file by the "read_program" function in the file "subroutines.sim". Below is a list of all such known commands. Required arguments are enclosed in a "<>" pair. Optional arguments are enclosed in square brackets "[]".

cecompute [label]: Forces the utilization of a fixed amount of 'CPU' by blocking a particular device thread for a fixed amount of simulated time.
recvmessg [comment]: Blocks a device's computation thread until a particular message is received.
sendmessg [priority] [comment]: Sends a message of 'length' bytes to the device associated with the integer 'dst_pe'.
monotonic [label]: Imposes a variable delay (to pause until a particular specified time) and typically does not block in a multi-tasking PE model.
looptop [comment]: A file marker indicating the top of a potential loop.
loopuntiltime [comment]: Instructs the programmable device to repeat commands after the last observed "looptop" file marker until the specified simulation time has been reached or passed.
loop [counter] [comment]: Instructs the programmable device to repeat commands after the "looptop" file marker until a counter has been decremented to zero.
sendSTIM [arg4] [comment]: This is code supporting an ancient version of the STIM scheduler no longer actively supported.
SUBGRAPH [label]: Specification for the multi_task_pe.sim to model a simple multi-tasking compute element.
eo_subgraph [comment]: Supports archaic multi-tasking structure.
END_OF_GRAPHS [label]: Marks the end of a 'SUBGRAPH' section.
TASK [priority] [label]: Specification for the beginning of a frame-rate task in the multi_priority_pe.sim device.
END_OF_TASK [label]: Marks the end of a 'TASK' section.
check_exec [comment]: A multi_priority_pe.sim device command releasing control back to an executive process, allowing control to pass to another task.
progmdone [comment]: Marks the end of a task or file.

D. Graph Files

The following are the listings of the structure (graph topology) file arch1.sim in Figure 8, and the Data Flow Graph flow1.dfg in Figure 9.

<xml version="1.0" standalone="yes"> <csim_hw_file> CGUIformatVersion 1.850000 %include ../../core_models/monitor.sim %include ../../core_models/subroutines.sim %include ../../core_models/parameters.sim %include ../../core_models/generic_pe.sim <DEFINE_MODULE> top_level <top_diagram> <DEFINE_NODE_INSTANCES> <ins 1> source = generic_pe <vrt> 1.800000 1.000000 3.600000 1.800000 </ins> <ins 1> sink = generic_pe <vrt> 6.200000 1.000000 8.000000 1.800000 </ins> <ins 1> Monitor = Monitor <vrt> 4.000000 2.000000 5.800000 2.800000 </ins> </DEFINE_NODE_INSTANCES> <DEFINE_TOPOLOGY> <lnk> source io_port <to> sink io_port <a_dr> hdplx <a_ql> 1 <a_tr> 100 <a_lt> 1.5 <a_cs> 1 1 <vrt> 6.200000 1.400000 3.600000 1.400000 </lnk> </DEFINE_TOPOLOGY> <ANNO> 1.800000 0.000000 A Simple Architecture File.</ANNO> </DEFINE_MODULE> </csim_hw_file> Figure 8: XML for the Simple Architecture "arch1.sim".


<xml version="1.0" standalone="yes">
<csim_sw_file>
CGUIformatVersion 1.850000

<DEFINE_GRAPH>  top_level <top_diagram>
<DEFINE_NODE_INSTANCES>
	<ins 1> START	=	unnamed	
		<a_ct> 0 <a_it> 1 <a_mp> /source  
		<vrt> -1.600000 2.400000  -0.400000 3.000000 </ins>
	<ins 1> EXIT	=	unnamed
		<a_ct> 0 <a_it> 1 <a_mp> /sink  
		<vrt> 2.800000 3.600000  4.000000 4.200000 </ins>
	<ins 1> Proc1	=	unnamed
		<a_ct> 11 <a_it> 10 <a_mp> /source  
		<vrt> 0.600000 2.400000  1.800000 3.000000 </ins>
	<ins 1> Proc2	=	unnamed
		<a_ct> 7 <a_it> 2 <a_mp> /sink  
		<vrt> 0.600000 3.600000  1.800000 4.200000 </ins>
  </DEFINE_NODE_INSTANCES>

  <DEFINE_TOPOLOGY>
	<lnk> START begin	<to>  Proc1	in 
		<a_pd> 1 <a_th> 1 <a_cn> 1 <a_in> 0  
		<vrt> 0.600000 2.600000  -0.400000 2.600000 </lnk>
	<lnk> Proc1 out	<to>  Proc2	
		in <a_pd> 1 <a_th> 2 <a_cn> 2 <a_in> 0  
		<vrt> 1.200000 3.600000  1.200000 3.000000 </lnk>
	<lnk> Proc2 out	<to>  EXIT	end 
		<a_pd> 1 <a_th> 10 <a_cn> 10 <a_in> 0  
		<vrt> 2.800000 3.800000  1.800000 3.800000 </lnk>
  </DEFINE_TOPOLOGY>

  <ANNO> -2.000000 0.000000 A Simple Data Flow Graph</ANNO>
</DEFINE_GRAPH>

</csim_sw_file>


Figure 9:  XML for the Simple DFG "flow1.sim".

E. Change Record

The way to direct the (static) scheduler to place the .prog files somewhere other than the current directory, was to use the scheduler command-line option "-o" as in "sched test.dfg netinfo -o ./programs". The way to tell the PE models where to find the .prog files is via the environmental variable PMOD_PROG as in "setenv PMOD_PROG ./programs".
It was recommended to use of the same mechanism for both systems. For instance, have the static scheduler look for the same environmental variable PMOD_PROG. If it exists, then test for the command-line argument--if that mechanism remains. The command-line argument will preempt any differing environmental variable or yield an abort of the scheduler.
This change was instituted 4-5-02.

Dr. K. Burgess
Dr. R. Artz

CSIM as a Performance Modeling Tool:An Overview

Dr. K. Burgess, Dr. R. Artz

Table of Contents

A. CSIM Distribution Directory Tree

CSIM as a Performance Modeling Tool:
An Overview