1. Choosing CSIM for Performance Modeling
Modeling is meant to make predictions about an underlying
system. In many cases, those questions are constrained by
components whose existence is speculated and for which there exist
few specifications. In many of these cases, software driving the
system is not only unwritten, but in parts is unspecified.
Further, modeled systems may be the cumulative effort of
geographically separate teams working from different companies.
Without many missing details, the modeling task requires a
relatively coarse modeling resolution, a modeling system that is
robust to changing specifications, and a modeling description that
can be decomposed into separately alterable sections
representative of pieces of the final system that will be
developed by various disparate teams.
Why Choose CSIM?
As a member of the user community, we have considered several modeling languages and tools for these sorts of tasks. CSIM meets all of the above requirements and, with support from CSIM.com, has met all of our needs. First of all, CSIM's design easily allows for a separate construction of hardware and software models. Software models can be 'hosted' on a changing hardware model through use of a mapping file that associates software processes with hardware devices (details below). This means that software can easily be reallocated to different hardware. It also means that software designers at one location can 'test' (coarse-level) software performance on earlier hardware models even as those hardware models are being modified at another location. Likewise, hardware modelers can test new hardware models through use of existing, but not necessarily up-to-date software models.
Secondly, and in contrast with other considered modeling languages, all CSIM models can be constructed more naturally from a top-down rather than bottom-up approach. Hardware models can be specified at a system level first, then for each system module, and finally at the level of each device within the modules. Similarly, software models can be specified in terms of a computer software configuration item (CSCI) followed by constituent software components (CSC) and finally in terms of software units (CSU). The top down approach mimics the way large systems are typically designed and primarily means that the bottom-level details can be changed as new information becomes available without major revisions of the model.
Thirdly, the CSIM source code is designed in such a way that one or two individuals can maintain it.
In general, one of CSIM's greatest strengths is in its role as a Systems Engineering tool. The process of constructing hardware and software models requires a level of communication between the various software and hardware teams. The model itself represents a way to accumulate and disseminate hardware and software structure and interaction information. Design and performance weaknesses can be exposed at the earliest stages of development allowing for corrections that have the lowest impact on cost and schedule.
Why Not Choose an Alternative?
We are concerned with situations where both architecture and software designs may change in significant ways before the program ends. Without design and device details, lower-level models tend to provide pointlessly overly precise information about incorrect models. These alternative modeling languages also strongly mix software with hardware design. This creates configuration management nightmares when teams across the country have to incorporate changes to their modeled components simultaneously. Software and hardware coupling also makes it difficult to swap in and out alternative scenarios; modeling of a hardware failure or modeling situations including or excluding various software functions. The alternative tools do allow for the development of software schedulers and message routers, but the construction of these tools would be awkward and development very time consuming.
Efficiency. By way of example, one specific system model developed by us, and the entire CSIM tool package fit onto a 1.4MB floppy disk and be run on just about any standard computer. A one-tenth second CSIM simulation of this model takes approximately three minutes.
In contrast, a model of only a portion of the same system, was created in another proprietary tool. In addition to a product installation that requires a CD, plus a 100MB Zip disk to hold the compressed models, and a 250MB RAM 200+MHz Pentium to run. Typical run times consume days, even on a much more powerful PC than above!
Modeling Goals
The primary goal considered here is to provide a performance modeling tool that helps our Systems Integration team verify that all system components are likely to work together. This includes looking at metrics such as signal contentions, message transfer delays, component utilization under various scenarios, and in particular identifying any system bottlenecks. Performance models also can be used to analyze system responses to component and certain high-level software failures, to track changes in system design, and test hypothetical alternative architectures and devices.
2. CSIM Performance Model Distribution: Setup & Primer
CSIM is distributed by CSIM.com. Two other organizations re-distribute
CSIM with customized versions of performance models. This section lists
and explains the steps required to unpack and run
CSIM. This section also describes the run-time and
post-processing analysis tools, and points of contact for getting
help. The descriptions of this section require a minimal knowledge
of UNIX and no knowledge of CSIM. An overview of CSIM's tools and
the structure of hardware and software models will be given in the
following sections.
The standard CSIM distribution consists of a single file named csim_install_xx.tar, where the xx is the version number. The contents of this self-contained, compressed archive file include:
2.1 Installing or Updating CSIM, and Setting Environment
Note: Sites with a shared file system need install only one copy of CSIM which can be shared by all users. In other words, only one person at each site needs to perform this install; not each user. We recommend appointing this person as custodian of the installation. The custodian(s) can efficiently handle tool updates and configuring the individual user's environments. (Of course, there is no harm in installing multiple copies, it is just not as efficient to maintain.)
(1) Recommendation: Be in a C-shell or compatible shell. The tcsh shell is preferred. If this isn't your default shell, simply type tcsh or csh. (Although any shell can be used with CSIM, the instructions here apply to tcsh or csh. These instructions have been tested in these shells.)
(2) Go to the directory where you wish to install CSIM.
Example:
cd /proj/xx
(3) Type:
tar xvf install_v*.tar
and,
gunzip -r csim
This will unpack and uncompress the CSIM files into a directory called csim under your current directory. (i.e. /proj/xx/csim ) This forms the CSIM root location. The files in the csim directory will have a directory structure similar to the one in Appendix CSIM Distribution Directory Tree.
Now you have installed the CSIM files. Next, you need to adjust the setup according to the environment at your site.
(4) Edit the csim/tools/setup file. Look for the keyword "CSIM_ROOT" and change this directory path to reflect your installation location. Also look for your machine's "CSIM_C_COMPILER" variable, and, if necessary, change any "gcc" entries to the "cc" command of your local compiler.
(5) You may want to edit csim/tools/platform/gui_setups file, where 'platform' corresponds to the type of your system(s), if your local text editor is not the stated default ("textedit" or "nedit" respectively). A common alternative "xterm -e vi" will work on most systems. (This launches the vi editor within an xterm-window.)
(6) Source the csim/tools/setup file.
This final step (6) is the only one which must be repeated for each session and
user of CSIM. For convenience, we recommend placing the source command in your home directory
.cshrc or .tcshrc file, to occur automatically every login.
2.2 Building Hardware and Software Models
The CSIM distribution includes a number of demonstrations in the
directory csim/demo_examples. This section focuses on the
performance modeling demonstration discussed further in More on
the Performance Model Demonstration. This model includes two
'hardware' models and two 'software' models and resides in the
demo directory 'csim/demo_examples/Lesson_Models_for_HwSw_Performance_Modeling'. Below are
instructions for building the simpler of two software models on
the simpler of two hardware descriptions.
Basic Setup
(2) Create a directory where you have write access.
Then copy the example performance model directory contents to that directory:
(3) Move into your directory where the models are:
(4) Open the GUI on the arch1.sim file.
(5) Choose the menu item "Tools->Build Simulation" to compile the
HW architecture model.
(6) Choose the menu item "Tools->Build Routing Table" to construct
the network routing information tables.
Alternative Build Note: Steps four through six can be substituted
for the following three non-graphical commands executed from the C-shell:
(7) Choose "File->Open->Open a new file" and choose the file "flow1.dfg".
This provides a graphical view of the simpler of two SW models and yields a diagram similar
to that in Figure 2.
(6) Choose the menu item "Tools->Build DFG SW" to build the simple 'software' model.
(7) Choose the menu item "Tools->Plot Ideal TimeLine" to see the ideal timeline.
(1) The performance modeling simulation can be viewed from the
perspective of either the hardware or software. By default, the
simulation is viewed from the hardware perspective and the default
graphical window depicts the hardware architecture: a graphical
description similar to Figure 1.
This can be changed to view the simulation from the data flow
graph (DFG) representation (a view similar to that in Figure 2).
To accomplish this, type setenv SIM_GRAPH flow1.dfg before
starting up the GUI.
The simulation view can be changed back to the hardware
perspective (before starting the GUI) by typing unsetenv
SIM_GRAPH or equivalently setenv SIM_GRAPH arch1.sim.
(2) If not in the GUI, type gui arch1.sim or gui flow1.dfg.
Choose the menu item Tools->Run Simulation. This step could
equally have been executed from the command line via sim.exe.
(3) Optionally choose Animation->Animation Types->Nodes:
Concurrent Activities and/or choose Animation->Animation
Types->Links: User/Model Defined. These commands override
default device and node coloring discussed in Run-time analysis
below and are less useful for models (such as these) that use
"core_models" components.
(4) Click on "Run/Continue".
(5) When asked (at the UNIX shell window), choose a verbosity
level. This verbosity level controls the detail of command-line
feedback the simulation will provide about the state of messages
in the simulation. Zero is typically chosen to minimize output
and speed up the simulation.
The simulation shows the flow of messages from creation to
destination by coloring the various device and DFG objects. The
simulation can be slowed down by adjusting the "Speed Slowdown"
slider, stopping, stepping through the simulation, or crawling
through the simulation per simulation control panel buttons.
These and other GUI-accessible controls are described in CSIM's
documentation (see The CSIM Graphical Simulator).
When viewing a general simulation or plotted output, colors have
the following meanings:
In addition to an object's color, a link's simulated values can
be obtained during simulation runtime by choosing a link and then
choosing "Options->Examine Link". Similarly, a list of all active
links can be obtained by choosing "Options->List Active Links".
There are a number of ways to analyze simulation output. For
instance, any number of specialized 'hooks' can be placed into the
'C' code of hardware devices that might output information into an
external file or provide graphical information at runtime (e.g., a
'meter'). Below are six methods that provide post-process
information using existing simulation output. Most of the
post-processing analysis tools use a plotting
package called "xgraph". This package processes data files
created by the simulation.
(1) To view the run-time-generated process timeline, choose
"Tools->Plot Proc Timeline" from the GUI. Equivalently, type
"xgraph ProcTline.dat &" from the command line.
(2) To view the run-time-generated timeline as well as the network
utilization paths, choose "Tools->Plot Comm+Proc Tline" from the
GUI. Equivalently, type "xgraph Spider.dat ProcTline.dat &" from
the command line.
(3) To view system-wide contention levels, type "view_contention
LinkTline.dat" from the command line. (This step can not easily
be processed directly from the GUI.) The view_contention
executable creates a file 'LinkTline.hst' that can be viewed
graphically with the command "xgraph LinkTline.hst". Other
contention analysis options are described at
www.csim.com/view_contention.html.
(4) To create a specialized plot of simulation events, use the
event tool. Like the contention analysis tool, this involves
processing a text file: type "timeline EventHist.dat" and respond
to a series of data-generated questions. The results are then
viewed by entering "xgraph EventHist.tln". This is a powerful
tool but it is also fairly complicated. Instructions for its use
are available at
www.csim.com/timeline/timeline.html.
(5) View the ASCII file 'summaries.dat' for processor, link, and
port utilizations. Note that these statistics are collected
between the default times "Time1" = 0 msec and "Time2" = 1,000,000
msec. This time window can be changed by editing the file
"csim/model_libs/core_models/parameters.sim".
(6) To create a supplemental event history file, run the
simulation at high verbosity and pipe the output to a separate
file. This is accomplished as follows:
A well-written manual can be seen on-line at
www.csim.com/.
Frequently asked CSIM questions are available at
www.csim.com/faq.html and
www.csim.com/faq2.html.
CSIM has an "Issues Database" for users.
A number of issues and resolutions have been posted there.
For general CSIM modeling questions contact admin@csim.com.
The backbone of CSIM is the CSIM preprocessor ("csim") described
thoroughly in
www.csim.com/simulator/csim_doc.html.
The preprocessor is a primary CSIM tool that converts CSIM source code
into 'C' source code (producing an ASCII file called "out.c").
Any number of 'C' compilers can compile
this source code-e.g. "gcc". This makes CSIM fairly machine and
operating system independent. In particular, all CSIM modeled
components and support files are stored in ASCII text files.
These flat files are easily read and processed with standard
editors and other operating system text-processing tools
(significantly including "grep" and "perl"). The CSIM code
handled by the CSIM preprocessor consists of behavioral
descriptions (how components behave) and topological descriptions
(how components interact).
The two most basic components of a CSIM model are "devices" and
"links". Devices model behavior in terms of messages. Devices
either create messages (a message source), process messages (a
message sink), or modify messages. In the latter case, messages
may be conditionally passed from one link to another through a
device (a 'switch'), may be delayed, altered in size, tagged with
some specific information before being sent on its way, or may be
altered by some combination of these actions. The links describe
how that information is passed between devices. Links can be
thought of as data pipes where the flow of information is
specified by rate, latency and direction parameters. Further
parameters exist to specify a data pipe's queue length and cost (a
variable used by routing algorithms-see Router).
Many CSIM users do not need to know how to create modeled
devices. Instead, users typically need to know how about the existing
device models, and know how to combine these devices with
links to build a module or system architecture; that is, users
typically provide only a model's topology. A summary of CSIM
preprocessor functions that provide a modeled device with its
behavior is given in appendix CSIM Device-Level Behavioral
Description.
A topology is a description of how devices
are connected together. CSIM's topology descriptions include inheritance of
attributes on objects. The simple topological description of the performance
demonstration's "arch1.sim" was given in diagram form in Figure 1.
In this figure, one device of type "generic_pe" is given the name
"source". This device produces information that is sent out of a
port created by the device (labeled "io_port") and across a link
to a similar device named "sink". The link restricts data flow to
occur only at a rate of 100 MB/sec. The link also imposes a fixed
latency of 1.5 microseconds. The behavior of these two devices is not apparent
from the topological description. The behavior is typically
described in comments within the file and occasionally in
supporting documents such as that provided in Device Models.
Devices and links can be combined into modules that cumulatively
are used to describe a system architecture. An example of a
module is provided in A More Complicated Architecture File:
arch2.sim. This example module exists in the file "arch2.sim"
(see Figure 3) and is depicted in Figure 4. As portrayed,
this module contains links connecting two external interfaces and
three additional devices. A wider border seen graphically in
Figure 3 indicates that the "dual_processor" box represents a
module and not a device.
The topological descriptions of a module or an architecture
(really just a top-level module) are saved into a standard
ASCII file in the standard Extensible Markup Language (XML) format.
For instance, the XML description corresponding to
Figure 1 is given in Figure 7 (appendix p.22). The topological
XML information is easily reverse engineered and there are many
times when it is more convenient to directly alter the XML source
file using an ASCII editor. However, CSIM has provided a GUI tool
(discussed below) to automate the XML construction. In fact, many
CSIM developers will never look at a raw XML file.
The second part of CSIM's model topology is "Instance Attributes".
These are variable assignments that are inherited by a device (or
data flow graph node discussed later) when assigned to a parent
module. These attributes and how they are invoked are discussed
in
www.csim.com/simulator/instance_attributes.html.
The CSIM GUI is well described in
www.csim.com/gui/gui_doc.html.
The GUI provides not only a graphical way of entering the underlying XML
topological descriptions for 'hardware', but also a way to
graphically describe a primitive software model called a data flow
graph (DFG).
By convention, files that describe devices, modules, and
architectures are given the extension ".sim". DFG files are
typically given the extension ".dfg". Both 'hardware' (.sim) and
'software' (.dfg) files are constructed from the GUI in the same
way: as boxes connected by lines. In the 'hardware' case, the
boxes represent devices or modules and the lines represent links.
In the 'software' case, the boxes represent tasks, also called "nodes" or
"supernodes", and the connecting lines represent data, also called "arcs". Figure 1
and Figure 2 represent the CSIM GUI's graphical description of a
sample hardware and software topology.
The CSIM GUI keeps track of included ('hardware') devices by their
name, type, and topological location. A device's name, along with
any parent module's name, uniquely identifies itself in the
simulation. The behavior of devices is associated with a box
through its "type". The relative connection of boxes by lines,
the absolute location of devices and modules, and the name and
type of each device is information stored in XML format. This XML
information is then passed to the simulation by way of the CSIM
preprocessor.
The GUI employs a simple mouse-based point, click, and drag
methodology for creating boxes and links. A device's name and
type are provided via the GUI's "Open Properties" button. As was
mentioned in the prior section, the topology of a 'hardware'
architecture is primarily provided by the links attaching the
various devices. A link acts upon messages being passed between
devices (or between devices in modules) by controlling the data
rate, latency, allowable direction which messages may pass,
queue length, and the cost per segment. This information is
accessed for each link via the link's "Open Properties" button.
Data flow graphs, like a system's 'hardware' description, are
constructed from boxes and lines by the GUI. However, only a
default node type is typically used. The GUI provides five
specific node attributes to control the behavior of this default
node. Those attributes are:
From a performance modeling perspective, nodes (depicted by boxes
in the GUI) can be thought of as representing event-driven
software processes. Upon occurrence of an event, such a software
process comes into existence and gains some control of a
processor's resources (a processor listed in the "Map PE" field).
The activation event can be the passage of some amount of
scheduled time (a CSIM "monotone"), or the existence of a trigger
such as the arrival of some data. Once such a software process
gains control, it consumes an amount of system resources
(specified by the node's "Compute Time"). After the process has
completed, it passes control to one or more other nodes
representing other processes. In CSIM, the passing of control
between processes is modeled through use of arcs (depicted by
lines connecting boxes in the GUI).
Arcs connect two nodes acting in the role of source and sink. An
arc's actions are controlled by three parameters:
Analogous to hardware modules, DFGs can have supernodes. The box
"SuperNode" depicted in Figure 6 from the file "flow2.dfg"
represents one such supernode. Like modules, supernodes are
indicated graphically by a heavy border and contain an external
interface.
The GUI primarily simplifies the creation of DFGs, architecture
and modules. However, the GUI also has a "Tools" menu that
The Scheduler is documented in
www.csim.com/sched/scheduler.html. Analogous
to the CSIM preprocessor, the Scheduler processes a DFG's XML
description. However, instead of producing 'C' source code, the
Scheduler produces lists of 'software' programming commands, called pseudo-code,
stored in ASCII files. These command files are tailored
for devices that internally indicate a need for such
instructions. The command files are called ".prog files" and are
processed by these specified 'programmable' devices at simulation
run time. CSIM describes the DFG and use of the Scheduler as
follows:
The Scheduler can be invoked in two different ways: statically
and dynamically. In its static form, the Scheduler typically
produces an ASCII file for every programmable device in an
associated architecture. The commands in these files indicate the
receiving and sending of blocks of data (and for sending, the
identification of the receive device), device delays
(corresponding to the consumption of CPU time) and other related
commands itemized below. These ASCII files are automatically
placed into the (current) model directory but can easily be
redirected to another directory. By convention, we place these
instruction files into a "programs" subdirectory. This can be
accomplished by recasting CSIM's default "sched" and "stim"
aliases, and then redirecting the simulation to this directory via
the following three commands:
The Scheduler's dynamic form produces the same commands but
provides those commands interactively into a runtime buffer
instead of into pre-processed files. The dynamic form, described
at the address
www.csim.com/sched/dyn/index.html,
is more flexible and allows for a wider array of 'software' models.
A compiled CSIM simulation necessarily has the ability to
interpret its own topology descriptions. This ability has been
exploited to use the simulation itself to produce a "netinfo"
file. The "netinfo" file assigns an logical-ID (integer) to every device,
link, and device-type used in the simulation. CSIM's router places
into a file "netinfo.net", a list of all connections (links)
between neighboring devices and the user-entered 'cost' of
traversing each of those links. It then uses this information to
determine the best pathways from each device to every other device
in an architecture and places that information into a
"netinfo.rte" file. This routing information is read into the
simulation at initialization time (by subroutines contained in the
core_models library file "subroutines.sim"). Routes are chosen at
run-time to direct messages sent between devices. The router tool
is based on "a breadth-first version of the Dijkstra shortest
distance search algorithm." This tool is described at
www.csim.com/router/router.html.
As discussed in Post-processing analysis above, CSIM provides a
number of analysis tools. Most of these tools operate on output
information provided by the devices themselves. Process and data
flows, and message contention are viewed through a graphical
interpreter called "xgraph". These are separately described in
CSIM documentation under the
Time Line Viewer and
Contention Viewer.
Further, as discussed in Post-processing analysis, standard
UNIX tools can be used to filter and analyze high-verbosity
simulation output.
The following models are described here:
Any CSIM developer can develop useful performance models using
only the CSIM GUI and the device models provided with CSIM
distribution. Such a developed system typically consists of two
user-generated files. The first contains a topological
description of the modeled system's hardware. The second contains
a data flow graph (DFG) representative of software instructions
for the hardware model. By convention the 'hardware' file is
given an extension ".sim" and the 'software' file is given an
extension of ".dfg".
One such simple representative system resides in the directory
"csim/demo_examples/Lesson_Models_for_HwSw_Performance_Modeling". This directory contains five
files and one directory:
A diagram of the "arch1.sim" file is included in Figure 1. In
these diagrams, boxes represent devices or modules and connecting
lines represent communications links. Devices are behavior
description files written in standard 'C' that contain any number
of CSIM constructs (discussed in The CSIM PreProcessor).
Modules (shown in Figure 3) are topological descriptions
representing a grouping of devices, (other) modules, links, and
include one or more ports that represent an external interface.
The CSIM event simulator will pass messages through links into and
out of devices. Links restrict the flow of these messages in time
by user-modified parameters. These parameters include transfer
rate 'R', link latency 'L', link flow characteristics 'D' (full-duplex
vs. half-duplex vs. simplex), and message queue size 'Q'.
The devices may create, modify, or destroy messages.
Devices can be written by the user or obtained from a library.
All devices in Figure 1 through Figure 4 are part of CSIM
distribution and are located in the core_models subdirectory.
Both library and user-generated devices can be imported into the
model using the GUI by selecting `File->Import->by Reference to
File' and choosing a desired device file. (For a description of
the GUI, see The CSIM GUI: Describing 'Hardware' Models and
Data Flow Graphs.) Behind the scenes, this GUI import action
places reference lines into the arch1.sim file of the form:
The remainder of the arch1.sim file contains Extensible Markup
Language (XML) instructions describing the architecture's
topology. These XML instructions are fairly easily reverse
engineered and include information about where in a diagram the
device, module and link the components are located, what behavior
description files are associated with each device block, and how
the components are connected (via links).
In Figure 1, the device titled "Monitor" reads software
instructions generated by a DFG and sets up an environment to
track simulation results useful for post-simulation analysis. The
"generic_pe" devices named "source" and "sink" process this DFG
'software' to determine when, how large, and how many messages
they will create and send out or receive from their port
("io_port"). The generic_pe device, described in Device Models,
is one of several devices that can process `software' files-that
is, alter their simulation behavior via runtime instructions.
Further, the DFG is only one of several ways that these
'programmable' devices can be controlled. In general, devices
containing the CSIM preprocessor instruction
"DEVICE_CLASS=(programmable);" are in some way 'programmable'.
Such devices include "generic_pe.sim", "multi_priority_pe.sim",
"c40.sim", "sharc.sim", "multi_task_pe.sim" and "dynamic_pe.sim".
Concerning Figure 1, a DFG will stimulate the timing and movement
of data between the two specified "generic_pe.sim" devices. All
data that is passed between these devices is limited to moving at
a rate of 100 MB/second (100 bytes/msec after incurring a latency
of 1.5 msec). Because the link is half-duplex ("hdplx"), data can
only flow in one direction at a time. The queue length value of
one implies that the link will not place unread messages into a
buffer. This means that the "sink" must read a message sent by
"source" before a second message can be sent.
The architecture in Figure 3 includes a module, named
"dual_processor", depicted in Figure 4. Modules are graphically
differentiated from devices by the width of their (light blue)
border and the existence of external ports (depicted as small
orange squares). The external ports are named by the attached
link. Connections with a higher level arc use the same name;
e.g., "port1". Modules can be used to apply "instantiation
variables" or "instance attributes" to a part of the architecture.
These are variables whose values are locally applied to devices
contained within a module. However, in all other ways devices
and arcs nested in modules act as though the entire architecture
were flattened into a single layer. This flattened view can be
imposed on an architecture at simulation run-time by clicking once
on a module and choosing `View->Flatten Selected Nodes'.
The architecture in arch2.sim (Figure 3) is similar to that in
arch1.sim (Figure 1). The similarities ensure that DFGs designed
for arch1.sim will also operate on arch2.sim with no alterations.
The reverse is true as well with a qualification: all DFGs
designed for arch2.sim can be re-mapped onto arch1.sim through use
of a mapping file. This mapping, in fact, is the purpose of the
included file "flow2map.csim" discussed later. As mentioned, the
difference between the Figure 1 and Figure 3 architectures is the
module named 'dual_processor'. This module contains a
point-to-point switch allowing simultaneous information flow over
independent pairs of attached links. Thus information can flow
from the top-level device "source" to the module device
"processor2" at the same time that information is flowing from the
module device "processor1" to the top-level device "sink".
The (software) DFG depicted in Figure 2 can be associated with
either of the above (hardware) architecture files (arch1.sim or
arch2.sim). CSIM requires a unique START node in all DFGs to mark
the beginning of a flow of data. The "START" node in this graph
is assigned to a top-level 'hardware' device named "source". Of
course it is convenient for a 'programmable' (generic_pe) device
named "source" to exist as it does in both Figure 1 and Figure 3.
DFG-generated instructions contained in the connecting arc require
the device associated with this START node to move a single byte
of data to the hardware device assigned to the DFG process named
"Proc1". In this case, the "Proc1" node is assigned to the same
hardware device as "START" (i.e., "source"). When two consecutive
nodes in a DFG are mapped to the same source, CSIM knows the data
stays local to the processor and doesn't consider
moving the data indicated by the intervening arc. (Think about it: If it did,
over what path in the associated architecture would a device send
data back to itself?). Instead, CSIM ignores the send and receive process specified
by the arc and only imposes specified delays that may be indicated
by the nodes. In this case, the "START" node imposes a zero msec
delay-i.e., no delay. The node "Proc1" instructs the hardware
device "source" to delay 11 msec and place a single byte of data
into a queue (P=1). The delay and depositing of a byte into a
queue is repeated 10 times. Whenever the data queue accumulates
two bytes (T=2), those two bytes are sent by the hardware device
"source" to the hardware device "sink" assigned to the node
"Proc2". The device associated with this latter node 'consumes'
those two bytes (C=2). The device associated with "Proc2" then
follows the instructions provided by the "Proc2" node; in this
case begin by delaying 7 msec. Because 10 bytes are placed
sequentially into a queue that is 'triggered' every other byte,
the node "Proc2" 'fires' five times. The simulation controlled by
flow1.dfg will end whenever the "EXIT" node is reached.
The DFG file "flow2.dfg" depicted in Figure 5 differs from the
simpler file "flow1.dfg" in three primary ways. First, it
contains a 'supernode'-the DFG analogy for a module. This
supernode, called "SuperNode", is depicted in Figure 6. Second,
some nodes send messages to and receive messages from more than
one node. Third, the nodes are all assigned to indirect variable
names such as "Asource". A mapping file, flow2map.csim, handles
these assignments and is discussed in the next section.
The DFG supernode is entirely analogous with the 'hardware'
module. The CSIM GUI presents the supernode as having a thicker
border (see Figure 5). Supernodes have an interface drawn by the
CSIM GUI as small orange squares located at one end of an arc.
The connecting arc names these 'ports' and each port must be
attached to an arc in the higher-level DFG. Like the top-level
flow graph, supernodes can contain any combination of nodes, arcs,
and other supernodes (however recursion isn't allowed).
Nodes receiving signals from more than one connecting arc require
that all of these arcs' queues exceed their threshold size before
the node 'fires'. Thus, in Figure 6, the hardware associated with
both "Proc1" and "Proc3" must send two bytes to the hardware
associated with "Proc2" before this latter node is triggered.
Once triggered, it will thrice cycle between a delay of seven msec
followed by placing a byte into a queue designated for the node
named "Proc4". Likewise, once a node has gone through any
specified delay cycle, it places any bytes of data into the queues
of all attached arcs. So, for instance,
when the node "Proc3" in Figure 6 has delayed five msec, it
places a byte into queues designated for both "Proc2" and for "Proc4".
As noted in the prior subsection, nodes in the file flow2.dfg are
all assigned to indirect variable names: "Asource", "Asink",
"Bsource", and "Bsink". The single supernode is given the mapping
name "AsuperNode". The CSIM Scheduler (see Scheduler) processes DFGs
and associates these mapping names to hardware. (The Scheduler is
able to make this association by using a file named "netinfo"
produced in processing the associated hardware file.) Nodes that
are nested in a supernode are given a mapping name that is the
concatenation of the supernode mapping name, the delimiter "/",
and the nested node mapping name. Thus the DFG described by
flow2.dfg will yield four node to device mappings: "Asource",
"Asink", "AsuperNode/Bsource", and "AsuperNode/Bsink".
The hardware devices to which these nodes can be mapped are:
All four combinations of running either DFG model "flow1.dfg" or
"flow2.dfg" on either architecture file "arch1.sim" or "arch2.sim"
will work. The operation extends the description to build and run
this model in Instructions to build a Performance Modeling Demo
and Instructions to run the simulation.
Directory tree starting from: csim (Top of CSIM distribution package.)
Directory tree starting from: csim/tools
3. Model_Libs Directory Tree
Directory tree starting from: csim/model_libs
4. Demonstrations / Examples Directory
Directory tree starting from: csim/demo_examples
As mentioned, threads can use any standard 'C' functions along
with CSIM-specific functions to build up a device's behavior.
Other CSIM-specific functions include:
Finally, CSIM provides the following CSIM preprocessor directives
that mimic their 'C' equivalents:
It was recommended to use of the same mechanism for both systems.
For instance, have the static scheduler look for the same
environmental variable PMOD_PROG. If it exists, then test for the
command-line argument--if that mechanism remains. The
command-line argument will preempt any differing environmental
variable or yield an abort of the scheduler.
This change was instituted 4-5-02.
Example:
"Source" is a csh/tcsh command that executes a script file.
In this case, it tells your shell where the appropriate CSIM tool executables are
for your platform, and makes quick aliases for them.
(1) Make sure you have sourced the csim/tools/setup file, as described in step (6) of the installation above.
Build the Hardware Model
Example:
mkdir ~/myperfmodel
cp -r $CSIM_ROOT/demo_examples/Lesson_Models_for_HwSw_Performance_Modeling ~/myperfmodel
Example:
cd ~/myperfmodel
Build the Software Model
Type:
gui arch1.sim
This will open the simpler of two
'hardware' architecture models in the CSIM graphical tool and display
a diagram similar to that in Figure 1.
Figure 1 - arch1.sim. A Simple Hardware Model.
csim arch.sim
sim.exe -netinfo
router netinfo
Nothing about CSIM requires use of the GUI. Non-graphical commands are often quicker.
Figure 2 - flow1.dfg. A Simple DFG.
Network Switch Links are:
HW Generic XBar Links are:
HW Devices, HW Modules, SW Nodes, and SW Supernodes are:
Other devices listed in the timeline plots including the
generic_pe and multi_priority_pe are:
SIM-Display Panel
colorize() value Last Character of
Task Name XGRAPH Color
(colormap() value)
0 Black
--None--
0 Black
1 Fuchsia
8,F,T,b,p
12 Fuchsia
2 Blue
9,G,U,c,q
3 Blue
3 Cyan
H,V,d,r
9 Cyan
4 Navy
I,W,e,s
14 Navy
5 Yellow
J,X,f,t
7 Yellow
6 Dark-Gray
K,Y,g,u
11 Dark-Gray
7 Gray
O,L,Z,h,v
10 Light-Gray
8 Red
1,M,i,w
2 Red
9 Green
2,N,j,x
4 Green
10 Violet
3,A,O,k,y
5 Violet
11 Orange
4,B,P,l,z
6 Orange
12 Gold
5,C,Q,m
15 Gold
13 Pink
6,D,R,n
8 Pink
14 Dark Cyan
7,E,S,a,o
13 Aqua
15 White
-- None --
1 White
Post-processing Analysis
A. Getting Help
3. An Overview of the CSIM Tools
CSIM is a discrete event simulator that includes three primary
model construction tools, a number of simulation analysis tools,
and a set of component and system models. A thorough description
of the primary and analysis tools can be found at
http://www.csim.com/. The following sections will
only provide an overview of these tools.
This overview also includes a description of CSIM's device models.
3.2 CSIM GUI: Describing 'Hardware' Models and Data Flow Graphs
Every DFG must have a unique node named "START" and should have at
least one node named "EXIT". The "START" node starts up the
'software'. An "EXIT" node stops execution of the data flow
graph.
Arcs dictate the passing of data or control between nodes by specifying
the amount of data that passes between nodes and by specifying
which nodes become active. Nodes are mapped to 'hardware' devices
and an active node causes its associated device to consume
resources (CPU). Consequently, arcs dictate which processors
consume resources and dictate the size of messages passing over
links between these associated devices. One interpretation of DFG
components is that nodes act to consume CPU, and arcs act to spawn
processes and move data between nodes. Thus much of a DFG's
richness lies in its arcs.
The hooks for these commands are in an ASCII file saved into a
machine-dependent tools subdirectory; e.g.,
csim/tools/sun_solaris/gui_setups.
3.3 SCHEDULER - Tool for Software Data Flow Graphs (DFGs)
A DFG describes the tasks and inherent data dependencies of an
application; in particular, software applications. The SCHEDULER
utility accepts DFG files and after partitioning, allocating, and
scheduling the flow-graph nodes, produces corresponding
software-programs for each of the targeted processor elements
(PEs).
When triggered, a DFG arc indicates that a "Consume Amount" number
of bytes should be transferred between two nodes. The Scheduler
takes this event and instructs the first node's associated device
to send a message. The message created includes the number of
'consumed' bytes and the address of the device associated with the
destination node. However, if two software nodes are mapped to
the same device, then the entire message process is skipped. In
effect, messages passed between software nodes mapped to the same
device act as if they arrive instantaneously.
alias sched "`alias sched` -o ./programs"
alias stim "`alias stim` -d -o ./programs"
setenv PMOD_PROG ./programs
3.4 ROUTER
3.5 Analysis Tools
3.6 Device Models
delay_box.sim
dynam_sched/SchedRoutines3b.sim
dynam_sched/dynamic_pe.sim
dynam_sched/dynamic_sched.sim
generic_pe.sim
generic_xbar.sim
latency.sim
lbus.sim
monitor.sim
multi_models/monitor.sim
multi_models/multi_task_pe.sim
multi_models/parameters.sim
multi_models/subroutines.sim
multi_priority_pe.sim
parameters.sim
race_nic.sim
race_xbar.sim
racepp_nic.sim
racepp_nic_fd.sim
racepp_xbar.sim
racepp_xbar_fd.sim
subroutines.sim
switcher.sim
c40.sim
cascade_bus.sim
lanai.sim
myrinet_xbar.sim
4. More on the Performance Model Demonstration
The simple demonstration performance model used above was
constructed to highlight CSIM's performance modeling ability.
This description does assume a basic understanding of CSIM. An
overview of CSIM and links to CSIM documentation was
provided in An Overview of CSIM above.
The remainder of this subsection describes each of these files
and how they can be executed. As will be shown, some of the power
of CSIM for system performance modeling is demonstrated by the
application of either DFG (either .dfg file) to either system
architecture description (either .sim file).
4.1 A Simple Architecture File: arch1.sim
%include ../../core_models/generic_pe.sim
4.2 A More Complicated Architecture File: arch2.sim
Figure 3 - arch2.sim. A Slightly More Complex Hardware Model.
Figure 4 - The arch2.sim dual processor module.
4.3 A Simple Data Flow Graph: flow1.dfg
4.4 A More Complicated Data Flow Graph: flow2.dfg
Figure 5 - flow2.dfg: A More Complicated DFG.
Figure 6 - A supernode named "SuperNode" and assigned the type "Butterfly".
4.5 A Node to Device Mapping File: flow2map.csim
The missing link is an association between each possible DFG node
map, and any device names. This is accomplished by the file
flow2map.csim given in Listing 1. It is necessary to edit this file
when changing assignment of flow2.dfg nodes from arch1.sim to
arch2.sim. In particular, when the variable "usearch1" is defined
(as it is in Listing 1), the software node "AsuperNode/Bsource" is
assigned to the device "/source" found in arch1.sim. Otherwise,
it is assigned to the device "/dual_processor/processor1" found in
arch2.sim.
Listing 1 - flow2map.csim
<xml version="1.0" standalone="yes">
<csim_sw_file>
CGUIformatVersion 1.850000
%define usearch1
%ifdef usearch1
macro Asource = /source
macro Asink = /sink
macro AsuperNode/Bsource = /source
macro AsuperNode/Bsink = /sink
%endif
%ifndef usearch1
macro Asource = /source
macro Asink = /sink
macro AsuperNode/Bsource = /dual_processor/processor1
macro AsuperNode/Bsink = /dual_processor/processor2
%endif
</csim_sw_file>
4.6 Running the Demonstration Performance Models
These four combinations of 'hardware' and 'software' models
demonstrate some of CSIM's ability to efficiently handle
performance modeling. Both 'hardware' and 'software' designs can
be constructed from the top down where a coarse model is refined,
and modules are added or replaced, as more information becomes
available. It is easy to see that one design group may be using a
particular 'hardware' and 'software' model set while a second
group alters the 'hardware' model and a third group modifies the
'software' model. At any time, and with an appropriate mapping
file, updated 'software' can be mapped to either an original or
updated 'hardware'. In this way, design tradeoffs can be
addressed in both a system's architecture and software design
while the system is being developed.
A. CSIM Distribution Directory Tree
1. $CSIM_ROOT Directory Tree
csim ....................................................... 6.317-KB
|-- tools ............................................... 7360.697-KB
|-- model_libs .......................................... 2432.954-KB
|-- demo_examples ....................................... 409.479-KB
2. Tools Directory Tree
csim/tools ................................................. 12.373-KB
|-- sun_solaris ......................................... 1605.874-KB
| |-- general_utilities ............................... 256.060-KB
|
|-- sgi_irix ............................................ 1763.669-KB
| |-- general_utilities ............................... 498.940-KB
|
|-- mac_osx ............................................. 1764.114-KB
| |-- general_utilities ............................... 276.842-KB
|
|-- i86_linux2.2 ........................................ 1298.092-KB
| |-- general_utilities ............................... 295.574-KB
|
|-- hp_ux ............................................... 1816.703-KB
| |-- general_utilities ............................... 507.316-KB
|
|-- bin ................................................. 1.159-KB
12 Directories.
Total Space Used = 10.096716-MB (3.559003-MB Compressed)
csim/model_libs ............................................ 5.632-KB
|-- perfmod2 ............................................ 302.197-KB
|-- icons ............................................... 591.073-KB
|-- core_models ......................................... 263.188-KB
| |-- sharc_test.dir .................................. 33.729-KB
| |-- raceway_test.dir ................................ 29.004-KB
| |-- pe_xbar_test.dir ................................ 7.398-KB
| |-- pe_bus_test.dir ................................. 7.357-KB
| |-- myrinet_test.dir ................................ 19.015-KB
| |-- multi_models .................................... 82.637-KB
| | |-- raceway_test.dir ............................ 20.273-KB
| | | |-- programs ................................ 7.800-KB
| | |
| | |-- pe_bus_test.dir ............................. 8.598-KB
| | | |-- programs ................................ 2.988-KB
| | |
| | |-- myrinet_test.dir ............................ 20.644-KB
| | |-- programs ................................ 7.800-KB
| |
| |-- dynam_sched ..................................... 459.569-KB
| | |-- raceway_test.dir ............................ 18.836-KB
| |
| |-- c40_test.dir .................................... 19.642-KB
|
|-- arith_lib ........................................... 120.637-KB
|
|-- COTS_boards ......................................... 2.560-KB
| |-- MERC ............................................ 42.901-KB
| | |-- programs .................................... 1.024-KB
| |
| |-- CSPI ............................................ 12.648-KB
| | |-- programs .................................... 2.088-KB
| |
| |-- ALEX ............................................ 11.165-KB
| |-- test ........................................ 35.757-KB
| |-- programs ................................ 1.024-KB
|
|-- general_blocks ...................................... 89.242-KB
|-- Arithmetic ...................................... 19.730-KB
|-- Comparison ...................................... 6.437-KB
|-- Conversions ..................................... 2.778-KB
|-- Counters ........................................ 6.797-KB
|-- DS_Type_Operations .............................. 3.860-KB
|-- Data_Structure_Access ........................... 8.174-KB
|-- Delays .......................................... 3.294-KB
|-- Examples ........................................ 66.421-KB
|-- Execution_Control ............................... 5.777-KB
|-- File_Access ..................................... 9.899-KB
|-- Generators ...................................... 24.289-KB
|-- Logical ......................................... 4.412-KB
|-- Loops ........................................... 6.436-KB
|-- Memory .......................................... 9.201-KB
|-- Miscellaneous ................................... 7.322-KB
|-- Plot_Generation ................................. 3.049-KB
|-- Probes .......................................... 2.432-KB
|-- Queues .......................................... 4.507-KB
|-- Servers ......................................... 5.350-KB
|-- Statistics ...................................... 7.102-KB
|-- Switches ........................................ 2.135-KB
|-- Timers .......................................... 1.982-KB
|-- Traffic_Generators .............................. 6.117-KB
|-- Vector_Operations ............................... 11.636-KB
52 Directories.
Total Space Used = 2.432954-MB
csim/demo_examples ......................................... 12.887-KB
|-- demo0 ............................................... 1.985-KB
|-- demo1 ............................................... 39.416-KB
|-- demo2 ............................................... 46.575-KB
|-- demo2_slider ........................................ 55.994-KB
|-- demo3 ............................................... 43.326-KB
|-- demo4 ............................................... 144.143-KB
|-- demo6 ............................................... 23.447-KB
|-- demo7 ............................................... 41.706-KB
9 Directories.
Total Space Used = 0.409479-MB
B. CSIM Device-Level Behavioral Descriptions
The behavioral descriptions of devices consist of standard C-language
with a small set of extension functions developed by CSIM.
A simple CSIM device, "HW_dummy.sim", is given in Figure 7. This
device is a data sink that accepts messages but does nothing with
them. The device begins and ends with the keywords
"DEFINE_DEVICE_TYPE:" and "END_DEFINE_DEVICE_TYPE." The function
"PORT_LIST" provides the simulator with this device's topological
constraints-e.g. the number of allowable connections. A command
"DEVICE_CLASS" provides the simulator with class information (at
the moment, "programmable" is the only one being used).
DEFINE_DEVICE_TYPE: HW_dummy
PORT_LIST( inp ); /* Contains only an input port (inp). */
/* Local Variables */
int length_in;
struct message_struct *message_in;
int my_id;
DEFINE_THREAD: start_up /* Start blocking delay device */
{
DELAY(0.001);
/* Launch both port-handling process. */
TRIGGER_THREAD( process_inp, 0.1, 0 );
my_id = myid( MY_NAME );
fprintf(LinkTline,"replace_y_axis %d %s\n", my_id, MY_NAME);
}
END_DEFINE_THREAD.
DEFINE_THREAD: process_inp /* Process handles input port messages. */
{
while (1) /* Wait for data arrival */
RECEIVE( "inp", &message_in, &length_in );
}
END_DEFINE_THREAD.
END_DEFINE_DEVICE_TYPE
Figure 7: A simple CSIM device "HW_dummy".
A device's behavior is specified in software processes called
"threads" that begin and end with the keywords "DEFINE_THREAD:"
and "END_DEFINE_THREAD." Each device must have a startup thread.
The startup thread in Figure 7 delays the remainder of the
device's actions 0.001 msec1, then spawns a second thread named
"process_inp", writes information into a post-processing summary
file, and ends. The second spawned thread, "process_inp",
contains an infinite loop with a function "RECEIVE()" that blocks
until a message is received. Received messages are processed and
recorded automatically by the "RECEIVE()" function.
It is often useful to pass some 'C' language code, including functions and global
variable declarations, directly through to the 'C' compiler by
enclosing it in a DEFINE_GLOBAL: and END_DEFINE_GLOBAL. pair.
This is called a global block.
C. Software Commands Processed by the Programmable Class
The Scheduler produces commands for devices containing the CSIM
preprocessor instruction "DEVICE_CLASS=(programmable);" and for
the STIM_SCHEDULER.1 Such devices include "generic_pe.sim",
"multi_priority_pe.sim", "c40.sim", "sharc.sim",
"multi_task_pe.sim" and "dynamic_pe.sim". The commands or
'software' instructions ordinarily produced by the static
Scheduler are a subset of those handled by these programmable
devices. All such commands are initially read from a file by the
"read_program" function in the file "subroutines.sim". Below is a
list of all such known commands. Required arguments are enclosed
in a "<>" pair. Optional arguments are enclosed in square
brackets "[]".
D. Graph Files
The following are the listings of the structure (graph topology)
file arch1.sim in Figure 8, and the Data Flow Graph flow1.dfg in Figure 9.
<xml version="1.0" standalone="yes">
<csim_hw_file>
CGUIformatVersion 1.850000
%include ../../core_models/monitor.sim
%include ../../core_models/subroutines.sim
%include ../../core_models/parameters.sim
%include ../../core_models/generic_pe.sim
<DEFINE_MODULE> top_level <top_diagram>
<DEFINE_NODE_INSTANCES>
<ins 1> source = generic_pe <vrt> 1.800000 1.000000 3.600000 1.800000 </ins>
<ins 1> sink = generic_pe <vrt> 6.200000 1.000000 8.000000 1.800000 </ins>
<ins 1> Monitor = Monitor <vrt> 4.000000 2.000000 5.800000 2.800000 </ins>
</DEFINE_NODE_INSTANCES>
<DEFINE_TOPOLOGY>
<lnk> source io_port <to> sink io_port <a_dr> hdplx <a_ql> 1 <a_tr> 100
<a_lt> 1.5 <a_cs> 1 1 <vrt> 6.200000 1.400000 3.600000 1.400000 </lnk>
</DEFINE_TOPOLOGY>
<ANNO> 1.800000 0.000000 A Simple Architecture File.</ANNO>
</DEFINE_MODULE>
</csim_hw_file>
Figure 8: XML for the Simple Architecture "arch1.sim".
<xml version="1.0" standalone="yes">
<csim_sw_file>
CGUIformatVersion 1.850000
<DEFINE_GRAPH> top_level <top_diagram>
<DEFINE_NODE_INSTANCES>
<ins 1> START = unnamed
<a_ct> 0 <a_it> 1 <a_mp> /source
<vrt> -1.600000 2.400000 -0.400000 3.000000 </ins>
<ins 1> EXIT = unnamed
<a_ct> 0 <a_it> 1 <a_mp> /sink
<vrt> 2.800000 3.600000 4.000000 4.200000 </ins>
<ins 1> Proc1 = unnamed
<a_ct> 11 <a_it> 10 <a_mp> /source
<vrt> 0.600000 2.400000 1.800000 3.000000 </ins>
<ins 1> Proc2 = unnamed
<a_ct> 7 <a_it> 2 <a_mp> /sink
<vrt> 0.600000 3.600000 1.800000 4.200000 </ins>
</DEFINE_NODE_INSTANCES>
<DEFINE_TOPOLOGY>
<lnk> START begin <to> Proc1 in
<a_pd> 1 <a_th> 1 <a_cn> 1 <a_in> 0
<vrt> 0.600000 2.600000 -0.400000 2.600000 </lnk>
<lnk> Proc1 out <to> Proc2
in <a_pd> 1 <a_th> 2 <a_cn> 2 <a_in> 0
<vrt> 1.200000 3.600000 1.200000 3.000000 </lnk>
<lnk> Proc2 out <to> EXIT end
<a_pd> 1 <a_th> 10 <a_cn> 10 <a_in> 0
<vrt> 2.800000 3.800000 1.800000 3.800000 </lnk>
</DEFINE_TOPOLOGY>
<ANNO> -2.000000 0.000000 A Simple Data Flow Graph</ANNO>
</DEFINE_GRAPH>
</csim_sw_file>
Figure 9: XML for the Simple DFG "flow1.sim".