This example is actually chapters 2 and 4 of a more complete document, CSIM as a Performance Modeling Tool: An Overview. The example includes the construction and interpretation of a performance model. Details of the CSIM modeling language are left to the documenation at: CSIM Docs.
1.0 Building Hardware and Software Models
The CSIM distribution includes a number of demonstrations in the directory csim/demo_examples. This section focuses on the performance modeling demonstration of Multi-Computer Architecture Performance Modeling. The example includes two 'hardware' models and two 'software' models and resides in the demo directory 'csim/demo_examples/Lesson_Models_for_HwSw_Performance_Modeling'. Below are instructions for building the simpler of two software models on the simpler of two hardware descriptions.
(1) Make sure you have sourced the csim/tools/setup file, as described by your custodian.Build the Hardware Model
(2) Create a directory where you have write access. Then copy the example performance model directory contents to that directory:
cp -r $CSIM_ROOT/demo_examples/Lesson_Models_for_HwSw_Performance_Modeling ~/myperfmodel
(3) Move into your directory where the models are:
Build the Software Model
(4) Open the GUI on the arch1.sim file.
This will open the simpler of two 'hardware' architecture models in the CSIM graphical tool and display a diagram similar to that in Figure 1.
Figure 1 - arch1.sim. A Simple Hardware Model.
(5) Choose the menu item "Tools->Build Simulation" to compile the HW architecture model.
(6) Choose the menu item "Tools->Build Routing Table" to construct the network routing information tables.
Alternative Build Note: Steps four through six can be substituted for the following three non-graphical commands executed from the C-shell:
Nothing about CSIM requires use of the GUI. Non-graphical commands are often quicker.
(7) Choose "File->Open->Open a new file" and choose the file "flow1.dfg". This provides a graphical view of the simpler of two SW models and yields a diagram similar to that in Figure 2.
Figure 2 - flow1.dfg. A Simple DFG.
(6) Choose the menu item "Tools->Build DFG SW" to build the simple 'software' model.
(7) Choose the menu item "Tools->Plot Ideal TimeLine" to see the ideal timeline.
2.0 Running Simulations
(1) The performance modeling simulation can be viewed from the perspective of either the hardware or software. By default, the simulation is viewed from the hardware perspective and the default graphical window depicts the hardware architecture: a graphical description similar to Figure 1.
This can be changed to view the simulation from the data flow graph (DFG) representation (a view similar to that in Figure 2). To accomplish this, type setenv SIM_GRAPH flow1.dfg before starting up the GUI.
The simulation view can be changed back to the hardware perspective (before starting the GUI) by typing unsetenv SIM_GRAPH or equivalently setenv SIM_GRAPH arch1.sim.
(2) If not in the GUI, type gui arch1.sim or gui flow1.dfg. Choose the menu item Tools->Run Simulation. This step could equally have been executed from the command line via sim.exe.
(3) Optionally choose Animation->Animation Types->Nodes: Concurrent Activities and/or choose Animation->Animation Types->Links: User/Model Defined. These commands override default device and node coloring discussed in Run-time analysis below and are less useful for models (such as these) that use "core_models" components.
(4) Click on "Run/Continue".
(5) When asked (at the UNIX shell window), choose a verbosity level. This verbosity level controls the detail of command-line feedback the simulation will provide about the state of messages in the simulation. Zero is typically chosen to minimize output and speed up the simulation.
3.0 Analyzing Results
The simulation shows the flow of messages from creation to destination by coloring the various device and DFG objects. The simulation can be slowed down by adjusting the "Speed Slowdown" slider, stopping, stepping through the simulation, or crawling through the simulation per simulation control panel buttons. These and other GUI-accessible controls are described in CSIM documentation (see The CSIM Graphical Simulator).
When viewing a general simulation or plotted output, colors have the following meanings:
Network Switching Links are:
HW Generic XBar Links are:
- Purple when unused. They are thin before their first use or when a window is refreshed. They become thick after any use and remain that way until the window is refreshed (usually by changing views).
- Yellow on each segment where a packet request has been made and where each wormhole has been established. The yellow back tracks leaving purple behind its retreat-all the way to its destination if the wormhole is denied.1
- Orange is shown for control messages (such as token transfers) in network models.
- Blue when the first packet of a given message block passes across a link.
- Green if data flows down a segment unencumbered.
- Red to show contention if data is already flowing down a segment when a second data stream attempts to create a wormhole down that same segment.
HW Devices, HW Modules, SW Nodes, and SW Supernodes are:
- Purple when unused.
- Green when data is flowing through the device. (There is no contention on this XBar model.)
- Network XBars and NIC/NIU devices are:
- Black (uncolored) when unused.
- Blue when the device is processing its first packet of a given message block.
- Green if handling a control message (not implemented consistently).
- Red when the device is being used by further packets.
Other devices listed in the timeline plots including the generic_pe and multi_priority_pe are:
- Light cyan with a thin border for devices and nodes and a thick border for modules and supernodes.
- Blue when they initiate the first packet of a given message block.
- Color-coded by a mapping from the LAST letter or digit of the data flow diagram node name that is providing the device's instructions.
- Timeline plot devices supporting an unnamed monotonic (an alarm that goes off periodically to restart a data flow: say at 33.3ms) are pink.
In addition to an object's color, a link's simulated values can be obtained during simulation runtime by choosing a link and then choosing "Options->Examine Link". Similarly, a list of all active links can be obtained by choosing "Options->List Active Links".
4.0 Post-processing Analysis
There are a number of ways to analyze simulation output. For instance, any number of specialized 'hooks' can be placed into the 'C' code of hardware devices that might output information into an external file or provide graphical information at runtime (e.g., a 'meter'). Below are six methods that provide post-process information using existing simulation output. Most of the post-processing analysis tools use the xgraph plotting package. This package processes data files created by the simulation.
(1) To view the run-time-generated process timeline, choose "Tools->Plot Proc Timeline" from the GUI. Equivalently, type "xgraph ProcTline.dat &" from the command line.
(2) To view the run-time-generated timeline as well as the network utilization paths, choose "Tools->Plot Comm+Proc Tline" from the GUI. Equivalently, type "xgraph Spider.dat ProcTline.dat &" from the command line.
(3) To view system-wide contention levels, type "view_contention LinkTline.dat" from the command line. (This step can not easily be processed directly from the GUI.) The view_contention executable creates a file 'LinkTline.hst' that can be viewed graphically with the command "xgraph LinkTline.hst". Other contention analysis options are described at Contention Viewer.
(4) To create a specialized plot of simulation events, use the event tool. Like the contention analysis tool, this involves processing a text file: type "timeline EventHist.dat" and respond to a series of data-generated questions. The results are then viewed by entering "xgraph EventHist.tln". This is a powerful tool but it is also fairly complicated. Instructions for its use are available at Time-Line.
(5) View the ASCII file 'summaries.dat' for processor, link, and port utilizations. Note that these statistics are collected between the default times "Time1" = 0 msec and "Time2" = 1,000,000 msec. This time window can be changed by editing the file "csim/model_libs/core_models/parameters.sim".
(6) To create a supplemental event history file, run the simulation at high verbosity and pipe the output to a separate file. This is accomplished as follows:
5.0 More on the Performance Model Demonstration
The simple demonstration performance model used above was
constructed by users to exploit CSIM's performance modeling ability.
This description does assume a basic understanding of CSIM. An
overview of CSIM and links to CSIM documentation was
provided in An Overview of CSIM above.
Any CSIM developer can develop useful performance models using only the CSIM GUI and the device models provided with advanced CSIM distributions. Such a developed system typically consists of two user-generated files. The first contains a topological description of the modeled system's hardware. The second contains a data flow graph (DFG) representative of software instructions for the hardware model. By convention the 'hardware' file is given an extension ".sim" and the 'software' file is given an extension of ".dfg".
One such simple representative system resides in the directory "csim/demo_examples/Lesson_Models_for_HwSw_Performance_Modeling". This directory contains five files and one directory:
5.1 A Simple Architecture File: arch1.sim
A diagram of the "arch1.sim" file is included in Figure 1. In these diagrams, boxes represent devices or modules and connecting lines represent communications links. Devices are behavior description files written in standard 'C' that contain any number of CSIM constructs (discussed in The CSIM PreProcessor). Modules (shown in Figure 3) are topological descriptions representing a grouping of devices, (other) modules, links, and include one or more ports that represent an external interface.
The CSIM event simulator will pass messages through links into and out of devices. Links restrict the flow of these messages in time by user-modified parameters. These parameters include transfer rate 'R', link latency 'L', link flow characteristics 'D' (full-duplex vs. half-duplex vs. simplex), and message queue size 'Q'. The devices may create, modify, or destroy messages.
Devices can be written by the user or obtained from a library. All devices in Figure 1 through Figure 4 are part of advanced CSIM distribution packages and are located in the core_models subdirectory. Both library and user-generated devices can be imported into the model using the GUI by selecting `File->Import->by Reference to File' and choosing a desired device file. (For a description of the GUI, see The CSIM GUI: Describing 'Hardware' Models and Data Flow Graphs.) Behind the scenes, this GUI import action places reference lines into the arch1.sim file of the form:
The remainder of the arch1.sim file contains Extensible Markup Language (XML) instructions describing the architecture's topology. These XML instructions are fairly easily reverse engineered and include information about where in a diagram the device, module and link the components are located, what behavior description files are associated with each device block, and how the components are connected (via links).
In Figure 1, the device titled "Monitor" reads software instructions generated by a DFG and sets up an environment to track simulation results useful for post-simulation analysis. The "generic_pe" devices named "source" and "sink" process this DFG 'software' to determine when, how large, and how many messages they will create and send out or receive from their port ("io_port"). The generic_pe device, described in Device Models, is one of several devices that can process `software' files-that is, alter their simulation behavior via runtime instructions. Further, the DFG is only one of several ways that these 'programmable' devices can be controlled. In general, devices containing the CSIM preprocessor instruction "DEVICE_CLASS=(programmable);" are in some way 'programmable'. Such devices include "generic_pe.sim", "multi_priority_pe.sim", "c40.sim", "sharc.sim", "multi_task_pe.sim" and "dynamic_pe.sim".
Concerning Figure 1, a DFG will stimulate the timing and movement of data between the two specified "generic_pe.sim" devices. All data that is passed between these devices is limited to moving at a rate of 100 MB/second (100 bytes/msec after incurring a latency of 1.5 msec). Because the link is half-duplex ("hdplx"), data can only flow in one direction at a time. The queue length value of one implies that the link will not place unread messages into a buffer. This means that the "sink" must read a message sent by "source" before a second message can be sent.
5.2 A More Complicated Architecture File: arch2.sim
The architecture in Figure 3 includes a module, named "dual_processor", depicted in Figure 4. Modules are graphically differentiated from devices by the width of their (light blue) border and the existence of external ports (depicted as small orange squares). The external ports are named by the attached link. Connections with a higher level arc use the same name; e.g., "port1". Modules can be used to apply "instantiation variables" or "instance attributes" to a part of the architecture. These are variables whose values are locally applied to devices contained within a module. However, in all other ways devices and arcs nested in modules act as though the entire architecture were flattened into a single layer. This flattened view can be imposed on an architecture at simulation run-time by clicking once on a module and choosing `View->Flatten Selected Nodes'.
The architecture in arch2.sim (Figure 3) is similar to that in arch1.sim (Figure 1). The similarities ensure that DFGs designed for arch1.sim will also operate on arch2.sim with no alterations. The reverse is true as well with a qualification: all DFGs designed for arch2.sim can be re-mapped onto arch1.sim through use of a mapping file. This mapping, in fact, is the purpose of the included file "flow2map.csim" discussed later. As mentioned, the difference between the Figure 1 and Figure 3 architectures is the module named 'dual_processor'. This module contains a point-to-point switch allowing simultaneous information flow over independent pairs of attached links. Thus information can flow from the top-level device "source" to the module device "processor2" at the same time that information is flowing from the module device "processor1" to the top-level device "sink".
5.3 A Simple Data Flow Graph: flow1.dfg
The (software) DFG depicted in Figure 2 can be associated with either of the above (hardware) architecture files (arch1.sim or arch2.sim). CSIM requires a unique START node in all DFGs to mark the beginning of a flow of data. The "START" node in this graph is assigned to a top-level 'hardware' device named "source". Of course it is convenient for a 'programmable' (generic_pe) device named "source" to exist as it does in both Figure 1 and Figure 3. DFG-generated instructions contained in the connecting arc require the device associated with this START node to move a single byte of data to the hardware device assigned to the DFG process named "Proc1". In this case, the "Proc1" node is assigned to the same hardware device as "START" (i.e., "source"). When two consecutive nodes in a DFG are mapped to the same source, CSIM knows the data stays local to the processor and doesn't consider moving the data indicated by the intervening arc. (Think about it: If it did, over what path in the associated architecture would a device send data back to itself?). Instead, CSIM ignores the send and receive process specified by the arc and only imposes specified delays that may be indicated by the nodes. In this case, the "START" node imposes a zero msec delay-i.e., no delay. The node "Proc1" instructs the hardware device "source" to delay 11 msec and place a single byte of data into a queue (P=1). The delay and depositing of a byte into a queue is repeated 10 times. Whenever the data queue accumulates two bytes (T=2), those two bytes are sent by the hardware device "source" to the hardware device "sink" assigned to the node "Proc2". The device associated with this latter node 'consumes' those two bytes (C=2). The device associated with "Proc2" then follows the instructions provided by the "Proc2" node; in this case begin by delaying 7 msec. Because 10 bytes are placed sequentially into a queue that is 'triggered' every other byte, the node "Proc2" 'fires' five times. The simulation controlled by flow1.dfg will end whenever the "EXIT" node is reached.
5.4 A More Complicated Data Flow Graph: flow2.dfg
The DFG file "flow2.dfg" depicted in Figure 5 differs from the simpler file "flow1.dfg" in three primary ways. First, it contains a 'supernode'-the DFG analogy for a module. This supernode, called "SuperNode", is depicted in Figure 6. Second, some nodes send messages to and receive messages from more than one node. Third, the nodes are all assigned to indirect variable names such as "Asource". A mapping file, flow2map.csim, handles these assignments and is discussed in the next section.
The DFG supernode is entirely analogous with the 'hardware' module. The CSIM GUI presents the supernode as having a thicker border (see Figure 5). Supernodes have an interface drawn by the CSIM GUI as small orange squares located at one end of an arc. The connecting arc names these 'ports' and each port must be attached to an arc in the higher-level DFG. Like the top-level flow graph, supernodes can contain any combination of nodes, arcs, and other supernodes (however recursion isn't allowed).
Nodes receiving signals from more than one connecting arc require that all of these arcs' queues exceed their threshold size before the node 'fires'. Thus, in Figure 6, the hardware associated with both "Proc1" and "Proc3" must send two bytes to the hardware associated with "Proc2" before this latter node is triggered. Once triggered, it will thrice cycle between a delay of seven msec followed by placing a byte into a queue designated for the node named "Proc4". Likewise, once a node has gone through any specified delay cycle, it places any bytes of data into the queues of all attached arcs. So, for instance, when the node "Proc3" in Figure 6 has delayed five msec, it places a byte into queues designated for both "Proc2" and for "Proc4".
5.5 A Node to Device Mapping File: flow2map.csim
As noted in the prior subsection, nodes in the file flow2.dfg are all assigned to indirect variable names: "Asource", "Asink", "Bsource", and "Bsink". The single supernode is given the mapping name "AsuperNode". The CSIM Scheduler (see Scheduler) processes DFGs and associates these mapping names to hardware. (The Scheduler is able to make this association by using a file named "netinfo" produced in processing the associated hardware file.) Nodes that are nested in a supernode are given a mapping name that is the concatenation of the supernode mapping name, the delimiter "/", and the nested node mapping name. Thus the DFG described by flow2.dfg will yield four node to device mappings: "Asource", "Asink", "AsuperNode/Bsource", and "AsuperNode/Bsink".
The hardware devices to which these nodes can be mapped are:
Listing 1 - flow2map.csim <xml version="1.0" standalone="yes"> <csim_sw_file> CGUIformatVersion 1.850000 %define usearch1 %ifdef usearch1 macro Asource = /source macro Asink = /sink macro AsuperNode/Bsource = /source macro AsuperNode/Bsink = /sink %endif %ifndef usearch1 macro Asource = /source macro Asink = /sink macro AsuperNode/Bsource = /dual_processor/processor1 macro AsuperNode/Bsink = /dual_processor/processor2 %endif </csim_sw_file>
5.6 Running the Demonstration Performance Models
All four combinations of running either DFG model "flow1.dfg" or "flow2.dfg" on either architecture file "arch1.sim" or "arch2.sim" will work. The operation extends the description to build and run this model in Instructions to build a Performance Modeling Demo and Instructions to run the simulation.
Dr. K. Burgess
Dr. R. Artz