# **TRIBASIM : A NOVEL NETWORK ON CHIP SIMULATOR BASED ON SYSTEM C**

Daniel Gakwaya<sup>1</sup>, GaoYuJin<sup>2</sup>, Jean Claude Gombaniro<sup>3</sup> and Jean Pierre Niyigena<sup>4</sup>

Department of Computer Science, Beijing Institute of Technology, Beijing,100081 <sup>1</sup>wayadn@yahoo.fr <sup>2</sup>paulgyj@gmail.com

<sup>3</sup>Gombaniro002@yahoo.fr <sup>4</sup>niyigelnx@yahoo.fr

### Abstract

In this paper, we develop a simulator for the Triplet Based (TriBA) Network On Chip processor architecture. TriBA(Triple-based Architecture) is a multiprocessor architecture whose basic idea is to bundle together the object programming basic philosophy and hardware multicore systems[1] .In TriBA, nodes are connected in recursive triplets .TriBA network topology performance analysis have been carried out from different perspectives [2] and routing algorithms have been developed [3][4] but the architecture still lacks a simulator that the researcher can use to run simple and fast behavioral analysis on the architecture based on common parameters in the Network On Chip arena. We present TriBASim in this paper ,a simulator for TriBA, based on system c[6] .TriBASim will lessen the burden on researchers on TriBA ,by giving them something to just plug in desired parameters and have nodes and topology set up ready for analysis.

### Keywords

Keywords: NOC, triba, simulator, system c

# **1. INTRODUCTION**

The last decade has seen Networks on chip emerge as a viable replacement for the traditional bus based interconnection system that has dominated in systems on chip for at least 3 decades. This is due the flexibility of design and most importantly the reduction in energy consumption for computing chips inside our electronic devices Networks on chip offer[5].

Networks on chip were introduced by a few pioneer papers that pointed out that future system on chip designs will be limited the quality of the interconnection system between computing modules[6,7,8]. They proposed a brand new idea that views the System on Chip as a micronetwork of components. New designs would borrow ideas from the Data Networks research area and replace bus based interconnection systems with packet switched networks between modules within the System on Chip.

Although Networks on Chip have a lot of similarities with Data Networks ,there are differences one needs to consider .For instance NoCs are constrained to work within small distances inside the SoC while Data Networks can span kilometers of distance[6] .Also the links connection

Natarajan Meghanathan et al. (Eds) : CSEN, ADCO - 2014 pp. 23–30, 2014. © CS & IT-CSCP 2014

DOI: 10.5121/csit.2014.41003

### 24 Computer Science & Information Technology (CS & IT)

structure is more predictable for NoCs than it is for Data Networks .This led to completely new designs, protocol stacks and routing algorithms new Networks on Chip would be built upon. It is also important to note that the micro-network of components way of thinking used in NoCs allows abstraction in Traffic Modeling[9].

Numerous network on chip architectures have been proposed in academia and industry, the topologies such as 2-D Mesh, Torus and Hypercube have been used in various network on chip designs. Along with these topologies, new routing algorithms, switching techniques and flow control mechanisms are selectively combined to meet the particular needs of the system on chip design[9].

TriBA is a network on chip architecture that enforces the concept of Object Oriented Design in the way SoCs are designed[10]. It is suitable for sophisticated embedded applications with multiple concurrent processing centers. This topology's advantage over other 2D topologies such as hypercube topology is ease of realization and assembly [1]. Its nodes are connected in triplets, and higher order triba networks are recursively deduced from lower order ones. TriBASim is introduced in this paper, a simulator based on system c specifically designed to meet the daily needs of a researcher working on TriBA.

The rest of this paper is organized as follows: Section 2 explores already present NoC simulators and studies their intended use .Section 3 introduces TriBA and discusses the details relevant to our design; we delve into the design in section 4; Section 5 shows practical uses of the simulator .Future plans for TriBASim are addressed in Section 6 and Section 7 concludes the paper.

## **2. RELATED WORK**

Numerous Network on Chip simulators have been developed before, targeting different areas in research and industry .Orion [11,12] was developed to run power and area analysis for Networks On Chips .Users input router and link components to build different network configurations and run their analysis .Power and area analysis for TriBASim was basen on On Orion power models .Noxim[13] NoC simulator is based on systemc ,and it can be used to evaluate the quality of a NoC in terms of delay throughput ,area and power consumption. Modified versions of Noxim have been used to run performance analysis using some popular topologies such as torus and twisted torus [14].

NIGRAM [15] is another Noc Simulator also based on systemc .It uses discrete events and is cycle accurate .It is very useful when testing routing algorithms on some regular topologies .One should also mention Nostrum[16] ,a project focusing on developing Network-on-Chip architecture. It addresses the communication issues from the physical to the application levels .These are the simulators that have been relevant to this research ,interested readers can refer to [17] to dig more and see a more detailed list .

## **3. TRIBA OVERVIEW**

"A picture is worth a thousand words!", Fig[1] and Fig[2] will be the basis for our description of TriBA . Fig[1] displays the low level architecture for a triBA node and Fig[2] emphasizes network aspects of a TriBA interconnection which is the focus of our design .We scratch the surface on the concepts used in our design and the interested reader is referred to more in depth references where appropriate . Just like common computer architectures out there, our

architecture is composed of computing modules, memory modules and the interconnection system to allow these two to communicate[18].

For triBA however special care was taken to separate computations from communication .It is composed of three submodules as shown in Fig[1] .ProcUnit carries out computations ,DataUnit is simply a chunk of read/write memory store our data and InterUnit ,the focus of our design, takes care of communications [1,18] .ProcUnit and DataUnit are abstracted away in our design to focus on network aspects of triBa and InterUnit is viewed as a node from here on .



Fig[1] TriBA Architecture



Fig[2] IDC132 addressed interconnected nodes

Each node is assigned an address .TriBA uses an addressing mechanism specifically designed for nodes in triplets ---IDC132 .It has impressive properties such as the reflexive symmetry of IDC132 addresses and the 120°rotation. These combined with the vertex distance computation help remarkably when computing the distance(hops) between nodes in our routing algorithms[19].

Routing algorithms have been developed for TriBA, TDRA (Table Look up Deterministic Routing Algorithm) is one of them: when a node receives a message, it has to decide if it is the recipient of the message or if it has to forward it to neighbouring nodes. When determining the route in TDRA, there is no need to store all the network information in the node, and thus, the transmission overhead it might have generated is avoided[20].

The algorithm uses two tables: a Channel Status Table (CST) that stores the working state of all the output ports of the node and a Route Table, that stores output port to be chosen for each destination node in the network, from the current node.

DDRA (Distributed Deterministic Routing Algorithm)[21] is another routing algorithms for TriBA .It has no routing table at all, the transfer of messages is carried out based on the inherent addressing properties of TriBA nodes. IDC 132 enforces locality, this allows the message to get directly to the destination node if it is local and only go across triplet boundaries when there is need to. IDC132 also allows telling the exact location of the node in the entire interconnection network just by looking at its address. The current version of TriBASim supports DDRA .Packet switching mechanisms were used in TriBASim and credit based flow control was implemented.

# 4. THE TRIBASIM ROUTER

TriBASim has been implemented in systemC, SystemC is a set of C++ classes and macros which provide an event-driven simulation interface in C++. These facilities enable a designer to simulate concurrent processes; each described using plain C++ syntax. SystemC processes can communicate in a simulated real-time environment, using signals of all the data types offered by C++, some additional ones offered by the SystemC library, as well as user defined. In certain

respects, SystemC deliberately mimics the hardware description languages VHDL and Verilog, but is more aptly described as a system-level modeling language[22].

### ARCHITECTURE

Fig[3] shows a closer view to the InterUnit module of a typical node in TriBaSim. It comprises of 3 internal major sub modules: the pre-processor, the routing module and the switching and flow control module .We have 4 input output ports, 3 for communication with other nodes and one to interact with ProcUnit and DataUnit .The ports are color labeled for clarity .Red is for input ports which then pass the received data to the pre-processor sub module .Blue is for output ports and green squares represent our output buffers.



Fig[3] TriBASim Router Architecture

Fig[4] Node Ports interconnections

The ports are used to interface between noses .Links (channels) are used to connect nodes through ports .systemC provides convenience classes to implement ports and channels. sc\_port< sc\_fifo\_in\_if<sc\_bv<64 >>> was used for input ports , sc\_port< sc\_fifo\_out\_if<sc\_bv<64 >>> was used for output ports and sc\_fifo <sc\_bv<64>>> was used for channels. Buffers implemented as fifos using the sc\_fifo class have been designed to be on the output ports .The depths of the buffers can be set at the start of the simulation by passing appropriate parameters to TriBASim.

```
if( destAddr.3rdDoublet!=myAddr.3rdDoublet)
    if(11) sendNorth;exit;
    if(01) sendWest;exit;
    if(10) sendEast;exit;

if( destAddr.2ndDoublet!=myAddr.3rdDoublet)
    if(11) sendNorth;exit;
    if(01) sendWest;exit;
    if(10) sendEast;exit;

if( destAddr.firstDoublet!=myAddr.3rdDoublet)
    if(11) sendNorth;exit;
    if(01) sendWest;exit;
    if(01) sendWest;exit;
    if(01) sendWest;exit;
    if(10) sendEast;exit;
    if(10) sendWest;exit;
    if(10) sendWest;exit;
```

Fig[5]A Simplified Version of DDRA.

Upon reception of the packet the pre-processor checks whether the destination is the current node. The packet is passed to the local port if it is the case and passed to the routing module for destination port processing otherwise. Timings for sending and receiving overheads are also implemented in this sub module.

26

The routing module implements DDRA [21] .The algorithm enforces the principle of locality by sectioning IDC132 addresses into sections .This allows a level by level computation of the output port .A simplified version of the algorithm is shown in Fig[5].

The information from the routing module is then passed to the switching and flow control module .Data is switched to the appropriate port through a simple virtual crossbar switch we have implemented. This module also manages our buffer space by making sure we write to the buffer when there is free space and read from it only when it is not empty .Our flow control is credit based.

We have followed the principle of incremental design; a tribaNode class was designed with addresses, buffers, ports and sub modules as data members and methods to implement node functionality such as sending and receiving data .Sub modules are themselves a set of C++ classes. With the node in place ,we designed a tribaTriplet class to take nodes and connects them in groups of three .The class only provides interface ports to connect to other triplets .The simulator can currently be configured to connect 3 ,9 and 27 nodes.

The latency computations involve sending and receiving overhead and the time of flight, these time values are based on experimental values .Combined with the packet transmission time which depends on the packet size and the link bandwidth, we can get the latency experienced when we send a packet through one link by the formula below. The channel bandwidth is set at the start of simulation when a user runs TriBASim .It is in orders of GBit/s.

 $Latency = Sending \ overhead + Time \ of \ flight + \frac{Packet \ size}{Bandwidth} + Receiving \ overhead_{*}$ 

# **POWER AND AREA COMPUTATIONS**

Orion has been used to do power and area analysis in our simulator ,Orion is a powerperformance interconnection network simulator that is capable of providing detailed power characteristics, in addition to performance characteristics, to enable rapid power-performance tradeoffs at the architectural-level [23].Orion power models are based on real characteristics of hardware composing the interconnect like buffers ,gates and wires.

Considering a flit traversing our router, the total flit power can be computed as follows:

$$E_{flit} = E_{wrt} + E_{arb} + E_{xb} + E_{link}$$

Where E\_wrt is the power dissipated when writing to the buffer, E\_arb the power dissipated on arbitration, E\_xb the power dissipated on switching and E\_link the power dissipated on the link .From a user perspective, all we needed to specify was the parameters for the components of the interconnect and Orion provided the results based on the data they have collected. [23] Has more on the details of how power is modelled.

## **5.** CASE STUDIES

The figures below show a set of simulations we run with triBASim .In Fig[6] we traced the path followed by a packet from source to destination logging port information on each intermediary node .Critical network information can be easily obtained by activating convenience methods on node and triplet classes .In Fig[7] we studied how average latency in the network varies per link throughput per number of nodes Results show that lower level triBA networks saturate earlier

than their higher level counterparts. We have used the same link area and power configuration on entire networks but networks with different configurations can also be studied.

| At time 29 nstripletWest.West sent a packet 1011>> 1111 The data is 101<br>1111100110010010010010010110001010001010                                                                                   |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| tripletWest.East: At time :29 ns A packet from 1011 to 1111 passed through<br>me .It came in through my WEST input port and Went out through my NORTH out<br>put portThis is second doublet analysis  |
| tripletWest.North: At time :29 ns A packet from 1011 to 1111 passed through<br>me .It came in through my EAST input port and Went out through my NORTH ou<br>tput portThis is second doublet analysis |
| tripletNorth.West: At time :29 ns A packet from 1011 to 1111 passed through<br>me .It came in through my WEST input port and Went out through my NORTH ou<br>tput portThis is FIRST doublet analysis  |
| At time 29 ns : tripletNorth.North :The packet from 1111 to 1111Has reached<br>its destination. My address is 1111 and the entire packet is as follows101<br>1111101110011010100100110011001001011111 |
| gakwaya@xubuntu:~/orion2/orion_2.0_20091106\$ ./orion_router_power -pm -d 1<br>-l 1 tribaRouter<br>tribaRouter: 1.08755e-10                                                                           |
| Auffer:0.0159507 Crossbar:0.0574852 VC_allocator:0.00115078 SW_<br>allocator:0.00234997 Clock:0.00419481 Total:0.0811314<br>qakwaya@kubuntu:-/orion2/orion 2.0 200911665 ./orion router area          |
| Abuffer:83097.6 ACrossbar:101580 AVCAllocator:24570 ASWAllocat<br>or:2457 Atotal:211705<br>gakwaya@kubuntu:-/orion2/orion 2.0 200911065 ./orion link 500000 1                                         |
| Link power is 13.422<br>Link power is 13.422<br>Link area is 6.86567c+07<br>gakwaya@kubutu:-/orion2/orion 2.0 200911065                                                                               |
|                                                                                                                                                                                                       |

Fig[6] Packet tracing Simulation within TriBASim



Fig[7] Delay-latency-node count analysis within TriBASim

# **6.** FUTURE WORK

TriBASim can already run the common chores that Network On Chip simulators are supposed to run. We hope to add support for multiple routing algorithms other than DDRA. The simulations we have run are based on random traffic models. We hope to delve into studying the characteristics of the traffics patters for our in-house SoCs and incorporate them in future versions.

### **7.** CONCLUSIONS

A new simulator for the Triplet Based NoC architecture has been suggested .We went through a broad overview of TriBA and displayed its basic characteristics and state of the art .Furthermore, we described the details for the design of our simulator and ended the paper with practical uses showing its usefulness to the triBA researcher and anyone interested in NoCs in general.

### REFERENCES

- [1] 一种新的非冯•诺依曼计算机体系结构 TriBA,石峰, 计卫星, 乔保军, 刘滨,北京理工大学 学报,Vol.26 No.10 Oct 2006; A New Non Von Neumann Architecture TriBA, SHI Feng, J I Wei2xing, QIAO Bao2jun, L IU Bin, Transactions of Beijing Institute of Technology, Vol.26 No.10 Oct 2006.
- [2] TriBA 互联拓扑结构及其性能分析,刘彩霞,石峰,乔保军, HAROON Ur Rashid, 宋 红,北京 理工大学学报,计算机工程 Vol.36 No.15; TriBA Interconnection Topology Structure and Its Performance Analysis, LIU Cai-xia, SHI Feng, QIAO Bao-jun, HAROON Ur Rashid, SONG Hong, Journal of Beijing Institute of Technology, Computer Engineering, Vol.36 No.15.
- [3] 基三分层网络中一种基于查表的确定路由算法,乔保军,石峰,计卫星,刘滨,北京理工大学学报,计算机应用 Vol. 26 No .9 ,Table lookup determined routing algorithm for triplet based hierarchical interconnection network QIAO Bao jun, SHI Feng, JI Wei xing, LIU Bin,Journal of Beijing Institute of Technology ,Computer Applications ,Vol .26 No .9
- [4] 基三分层互连网络及其路由算法设计,计算机工程与设计,乔保军,石峰, 计卫星,北京理工大 学学报,计算机工程与设计, Vol .28 No .18; Triplet-based hierarchical interconnection network and design of its routing algorithm,QIAO Bao-jun, SHI Feng, JIWei-xing, Journal of Beijing Institute of Technology, Computer Engineering and Design, Vol .28 No .18.
- [5] From "Bus" and "Crossbar" to "Network-On-Chip", Arteris S.A. Copyright 2009 Arteris S.A. All rights reserved.
- [6] A generic architecture for on-chip packet-switched interconnections ,Guerrier, P.Greiner, A;Univ. Pierre et Marie Curie, Paris, France Design, Automation and Test in Europe Conference and Exhibition 2000. Proceedings
- [7] A Router Architecture for Networks on Silicon ,Edwin Rijpkema, Kees Goossens, and Paul Wielage ,Philips Research LaboratorieProf. Holstlaan 4, 5656 AA Eindhoven, The Netherlands, Proceedings of progress 2001, 2nd workshop on embedded systems
- [8] Networks on Chip A New Paradigm for Systems on Chip Design ,Luca Benini ,Giovanni De Micheli ,IEEE ,System-on-Chip, 2005. Proceedings. 2005 International Symposium on System On Chip ,17-17 Nov.Page(s) 2 – 6
- [9] Survey of Network on Chip (NoC) Architectures & Contributions, Ankur Agarwal, Cyril Iskander, Ravi Shankar, Journal of Engineering, Computing and Architecture ISSN 1934-7197 Volume 3, Issue 1, 2009
- [10] Locality Aware Optimal Task Scheduling Algorithm for TriBA A Novel Scalable Architecture,KHAN Haroon-U r-Rashid,SHI Feng( 石峰), Jour nal of Beijing Institute of Technology , 2008, Vol. 17, No. 3
- [11] ORION 2.0: A Fastand Accurate NoC Power and Area Model for Early-Stage Design Space Exploration, Andrew B. Kahng, Bin Li, Li-Shiuan Peh and Kambiz Samadi, Proceedings of Design Automation and Test in Europe (DATE), Nice, France, April 2009.
- [12] Orion: A Power-Performance Simulator for Interconnection Networks ,Hang-Sheng Wang Xinping Zhu Li-Shiuan Peh Sharad Malik,Proceedings of MICRO 35, Istanbul, Turkey, November 2002
- [13] http://www.noxim.org/
- [14] Enhanced Noxim simulator for performance evaluation of network on chip topologies, Swaminathan, K., Thakyal, D.; Nambiar, S.G.; Lakshminarayanan, G.; Seok-Bum Ko, Engineering and Computational Sciences (RAECS), 2014 Recent Advances, 978-1-4799-2290-1
- [15] open source simulator for network on chip ,Monika Gupta1, S. R. Biradar2, B. P. Singh,International Journal of Computers & Technology ,Volume 4 No. 2, March-April, 2013, ISSN 2277-3061
- [16] A High Level Power Model for the Nostrum NoC, Penolazzi, S, Jantsch, A. ,Digital System Design: Architectures, Methods and Tools, 2006. DSD 2006. 9th EUROMICRO Conference

#### Computer Science & Information Technology (CS & IT)

- [17] http://networkonchip.wordpress.com/2011/02/22/simulators/
- [18] A New Non Von Neumann Architecture TriBA, SHI Feng, J I Wei2xing, QIAO Bao2jun, L IU Bin, Transactions of Beijing Institute of Te, Vol. 26 No. 10, Oct. 2006
- [19] 基三分层互连网络和 2D-Mesh 的比较,北京理工大学学报, 计算机科学 2007 Vol. 34 No.9 ;Comparison of the Triplet-based Hierarchical Interconnection Network and 2-D Mesh for Multi-core Processor, Journal of Beijing Institute of Technology, 2007, Vol. 34 No.9
- [20] 基三分层网络中一种基于查表的确定路由算法,乔保军,石峰,计卫星,刘滨,北京理工大学学报,计算机应用 Vol. 26 No.9, Table lookup determined routing algorithm for triplet based hierarchical interconnection network QIAO Bao jun, SHI Feng, JI Wei xing, LIU Bin, Journal of Beijing Institute of Technology, Computer Applications, Vol. 26 No.9
- [21] 基三分层互连网络及其路由算法设计,计算机工程与设计,乔保军,石峰, 计卫星,北京理工大 学学报,计算机工程与设计, Vol .28 No .18; Triplet-based hierarchical interconnection network and design of its routing algorithm,QIAO Bao-jun, SHI Feng, JIWei-xing ,Journal of Beijing Institute of Technology, Computer Engineering and Design, Vol .28 No .18.
- [22] SystemC: From the Ground Up, Second Edition, David C. Black (Author), Jack Donovan (Author), Bill Bunton (Author), Anna Keist (Author) Springer; 2nd edition (December 30, 2010),ISBN-10: 0387699570 ISBN-13: 978-0387699578
- [23] Orion: A Power-Performance Simulator for Interconnection Networks ,Hang-Sheng Wang Xinping Zhu Li-Shiuan Peh Sharad Malik,Microarchitecture, 2002. (MICRO-35). Proceedings. 35th Annual IEEE/ACM International Symposiu,0-7695-1859-1

#### AUTHORS

#### Daniel GAKWAYA

A student at BEIJING INSTITUTE OF TECHNOLOGY currently pursuing his master's degree, School of Computer Science, Department of Advanced Embedded Computing. His research interests lie in Network Optimizations and Computer Graphics.

#### Gao Yu Jin

Associate Professor, BEIJING INSTITUTE OF TECHNOLOGY, School of Computer Science, Department of Advanced Embedded Computing ., His research interest include Embedded Multicore Processors.

#### Jean Claude GOMBANIRO

Master's student at the School of Computer Science Of BEIJING INSTITUTE OF TECHNOLOGY, Department of Natural Languages Processing His research interests lie in Big Data Processing and Language recognition Algorithms.

#### Jean Pierre NIYIGENA

Master's student at the School of Computer Science Of BEIJING INSTITUTE OF TECHNOLOGY, Department of Data Networks .His research interests lie in Geolocation Algorithms.







30