# SmartCell Architecture, Design and Performance Analysis for Reconfigurable Embedded Computing

Xinming Huang Worcester Polytechnic Institute, Worcester, MA 01609 Email: xhuang@ece.wpi.edu

## Abstract

This abstract presents SmartCell, a coarse-grained reconfigurable architecture, which tiles a large number of computing units with flexible interconnection fabrics on a single chip. SmartCell is a highly evolvable system in the sense that the number of processing units can be dynamically changed to adapt to different system requirements. It can also be configured to operate in various computing styles such as SIMD, MIMD and systolic array fashions. After discussing about the fundamental features, the design of a seedling SmartCell system with 64 processor elements is presented. The simulation results show that on average SmartCell is about 2 and 5 times more power efficient than the RaPiD and Stratix II FPGA, respectively.

## Introduction

Reconfigurable architectures have long been proposed as the way to bridge the flexibility and performance gap between processors and ASICs. The field programmable gate arrays (FPGAs) are still the dominating technology in this area. However, its bit level flexibility comes at a significant cost of area, power consumption and speed, due to the huge routing area overhead and timing penalty. In recognizing these issues, many researches have been conducted to develop more coarse-grained operators as the basis for reconfigurable computing architectures. A comprehensive review of the existing coarse-grained reconfigurable architectures (CGRAs) can be found in [1]. However, in order to demonstrate the full advantages of CGRA, more research work needs to be done to evaluate the performance through design optimization and benchmark applications.

This paper presents SmartCell – a novel CGRA for high performance low power applications. SmartCell is able to provide high computing capacity by exploring deep data pipeline and parallelism. Moreover, SmartCell provides the flexibility to be configured to perform different tasks and to meet different system requirements. The simulation results also indicate that the SmartCell can achieve good power efficiency performance. Section 2 introduces the SmartCell architecture and its building blocks. In Section 3, the standard cell implementation results for a seedling SmartCell system is provided along with the simulation results, followed by the conclusions in Section 4.

# **SmartCell Architecture**

An overview of the SmartCell architecture is depicted in Fig. 1. In a typical SmartCell system, a set of cell units is organized in a 2D tiled structure. Each cell block consists of four processing elements (PEs) placed at four edges for nearest neighbor connections.



Fig. 1 Block diagram of the SmartCell architecture

The PEs can be configured to perform basic 16-bit logic, shift and arithmetic functions. A three-level layered on-chip network is designed for flexible intra- and inter-cell data communications: a fully connected crossbar inside the cell unit; the short wire connection for adjacent PEs in neighboring cells; and a modified CMesh [2] network as system–level interconnection network. Due to data streaming nature of the targeted applications, intermediate results are flowed among the active cells to avoid high bandwidth on-chip networks.

A specific data path can be generated by configuration of computing components and the on-chip networks through the instruction codes in the memory attached to each PE. Fig. 2 shows the cell architecture and the configuration scheme. At run time, a configuration context is loaded into the instruction register and is then decoded for the data flow and functionality controls on a cycle by cycle basis. A serial peripheral interface (SPI) is designed to chain all instruction memories into a linear array for instruction loading and updating. Dynamic reconfiguration is achieved by loading new contexts into unused instruction memory and using a global select signal to specify operational memory range. The SmartCell can be configured to operate in SIMD, MIMD and systolic array styles. The number of active PEs can also be changed to meet different system requirements.



Fig. 2 Diagram of cell unit and configuration controller

### **System Design and Performance**

A proof-on-concept 4 by 4 SmartCell system with 64 PEs is developed and synthesized using standard cell ASIC technology. The design parameters and the synthesis results are listed in Table 1. The entire system consists of about 1.6M gates with an average power consumption of 156 mW at 100 MHz.

Table1. System design and synthesis results

| System dimension     | 4 by 4             |
|----------------------|--------------------|
| Library              | TSMC .13µm process |
| Maximum frequency    | 107 MHz            |
| Area                 | 1.6M gates         |
| Simulation frequency | 100 MHz            |
| Average power        | 156 mW             |



Fig. 3. Area and power consumption (a) Area breakdown (b) Average power consumption breakdown @ 100 MHz.

System area and power breakdowns (average benchmark simulation results) are depicted in Fig. 3. The processing units contribute to more than 50% of both area and power. The power and area results of the connection fabrics are derived by subtracting the total number by those of the processing units and memories. The power consumption is also compared with other computing platforms, including RaPiD [3] and Stratix II FPGA [4], as shown in Fig. 4. Due to different processes and operating voltages, a power



Fig. 4 Power consumption comparisons in FIR filter, matrixmatrix multiplication (MMM), 2D discrete cosine transform (2D-DCT) and motion estimation (ME) benchmarks

scaling applied for fair comparisons. The results show that on average the SmartCell system is about 2 and 5 times more power efficient than the RaPiD and FPGA, respectively.

### Conclusions

In this abstract, the SmartCell architecture is proposed as an evolvable and reconfigurable system for high performance and power efficient applications. After describing the fundamental features, a seedling SmartCell system is developed and evaluated using standard cell ASIC technology. The results show that it can be a promising architecture for embedded computing systems. As part of future works, the dynamic reconfiguration performance will be evaluated. Large real-time applications will be developed and implemented on the SmartCell platform to demonstrate its computing capacity and performance in real applications.

#### References

- [1]. R. Hartenstein, "A decade of reconfigurable computing: a visionary retrospective," In *Proceedings of IEEE DATE'01*.
- [2]. J. Balfour and W. Dally, "Design tradeoffs for tiled CMP onchip networks," In *Proceedings of the international conference on Supercomputing*, pp. 187 - 198, 2006.
- [3]. D.C. Cronquist, and etc., "Architecture design of reconfigurable pipelined datapaths," In Proceedings of Conference on Advanced Research in VLSI, 1999.
- [4]. Altera Corporation, http://www.altera.com