# A Prototype FPGA Tile for Subthreshold-Optimized CMOS

Peter Grossmann (grossmann.p@husky.neu.edu), Miriam Leeser (mel@coe.neu.edu) Northeastern University

Field-programmable gate arrays (FPGAs) are frequently used in low power systems because they can implement the same function as a microprocessor in a more energyefficient manner while still offering the benefits of low development time and cost relative to an ASIC. The same could be true in ultra-low power applications operating at subthreshold supply voltages, where performance is sacrificed in favor of increased energy efficiency. Process technology research has demonstrated the benefits of tailoring device design to subthreshold operation. Subthreshold FPGA research is only beginning, and has yet to consider use of subthreshold-optimized devices. Simulation of an FPGA tile with a two-input logic block in a subthreshold-optimized FDSOI process is shown to demonstrate the potential benefits of this technology for subthreshold FPGAs, and to provide a starting point for a more thorough investigation into the best circuit design choices for subthreshold FPGA building blocks.

#### **Motivation for Subthreshold FPGAs**

Ultra-low power systems typically require some combination of custom integrated analog circuits (which may be analog, digital, or mixed signal) and/or ultra-low power microprocessor to meet system requirements and power goals. This has led some researchers to use subthreshold logic both for custom ASICs and microprocessors. Subthreshold circuits attain peak energy efficiency by using a supply voltage that is less than the threshold voltage of the transistors. The power savings obtained from this mode is significant since chip power scales with the square of the supply voltage. The tradeoff for running at such low voltages is an orders-of-magnitude sacrifice in performance. Presently, subthreshold logic is a reasonable design choice when operating frequencies in the tens or hundreds of kilohertz are acceptable. For some applications-such as wireless sensor nodes, digital hearing aids, and RFID tags-this can be the case.

Ultra low power systems turning to subthreshold logic would benefit from having a subthreshold FPGA option to implement digital components. A subthreshold FPGA offers several potential benefits, depending upon system requirements. In processor-based systems, a subthreshold FPGA can offer an energy-efficiency benefit. In ASICbased systems, it can offer reconfigurability. In systems using both, it can enable the integration of the processor function and the ASIC function onto a single chip.

A subthreshold FPGA design has been presented for 90nm bulk CMOS[1]. Simulation of this design showed the energy-delay space accessible by the design for a representative logic function. Other simulations showed the challenges faced in implementing subthreshold FPGA routing circuitry due to process variation and a buffering scheme for mitigating that variation. The potential performance gains from use of subthreshold-optimized devices were not explored in this work.

## Benefits of Subthreshold-Optimized Device Technology

Research has shown that altering device design can improve performance for subthreshold operation. For bulk silicon, halo and retrograde doping are required for standard deep submicron transistors to suppress undesired short-channel effects. It has been argued that this is unnecessary for subthreshold because short-channel effects are reduced at ultra-low supply voltages. Eliminating these dopants reduces junction capacitance, which in turn improves delay and power consumption[2].

For fully-depleted silicon-on-insulator (FDSOI) transistors, another optimization can be made. Lightly doped transistors are desirable to reduce variation in threshold voltage both due to variation in oxide thickness and due to random dopant fluctuations[3]. In addition to performance benefits, subthreshold device optimizations for FDSOI thus have the potential to curtail the large performance variation seen in many subthreshold circuits to date. This technology is not yet mature, but the inherent advantages it holds over commercial bulk processes have been demonstrated at the device level. One goal of this work is to extend that demonstration to the circuit level through the development of subthreshold FPGA circuits.

## **Proposed Test Circuits**

Figure 1 shows a template for an FPGA configurable logic block that will be used to investigate circuit tradeoffs for FPGA building blocks in subthreshold-optimized CMOS. The circuit consists of a two-input lookup table (LUT) with the LUT output connected to a D flip-flop. The output of the LUT and the output of the flip-flop are multiplexed so that either can be programmed as the CLB output. The LUT is SRAM-based with memory storage nodes tied to multiplexer inputs. The multiplexers perform table lookup based on logic inputs A and B. Input B is connected to the second-stage multiplexer and therefore has faster switching characteristics.



Figure 1: Configurable Logic Block (CLB) template.

The first implementation based on this template uses a single power supply to power all circuitry and a standard 6T SRAM bit cell. Future designs will vary how the power supply is connected and how the SRAM bit cell is implemented.

Figure 2 shows a template for a complete FPGA tile. A tile consists of a CLB plus routing resources, and is organized in such a way that a tile can be replicated as-is to form a seamless array of logic and routing. The upper left corner shows the FPGA switch box selected for the template. The switch box consists of four programmable interconnect points (pips), denoted by diamonds, that each provide programmable connections between adjacent CLBs along one routing track. Each of the two CLB inputs may be programmed to connect to any of the four routing tracks in the tile, and the output may be configured to connect to any number of the four routing tracks in the adjacent tile via transmission gates. The overall style of the CLB and tile are based on the Xilinx XC2000 and XC3000 FPGAs[4].

The first implementation based on this template uses transmission gates to implement the pips and the CLB output switches. Future designs will match CLB choices for how the power supply is connected and how the SRAM bit cells for the programming bits are implemented.



#### **Prototype CLB Area and Performance**

The first implementation of the CLB template has been laid out in the MIT Lincoln Laboratory subthreshold-optimized SOI process[3] and simulated using Cadence IC design tools. The CLB is  $65.6 \,\mu\text{m} \ge 23.4 \,\mu\text{m}$ . Some space within these dimensions is available for minor circuit modifications such as adding sleep transistors or additional buffering. To improve design reliability, the minimum transistor gate length used is 0.2 µm rather than the minimum allowed 0.15 µm. Wherever possible, minimum gate widths are used to take advantage of the subthresholdoptimized technology's 1:1 ratio between NMOS and PMOS drive strength and to minimize area so that the routing tracks connecting adjacent CLBs may be as short as possible. This tradeoff is preferable to improving raw CLB performance because delay through routing resources dominates in FPGAs.

The circuit is tested by first programming it with a desired 2-input logic function and then exercising the logic function exhaustively to measure propagation delay through each circuit path. Energy consumption is measured separately for programming, which is a one-time operation after power-up for most applications, and for logic function operation. In general, both of these averages will be dependent upon what function is programmed into the CLB. In the initial test case tried, that of the OR function driven with a testbench running at 1MHz, the energy consumed during programming was 364 fJ, and the energy consumed during exhaustive logic testing with a return to zero between each input combination was 89.5 fJ. This translates into an average programming power of 12.3 nW, and an average logic power of 5.4 nW for the 0.3V nominal supply voltage used in simulation.

# Prototype Tile Implementation and Verification

The first implementation of the tile template has been implemented using Cadence schematic capture tools. Functional verification has been performed on a 3x2 array of tiles by configuring the array as a serial adder. The serial adder is a convenient choice for verification because it requires logic blocks to be configured both with and without use of the D flip-flop, requires use of several different two-input functions, and is simple enough that the routing resources can be allocated by hand. Eight-bit serial addition has been successfully demonstrated for the 3x2 tile array.

#### **Future Work**

The prototype tile will be used as a reference design against which tiles containing alternate circuit implementations will be compared. These tile variations will be placed on a subthreshold-optimized CMOS test chip. Through successful fabrication of such FPGA test circuits, it will be possible to demonstrate both the potential energy benefits of subthreshold FPGAs for ultra low power applications, and the benefits of building such devices using subthreshold-optimized technology.

#### References

- B.H. Calhoun, J.F. Ryan, S. Khanna, M. Putic, and J. Lach, "Flexible Circuits and Architectures for Ultralow Power," *Proceedings of the IEEE*, vol. 98, 2010, pp. 267-282.
- [2] B. Paul, A. Raychowdhury, and K. Roy, "Device optimization for digital subthreshold logic operation," *Electron Devices, IEEE Transactions on*, vol. 52, 2005, pp. 237-247.
- [3] S. Vitale, P.W. Wyatt, N. Checka, J. Kedzierski, and C.L. Keast, "FDSOI Process Technology for Subthreshold-Operation Ultralow-Power Electronics," *Proceedings of the IEEE*, 2010, pp. 333-342.
- [4] S. Trimberger, *Field-programmable gate array technology*, Boston: Kluwer Academic Publishers, 1994.