

# Sc2 C-to-FPGA Compiler

#### Maya Gokhale, Janette Frigo, Christine Ahrens, Marc Popkin-Paine

Los Alamos National Laboratory

Janice M. Stone

Stone Ergonaut





## **Overview**

#### Language

- □ C subset augmented with parallel communicating processes
- □ FIFO-based streams to communicate data between processes
- □ Signals and Parameters for coordination and flow control
- □ Process located on hardware (FPGA) or software (Linux PC)

### Compiler

- □ Based on Stanford University Intermediate Format (SUIF) library
- □ Targets Linux PC based AMS Firebird board
- □ Easily re-targetable: board architecture described in a file
- Generates Register-Transfer-Level VHDL
- □ Source Code available at http://rcc.lanl.gov

### Applications

- □ Signal and image processing
- □ Fixed point, use external memory and Block RAM





### **Sc2 Processes**

- Process body (the code it contains) is described in a process function
  - Image: Image:
  - Process function header describes streams, signals, and parameters that the process function uses
- Each process is an independent unit ///PROCESS directive describes the process
- Processes execute concurrently
  - □ Sc\_initiate intrinsic is used to start a process
  - Any software process may initiate another software process or hardware process
- Arrays of processes can be defined





### **Example: Process Function directives**

Two process functions with input and output streams /// PROCESS\_FUN host1\_run

/// OUT\_STREAM sc\_uint32 output\_stream

/// PROCESS\_FUN\_BODY

/// PROCESS FUN END

- - -

- - -

/// PROCESS\_FUN controller\_run /// IN\_STREAM sc\_uint32 input\_stream /// OUT\_STREAM sc\_uint32 output\_stream /// PROCESS\_FUN\_BODY

/// PROCESS\_FUN\_END





### Example: Process and Connect Directives

/// PROCESS controller PROCESS\_FUN controller\_run

**TYPE HP ON PE0** 

/// PROCESS host1 PROCESS\_FUN host1\_run



Connections can also be described graphically, and /// directives are generated



host1

controller



# **Streams and Signals**

- Streams transmit data between processes
- Streams can be defined between software, hardware-software, and hardware processes
- Stream intrinsic functions are defined to
  - □ Read
  - Check for end of stream
  - □ Write
- Hardware streams are implemented as hardware FIFOs with user-defined FIFO depth in the Streams-C hardware library
- Software streams are managed by the thread-based Streams-C software runtime library

- Signals are used to synchronize processes and coordinate phases of processing
- Signal intrinsic functions are defined to
  - Post a signal, along with a single word of data
  - Wait for a signal and receive a single word of data
- Hardware and software signal implementation is similar to streams
- Parameters provide a mechanism for giving each newly initiated process a word of unique data.





### Sc2 Code Example: Polyphase Filter



## **Compiler Structure**







# **Synthesis Compiler Features**

- Uses the SUIF 1.3 library (suif.stanford.edu)
- Uses Tim Callahan's inline pass to inline function calls
- Optimizations include
  - SUIF optimizations such as constant folding, common sub-expression elimination, dead code elimination
  - Loop pipelining of innermost loops
  - Loop unrolling (directive)
- Compiler schedules sequential code, performs fine-grained parallelization
- Compiler reads board architecture from a file
  Easily retargetable
- Compiler source is available at rcc.lanl.gov





## **Board Definition File**

Memory Type EXTERNAL64 Data size 64 bits **Read/Write Port** OUT MAR width 32 bits **BUFFER MDR width 64 bits** OUT R EN width 1 bit OUT W EN width 1 bit Identify MAR name MAR Identify MDR name MDR Identify Read enable name R EN Identify Write enable name W EN Load Store latency 1 (MAR, MDR) Memcopy latency 8 MAR, MDR, MDR, MDR, MDR, MDR, MDR, MDR, (MAR, MDR) **Architecture Firebird Board Virtex2000 Processor PE0** 4 EXTERNAL64 memory mem size 1000000 memory-number 0 controller Mem641 generics (schedule = priority, LADbase=0x1000, LADinc=0x200, mem component=EXTERNAL) Los Alamos National Lab





# **Applications**

#### Poly phase filter bank of four

- **D Ppf\_a: 32-bit stream input data, external memory for coefficients**
- □ **Ppf\_ab: 32-bit stream input data, block ram for coefficients**
- **Ppf1: 64-bit external memory input data, registers for coefficients**
- Ppf: 32-bit stream input data, registers for coefficients

#### K-Means Clustering

- Unsupervised clustering of multi-spectral data
- 32-bit stream input data, block ram for centers

#### Fast Folding

Modified butterfly FFT

- Performance evaluation in progress automatically generated hardware ppf1 faster than GHz Pentium
- Applications source code available on web site





# Summary

- Streams-C compiler synthesizes hardware circuits that run on reconfigurable hardware from parallel C programs
- C-to-hardware tool with parallel programming model and efficient hardware libraries
- Functional Simulator runs on host workstation
- 5 10x faster development time over hand-coded
- Performance comparable to GHz Pentium
- Open Source we welcome collaborations



