



Altera Corporation 101 Innovation Drive San Jose, CA 95134

#### High Performance Embedded Computing Workshop



18-20 September 2007



### **Algorithmic Gains with FPGA CoProcessing**

- Many Off-Processor Acceleration Functions have been benchmarked for FPGA CoProcessing
- Acceleration Depends on Algorithm and Logic Division Process

| Application                                                     | SW Only                                          | HW Co-Processing                       | HW Speed Up |
|-----------------------------------------------------------------|--------------------------------------------------|----------------------------------------|-------------|
| Hough & inverse Hough Processing                                | 12 Minutes processing time<br>Pentium 4 – 3Ghz   | 2 seconds of processing<br>time @20Mhz | 370x Faster |
| AES 1MB data processing/crypto rate<br>Encryption<br>Decryption | 5,558ms/1.51MB/s<br>5,562ms/1.51MB/s             | 424ms/19.7MB/s<br>424ms/19.7MB/s       | 13xFaster   |
| Smith Waterman ssearch34 from<br>FASTA                          | 6461 sec processing time<br>(Opteron)            | 100 sec FPGA processing`               | 64xFaster   |
| Multi-dimensional hypercube search                              | 119.5 sec (Opteron<br>2.2Ghz)                    | 1.06 sec FPGA@140Mhz                   | 113xFaster  |
| Callable Monte-Carlo Analysis<br>(64,000 paths)                 | 100 sec processing time<br>(Opteron 2.4Ghz)      | 10 sec of processing<br>@200Mhz FPGA   | 10xFaster   |
| BJM Financial Analysis (5M paths)                               | 6300 sec processing time<br>(Pentium 4 – 1.5Ghz) | 242 sec of processing @<br>61Mhz FPGA  | 26xFaster   |
| Mersenne Twister Random Number<br>Generation                    | 10M 32bit integers/sec<br>(Opteron – 2.2Ghz)     | 319M 32bin integers/sec                | 3xFaster    |



# Example—Financial Solution (50x +)





Intel

{Ⅲ

## **Solution Examples**

#### (Xtreme Data Dual & Notional Quad Architecture)



- FPGA uses all motherboard resources meant for CPU:
  - HyperTransport Links, Memory interface, power supply, heat-sink
- Usable in rack-mount or highdensity, "blade" server systems, where PC boards don't work
  - Process is Scalable for Quad Processors
- Cores Available to interface with AMD HyperTransport
  - High Bandwidth, Low Latency
- Applications and Benchmarks
  Coming Soon!

