

### **Photonic Many-Core Architecture Study**

Nadya Bliss<sup>1</sup>, Krste Asanović<sup>2</sup>, Keren Bergman<sup>3</sup>, Luca Carloni<sup>3</sup>, Jeremy Kepner<sup>1</sup>, Sanjeev Mohindra<sup>1</sup>, Vladimir Stojanović<sup>4</sup>

<sup>1</sup>*MIT Lincoln Laboratory,* <sup>2</sup>*University of California Berkeley,* <sup>3</sup>*Columbia University,* <sup>4</sup>*MIT Research Laboratory of Electronics* 

September 23<sup>rd</sup>, 2008



PM: Jagdeep Shah

This work is sponsored by DARPA under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.

MIT Lincoln Laboratory

HPEC2008 1 NTBliss 9/29/2008



## Outline

#### Introduction

- Logical Architecture Abstraction
- Modeling and Mapping
- Experiments and Results
- Summary



## **Emerging Device Trends**



Emerging device technologies create a large parameter space of possible future architectures

MIT Lincoln Laboratory

## **Benefits of Photonic Interconnects**

#### ELECTRONICS

#### TO MEMORY



- Communication to memory banks is chip power and pin/wire density limited
- Poor scaling of on-chip mem controllers with cores
- At most 3-6 Tb/sec in the next few years

#### CORE-TO-CORE



- Buffer, receive and re-transmit at every switch
- Power dissipation grows with data rate



- Modulate/receive data once per communication
- Scalable, low power switch fabric
- Balanced communication and computation

Photonics can provide high bandwidth, low latency communication while meeting power requirements of embedded systems.

MIT Lincoln Laboratory

#### OPTICS



- Use optical network as an efficient global crossbar
- Better scaling with N groups
- Expected performance 40-80 Tb/sec



#### **System Level View**

-Photonic Many-core Architecture Network: PhotoMAN-

Selecting a system level architecture allows the parameter space to be narrowed while meeting requirements of DoD applications.

- Manycore processor chip
  - 64-256 cores (in 22nm node)
- Off-chip memory
  - a set of DRAM chips
  - minimum capacity 128 GB (at 22nm)
- Evaluate interaction of the photonic network and memory hierarchy
- Board power limit 500 W
  - Consistent with power constraints of medium-sized UAV





To evaluate the architecture develop

- 1. Expressive logical abstraction
- 2. Modeling and mapping framework



## Outline

- Introduction
- Logical Architecture Abstraction
- Modeling and Mapping
- Experiments and Results
- Summary



### **Logical Abstraction**

-Kuck\* Memory Hierarchy-



# The Kuck notation provides a clear way of describing a hardware architecture along with the memory and communication hierarchy

MIT Lincoln Laboratory

\*High Performance Computing: Challenges for Future Systems, David Kuck, 1996



#### **PhotoMAN Logical Representation**

-MIT/UCB 1 Group Memory Configuration-



The Kuck notation is suitable for both high-level and detailed physical descriptions of the architecture, such as groups and access points.

Legend: • AP - access point • APG - access point group



#### **PhotoMAN Logical Representation**

-MIT/UCB 4 Group Memory Configuration-





## Outline

- Introduction
- Logical Architecture Abstraction
- Modeling and Mapping
- Experiments and Results
- Summary



## pMapper: Modeling and Mapping





## **PhotoMAN Machine Description**



Given a hardware model *H* and a program parse tree *T*, pMapper finds maps *M* that minimize execution latency:

 $argmin_M f(T, H, M)$ 

Focus of the PhotoMAN study ~

|   | Parameter                                             | Symbol                                   | Unit           |
|---|-------------------------------------------------------|------------------------------------------|----------------|
|   | Number of processing cores                            | $N_P$                                    | N/A            |
|   | $P_0^i$ speed                                         | $R_P$                                    | (FL)OPs/second |
|   | $P_0^i$ latency                                       | $L_P$                                    | second         |
|   | $P_0^i$ efficiency on operation $k$                   | $E_{P_{i,k}}$                            | N/A            |
|   | $M_0^i$ , local memory capacity of $P_0^i$            | $C_{local}$                              | bytes          |
|   | Number of memory banks                                | $N_{MB}$                                 | N/A            |
|   | Locations of access points (core indices)             | Α                                        | N/A            |
|   | Number of groups                                      | $N_G$                                    | N/A            |
|   | $SM_1$ , on-chip shared memory capacity               | $C_{on-chip}$                            | bytes          |
|   | Size of a single data element of data type ${\cal T}$ | $S_T$                                    | bytes          |
|   | $N_{0.5}$ , inter-core network bandwidth              | $\mathbf{R}_{\mathbf{N}}$                | bytes/second   |
|   | $N_{0.5}$ , inter-core network latency                | $\mathbf{L}_{\mathbf{N}}$                | second         |
|   | $SMN_1$ , on-chip memory network bandwidth            | $\mathbf{R}_{\mathrm{M_{on}}}$           | bytes/second   |
|   | $SMN_1$ , on-chip memory network latency              | ${ m L}_{ m M_{on}}$                     | second         |
|   | Access point to on-chip memory bandwidth              | $\mathbf{R}_{\mathrm{A}_{\mathrm{Mon}}}$ | bytes/second   |
|   | Access point to on-chip memory latency                | ${ m L}_{ m A_{Mon}}$                    | second         |
|   | Access point to crossbar bandwidth                    | $\mathbf{R}_{A_{XS_{on}}}$               | bytes/second   |
|   | Access point to crossbar latency                      | $L_{A_{XSon}}$                           | second         |
| - | Crossbar to on-chip memory bandwidth                  | $R_{XS_{M_{on}}}$                        | bytes/second   |
|   | Crossbar to on-chip memory latency                    | $L_{XS_{Mon}}$                           | bytes/second   |
|   | $SM_2$ , off-chip shared memory capacity              | $C_{off-chip}$                           | bytes          |
|   | $SMN_2$ , off-chip memory network bandwidth           | $\mathbf{R}_{\mathrm{M_{off}}}$          | bytes/second   |
|   | $SMN_2$ , off-chip memory network latency             | $L_{M_{off}}$                            | second         |

MIT Lincoln Laboratory



#### **Memory Hierarchy Formulation**

-MIT/UCB 1 Group Memory Configuration-



- Bandwidth and latency matrices have the same pattern of non-zeros
- Topology for  $N_{0.5}$  and  $SMN_1$  is the same for the 1-Group configuration
- Diagonal entries encode
  - R<sub>N</sub> bandwidth to local store
  - R<sub>Mon</sub> whether P<sup>i</sup> is an access point

| CORE-T           | <sup>-</sup> O-CC | RE N          | IETV          | /ORI            | K, N <sub>0</sub> | .5 |                  |                         |   |
|------------------|-------------------|---------------|---------------|-----------------|-------------------|----|------------------|-------------------------|---|
|                  |                   | $P^0$         | $P^1$         | $P^2$           | $P^3$             |    | $P^{16}$         | <br>$P^{255}$           |   |
|                  | $P^0$             | $r_{M_0^0}$   | $r_{N_{0.5}}$ | 0               | 0                 |    | $r_{N_{0.5}}$    | <br>0                   | 1 |
|                  | $P^1$             | $r_{N_{0.5}}$ | $r_{M_0^1}$   | $r_{N_{0.5}}$   | 0                 |    | 0                | <br>0                   |   |
|                  | $P^2$             | 0             | $r_{N_{0.5}}$ | $r_{M_{0}^{2}}$ | $r_{N_{0.5}}$     |    | 0                | <br>0                   |   |
| B <sub>N</sub> = | $P^3$             | 0             | 0             | $r_{N_{0.5}}$   | $r_{M_{0}^{3}}$   |    | 0                | <br>0                   |   |
| IUN -            |                   |               |               |                 |                   |    |                  | <br>                    |   |
|                  | $P^{10}$          | $r_{N_{0.5}}$ | 0             | 0               | 0                 |    | $r_{M_{0}^{16}}$ | <br>0                   |   |
|                  | <br>D255          |               |               |                 |                   |    |                  | <br>                    |   |
|                  | $P^{200}$         | 0             | 0             | 0               | 0                 |    |                  | <br>$r_{M_{0}^{255}}$ . |   |

#### SHARED MEMORY NETWORK, SMN<sub>1</sub>

|                                           |           | $P^0$         | $P^1$        | $P^2$       | $P^3$       | <br>$P^{16}$    | <br>$P^{112}$   | $P^{113}$   |    | $P^{255}$ |
|-------------------------------------------|-----------|---------------|--------------|-------------|-------------|-----------------|-----------------|-------------|----|-----------|
|                                           | $P^0$     | 0             | $r_{SMN_1}$  | 0           | 0           | <br>$r_{SMN_1}$ | <br>0           | 0           |    | 0         |
|                                           | $P^1$     | $r_{SMN_1}$   | 0            | $r_{SMN_1}$ | 0           | <br>0           | <br>0           | 0           |    | 0         |
|                                           | $P^2$     | 0             | $r_{SMN_1}$  | 0           | $r_{SMN_1}$ | <br>0           | <br>0           | 0           |    | 0         |
|                                           | $P^3$     | 0             | 0            | $r_{SMN_1}$ | 0           | <br>0           | <br>0           | 0           |    | 0         |
|                                           |           |               |              |             |             | <br>            | <br>            |             |    |           |
| $\mathbf{R}_{\mathbf{M}_{\mathrm{on}}} =$ | $P^{16}$  | $r_{SMN_1}$   | 0            | 0           | 0           | <br>0           | <br>0           | 0           |    | 0         |
|                                           |           |               |              |             |             | <br>            | <br>            |             |    |           |
|                                           | $P^{112}$ | 0             | 0            | 0           | 0           | <br>0           | <br>$1^{112}$   | $r_{SMN_1}$ |    | 0         |
|                                           | $P^{113}$ | 0             | 0            | 0           | 0           | <br>0           | <br>$r_{SMN_1}$ | $1^{113}$   |    | 0         |
|                                           |           |               |              |             |             | <br>            | <br>\           |             |    |           |
|                                           | $P^{255}$ | 0             | 0            | 0           | 0           | <br>0           | <br>کھے         | 0           |    | 0         |
| AP-to-SN                                  | N         | -             |              |             |             |                 | $\sim$          | $\supset$   |    |           |
| $\mathbf{R}_{\mathbf{A}_{\mathrm{M}}}$    | on =      | $r_{A_{M_o}}$ | $_{n}*I_{I}$ | ]           |             | -               | ACCE<br>POIN    | ESS<br>ITS  | \$ |           |

MIT Lincoln Laboratory



#### **Memory Hierarchy Formulation**

-MIT/UCB N<sub>G</sub> Group Memory Configuration-



- Core-to-core network not shown and is the same as in 1 group case
- While memory access requires one additional transfer, the topology is represented with a single matrix R<sub>AXSon</sub>

| SHA                                       | RE                                                                                                                                          | ) MEI                                                                                                                               | MOR                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | RY N                                                                             | ЕТИ                                                                | 0                                  | RK,                                                                   | SI                     | NN <sub>1</sub>                                                                      |                         |                                                                       |                                           |                                                                 |
|-------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|--------------------------------------------------------------------|------------------------------------|-----------------------------------------------------------------------|------------------------|--------------------------------------------------------------------------------------|-------------------------|-----------------------------------------------------------------------|-------------------------------------------|-----------------------------------------------------------------|
| $\mathbf{R}_{\mathrm{M}_{\mathrm{on}}}$ = | $egin{array}{ccc} P^0 & P^1 & & & \ P^2 & P^3 & & & \ P^{16} & & & \ P^{48} & & & \ P^{64} & & & \ P^{255} & & \ P^{255} & & \ \end{array}$ | $\left[ \begin{array}{c} P^{0} \\ 0 \\ r_{SMN_{1}} \\ 0 \\ 0 \\ \\ r_{SMN_{1}} \\ \\ 0 \\ \\ 0 \\ \\ 0 \\ 0 \\ \end{array} \right]$ | $P^1 \ r_{SMN_1} \ 0 \ r_{SMN_1} \ 0 \ \dots \ 0 \ \ 0 \ \dots \ \ 0 \ \ \ \$ | $P^2$<br>0<br>$r_{SMN_1}$<br>0<br>$r_{SMN_1}$<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | $P^3$<br>0<br>$r_{SMN_1}$<br>0<br><br>0<br><br>0<br><br>0<br><br>0 |                                    | $P^{16}$<br>$r_{SMN_1}$<br>0<br>0<br><br>0<br><br>0<br><br>0<br><br>0 |                        | $P^{48}$<br>0<br>0<br>0<br><br>1 <sup>48</sup><br><br>$r_{SMN_1}$<br><br>$r_{SMN_1}$ |                         | $P^{64}$<br>0<br>0<br>0<br><br>$r_{SMN_1}$<br>$r^{64}$<br>$r_{SMN_1}$ | <br><br><br><br><br>                      | P <sup>255</sup><br>0<br>0<br>0<br>0<br><br>0<br><br>0<br><br>0 |
| AP-X                                      | ( <b>S-N</b>                                                                                                                                | APG<br>APG<br>APG<br>APG<br>XSG                                                                                                     | <b>PRY</b><br><sup>(0</sup><br><sup>(1</sup><br><sup>(2</sup><br><sup>(3)</sup><br><sup>(0)</sup>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | <b>NET</b><br>0<br>0<br>0<br>0<br>0                                              | <b>WO</b><br><sup>0</sup> Ai                                       | <b>RK</b><br>0<br>0<br>0<br>0<br>0 | (1 A                                                                  | PC<br>0<br>0<br>0<br>0 | Q <sup>2</sup> A                                                                     | AP(<br>0<br>0<br>0<br>0 | G <sup>3</sup>                                                        | XS<br>$r_{A_X}$<br>$r_{A_X}$<br>$r_{A_X}$ | GO<br>Son<br>Son<br>Son<br>Son<br>Mon                           |
|                                           |                                                                                                                                             |                                                                                                                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                  | AF                                                                 | AP-XS BANDWIDTH                    |                                                                       |                        |                                                                                      |                         |                                                                       | *                                         |                                                                 |
|                                           |                                                                                                                                             |                                                                                                                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                  |                                                                    |                                    |                                                                       |                        |                                                                                      |                         |                                                                       |                                           | )                                                               |



## Outline

- Introduction
- Logical Architecture Abstraction
- Modeling and Mapping
- Experiments and Results
- Summary



### Maps



High programmability is a desirable architecture characteristic

- Complexity of mapping chosen to optimize performance (minimize execution time) provides insight into programmability of hardware
  - The higher complexity of the mapping, the lower programmability



## Synthetic Aperture Radar (SAR)

SAR processing chain is common to many defense application and requires significant amount of both computation and communication.



NTBliss 9/29/2008



## **Airborne Video Surveillance**

## Georegistration is a key computational kernel in airborne video surveillance and other image processing algorithms.





### **PhotoMAN Performance**



HPEC2008 19 NTBliss 9/29/2008



## **PhotoMAN Programmability**



See J. Kepner and N. Bliss, "Evaluating the Productivity of a Multicore Architecture"



## **Best Performing Architecture**



- 16 groupsOptical to memoryOptical mesh
- •256 cores



#### **Current/future research**

- Network topology
- Power optimization
- Processor characteristics
- Cache architecture
- Hierarchical mapping



- Emerging device trends are motivating the need for logical architecture abstractions and robust modeling, mapping and simulation environments
- PhotoMAN study focus: photonic networks
- Kuck diagrams provide an expressive logical abstraction
- Detailed hardware model describes the mapping and modeling optimization space explored by pMapper and allows for architecture evaluation
- Initial results show over an order of magnitude improvement in *application* performance with photonics, while maintaining scalability