# Leveraging Multi-Core Processors in a CDMA-2000 SDR Base Station

Steve Muir, John Chapin, Andrew Chiu, Victor Lum and Jeremy Nimmer Vanu, Inc. steve@vanu.com

#### Abstract

The software radio technology used by Vanu, Inc. is designed to deliver ongoing performance improvements to customers. An all-software approach, eliminating low-level firmware such as VHDL for FPGAs or DSP assembly code, enables low-cost porting of waveforms to new processors and platforms as they become available. This enables exploiting the Moore's Law growth curve of the semiconductor industry. This paper evaluates the success of this approach by reporting performance improvements delivered for a particular waveform, CDMA2000 1xRTT, across multiple hardware generations.

### Introduction

The exponential performance increase of computer systems due to Moore's Law is extremely well known. The potential benefits for software radio systems and users are immense: ongoing improvements in capacity, and ongoing reductions in size and power consumption, without increase in hardware cost. Yet there is little published data showing whether the raw performance increases offered by semiconductor manufacturers translate into full-system performance increases for users of software radio systems.

This paper analyzes performance of a single waveform across multiple Intel-based systems representing several years of technology improvement. We chose the CDMA2000 1xRTT cellular waveform, more specifically the basestation side implementation of its RC3 voice channel, since it has sufficiently high computation requirements to stress current state-of-the-art processors.



Figure 1: Channel capacity for CDMA2000 1xRTT RC3 (Vanu Anywave software). Circled data points include PHY, control, and I/O. Others are PHY-only CPU benchmarks

## **Channel Capacity**

Figure 1 shows the number of voice channels that the Vanu Anywave<sup>TM</sup> CDMA2000 physical layer can support on a single core. Over the 5.5 years separating the acquisition of the first and last machine in the study, channel capacity per core increased by a factor of 16. Not shown in the figure is a parallel increase in the number of cores. The "sweet spot" for price-performance is a two CPU server, which had 2 cores in the 2001 1.0 GHz system and is now offered with 8 cores in late 2007. If the core count increase were to provide linear performance growth (which in practice it does not) there would be a further factor of 4 capacity increase at the system level, for roughly the same server price.

Figure 1 also shows full-system benchmarks, where a radio head is connected and the BTS runs in an operational software configuration, although still using only a single core. Due to laboratory equipment limitations we were only able to benchmark two of the servers as full systems. These tests show that adding control processing, a BSC connection and radio head I/O to the PHY processing reduces the channel capacity by 50%.

## **Platform Effects**

In addition to CPU speed and number of cores, other platform features significantly affect performance. Table 1 describes the systems used for the study, labeled A-E.

One critical part of the platform is the compiler used for the signal processing code. We have found that the Intel C compiler generates substantially more efficient code than the Gnu C compiler for our applications. Figure 2 compares the PHY-only channel capacity for the two compilers (vertical bars and left axis). The quantization into an integer number of channels masks the magnitude of the benefit of using ICC. To show the benefit more clearly, the plotted line and right axis show the performance improvement before quantization. It ranges from 25% for a 2003 vintage machine with a Xeon up to 52% for a current Core2 server. (Since the Core2 machines were introduced only recently, we can expect GCC's code generation for the target to improve over the next year or two.)

Another critical part of the platform is the availability of a pipelined vector execution unit, such as SSE and SSE2 for Intel processors. These execution units are very efficient for the streaming data computations characteristic of signal processing code. SSE2 on the Core2 microarchitecture is particularly efficient because a 128-bit wide operation can be issued every cycle, compared to every two cycles on



Figure 2: Performance effects due to compiler choice and use of the SSE2 vector execution unit. ICC is the Intel C compiler, while GCC is the GNU C compiler

Netburst. Figure 2 shows an overall capacity benefit of 3x-4x when the vector instruction unit is exploited.

The performance results shown in Figure 1 all use ICC with SSE2, except for platform A whose Pentium III Coppermine does not have SSE capability.

### Conclusions

This paper shows that the Moore's Law curve of the semiconductor industry can deliver ongoing significant performance growth to users of software radio systems. Platform improvements over a 3.5 year period (September 2003 to April 2007) increased CDMA2000 channel capacity per core from 6 to 16, and number of cores per dual-CPU server from 2 to 4. We expect this trend to continue. For example, 8-core servers are now widely available. A radio operator who invests in software-radio based systems can expect ongoing reductions in cost or increases in capacity (or both) as nodes are incrementally added to a deployed system.

Not every software radio vendor can deliver the benefits of Moore's Law to end users. Waveform software for software radio is expensive to develop. If it is not portable across hardware generations, the vendor is locked in to obsolete components until the software's cost has been fully recovered. This is often a problem when vendors select DSPs or FPGAs as their SDR signal processing platform.

Vanu, Inc. has focused its engineering efforts on developing portable waveform software. All the signal processing executes as an application, on a standard OS, on processors such as Intel x86 and PowerPC. The software is highly modular so individual components can be specialized to exploit processor features (such as the new SSE3 vector instruction set from Intel) without affecting the rest of the software. This system design gives Vanu's products, such as the Anywave<sup>TM</sup> Radio Access Network, a unique ability to deliver ongoing performance improvements to operators and users.

| ID | Mfr  | Model          | Acq.<br>Date | Total<br>Cores |
|----|------|----------------|--------------|----------------|
| Α  | Dell | GX150          | 4/1/01       | 1              |
| В  | HP   | DL380 G3       | 9/1/03       | 2              |
| С  | IBM  | xSeries 336    | 7/1/06       | 2              |
| D  | Dell | Optiplex GX745 | 4/1/07       | 2              |
| Е  | IBM  | System x3550   | 4/1/07       | 4              |

|    | CPU Brand                   | Core<br>speed | Total L2<br>Cache | Memory<br>bus |
|----|-----------------------------|---------------|-------------------|---------------|
| ID | (Core arch.)                | (GHz)         | ( <b>KB</b> )     | (MHz)         |
| А  | Pentium III<br>(Coppermine) | 1.0           | 256               | 133           |
| В  | Xeon<br>(Netburst)          | 2.8           | 512               | 266           |
| С  | Xeon<br>(Netburst)          | 3.6           | 2048              | 266           |
| D  | Xeon 6400<br>(Core 2)       | 2.13          | 2048              | 667           |
| Е  | Woodcrest<br>(Core 2)       | 2.6           | 8192              | 266           |

**Table 1: System Characteristics**