Understanding the CPU Requirements
When analyzing the machine specifications for performance, you have to target throughput rates that take into account failure conditions.
For instance, to account for a GGSN reset that might attempt to reconnect 20,000 users in one minute after a failure, you would need to add the load for an Accounting-ON, which attempts to close 20,000 sessions (at approximately 1000 sessions a second, which equals to 20 seconds), plus 500 authorizations per second plus 500 accounts per seconds for 40 seconds, to whatever baseline in terms of authorizations, accounts, and interims you have as an average. Double each of those values to support two GGSNs restarting, for instance, after a power outage on a shared UPS.
When analyzing the expected performance of a given SBRC system, you must evaluate the following three items in terms of the CPU requirements:
Number of CPUs (includes cores, as well as virtual processors [VPs])
A single chip may have multiple cores.
A core represents a single CPU unit and multiple cores on a chip often share a memory bus or I/O bus.
Virtual processor is a way to optimize the use of a core by permitting more threads to execute on the same core, while one thread is awaiting a memory or bus operation.
Cores are an accurate metric of actual performance, as the VP’s optimization is not constant, but depends on the workload.
For example, the SPARC64 VII processor chip used in the M3000 has four cores and each has two VPs.
Speed of single CPU or single core
Each core of a SPARC64 VII processor executes at 2.52 GHz, which means that one thread can execute 2.52 billion simple instructions per second.
This is the most important item for NDB, which has fewer execute threads (two or four Local Query Handlers to do most of the work) running simultaneously.
Total CPU in terms of GHz.
This number represents the amount of work that the whole system can execute.
Multiply the clock-rate of the processor by the number of cores.
For example, the SPARC64 VII running at 2.52 GHz can execute 2.52 x 4 cores, or 10.08 GHz of work.
This is the most important number for SBR, which is well-behaved and generally scales to use all the CPU resources available until:
The I/O limit is reached
A single thread or other threads (such as the thread that reads RADIUS transactions from the network) utilizes all of one core’s worth of CPU.
Table 12 describes the SPARC processor models. For the latest information on SPARC processor models available for shipping, refer to your Oracle Sales Engineer.
Table 12: SPARC Processor Models
SPARC Processor Model | Number of Cores | Number of Virtual Processors | Single-Core GHz Range | Total GHz |
---|---|---|---|---|
UltraSPARC T1 | 8 | 32 | 1.0-1.4 | 8-11.2 |
UltraSPARC T2 | 8 | 64 | 1.2-1.4 | 9.6-11.2 |
SPARC 64 VI | 2 | 4 | 2.15 | 4.3 |
SPARC 64 VII | 4 | 8 | 2.52-2.88 | 10.08-11.52 |
SPARC T3 | 16 | 128 | 1.65 | 26.4 |
SPARC T4 | 8 | 64 | 3.0 | 24.0 |
Table 13 shows SPARC processors.
Table 13: SPARC Processors
System Model Number | Processor Type | Maximum Number of Processors |
---|---|---|
M3000 | SPARC64 VI SPARC64 VII (recommended) | 1 |
M4000 | SPARC64 VI SPARC64 VII (recommended) | 4 |
M5000 | SPARC64 VI SPARC64 VII (recommended) | 8 |
M8000 | SPARC64 VI SPARC64 VII (recommended) | 16 |
M9000 | SPARC64 VI SPARC64 VII (recommended) | 32–64 |
T1000 | UltraSPARC T1 | 1 |
T2000 | UltraSPARC T1 | 1 |
T5120 | UltraSPARC T2 | 1 |
T5140 | UltraSPARC T2+ | 2 |
T5220 | UltraSPARC T2 | 1 |
T5240 | UltraSPARC T2+ | 2 |
T5440 | UltraSPARC T2+ | 4 |
T3-1 | SPARC T3 | 1 |
T3-2 | SPARC T3 | 2 |
T3-4 | SPARC T3 | 4 |
T4-1 | SPARC T4 | 1 |
T4-2 | SPARC T4 | 2 |
T4-4 | SPARC T4 | 4 |
Using the total GHz, as explained in Table 12 and Table 14, you can estimate the expected performance based on the results in the current installation, or on the expected results based on test results recorded in SBRC System-of-Systems Performance Reference.
For example, the results recorded in SBRC System-of-Systems Performance Reference are for SBRC doing TTLS authentication with a cipher suite 0x002f on an M3000 2.75 GHz is 888 accepts per second.
2.75 GHz x 4 cores = 11 GHz.
A SPARC T3-4 model (which can have 4 processors) can perform 26.4 x 4= 105.6 GHz total.
Thus, the SPARC T3-4 model should be able to perform upwards of 888 x (105.6 / 11) = 8524 TTLS accepts per second. This calculation should prove accurate, but might be limited by the single thread handling authentication network traffic as its utilization reaches the entirety of one 1.65 GHz core.
T4 has special acceleration for single-thread throughput.
In a case of an existing SBRC system running WiMAX, for example, running on three T5120s (one SM node and two SSR D nodes), if you record an average transaction rate of 800 SPS, where the total average utilization of the SM node is 20 percent and the SSR’s single maximum thread utilization is 1.0 percent out of a maximum of 1.56 percent (100 % / 64 virtual processors = 1.56 % per VP from prstat -L), and you wish to upgrade to three M4000s with two CPUs:
For the SBRC front-end application, you can expect a maximum of 8400 SPS [800 SPS x (20.16 total GHz) / (20% x 9.6 total GHz) = 8400 SPS].
For the SBRC SSR, you can expect a maximum of 2995 SPS [800 SPS x 2.88 GHz / (1.2 GHz x 1.0/1.56) = 2995 SPS].
The back-end CPS will be slightly higher than these numbers due to decreased contention, as discussed in Understanding Performance Metrics, but this methodology allows a safe estimate.
In this example, you are limited by the back-end utilization, and a greater overall increase will be noticed going to four D nodes of the M3000 with one processor running at 2.52 GHz (for approximately 2 x 2995 = 5990 SPS), versus two D nodes of the M4000 with two processors each.
Table 14 describes some of the Intel processor examples.
Table 14: Intel Processor Examples
Intel Processor Model | Number of Cores | Number of Virtual Processors | Single-Core GHz Range | Total GHz |
---|---|---|---|---|
X5687 | 4 | 8 | 3.6 | 14.4 |
X5660 | 6 | 12 | 2.8 | 16.8 |
X5690 | 6 | 12 | 3.46 | 20.76 |
E7-4870 | 10 | 20 | 2.41 | 24 |
1 E7-4870 uses Intel “Turbo Boost Technology,” which can attain a maximum frequency of 2.8 GHz under heavy use.
In the Intel processor example servers described in Table 14, the following configurations are optional and there is no model number that references all the options specifically:
Standalone or Front-End Applications
Standalone or Front-End Applications
The following server example can be optimized for throughput:
HP DL 380 2 x X5687, 8G, 200G fast (512 Mbps+) SSD (mirrored or RAID 1), or HDD (RAID 0, RAID 1, RAID 10 for highest performance). In the case of SSD, a 6 Gbps SATA with Sandforce-based SSDs (like the Intel 520) performs well for accounting and local CST because the data can be compressed. For long-term or compressed log storage, a separate HDD would be useful.
The following server examples can be used for better proxy and TTLS cases:
HP DL580 2 x E7-4870, 8G, 200G fast SSDx4 (RAID 10), or 2xRAID 10 HDD
HP DL580 2 x X5690, 8G, 200G fast SSDx4 (RAID 10), or 2xRAID 10 HDD
The following server example can be used for running four virtual instances of SBR:
HP DL580 4 x E7-4870, 36G memory, 2x (128G fast SSDx2), or RAID—one for each instance.
Back Ends
Back Ends
The following server examples can be used for back ends:
HP DL380 2 x X5687, 16-64G, 100G fast SSDx2, or HDD RAID 0 or 10 (mirrored or RAID 0)
HP DL980 2 x E7-4870, 16-64G, 100G fast SSDx2, or HDD RAID 0 or 10