Test Results Summary
This section provides a summary of the results tested in a given set of systems.
All of the results shown in this section are for one front-end application unless otherwise stated.
Pap Authentication, Accounting Start, Accounting Stop
Pap Authentication, Accounting Start, Accounting Stop
Tested on four D nodes with zero sessions preloaded.
The number of test client threads is varied to show the effect of simultaneous operations.
Table 18: Pap Authentication, Accounting Start, Accounting Stop
Client Test Threads | CPS | Front-End Utilization | SSR Node Single Thread Utilization |
---|---|---|---|
100 | 4290 | 91 % | 3.8 % |
80 | 4252 | 91 % | 3.8 % |
120 | 4239 | 90 % | 3.8 % |
Pap Authentication, Accounting Start, Accounting Stop—Sessions Preloaded
Pap Authentication, Accounting Start, Accounting Stop—Sessions Preloaded
Tested on four D nodes with sessions preloaded.
Table 19: Pap Authentication, Accounting Start, Accounting Stop–Sessions Preloaded
Number of Sessions Preloaded | CPS | Front-End Utilization | SSR Node Thread Utilization |
---|---|---|---|
1M | 4210 | 89 % | 5.7-6.1 % 1 |
4M | 4185 | 91 % | 6.5-6.8 % |
7M | 4064 | 89 % | 7.2-7.4 % |
1CPU utilization varied with Local and Global Checkpoints (LCP and GCP). Higher figures for other results are shown below.
Accounting Only—Sessions Preloaded, One Start and One Stop, Four D Nodes
Accounting Only—Sessions Preloaded, One Start and One Stop, Four D Nodes
Tested on four D nodes with sessions preloaded.
Table 20: Accounting Only—Sessions Preloaded, One Start and One Stop, Four D Nodes
Number of Sessions Preloaded | Accts/Sec | SSR Node Single Thread Utilization |
---|---|---|
0 | 11,900 | |
4M | 11,446 | |
7M/100 threads | 11,070 | 8.7 % |
7M/120 threads | 11,160 | 8.8 % |
Accounting Only—Sessions Preloaded, One Start and One Stop, Two D Nodes
Accounting Only—Sessions Preloaded, One Start and One Stop, Two D Nodes
Tested on two D nodes with sessions preloaded, variation of accounting to local file.
Table 21: Accounting Only—Sessions Preloaded, One Start and One Stop, Two D Nodes
Number of Sessions Preloaded | Accts/Sec | SSR Node Single Thread Utilization |
---|---|---|
0 sessions/no account logging | 13,900 | 4.2 % |
0 sessions/default account logging | 12,280 | 4.4 % |
1M sessions/default account logging | 12,340 | 6.6 % |
4M sessions/default account logging | 12,300 | 7.5 % |
4M sessions/minimal account logging | 12,660 | 8.0 % |
4M sessions/no account logging | 13,740 | 8.3 % |
7M sessions/no account logging | 13,400 | 8.6 % |
Standalone: Auth/start/stop CPS
Standalone: Auth/start/stop CPS
Tested on standalone SBR for Auth/start/stop CPS.
Table 22: Standalone: Auth/start/stop CPS
Case | CPS | CPU Utilization |
---|---|---|
0 Sessions | 6330 | 97 % |
1M sessions | 5930 | 96 % |
3M sessions | 5760 | 95 % |
Standalone: Accounting for 200 Threads
Standalone: Accounting for 200 Threads
Tested on standalone SBR for accounting only for 200 threads.
Standalone being a 32-bit application, is limited to a 4 GB working set.
Table 23: Standalone: Accounting for 200 Threads
Case | Accts/s | CPU Utilization | Disk I/O |
---|---|---|---|
1M sessions | 22,500 | 95 % | 1.8 M/s |
3M sessions (This maximum varies with each use case, depending on data stored in the SSR compared to the 4G limit.) | 20,300 | 95 % | 2.1 M/s |
3M Sessions/Account.ini Enabled | 15,200 | 96 % | 6.0 M/s |
Standalone: Authentication Only
Standalone: Authentication Only
Tested on standalone SBR for authentication only.
Table 24: Standalone: Authentication Only
Case | Auths/s | CPU Utilization | Disk I/O |
---|---|---|---|
PAP | 14,800 | 98 % | – |
CHAP | 14,760 | 98 % | – |
Rejects/s | ~22,000 | 98 % | – |
LogAccept=1 | 13,170 | 96 % | 1.2 M/s |
LogLevel=2 | 3,890 | 77 % | 6.8 M/s |
LogLevel=1 | 9,590 | 86 % | 3.6 M/s |
LDAP Authentication Only
LDAP Authentication Only
Tested for LDAP authentication.
Table 25: Authentication Only
Case | Auths/Sec | Front-End Overall | Front-End Single-Thread |
---|---|---|---|
1 client threads, maxconnect = 1 | 454 | 11 % | 3.4 % |
10 client threads, maxconnect = 1 | 1030 | 26 % | 6.2 % |
100 client threads/ 10 [server/*] sections | 10,175 | 39 % 2 | —3 |
2 CPU on LDAP server maxed out—running 166 percent of two CPU VMware at 3.3 GHz.
3 Evenly split, below the transport threads level.
Oracle 11g
Oracle 11g
Tested on an Oracle 11g database running on an M3000, 2.52 GHz.
Table 26: Oracle 11g
Case | Auths/Sec or CPS | Front-End % | Oracle DB Process Max % |
---|---|---|---|
Auth only—10 client threads | 9200 auths/sec | 24 % | 22 % overall |
Auth only—15 client threads | 12,500 auths/sec | 37 % | 30 % overall |
Directed realms, 20 client threads over two instances | 17,000–24,0004 auths/sec | 71 % | 70 % |
Auth/start/stop, MaxConnections = 1 | 2610 CPS | 37 % | 4.6 % |
Auth/start/stop, MaxConnections = 10 | 4255 CPS | 91 % | 6.9 % (spread over 8 Oracle processes) |
4 This result varied widely due to latency of the single Oracle DB processing requests from too many simultaneous connections.
LCI Queries against the SSR through SBRC
LCI Queries against the SSR through SBRC
The number of LCI server threads is hard-limited to eight by the server to avoid interference with regular RADIUS traffic processing.5
Table 27: LCI Queries against the SSR through SBRC
Case | LCI Queries/Sec | Front-End CPU | SSR Node CPU |
---|---|---|---|
100 query client threads | 1452 | 53 % | 3.7 % |
200 query client threads | 1716 | 62 % | 4.4 % |
5 SBRC 7.4.0 performance improvement permits greater throughput at less front-end CPU utilization.
IP Address Allocation
IP Address Allocation
Tested on two D nodes with IP address allocation.
Table 28: IP Address Allocation with Two D Nodes
Case | CPS | SSR Node CPU |
---|---|---|
1M sessions, 1 pool, 1 front end | 3050 | 18 %, 18 %, 5.2 %, 3.5 % |
1M sessions, sharing 1 pool, 2 front ends | 1600 + 1550 = 3150 | 19 %, 18 %, 5.3 %, 3.5 % |
1M sessions, 2 pools, each front end using its own | 1885 + 1857 = 3742 | 22 %, 20 %, 5.1 %, 3.5 % |
1M sessions, LCI queries, 1 front end | 2650 + 550 LCI TPS | 17 % |
1M sessions, LCI queries, 2 front ends | 3200 + 750 LCI TPS | 22 % |
IP Address Allocation with Eight Threads, Two D Nodes
IP Address Allocation with Eight Threads, Two D Nodes
Tested on two D nodes with eight execute threads on eight virtual processors. This test case gives an indication of performance on the M4000. In this case, two threads are running on each physical processor for the M3000 case; it will be one thread per physical processor for the M4000.
Table 29: IP Address Allocation with Two D Nodes and Eight Execute Threads
Case | CPS | Max Single Thread (out of 12.5 %) SSR Node | Front-End CPU |
---|---|---|---|
0 sessions, 1 front end | 3635 | 5.3, 5.1, 5.1, 5.1, 3.3, 2.0 | 75 % |
0 sessions, 2 front ends, 2 pools | 3270 + 3390 = 6660 | 8.3, 8.2, 8.2, 8.2, 4.0, 2.0 | 68 % |
1M sessions, 2 front ends, 2 pools | 2509 + 2492 = 5001 | 10.0, 9.7, 9.7, 9.6, 3.4, 1.9 | |
1M sessions, 2 front ends, 1 pool6 | 1853 + 1875 = 3728 | 8.4, 8.3, 8.0, 8.0, 3.3, 1.9 |
6 In the two front end applications, one pool case, the limiting factor is the ndb lock collision associated with multiple threads reaping the old IP address at the same time.
IP Address Allocation with Eight Threads, Four D Nodes
IP Address Allocation with Eight Threads, Four D Nodes
Tested on four D nodes with eight execute threads on eight virtual processors. This test case gives an indication of performance on the M4000. In this case, two threads are running on each physical processor.
Table 30: IP Address Allocation with Four D Nodes and Eight Execute Threads
Case | CPS | Max Single Thread (out of 12.5 %) SSR Node | Front-End CPU |
---|---|---|---|
No IP, 1M sessions, 2 front ends (“No IP” is used as a baseline reference) | 5514 + 5472 = 10,9867 | 3.8, 3.8, 3.2, 3.0, 2.9, 1.7 | |
1M sessions, 1 front end | 3650 | 4.0, 3.9, 3.7, 3.7, 2.2, 1.9 | 74 % |
1M sessions, 2 front ends, 2 pools | 2550 + 2600 = 5150 | 5.4, 5.3, 5.1, 5.1, 2.7, 2.1 | 45 % |
1M sessions, 2 front ends, 1 pool | 2300 + 2400 = 4700 | 3.9, 3.9, 3.8, 3.7, 2.0, 1.6 | 35 % |
7 10 percent variation due to GCP and LCP.
TTLS
TTLS
TTLS with five RADIUS auth/challenge pairs per accept using various cipher suites and dh key sizes.
Table 31: TTLS with Five RADIUS Auth/Challenge Pairs per Accept
Case | Accepts/Second | Front-End % |
---|---|---|
1024 dh bits, 0x39 | 565 | 95 % |
512 dh bits, 0x39 | 871 | 95 % |
1536 dh bits, 0x39 | 256 8 | 73 % |
0x38 | 574 | (Maximum, around 95 %) |
0x33 | 575 | (Maximum, around 95 %) |
0x32 | 578 | (Maximum, around 95 %) |
0x16 | 575 | (Maximum, around 95 %) |
0x13 | 872 | (Maximum, around 95 %) |
0x66 | 886 | (Maximum, around 95 %) |
0x35 | 890 | (Maximum, around 95 %) |
0x2f | 888 | (Maximum, around 95 %) |
0x15 | 575 | (Maximum, around 95 %) |
0x12 | 1112 | (Maximum, around 95 %) |
0x0a | 1116 | (Maximum, around 95 %) |
0x05 | 1114 | (Maximum, around 95 %) |
0x04 | 1120 | (Maximum, around 95 %) |
0x07 | 1116 | (Maximum, around 95 %) |
0x09 | 1117 | (Maximum, around 95 %) |
8 The clients tested were out of CPU.
TTLS Plus Storing Resumption Context
TTLS Plus Storing Resumption Context
Tested with two D nodes on TTLS plus storing resumption context storage overhead.
Table 32: TTLS Plus Storing Resumption Context
Case | Accepts/Second | NDB SSR Node Thread Utilization |
---|---|---|
Resumption (0x38) | 575 | 0.7, 0.7, 0.4, 0.4 |
Resumption (0x09) | 1117 | 0.9, 0.8, 0.5, 0.5 |
WiMAX
WiMAX
WiMAX tested with TTLS plus one HA plus two starts and two stops. 10 RADIUS transactions = 6 NDB hits.
Table 33: WiMAX
Case | Accepts/Second | NDB SSR Node Thread Utilization | Front-End % |
---|---|---|---|
WiMAX (0x38) | 355 | 3.5, 3.5, 1.3, 1.1 | 91 % |
WiMAX (0x09) | 475 | 4.7, 4.7, 1.6, 1.4 | 98 % |
4D node system: M5000
4D node system: M5000
These results were tested on the M5000 with four CPUs (virtual CPUs disabled) at 2.66 GHz and switch-connected with 10G networking.
Simple accountings per second = 49,200 (million per row stored):
Network bandwidth (NB) for D nodes recorded upwards of 42 MBps = 336 Mbps
Disk bandwidth 5 MBps = 40 Mbps
Realistic accountings per second (7M rows preloaded) and more data per transaction (approximately 512b per row stored):
Accountings per second = 23,240
Bandwidth (up to 7 MBps) = 56 Mbps
Network bandwidth = 80 MBps
IP address allocations: 7701 CPS
WiMAX with resumptions: 3300 CPS
Standalone: M9000
Standalone: M9000
These results were tested on M9000 (16 CPUs x 8 cores) at 3.0 GHz running in one global zone, or six non-global zones.
When SBR is running in non-global zones, the performance is limited because it is unable to set the RealTime priority on the receiving threads.
Maximum authentications per second for one global zone = 31,000
Maximum accountings per second for one global zone (with logging) = 38,000
Maximum authentications per second across six non-global zones = 105,804
Maximum accountings per second across six non-global zones = 91,470
CSPS (without logging) across six non-global zones = 59,252
TTLS (512RSA):
Maximums TTLS per second for one global zone = 3900
Maximums TTLS per second for six non-global zones = 11,900
Standalone: T3
Standalone: T3
These results were tested on T3 (4 CPUs x 128 cores) at 1.65 GHz running in six non-global zones.
Network bandwidth–disabling virtual CPUs decreases performance in all cases.
Maximum authentications per second across six non-global zones = 69,850. Total CPU utilization is 18%.
Maximum accountings per second across six non-global zones: 18,800. Total CPU utilization is 15.6%.
Accounting and local session performance is excessively limited by single-CPU speed and spindle I/O performance. Consequently, T3 is not recommended for accounting performance, but has high ROI for TLS/TTLS and WiMAX cases.
TTLS (512 DHE)—Maximum TTLS per second across six non-global zones = 12,100.
Total CPU utilization is 78% and represents maximum server utilization available on this server.
TTLS (1024RSA)—Maximum TTLS per second across six non-global zones = 17,000.
Total CPU utilization is 66%.
TTLS (1024 DHE)—Maximum TTLS per second across six non-global zones = 6300.
Total CPU utilization is 52%.
TTLS (2048 DHE)—Maximum TTLS per second across six non-global-zones = 2046.
Total CPU utilization is 43.2%.
SSR and Standalone Performance: E4870 and X5687 CPUs
SSR and Standalone Performance: E4870 and X5687 CPUs
These results were tested on E4870 CPUs and X5687 CPUs.
Table 34: SSR Performance
Case | 6D - E7–4870 x 1 CPU | 4D - E7–4870 x 1 CPU | 2D - E7-4870 x I CPU | 2D X5687 x 2 CPU |
---|---|---|---|---|
Accountings per second | 137,900 | 97,000* | 78,100 | 89,000 |
Accountings per second 512 byte sessions, 15M preload | 65,600 | 47,700 | 24,400 | – |
SBR 7.4.1 with new NDB optimizations | 140,700 | 120,000 * | 107,000 | 107,000 |
Table 35: Standalone Performance
Case | X5687 x 2 CPUs | E7-4870 x 1 CPU (2 for TTLS) | E7-4870 x 8 CPUs (with 6 to 12 Virtual machines) |
---|---|---|---|
Pap Authentications per second | 44,900 | 48,000 | 179,000 |
Accountings per second (CST and logging to disk) | 42,000 | 40,000* | 190,000 |
TTLS 512 DHE (0x39) authentications per second | 3,800 | 4,300 | – |
TTLS 1024 RSA (0x2F) authentications per second | 4,700 | 5,400 | 11,600 |
TTLS 1024 DHE (0x39) authentications per second | 2,900 | 3,000 | – |
*—These numbers are based on estimation.
Proxy authentications per second, accountings per second = 22,000. Round-robin is used for multiple downstreams.
JDBC/MySQL authentication = 18,800. Total CPU utilization is 15.6%.
JavaScript Engine Overhead = 10%.