Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 

IP Pools through the SSR

 

Because of the (well-known) problems in the ability of a SQL-derived system to implement efficient, reliable queuing across different processes, the current implementation of SSR has a performance limit with recovering addresses from very large IP pools on multiple threads. Multiple threads (running across multiple SBRC instances) attempting to pre-fetch possible lock-up keys to reclaim unused IP addresses can contend for the rows. This is a form of lock contention that can interfere with continued high-throughput SBRC operations. The semantics for lock back-off are controlled by the value for TransactionDeadlockDetectionTimeout in the config.ini file. One thread can wait up to this timeout value before retrying the reclaim operation, leading to a serious throughput limit.

There are several ways of managing this contention.

  • Use different IP pools for different NAS devices, with different NAS devices biasing themselves (or being biased by a load balancer) to a given SBRC front-end application. This reduces cross-front-end contention.

  • Shard the users or profiles so that different users have a Framed-IP-Address set from different pools. This evenly spreads the contention so that the usual case of random sleep used to manage contention (configured in dbclusterndb.gen as a period between CacheThreadSleepMin and CacheThreadSleepMax) will be sufficient to ensure adequate back-off for a request to reclaim a chunk of IP addresses. The added benefit is you can easily identify such items as class of service by linking given addresses to certain pools with well-defined ranges.

  • Have enough addresses to set the CacheLowWater and CacheHighWater marks in dbclusterndb.gen very high, so that contended operations do not impact the effective throughput (so that you can take a multi-hundred millisecond pause in reclaiming addresses without any transaction errors); however, values over 20,000 may cause SBRC to take longer to shut down in order to return the cached addresses to the available state. Also, a CacheHighWater value near the available size of the pool can cause one SBRC to cache all available addresses, leaving other SBRCs with none, which will lead to incorrectly failed authentications.