Scaling
Growth Without Pain
Apstra supports network-transparent distributed state access and management, while parallel execution is supported by the separate processes. Real-time execution is supported by an event-driven asynchronous model of execution together with real-time scheduling of execution. Efficiency and predictability are supported by compilation through C++ as an intermediate language to achieve machine- level efficiency.
There are three dimensions to scaling:
-
State
-
Processing
-
Network Traffic
Scaling the State
The Juniper Apstra data store scales horizontally, by adding more high-availability (HA) pairs of servers. In Apstra, intent and telemetry data stores are separated and can scale independently as needed.
Scaling Processing
Juniper Apstra can launch multiple copies of processing agents (per agent type) if and when required that will share the processing load. More agents can be added by adding more servers to host them, and an agent’s lifecycle is managed by Apstra.
Apstra’s state-based pub/sub architecture allows agents to react (provide application logic) to a well-defined subset of state. Coverage of the whole intent is done through separate agents delegated to dealing with different subsets of state. This means that when there is a change in the intent or operational state, the agent’s reaction is to “incremental change” and is independent of the size of the whole state.
Apstra employs the traditional approach to deal with scale and associated complexity — decomposition. The “everyone knows everything” approach doesn’t scale, so Apstra distributes the knowledge about the desired state and lets each agent determine how to reach that state. This eliminates the need for centralized decision making. Because of this, the Apstra Server is not considered a “controller”. Apstra’s support for live graph queries implies that clients such as UI can ask for exactly what they want and get exactly what they need and nothing more, allowing granular control of the amount of data to be fetched from the back end.
Scaling Network Traffic
The third dimension is scaling network traffic. Communication between the agents and data store is using an optimized binary channel, thus significantly reducing the amount of traffic compared to text-based protocols.
Fault tolerance is achieved by executing the Apstra application as multiple processes, possibly running on separate hardware devices connected by a network and separating the state from processing with support for replicated state and fast recovery of state.
Apstra has been tested with production deployments of 400+ switches. Apstra has completed internally testing of network fabrics comprising up to 1600 virtual devices. Physical fabric size limitations are based on vendor form factor and software limitations.