An important aspect of any network management system is to monitor, control and plan the network infrastructure. As the operator network increases in size, heterogeneity and complexity, effective management and planning for such network becomes more important. The main challenges in this area include:
Identifying the data to be collected
Measurement strategy or Interpreting the collected data
Publishing the Threshold events in networks
Presenting the data, which helps in analyzing the networks performance
The Need and Benefits of Performance Manager
Before understanding the Performance Manager capabilities and benefits, an operator’s expectation from any performance manager tool might be the following:
An operator looks towards the performance manager as a source that can provide a holistic insight of the network performance.
The operator views the performance manager for data that capture the essential network data for planning, optimization and operation.
Data that can enable to ensure a high-quality network all the time.
A tool that can identify the performance degradations proactively.
Edge Services Director uses the Juniper Networks Device Management Interface (DMI) to directly connect to and discover devices. DMI is an extension to the NETCONF network management protocol. Performance Manager is designed to address the aforementioned requirements.
The PM functionality takes care of the following aspects:
The types of features that Performance Manager presents to the operator to enable the operator to ensure service availability, verify or monitor individual services and the service network performance.
The mechanism and manner that Performance Manager uses to collect the different metrics and interpret them to achieve the functionalities.
The following table describes the operator expectations and the corresponding features or screens available in Edge Services Director:
Ensure high quality network
Operator can view the near real time health and performance of its SDG Network
Operator can Monitor health and performance of individual Services/Service Instances
Operator can Monitor health and performance of individual Chassis parameters (for example, packet path through interface and port
Maximize the utilization of network investments
Operator can define event threshold on different Metric’s, which generates alarms when a threshold limit is breached
Analysis of trend or historical performance for a SDG or its logical and physical components when the real time data is showing some fault or performance degradation
View the historical or trend performance for individual Service Instance
Compare service instance performance
Improve the efficiency of operations
View the historical or the trend performance for individual hardware components such as traffic following through interface and port
Analysis of Faults and Syslog for historical time period where some performance degradation is observed
Generate consolidate PM KPI reports periodically for different services and Chassis components for easy information sharing across an operator’s organization
Type of Data Collected
It is important to identify and define the performance manager KPIs that can help the operator to measure the performance and the operational status of the services running in theirSDG network. Apart from services, the operator might also interested in KPIs for the different chassis options, which are enabling the service execution in the SDG network. Different metrics such as real service instance KPIs and HA Network KPIs are collected. Such a collection cause the PM data collector and PM data aggregator to operate effectively. The following are the different sets of KPIs supported.
SNMPv3 with MD5 and SHA authentication and DES, 3DES, AES 128, AES 192, and AES 256 privacy.
Pluggable Message Processing Models with implementations for MPv1, MPv2c, and MPv3.
Pluggable transport mappings. UDP, TCP, and TLS are supported out-of-the-box.
Synchronous and asynchronous requests.
Logging based on Log4J
Row-based efficient asynchronous table retrieval with GETBULK.
Retrieval of scalar counters with GET requests.
Service Instance KPIs—These KPIs are collected at a service instance level such as adc instance, tlb instances. Some KPIs from this list provide the near real time performance of the service instance. The following are the KPIs for different services supported:
ADC KPIs SNMP/DMI VIP Status SNMP Real Server Status SNMP Connection-table count SNMP CPU status for ADC Control for last 64 seconds SNMP and DMI CPU status for ADC Data cores for last 64 seconds/NPU SNMP and DMI CPU status for each ADC Data cores for last 64 seconds/NPU SNMP and DMI Allocation Failures per NPU SNMP Allocation Failures per DP SNMP ADC ms interface status SNMP ADC Egress Interface SNMP TLB KPI TLB Routing Instance Composite next hop Index status SNMP Real Server Status SNMP Net-monitored & overall CPU utilization of TLB PIC SNMP and DMI TLB ms interface status SNMP TLB Egress Interface SNMP CGNAT KPI CPU status SNMP and DMI Packet drop status [delta for every 3 polling] SNMP Memory status SNMP NAT pool status [ Utilization = Ports in use*100/Total configured ports] SNMP Port Blocks in use If configured SNMP CGNAT service pic status SNMP CGNAT - Stateful sync CPU Utilization SNMP Statefull Firewall KPI CPU status SNMP and DMI Packet drop status [delta for every 3 polling] SNMP Memory status SNMP and DMI SFW service pic status SNMP
Chassis KPIs—These KPIs are collected for all the interfaces, ports and other physical components which are used by the SDG service instances to perform their task such as ingress egress on a service pic, inPacket and outPacket on AE interface. Some of the KPIs are defined to provide the near real time performance for the physical component.
HA Network KPIs—These KPIs are applicable only when the SDG setup is a HA deployment. These KPIs are the indicators of the HA deployments performance and health. Some of the KPIs provide the near real-time performance of the HA setup (the master and the backup SDG). The following are the KPIs identified for Master and Backup SDGs:
HA Master SDG status SNMP and DMI BGP advertising AS path information for GI-PVT SNMP VRRP status SNMP CGNAT Stateful sync status SNMP CGNAT default route Route status in GI-PVT SNMP ADC VIP route status in radware routing instance route table SNMP TLB routing instance default Route status SNMP HA Backup SDG status SNMP and DMI BGP advertising AS path information for GI-PVT SNMP VRRP status SNMP & DMI CGNAT Stateful flows HA status SNMP CGNAT default route Route status in GI-PVT SNMP ADC VIP route status in radware routing instance route table SNMP TLB routing instance default Route status SNMP
KPI Threshold Definition—An operator can configure on which KPIs the thresholding to be enabled. Using the Monitoring Profile, one can associate the thresholding KPIs to any in the network. Monitoring Profile is a predefined list of KPIs and threshold value for each KPI. SDG NM Fault Manager,which is an Open NMS solution integrated with Junos Space defines different UEI (Unique Event Identifiers). These UEIs are correlated by PM to the existing list of KPIs for which threshold definition exists. When there is any breach for any of these threshold KPIs, SDG NM Performance Manager will send an event trap to the Open NMS FM system.
Throughput KPIs—The KPIS that measure the throughput for any service instance (that is. ADC, CGNAT, SFW and TLB) running on the SDG are as follows:
TLB- throughput on the IRB link where firewall filter exists ADC- associated ms interface throughput CGNAT- Ingress Egress on the associated sp interface SFW- Ingress Egress on the associated sp interface
Method of Collection of Data
The core capability of a PM system is to develop a robust and scalable collection mechanism. The performance manager tool supports a multiprotocol data collection using DMI and SNMP. The DMI capabilities are extended from the Junos Space Platform and the SNMP Collector is a proprietary implementation. The current collection support in Performance Manager is as follows:
DMI based collection: SDG network deployed on Juniper’s MX series router supports management interaction over a proprietary communication channel called DMI (Device Management Interface). This is an implementation of netconf protocol (RFC 6241). Edge Services Director uses the DMI capabilities from Junos Space Network Application platform and adopts an XML remote procedure call (RPC) collection mechanism. The following are the characteristics of this collector:
Data collection for KPIs marked above for real time monitoring
DMI is an xml rpc based communication which allows the Management Application to run the CLI commands and get a near real time performance data.
DMI based collection would be supported only for the features where the operator is presented with near real time data i.e. Monitor SDG and Monitor SDG Service UI. 4.
SDG NM also has a SNMP based collection available for these KPIs. Thus the data collected over the DMI channel would not be persisted as the trends for these KPIs can be read from the SNMP collected data store.
SNMP based Collection: Performance manager provides a data collection for the entire SDG PM counter over SNMP. The reason for not extending DMI data collection or for implementing our proprietary SNMP collector are:
DMI channel is an SSH channel over which the two host communicate via rpc xml. This can be leveraged for collecting only a small number of counters. When the number of SDG devices in network increase, using an SSH chanel is a time-consuming operation.
Build RPC XML messages and parse the reply for a large number of counters on a big network is processor-intensive that degrades the overall the data collection and representation, thereby affecting the real time aspect of monitoring the performance.
Junos Space does not support the SNMP based collection for the Utility MIB object IDs.
Performance Manager provides an SNMP based collection, offering the following advantages to SDG NM:
Method of Measurement of Data
The PM counters polled over SDG NM SNMP collector are aggregated and presented to the operator. For PM, all the counters are polled by the SDG NM SNMP collector. Data for PM is not sent to the SDG NM by the device. The following are the details on how the data collected by SDG NM SNMP collector is stored across in SDG NM.
Dashboard and Monitoring View
Few of the counters are required to display the daily performance trend of the SDG and SDG Network in the Dashboard and Monitoring View. These counters are stored in the JunosSpace MySQL database. The retrieval from MySQL is fast because of the small number of KPIs that are refreshed in frequent interval to show the current day performance. The graphical display in Dashboard and Monitoring views are more dynamic with the monitoring rules is place; therefore, the data needs to be stored for querying and application of rules.
Performance Manager View
Only the trend or past performance data are displayed, which implies a large number of KPIs and a longer duration of data.
Performance Manager Functionalities
Performance Manager provides the following functionalities:
The trend data is available for 1 day to last 365 days with 15 minutes granularity (These numbers are proposed considering the current support for SNMP polling on the MX device and the storage capacity available on Junos Space device).
View Service performance for each SDG.
Compare different service instance metrics across SDGs.
View trends for HA metrics for Master and Standby.
View trends for the Chassis level metrics.
Switch to FM and Syslog view for any specific time period where any peak in data is observed.
Switch to PM view from Monitor SDG/Chassis view and Monitor SDG Services view
View the Top 3 Talkers of the day (Based on the highest number of Threshold Alarms for the day)
There is no separate landing page for PM View. On selection, it launches the PM view as described here. Also the first SDG is selected by default in the navigation tree.
This view is split into three parts. The first pane is a navigation tree showing all the deployed SDGs. Each root node is the name of the SDG and the child nodes are the Chassis, HA (this node will appear only for SDG which are deployed as HA) and the installed services ADC, TLB, CGNAT and SFW. Each service is a root node for the service instances. The middle pane is the Graphing area which displays the trend graphs for the selected KPIs. For SDG deployed as HA there is small change in the view on selection of the Chassis and HA node in the navigation tree. In this case for Chassis and HA this section would be represented as tabbed view representing the Master and Standby’s. For all other selection in the navigation tree it will be shown as single tab. Other aspects of this pane are View Service Instance Metrics, View Chassis Metrics and Compare Metrics. The last section is the KPI view which lists the KPIs for the select node in the navigation tree in the first pane. It presents different actions on the selected KPIs like Graph, Graph All, Select All, and Select None. These actions are further described below for individual views.
Performance Manager View After a Context-Switch from the Monitoring Page
If you perform context switch to PM view from the Monitoring view or types in an SDG component by using the search utility, the Search text box displays the component selected in the Monitoring View before doing the context switch.
The navigation tree is filtered based on the search criteria. With a context switch, the navigation tree is filtered to display only the selected component for which the context switch happened and the same node is selected.