Service Monitoring

 

Ceph Monitoring

Ceph is a unified, distributed storage system that provides object storage and block storage. AppFormix monitors Ceph performance, availability, and usage, with both charts and alarms.

In addition, AppFormix Agent can be installed on the Ceph object storage daemon (OSD) and monitor hosts, for real-time health and performance monitoring of the storage hosts that power a Ceph storage cluster.

Ceph Service Monitoring

From the context menu, select Services > Ceph. The Ceph service monitoring page displays a summary of the current usage of a Ceph cluster, including total cluster capacity, used capacity, and number of OSDs, pools, objects. The Health Status table displays errors and warnings of your Ceph cluster. Details about usage of each storage pool are shown in table and chart views.

Figure 1 shows the Ceph service monitoring page and storage pool usage details in a table.

Figure 1: Ceph Service Summary of Current Usage of Ceph Cluster
Ceph Service Summary of Current
Usage of Ceph Cluster

Figure 2 shows the Ceph service monitoring page and storage pool usage details in a chart.

Figure 2: Ceph Service Summary of Storage Pool Usage in Chart View
Ceph Service Summary of Storage
Pool Usage in Chart View

Monitoring Ceph OSD and Monitor Nodes

With AppFormix Agent installed on the Ceph storage hosts, details are available about each OSD and Monitor node in the cluster. Using the context menu, select Services > Ceph > Nodes. Each host in the list has a tag of ceph-osd or ceph-monitor. When a host with a ceph-osd tag is selected, a summary of host performance metrics are shown, as well as the health and status of each OSD on the host. See Figure 3 for an example summary.

Figure 3: Performance Metrics, Health, and Status for Each OSD on Host
Performance Metrics, Health,
and Status for Each OSD on Host

All of AppFormix host monitoring functionality is available for the storage host, including Charts and Alarms. Navigate to Charts and Alarms in the left menu.

Figure 4: Navigating to Host Chart View from Monitoring Nodes
Navigating to Host Chart View from Monitoring Nodes

Service Alarms

Alarms can be configured to monitor the Ceph cluster metrics at the cluster, pool, or host level.

To configure an alarm for cluster-wide and per-pool metrics, select Alarms in the left menu. Choose the Service Alarms module, and select ceph from the Service drop-down list. Ceph service alarms can be created to monitor a cluster or a pool. With cluster scope, an alarm can be configured for cluster-wide metrics, such as the cluster storage usage. With pool scope, an alarm can be configured to monitor per-pool metrics for one or multiple pools.

To configure an alarm for a Ceph storage host, select the Alarms module in the Alarms pane. An alarm can be configured for one or multiple Ceph storage hosts. See Configuring Alarms in Alarms for details.

As with all alarms in AppFormix, Notifications can be configured for Ceph alarms. Figure 5 shows the alarm state for the Ceph cluster metrics.

Figure 5: Alarm State for Ceph Cluster Metrics
Alarm State for Ceph Cluster
Metrics

Configuration

See Service Monitoring for steps to configure AppFormix to monitor a Ceph cluster.

Contrail Monitoring

Contrail Networking is a software-defined networking (SDN) platform based on the open-source network virtualization project, OpenContrail. The Juniper platform automates and orchestrates the creation of highly scalable virtual networks.

AppFormix provides monitoring and orchestration for the OpenContrail Service. See the Service Monitoring instructions for how to configure Contrail monitoring.

Dashboard

AppFormix service monitoring Dashboard for a Contrail cluster displays the overall state of the cluster and its components.

AppFormix provides real-time liveness for each Contrail service part of the four Contrail service groups - Analytics Nodes, Config Nodes, Controller Nodes, and DB Nodes running on all the hosts configured during the Contrail installation. Figure 6 shows real-time liveness for each Contrail service.

Figure 6: Contrail Real-Time Liveness
Contrail Real-Time
Liveness

AppFormix also provides a historical liveness view of each Contrail service. Figure 7 show a historical liveness view.

Figure 7: Contrail Historical Liveness
Contrail Historical
Liveness

In addition, any alarm generated by the Contrail Service can also be accessed on the AppFormix Dashboard. Figure 8 shows examples of Contrail service alarms.

Figure 8: Contrail Service Alarms
Contrail Service Alarms

AppFormix monitors the real-time status of every element of the Contrail cluster. You can select an element from the Group drop-down list for the Contrail Service. For example, if Analytics Nodes service group is selected, the Dashboard displays each service on every host that is configured for that particular service group. Liveness statistics and basic metrics are also available for each service in this view. Figure 9 shows statistics and metrics for the Contrail analytics nodes.

Figure 9: Contrail Service Analytics Nodes Statistics
Contrail Service
Analytics Nodes Statistics

For Contrail Config Nodes, AppFormix enables a Peer view for XMPP and BGP peers. The information provides some rx and tx reachability statistics, as shown in Figure 10.

Figure 10: Contrail Service XMPP Peers
Contrail Service XMPP
Peers

Service Alarms

An alarm can be configured for any of the Contrail metrics collected. In the Alarm panel, select the Service Alarms module. Then select Contrail from the Service drop-down list. Additionally, notifications can also be configured for Contrail service alarms. Figure 11 shows the Alarm pane for configuring Contrail service alarms. See Alarms and Notifications.

Figure 11: Alarm Pane for Configuring Contrail Service Alarms
Alarm Pane for
Configuring Contrail Service Alarms

Flow Monitoring with Contrail vRouter

When the Contrail vRouter is installed on a compute node, AppFormix provides debug mode functionality in the Network Topology panel. In this mode, the top flows on each compute node are available for visualization with details on flow tuples, packets, and bytes. Figure 12 shows the flow monitoring details and visualization.

Figure 12: Flow Monitoring with Contrail vRouter
Flow Monitoring with Contrail vRouter

In debug mode, you can analyze details on the top flows on any compute part of the Network Topology view. Figure 13 shows the Contrail flow details.

Figure 13: Contrail Flow Monitoring Details
Contrail Flow Monitoring
Details

Configuration

In order for AppFormix to monitor Contrail metrics, the AppFormix Platform must be configured with the administrator credentials for the OpenStack environment with Contrail. Another requirement for Contrail monitoring is that AppFormix Platform host must be able to open connections to the analytics API and configuration API. For example, ports 8081 and 8082 on the Contrail controller.

Contrail cluster connection details can be configured in AppFormix Dashboard or the Ansible playbooks.

To configure Contrail cluster connection details from the Dashboard:

  1. Select Settings > Service Settings. Then select the Contrail tab, as shown in Figure 14.

    Figure 14: Configure Contrail Cluster Connection Details
    Configure
Contrail Cluster Connection Details
  2. Click Add Cluster. Enter the cluster name, analytics URL, and configuration URL. The URLs should specify only the protocol, address, and optionally port.

    For example, http://contrail.example.com:8081 for the analytics URL and http://contrail.example.com:8082 for the configuration URL.

  3. Click Setup. On success, a Submission Successful message appears in the Dashboard.

For configuration using ansible playbooks, see Service Monitoring for steps to configure AppFormix to monitor a Contrail cluster.

MySQL Monitoring

A MySQL database is integral to the operation of OpenStack infrastructure services. Metrics for MySQL performance are available in real-time charts and alarms. Mulitple MySQL clusters can be configured to be monitored.

Resource Availability

The availability of MySQL nodes for each of the configured MySQL clusters is recorded periodically. You can view both the current status, as well as the historical status over a specified period of time by selecting All Services > MySQL from the context menu at the top and, then select Dashboard from the left pane. Figure 15 shows the historical resource availability for the MySQL nodes.

Figure 15: MySQL Nodes Historical Availability
 MySQL Nodes Historical
Availability

Figure 16 shows the real-time resource availability for the MySQL nodes.

Figure 16: MySQL Nodes Real-Time Availability
 MySQL Nodes Real-Time
Availability

Dashboard

Each MySQL cluster has a dashboard displaying real-time usage metrics for each of its nodes, as shown in Figure 17.

Figure 17: Real-Time Usage Metrics for Cluster Nodes
Real-Time Usage Metrics for
Cluster Nodes

Real-Time Charts

From the context menu, select All Services > MySQL. Click the Charts icon from the left navigation pane. Figure 18 shows MySQL performance metric charts.

Figure 18: MySQL Performance Metric Charts
MySQL Performance Metric
Charts

Service Alarms

An alarm can be configured for any of the MySQL metrics collected. In the Alarm pane, select the Service Alarms module. Then select mysql from the Service drop-down list. MySQL alarms can be created for one or more MySQL nodes. Additionally, Notifications can also be configured for MySQL Alarms. Figure 19 shows the Alarm Input pane for MySQL alarm configuration.

Figure 19: Alarm Input Pane for MySQL
Alarm Input Pane for MySQL

Configuration

For AppFormix to monitor MySQL metrics, there must exist a MySQL user with remote, read-permission. In this topic, we create a new user with read-only access to the database. Alternately, an existing user account can be used.

To configure MySQL monitoring:

  1. Create a read-only user account 'appformix' that can access the MySQL database from any host:

    Change 'mypassword' to a strong password. Optionally, you may restrict the 'appformix' account to only connect from a specific IP address or hostname by replacing '%' with the host on which AppFormix Controller runs.

  2. Next, configure the MySQL connection details in AppFormix. From the Settings menu, select Service Settings. Then, select the MySQL tab.
  3. Enter the host and port on which MySQL runs. The default port for MySQL is 3306.
  4. Enter the username and password from Step 1. Finally, click the Setup button. On success, the button changes to Submitted. Figure 20 shows MySQL connection and credential settings.
    Figure 20: My SQL Connection and Credential Settings
    My SQL Connection and Credential
Settings

RabbitMQ Monitoring

OpenStack depends on RabbitMQ to deliver messages between services. AppFormix Service Monitoring can be used to monitor RabbitMQ metrics through real-time charts. Service alarms can also be configured for these metrics.

Resource Availability

The connectivity of nodes for each of the configured Rabbit clusters is recorded periodically. You can view both the current status, as well as the historical status over a specified period of time by selecting Services > RabbitMQ from the context menu at the top, and selecting Dashboard in the left pane.

Dashboard

The Dashboard also provides detailed metrics for a single RabbitMQ cluster, as shown in Figure 21. Select Dashboard in the left pane, then Services > RabbitMQ in the top context menu, and then select a Rabbit Cluster by name.

Figure 21: Real-Time Usage Metrics for RabbitMQ Cluster
Real-Time Usage Metrics
for RabbitMQ Cluster

The counters in the top pane display the number of active channels, connections, consumers, exchanges, and queues. Below, tables display statistics about message rates across the cluster, and per-node resource consumption.

Real-Time Charts

For a real-time view of RabbitMQ metrics, select All Services > RabbitMQ from the context menu. Next, click the Charts icon in the left pane. Figure 22 shows RabbitMQ real-time metric charts.

Figure 22: RabbitMQ Real-Time Metric Charts
RabbitMQ Real-Time Metric Charts

Service Alarms

To configure an alarm to monitor RabbitMQ metrics, select Alarms to open the Alarm pane. See Alarms. Select Service_Alarms for the module and rabbit for the service. An alarm can be configured for a metric on a per-cluster, per-node, or per-queue basis. Select the appropriate metric scope, and then choose a metric to monitor. As with other alarms, you can optionally configure Notifications in the Advanced settings. Figure 23 shows the RabbitMQ alarm configuration pane.

Figure 23: RabbitMQ Alarm Configuration
RabbitMQ Alarm Configuration

Configuration

For AppFormix to be able to collect metrics from RabbitMQ, the RabbitMQ management plug-in must be enabled, and AppFormix must be configured with user credentials to collect RabbitMQ metrics.

To configure RabbitMQ monitoring:

  1. Enable the RabbitMQ plugin by issuing the following commands on the host that runs RabbitMQ:
  2. AppFormix requires RabbitMQ user credentials with privileges to read the metrics. You can use an existing RabbitMQ user with an administrator or monitoring role, or create a new user account. To create a user account with “monitoring” privileges, issue the following commands on the host that run RabbitMQ:"" "" ".*"

    Replace the sample mypassword with a strong password.

  3. Verify the settings by opening http://<rabbit-host>:15672/ in a Web browser, and log in with the RabbitMQ user credentials.
  4. Configure AppFormix with the details of the RabbitMQ cluster. Click Settings from the Dashboard. In the Services Settings page, select the RabbitMQ tab.

    Enter the Rabbit Cluster URL from Step 1. Enter the username and password from Step 2. Click Setup. On success, the button changes to Submitted. Figure 24 shows the RabbitMQ URL and credential settings.

    Figure 24: RabbitMQ URL and Credential Settings
    RabbitMQ URL and Credential
Settings

OpenStack Services Monitoring

AppFormix monitors Keystone, Nova, and Neutron services that power the OpenStack cloud management system. AppFormix performs status checks for processes that implement the services on both controller and compute hosts.

AppFormix monitors the overall connectivity to each API and the status of components that comprise the service.

  • Overall connectivity is monitored by issuing an API call to get the component service list in the case of Nova and Keystone, or the agent list in the case of Neutron. The status of this check is reflected in default_openstack_cluster_status for each of Keystone, Nova, and Neutron. If the API call is successful, the default_openstack_cluster_status is good.

  • Latency of the API is recorded. An alarm can be configured for the API latency metric.

  • Each of the above API calls returns a list of sub-services. AppFormix examines the statuses of these individual sub-services. AppFormix displays the health of each sub-service in the list.

For example, if the nova-api sub-service is up and responds to the API call successfully, then the Health of the default_openstack_cluster_status for Nova will be Good - even if an individual sub-service of Nova has failed. As an alternative example, consider if nova-scheduler is not running. If the API call to list the status of Nova sub-services succeeds, then the default_openstack_cluster_status will be Good but Health of the nova-scheduler will be Bad.

Dashboard

You can view both the current status and the historical status over a specified period of time in the Dashboard. Select the name of a service from Services in the context menu at the top, and select Dashboard from the left pane.

Figure 25 shows the Openstack Keystone nodes real-time availability.

Figure 25: OpenStack Keystone Nodes Real-Time Availability
OpenStack Keystone
Nodes Real-Time Availability

Figure 26 shows the Openstack Keystone nodes historical availability.

Figure 26: OpenStack Keystone Nodes Historical Availability
OpenStack Keystone
Nodes Historical Availability

Figure 27 shows the Openstack Nova nodes real-time availability.

Figure 27: OpenStack Nova Nodes Real-Time Availability
OpenStack Nova Nodes Real-Time
Availability

Figure 28 shows the Openstack Nova nodes historical availability.

Figure 28: Openstack Nova Nodes Historical Availability
Openstack Nova Nodes
Historical Availability

Figure 21 shows the Openstack Neutron nodes real-time availability.

Figure 29: OpenStack Neutron Nodes Real-Time Availability
 OpenStack Neutron
Nodes Real-Time Availability

Figure 30 shows the Openstack Neutron nodes historical availability.

Figure 30: OpenStack Neutron Nodes Historical Availability
OpenStack Neutron
Nodes Historical Availability

Service Alarms

An alarm can be configured for any of the OpenStack services. In the Alarm pane, select the Service Alarms module. Then, select openstack from the Service drop-down list. The metrics for which alarms can be configured are broadly categorized into three scopes:

ClusterHeartbeat metrics, such as liveness checks for Nova, Neutron, and Keystone APIs.
HostAllocation of resources on compute hosts. Alarms can be configured for absolute count or as a percentage of host capacity. Metrics include virtual CPU (vCPU), memory, and local storage.
ProjectAllocation of resources by a project. Alarms can be configured for absolute count or as a percentage of project quota. Resource metrics include instances, vCPU, memory, storage, floating IP addresses, and security groups.

As with other alarms, notifications can also be configured for any OpenStack service alarm, as shown in Figure 31.

SLA profiles can be configured for Nova, Neutron, and Keystone by navigating to Settings > SLA Settings. Then select the appropriate tab for the service. A list of rules can be defined for both Health and Risk.

Figure 31: Alarm Input Pane for OpenStack Services
Alarm Input Pane for
OpenStack Services

Configuration

The OpenStack configuration parameters provided during AppFormix installation are sufficient for monitoring OpenStack services. No additional configuration is required. To modify the current values, from the Settings menu, select Service Settings. Then select the OpenStack Services tab. Figure 32 shows the OpenStack services settings and configuration parameters.

Figure 32: OpenStack Services Settings and Configuration Parameters
OpenStack Services
Settings and Configuration Parameters

ScaleIO Monitoring

ScaleIO provides software-defined block storage. AppFormix metrics for ScaleIO performance and availability are available in real-time charts and alarms.

Dashboard

The AppFormix service monitoring dashboard for a ScaleIO cluster displays the overall state of the cluster and its components. It also displays real-time storage capacity and read/write bandwidths of the cluster, as shown in Figure 33.

Figure 33: Real-Time Usage Metrics for ScaleIO Cluster
Real-Time Usage Metrics
for ScaleIO Cluster

Real-Time Charts

To view cluster-wide metrics in the charts, select Services > ScaleIO from the top context menu. Select the Charts icon from the left pane. Figure 34 shows the ScaleIO service summary of cluster metrics in a chart view.

Figure 34: ScaleIO Service Summary of Cluster Metrics in Chart View
ScaleIO Service Summary
of Cluster Metrics in Chart View

Real-Time Status of ScaleIO Components

AppFormix monitors the real-time status of every element of the ScaleIO cluster. You can select an element from the Resource drop-down list.

SDS

Figure 35 shows the real-time status of SDS elements of the ScaleIO cluster.

Figure 35: Real-Time Status of SDSs of the ScaleIO Cluster
Real-Time Status of SDSs of
the ScaleIO Cluster

SDC

Figure 36 shows the real-time status of SDC elements of the ScaleIO cluster.

Figure 36: Real-Time Status of SDCs of the ScaleIO Cluster
Real-Time Status of SDCs of
the ScaleIO Cluster

Protection Domain

Figure 37 shows the real-time status of the protection domains of the ScaleIO cluster.

Figure 37: Real-Time Status of Protection Domains of the ScaleIO Cluster
Real-Time Status
of Protection Domains of the ScaleIO Cluster

Storage Pools

Figure 38 shows the real-time status of the storage pools of the ScaleIO cluster.

Figure 38: Real-Time Status of Storage Pools of the ScaleIO Cluster
Real-Time Status of
Storage Pools of the ScaleIO Cluster

Devices

Figure 39 shows the real-time status of the devices of the ScaleIO cluster.

Figure 39: Real-Time Status of Devices of the ScaleIO Cluster
Real-Time Status of Devices
of the ScaleIO Cluster

Volumes

Figure 40 shows the real-time status of the volumes of the ScaleIO cluster.

Figure 40: Real-Time Status of Volumes of the ScaleIO Cluster
 Real-Time Status of Volumes
of the ScaleIO Cluster

Service Alarms

An alarm can be configured for any of the ScaleIO metrics collected. In the Alarm pane, select the Service Alarms module. Then select scaleio from the Service drop-down list. Additionally, notifications can also be configured for ScaleIO alarms, as shown in Figure 41.

Figure 41: Alarm Input Pane for ScaleIO
Alarm Input Pane for ScaleIO

Per-Instance Storage Volume Metrics

When a virtual machine mounts a storage volume, AppFormix Agent monitors the disk latency and throughput to the network attached storage volume. Instance metrics for storage I/O and latency (such as disk.* metrics) are available on a per-volume basis in the charts. An alarm on such a metric will indicate the volume for which the alarm triggered.

Configuration

For AppFormix to monitor ScaleIO metrics, there must exist a ScaleIO user with admin authorization of the cluster. ScaleIO cluster connection details can be configured in AppFormix. From the Settings menu, select Service Settings. Then, select the ScaleIO tab.

Enter the cluster name and host on which ScaleIO runs. Enter the username and password, then click Setup. On success, the button changes to Submitted. Figure 42 shows the ScaleIO services and credentials settings.

Figure 42: ScaleIO Services and Credentials Settings
ScaleIO Services and
Credentials Settings