Metrics

 

A metric is a measured value for an element in the infrastructure. AppFormix Agent collects and calculates metrics for hosts and instances. AppFormix metrics are organized into hierarchical categories based on the type of metric.

Some metrics are percentages of total capacity. In such cases, the category of the metric determines the total capacity by which the percentage is computed. For instance, host.cpu.usage indicates the percentage of CPU consumed relative to the total CPU available on a host. In contrast, instance.cpu.usage is the percentage of CPU consumed relative to the total CPU available to an instance. As an example, consider an instance that is using 50% of one core on a host with 20 cores. The instance's host.cpu.usage will be 2.5%. If the instance has been allocated two cores, then its instance.cpu.usage will be 25%.

Alarms can be configured for any metric. Many metrics can also be displayed in charts. When an alarm triggers for a metric, the alarm is plotted on charts at the time of the event. In this way, metrics that cannot be plotted directly as a chart are still visually correlated in time with other metrics.

AppFormix Agent collects both raw metrics and calculated metrics. Raw metrics are values read directly from the underlying infrastructure. Calculated metrics are metrics that AppFormix Agent derives from raw metrics.

Hosts

Table 1 lists the raw metrics available for hosts.

Table 1: Raw Metrics for Hosts

Metric

Chart

Alarm

host.cpu.io_wait

x

x

host.cpu.ipc **

x

x

host.cpu.l3_cache.miss **

x

x

host.cpu.l3_cache.usage **

x

x

host.cpu.mem_bw.local **

x

x

host.cpu.mem_bw.remote **

x

x

host.cpu.mem_bw.total **

x

x

host.cpu.usage

x

x

 

host.disk.io.read

x

x

host.disk.io.write

x

x

host.disk.response_time

x

x

host.disk.read_response_time

x

x

host.disk.write_response_time

x

x

host.disk.smart.hdd.command_timeout

 

x

host.disk.smart.hdd.current_pending_sector_count

 

x

host.disk.smart.hdd.offline_uncorrectable

 

x

host.disk.smart.hdd.reallocated_sector_count

 

x

host.disk.smart.hdd.reported_uncorrectable_errors

 

x

host.disk.smart.ssd.available_reserved_space

 

x

host.disk.smart.ssd.media_wearout_indicator

 

x

host.disk.smart.ssd.reallocated_sector_count

 

x

host.disk.smart.ssd.wear_leveling_count

 

x

host.disk.usage.bytes

x

x

host.disk.usage.percent

x

x

 

host.memory.usage

x

x

host.memory.swap.usage

x

x

host.memory.dirty.rate

x

x

host.memory.page_fault.rate

x

x

host.memory.page_in_out.rate

x

x

 

host.network.egress.bit_rate

x

x

host.network.egress.drops

x

x

host.network.egress.errors

x

x

host.network.egress.packet_rate

x

x

host.network.ingress.bit_rate

x

x

host.network.ingress.drops

x

x

host.network.ingress.errors

x

x

host.network.ingress.packet_rate

x

x

host.network.ipv4tables.rule_count

x

x

host.network.ipv6tables.rule_count

x

x

 

openstack.host.disk_allocated

x

x

openstack.host.memory_allocated

x

x

openstack.host.vcpus_allocated

x

x

Note: ** CPU cache and memory bandwidth metrics are available for Intel© Xeon© processor family with Intel© Resource Directory Technology. The AppFormix software automatically detects the processor family and makes the additional metrics available for display and analysis.

Table 2 lists the calculated metrics available for hosts.

Table 2: Calculated Metrics for Hosts

Metric

Chart

Alarm

host.cpu.normalized_load_1m

x

x

host.cpu.normalized_load_5m

x

x

host.cpu.normalized_load_15m

x

x

host.cpu.temperature

 

x

 

host.disk.smart.predict_failure

 

x

 

host.heartbeat

 

x

host.cpu.normalized_loadNormalized load is calculated as a ratio of the number of running and ready-to-run threads to the number of CPU cores. This family of metrics indicate the level of demand for CPU. If the value exceeds 1, then more threads are ready to run than exists CPU cores to perform the execution. Normalized load is a provided as an average over 1-minute, 5-minute, and 15-minute intervals.
host.cpu.temperatureCPU temperature is derived from multiple temperature sensors in the processor(s) and chassis. This temperature provides a general indicator of temperature in degrees Celsius inside a physical host.
host.disk.smart.predict_failureAppFormix Agent calculates predict_failure using multiple S.M.A.R.T. counters provided by disk hardware. The agent will set predict_failure to true (value=1) when it determines from a combination of S.M.A.R.T. counters that a disk is likely to fail. An alarm triggered for this metric contains the disk identifier in the metadata.
host.heartbeatThe host.heartbeat indicates if AppFormix Agent is functioning on a host. AppFormix Controller periodically checks the status of each host by making a status request to AppFormix Agent. The host.heartbeat metric is incremented for each successful response. Alarms can be configured to detect missed heartbeats over a given interval.

Instances

Table 3 lists the raw metrics available for instances.

Table 3: Raw Metrics for Instances

Metric

Chart

Alarm

instance.cpu.usage

x

x

instance.cpu.ipc **

x

x

instance.cpu.l3_cache.miss **

x

x

instance.cpu.l3_cache.usage **

x

x

instance.cpu.mem_bw.local **

x

x

instance.cpu.mem_bw.remote **

x

x

instance.cpu.mem_bw.total **

x

x

 

instance.disk.io.read_bandwidth

x

x

instance.disk.io.read_iops

x

x

instance.disk.io.read_iosize

x

x

instance.disk.io.read_response_time

x

x

instance.disk.io.write_bandwidth

x

x

instance.disk.io.write_iops

x

x

instance.disk.io.write_iosize

x

x

instance.disk.io.write_response_time

x

x

instance.disk.usage.bytes

x

x

instance.disk.usage.percentage

x

x

 

instance.memory.usage

x

x

 

instance.network.egress.bit_rate

x

x

instance.network.egress.drops

x

x

instance.network.egress.errors

x

x

instance.network.egress.packet_rate

x

x

instance.network.egress.total_bytes

x

x

instance.network.egress.total_packets

x

x

instance.network.ingress.bit_rate

x

x

instance.network.ingress.drops

x

x

instance.network.ingress.errors

x

x

instance.network.ingress.packet_rate

x

x

instance.network.ingress.total_bytes

x

x

instance.network.ingress.total_packets

x

x

Note: ** CPU cache and memory bandwidth metrics are available for Intel© Xeon© processor family with Intel© Resource Directory Technology. The AppFormix software automatically detects the processor family and makes the additional metrics available for display and analysis.

Table 4 lists the calculated metric available for instances.

Table 4: Calculated Metrics for Instances

Metric

Chart

Alarm

instance.heartbeat

 

x

instance.heartbeatThe instance.heartbeat indicates whether an instance is running. AppFormix Agent periodically checks the state of host processes associated with each instance. The instance.heartbeat metric is incremented for each successful status check. Alarms may be configured to detect missed heartbeats over a given interval.

Network Device

AppFormix can collect network device metrics using SNMP or Juniper Telemetry Interface (JTI). See Network Devices for details.

Table 5 lists the metrics available per interface with SNMP network device monitoring.

Table 5: Metrics Available per Interface with SNMP Network Device Monitoring

Metric

Unit

Chart

Alarm

snmp.interface.out_discards

discards/s

x

x

snmp.interface.in_discards

discards/s

x

x

snmp.interface.in_errors

errors/s

x

x

snmp.interface.out_unicast_packets

packets/s

x

x

snmp.interface.in_octets

octets/s

x

x

snmp.interface.in_unicast_packets

packets/s

x

x

snmp.interface.out_packet_queue_length

count

x

x

snmp.interface.speed

bits/s

x

x

snmp.interface.out_octets

octets/s

x

x

snmp.interface.in_unknown_protocol

packets/s

x

x

snmp.interface.in_non_unicast_packets

packets/s

x

x

snmp.interface.out_errors

errors/s

x

x

snmp.interface.out_non_unicast_packets

packets/s

x

x

Table 6 lists the metrics available per interface with JTI network device monitoring.

Table 6: Metrics Available per Interface with JTI Network Device Monitoring

Metric

Unit

Chart

Alarm

junos.system.linecard.interface.egress_errors.if_errors

errors/s

x

x

junos.system.linecard.interface.egress_errors.if_discard

discards/s

x

x

junos.system.linecard.interface.egress_stats.if_1sec_pkts

packets/s

x

x

junos.system.linecard.interface.egress_stats.if_octets

octets/s

x

x

junos.system.linecard.interface.egress_stats.if_mc_pkts

packets/s

x

x

junos.system.linecard.interface.egress_stats.if_bc_pkts

packets/s

x

x

junos.system.linecard.interface.egress_stats.if_1sec_octets

octets/s

x

x

junos.system.linecard.interface.egress_stats.if_pkts

packets/s

x

x

junos.system.linecard.interface.egress_stats.if_uc_pkts

packets/s

x

x

junos.system.linecard.interface.egress_stats.if_pause_pkts

packets/s

x

x

junos.system.linecard.interface.ingress_errors.if_in_fifo_errors

errors/s

x

x

junos.system.linecard.interface.ingress_errors.if_in_frame_errors

errors/s

x

x

junos.system.linecard.interface.ingress_errors.if_in_l3_incompletes

packets/s

x

x

junos.system.linecard.interface.ingress_errors.if_in_runts

packets/s

x

x

junos.system.linecard.interface.ingress_errors.if_errors

errors/s

x

x

junos.system.linecard.interface.ingress_errors.if_in_l2chan_errors

errors/s

x

x

junos.system.linecard.interface.ingress_errors.if_in_resource_errors

errors/s

x

x

junos.system.linecard.interface.ingress_errors.if_in_qdrops

drops/s

x

x

junos.system.linecard.interface.ingress_errors.if_in_l2_mismatch_timeouts

packets/s

x

x

junos.system.linecard.interface.ingress_stats.if_1sec_pkts

packets/s

x

x

junos.system.linecard.interface.ingress_stats.if_octets

octets/s

x

x

junos.system.linecard.interface.ingress_stats.if_mc_pkts

packets/s

x

x

junos.system.linecard.interface.ingress_stats.if_bc_pkts

packets/s

x

x

junos.system.linecard.interface.ingress_stats.if_1sec_octets

octets/s

x

x

junos.system.linecard.interface.ingress_stats.if_error

errors/s

x

x

junos.system.linecard.interface.ingress_stats.if_pkts

packets/s

x

x

junos.system.linecard.interface.ingress_stats.if_uc_pkts

packets/s

x

x

junos.system.linecard.interface.ingress_stats.if_pause_pkts

packets/s

x

x

Table 7 lists the metrics available per interface queue with JTI network device monitoring.

Table 7: Metrics Available per Interface Queue with JTI Network Device Monitoring

Metric

Unit

Chart

Alarm

junos.system.linecard.interface.egress_queue_info.bytes

bytes/s

x

x

junos.system.linecard.interface.egress_queue_info.packets

packets/s

x

x

junos.system.linecard.interface.egress_queue_info.allocated_buffer_size

bytes

x

x

junos.system.linecard.interface.egress_queue_info.avg_buffer_occupancy

bytes

x

x

junos.system.linecard.interface.egress_queue_info.cur_buffer_occupancy

bytes

x

x

junos.system.linecard.interface.egress_queue_info.peak_buffer_occupancy

bytes

x

x

junos.system.linecard.interface.egress_queue_info.red_drop_bytes

bytes/s

x

x

junos.system.linecard.interface.egress_queue_info.red_drop_packets

packets/s

x

x

junos.system.linecard.interface.egress_queue_info.rl_drop_bytes

bytes/s

x

x

junos.system.linecard.interface.egress_queue_info.rl_drop_packets

packets/s

x

x

junos.system.linecard.interface.egress_queue_info.tail_drop_packets

packets/s

x

x

OpenContrail vRouter on a Host

Table 8 lists raw metrics available for an OpenContrail vRouter on a host.

Table 8: Raw Metrics for OpenContrail vRouter

Metric

Chart

Alarm

plugin.contrail.vrouter.aged_flows

x

x

plugin.contrail.vrouter.total_flows

x

x

plugin.contrail.vrouter.exception_packets

x

x

plugin.contrail.vrouter.drop_stats_flow_queue_limit_exceeded

x

x

plugin.contrail.vrouter.drop_stats_flow_table_full

x

x

plugin.contrail.vrouter.drop_stats_vlan_fwd_enq

x

x

plugin.contrail.vrouter.drop_stats_vlan_fwd_tx

x

x

plugin.contrail.vrouter.flow_export_drops

x

x

plugin.contrail.vrouter.flow_export_sampling_drops

x

x

plugin.contrail.vrouter.flow_rate_active_flows

x

x

plugin.contrail.vrouter.flow_rate_added_flows

x

x

plugin.contrail.vrouter.flow_rate_deleted_flows

x

x

OpenStack Project in Chart View

Table 9 lists the raw metrics available in the OpenStack Project Chart View.

Table 9: Raw Metrics for OpenStack Project

Metric

Chart

Alarm

openstack.project.active_instances

x

x

openstack.project.vcpus_allocated

x

x

openstack.project.volume_storage_allocated

x

x

openstack.project.memory_allocated

x

x

openstack.project.floating_ip_count

x

openstack.project.security_group_count

x

x

openstack.project.volume_count

x

x

ScaleIO Service

Table 10 lists the raw metrics available for ScaleIO monitoring.

Table 10: Raw Metrics for ScaleIO Monitoring

Metric

Unit

Chart

Alarm

numOfDevices

count

x

x

numOfProtectionDomains

count

x

x

numOfSdc

count

x

x

numOfSds

count

x

x

numOfStoragePools

count

x

x

numOfVtrees

count

x

x

numOfSnapshots

count

x

x

numOfVolumes

count

x

x

numOfThickBaseVolumes

count

x

x

numOfThinBaseVolumes

count

x

x

numOfVolumesInDeletion

count

x

x

numOfMappedToAllVolumes

count

x

x

numOfUnmappedVolumes

count

x

x

capacityAvailableForVolumeAllocationInKb

Kbyte

x

x

capacityInUseInKb

Kbyte

x

x

capacityLimitInKb

Kbyte

x

x

unusedCapacityInKb

Kbyte

x

x

spareCapacityInKb

Kbyte

x

x

protectedCapacityInKb

Kbyte

x

x

maxCapacityInKb

Kbyte

x

x

snapCapacityInUseInKb

Kbyte

x

x

thickCapacityInUseInKb

Kbyte

x

x

thinCapacityInUseInKb

Kbyte

x

x

bckRebuildReadBandwidth

Kbyte/sec

x

x

bckRebuildWriteBandwidth

Kbyte/sec

x

x

fwdRebuildReadBandwidth

Kbyte/sec

x

x

fwdRebuildWriteBandwidth

Kbyte/sec

x

x

normRebuildReadBandwidth

Kbyte/sec

x

x

normRebuildWriteBandwidth

Kbyte/sec

x

x

primaryReadBandwidth

Kbyte/sec

x

x

primaryWriteBandwidth

Kbyte/sec

x

x

rebalanceReadBandwidth

Kbyte/sec

x

x

rebalanceWriteBandwidth

Kbyte/sec

x

x

secondaryReadBandwidth

Kbyte/sec

x

x

secondaryWriteBandwidth

Kbyte/sec

x

x

totalReadBandwidth

Kbyte/sec

x

x

totalWriteBandwidth

Kbyte/sec

x

x

bckRebuildReadIops

IOPS

x

x

bckRebuildWriteIops

IOPS

x

x

fwdRebuildReadIops

IOPS

x

x

fwdRebuildWriteIops

IOPS

x

x

normRebuildReadIops

IOPS

x

x

normRebuildWriteIops

IOPS

x

x

primaryReadIops

IOPS

x

x

primaryWriteIops

IOPS

x

x

rebalanceReadIops

IOPS

x

x

rebalanceWriteIops

IOPS

x

x

secondaryReadIops

IOPS

x

x

secondaryWriteIops

IOPS

x

x

totalReadIops

IOPS

x

x

totalWriteIops

IOPS

x

x

bckRebuildReadIosize

Kbyte

x

x

bckRebuildWriteIosize

Kbyte

x

x

fwdRebuildReadIosize

Kbyte

x

x

fwdRebuildWriteIosize

Kbyte

x

x

normRebuildReadIosize

Kbyte

x

x

normRebuildWriteIosize

Kbyte

x

x

primaryReadIosize

Kbyte

x

x

primaryWriteIosize

Kbyte

x

x

rebalanceReadIosize

Kbyte

x

x

rebalanceWriteIosize

Kbyte

x

x

secondaryReadIosize

Kbyte

x

x

secondaryWriteIosize

Kbyte

x

x

totalReadIosize

Kbyte

x

x

totalWriteIosize

Kbyte

x

x

RabbitMQ Service

Table 11 lists the raw metrics available for RabbitMQ monitoring.

Table 11: Raw Metrics for RabbitMQ Monitoring

Metric

Unit

Chart

Alarm

rabbit.cluster.connection_totals.blocked_connections

count

x

x

rabbit.cluster.connection_totals.blocked_connections_details

messages/s

x

x

rabbit.cluster.message_stats.ack

count

x

x

rabbit.cluster.message_stats.ack_details

messages/s

x

x

rabbit.cluster.message_stats.deliver

count

x

x

rabbit.cluster.message_stats.deliver_details

messages/s

x

x

rabbit.cluster.message_stats.deliver_get

count

x

x

rabbit.cluster.message_stats.deliver_get_details

messages/s

x

x

rabbit.cluster.message_stats.get

count

x

x

rabbit.cluster.message_stats.get_details

messages/s

x

x

rabbit.cluster.message_stats.publish

count

x

x

rabbit.cluster.message_stats.publish_details

messages/s

x

x

rabbit.cluster.message_stats.redeliver

count

x

x

rabbit.cluster.message_stats.redeliver_details

messages/s

x

x

rabbit.cluster.object_totals.channels

count

x

x

rabbit.cluster.object_totals.connections

count

x

x

rabbit.cluster.object_totals.consumers

count

x

x

rabbit.cluster.object_totals.exchanges

count

x

x

rabbit.cluster.object_totals.queues

count

x

x

rabbit.cluster.queue_totals.blocked_queues

count

x

x

rabbit.cluster.queue_totals.blocked_queues_details

messages/s

x

x

rabbit.cluster.queue_totals.consumer_utilisation_percent

count

x

x

rabbit.cluster.queue_totals.messages

count

x

x

rabbit.cluster.queue_totals.messages_details

messages/s

x

x

rabbit.cluster.queue_totals.messages_ready

count

x

x

rabbit.cluster.queue_totals.messages_ready_details

messages/s

x

x

rabbit.cluster.queue_totals.messages_unacknowledged

count

x

x

rabbit.cluster.queue_totals.messages_unacknowledged_details

messages/s

x

x

rabbit.queue.consumers

count

x

rabbit.queue.consumer_utilisation

count

x

rabbit.queue.messages

count

x

rabbit.queue.messages_ready

count

x

rabbit.queue.messages_ready_detail

count

x

rabbit.queue.memory

count

x

rabbit.queue.messages_detail

count

x

rabbit.queue.messages_unacknowledged

count

x

rabbit.queue.messages_unacknowledged_detail

count

x

rabbit.queue.state

count

x

rabbit.node.sockets_total

count

x

x

rabbit.node.fd_total

count

x

x

rabbit.node.sockets_used_percent

count

x

x

rabbit.node.run_queue

count

x

x

rabbit.node.proc_used_percent

count

x

x

rabbit.node.proc_total

count

x

x

rabbit.node.mem_used_percent

count

x

x

rabbit.node.uptime

count

x

x

rabbit.node.disk_usage_ratio

count

x

x

rabbit.node.disk_free_alarm

count

x

x

rabbit.node.fd_used_percent

count

x

x

rabbit.node.mem_limit

count

x

x

rabbit.node.mem_alarm

count

x

x

rabbit.node.disk_free

count

x

x

rabbit.node.sockets_used

count

x

x

rabbit.node.processors

count

x

x

rabbit.node.running

count

x

x

rabbit.node.disk_free_limit

count

x

x

rabbit.node.fd_used

count

x

x

rabbit.node.proc_used

count

x

x

rabbit.node.mem_used

count

x

x

rabbit.node.heartbeat

count

x

x

rabbit.node.latency

count

x

x