Navigation
Guide That Contains This Content
[+] Expand All
[-] Collapse All

    Metrics

    A metric is a measured value for an element in the infrastructure. AppFormix Agent collects and calculates metrics for hosts and instances. AppFormix metrics are organized into hierarchical categories based on the type of metric.

    Some metrics are percentages of total capacity. In such cases, the category of the metric determines the total capacity by which the percentage is computed. For instance, host.cpu.usage indicates the percentage of CPU consumed relative to the total CPU available on a host. In contrast, instance.cpu.usage is the percentage of CPU consumed relative to the total CPU available to an instance. As an example, consider an instance that is using 50% of one core on a host with 20 cores. The instance's host.cpu.usage will be 2.5%. If the instance has been allocated two cores, then its instance.cpu.usage will be 25%.

    Alarms can be configured for any metric. Many metrics can also be displayed in charts. When an alarm triggers for a metric, the alarm is plotted on charts at the time of the event. In this way, metrics that cannot be plotted directly as a chart are still visually correlated in time with other metrics.

    AppFormix Agent collects both raw metrics and calculated metrics. Raw metrics are values read directly from the underlying infrastructure. Calculated metrics are metrics that AppFormix Agent derives from raw metrics.

    Hosts

    Table 1 lists the raw metrics available for hosts.

    Table 1: Raw Metrics for Hosts

    Metric

    Chart

    Alarm

    host.cpu.io_wait

    x

    x

    host.cpu.ipc **

    x

    x

    host.cpu.l3_cache.miss **

    x

    x

    host.cpu.l3_cache.usage **

    x

    x

    host.cpu.mem_bw.local **

    x

    x

    host.cpu.mem_bw.remote **

    x

    x

    host.cpu.mem_bw.total **

    x

    x

    host.cpu.usage

    x

    x

     

    host.disk.io.read

    x

    x

    host.disk.io.write

    x

    x

    host.disk.response_time

    x

    x

    host.disk.read_response_time

    x

    x

    host.disk.write_response_time

    x

    x

    host.disk.smart.hdd.command_timeout

     

    x

    host.disk.smart.hdd.current_pending_sector_count

     

    x

    host.disk.smart.hdd.offline_uncorrectable

     

    x

    host.disk.smart.hdd.reallocated_sector_count

     

    x

    host.disk.smart.hdd.reported_uncorrectable_errors

     

    x

    host.disk.smart.ssd.available_reserved_space

     

    x

    host.disk.smart.ssd.media_wearout_indicator

     

    x

    host.disk.smart.ssd.reallocated_sector_count

     

    x

    host.disk.smart.ssd.wear_leveling_count

     

    x

    host.disk.usage.bytes

    x

    x

    host.disk.usage.percent

    x

    x

     

    host.memory.usage

    x

    x

    host.memory.swap.usage

    x

    x

    host.memory.dirty.rate

    x

    x

    host.memory.page_fault.rate

    x

    x

    host.memory.page_in_out.rate

    x

    x

     

    host.network.egress.bit_rate

    x

    x

    host.network.egress.drops

    x

    x

    host.network.egress.errors

    x

    x

    host.network.egress.packet_rate

    x

    x

    host.network.ingress.bit_rate

    x

    x

    host.network.ingress.drops

    x

    x

    host.network.ingress.errors

    x

    x

    host.network.ingress.packet_rate

    x

    x

    host.network.ipv4tables.rule_count

    x

    x

    host.network.ipv6tables.rule_count

    x

    x

     

    openstack.host.disk_allocated

    x

    x

    openstack.host.memory_allocated

    x

    x

    openstack.host.vcpus_allocated

    x

    x

    Note: ** CPU cache and memory bandwidth metrics are available for Intel© Xeon© processor family with Intel© Resource Directory Technology. The AppFormix software automatically detects the processor family and makes the additional metrics available for display and analysis.

    Table 2 lists the calculated metrics available for hosts.

    Table 2: Calculated Metrics for Hosts

    Metric

    Chart

    Alarm

    host.cpu.normalized_load_1m

    x

    x

    host.cpu.normalized_load_5m

    x

    x

    host.cpu.normalized_load_15m

    x

    x

    host.cpu.temperature

     

    x

     

    host.disk.smart.predict_failure

     

    x

     

    host.heartbeat

     

    x

    host.cpu.normalized_loadNormalized load is calculated as a ratio of the number of running and ready-to-run threads to the number of CPU cores. This family of metrics indicate the level of demand for CPU. If the value exceeds 1, then more threads are ready to run than exists CPU cores to perform the execution. Normalized load is a provided as an average over 1-minute, 5-minute, and 15-minute intervals.
    host.cpu.temperatureCPU temperature is derived from multiple temperature sensors in the processor(s) and chassis. This temperature provides a general indicator of temperature in degrees Celsius inside a physical host.
    host.disk.smart.predict_failureAppFormix Agent calculates predict_failure using multiple S.M.A.R.T. counters provided by disk hardware. The agent will set predict_failure to true (value=1) when it determines from a combination of S.M.A.R.T. counters that a disk is likely to fail. An alarm triggered for this metric contains the disk identifier in the metadata.
    host.heartbeatThe host.heartbeat indicates if AppFormix Agent is functioning on a host. AppFormix Controller periodically checks the status of each host by making a status request to AppFormix Agent. The host.heartbeat metric is incremented for each successful response. Alarms can be configured to detect missed heartbeats over a given interval.

    Instances

    Table 3 lists the raw metrics available for instances.

    Table 3: Raw Metrics for Instances

    Metric

    Chart

    Alarm

    instance.cpu.usage

    x

    x

    instance.cpu.ipc **

    x

    x

    instance.cpu.l3_cache.miss **

    x

    x

    instance.cpu.l3_cache.usage **

    x

    x

    instance.cpu.mem_bw.local **

    x

    x

    instance.cpu.mem_bw.remote **

    x

    x

    instance.cpu.mem_bw.total **

    x

    x

     

    instance.disk.io.read_bandwidth

    x

    x

    instance.disk.io.read_iops

    x

    x

    instance.disk.io.read_iosize

    x

    x

    instance.disk.io.read_response_time

    x

    x

    instance.disk.io.write_bandwidth

    x

    x

    instance.disk.io.write_iops

    x

    x

    instance.disk.io.write_iosize

    x

    x

    instance.disk.io.write_response_time

    x

    x

    instance.disk.usage.bytes

    x

    x

    instance.disk.usage.percentage

    x

    x

     

    instance.memory.usage

    x

    x

     

    instance.network.egress.bit_rate

    x

    x

    instance.network.egress.drops

    x

    x

    instance.network.egress.errors

    x

    x

    instance.network.egress.packet_rate

    x

    x

    instance.network.egress.total_bytes

    x

    x

    instance.network.egress.total_packets

    x

    x

    instance.network.ingress.bit_rate

    x

    x

    instance.network.ingress.drops

    x

    x

    instance.network.ingress.errors

    x

    x

    instance.network.ingress.packet_rate

    x

    x

    instance.network.ingress.total_bytes

    x

    x

    instance.network.ingress.total_packets

    x

    x

    Note: ** CPU cache and memory bandwidth metrics are available for Intel© Xeon© processor family with Intel© Resource Directory Technology. The AppFormix software automatically detects the processor family and makes the additional metrics available for display and analysis.

    Table 4 lists the calculated metric available for instances.

    Table 4: Calculated Metrics for Instances

    Metric

    Chart

    Alarm

    instance.heartbeat

     

    x

    instance.heartbeatThe instance.heartbeat indicates whether an instance is running. AppFormix Agent periodically checks the state of host processes associated with each instance. The instance.heartbeat metric is incremented for each successful status check. Alarms may be configured to detect missed heartbeats over a given interval.

    Network Device

    AppFormix can collect network device metrics using SNMP or Juniper Telemetry Interface (JTI). See Network Devices for details.

    Table 5 lists the metrics available per interface with SNMP network device monitoring.

    Table 5: Metrics Available per Interface with SNMP Network Device Monitoring

    Metric

    Unit

    Chart

    Alarm

    snmp.interface.out_discards

    discards/s

    x

    x

    snmp.interface.in_discards

    discards/s

    x

    x

    snmp.interface.in_errors

    errors/s

    x

    x

    snmp.interface.out_unicast_packets

    packets/s

    x

    x

    snmp.interface.in_octets

    octets/s

    x

    x

    snmp.interface.in_unicast_packets

    packets/s

    x

    x

    snmp.interface.out_packet_queue_length

    count

    x

    x

    snmp.interface.speed

    bits/s

    x

    x

    snmp.interface.out_octets

    octets/s

    x

    x

    snmp.interface.in_unknown_protocol

    packets/s

    x

    x

    snmp.interface.in_non_unicast_packets

    packets/s

    x

    x

    snmp.interface.out_errors

    errors/s

    x

    x

    snmp.interface.out_non_unicast_packets

    packets/s

    x

    x

    Table 6 lists the metrics available per interface with JTI network device monitoring.

    Table 6: Metrics Available per Interface with JTI Network Device Monitoring

    Metric

    Unit

    Chart

    Alarm

    junos.system.linecard.interface.egress_errors.if_errors

    errors/s

    x

    x

    junos.system.linecard.interface.egress_errors.if_discard

    discards/s

    x

    x

    junos.system.linecard.interface.egress_stats.if_1sec_pkts

    packets/s

    x

    x

    junos.system.linecard.interface.egress_stats.if_octets

    octets/s

    x

    x

    junos.system.linecard.interface.egress_stats.if_mc_pkts

    packets/s

    x

    x

    junos.system.linecard.interface.egress_stats.if_bc_pkts

    packets/s

    x

    x

    junos.system.linecard.interface.egress_stats.if_1sec_octets

    octets/s

    x

    x

    junos.system.linecard.interface.egress_stats.if_pkts

    packets/s

    x

    x

    junos.system.linecard.interface.egress_stats.if_uc_pkts

    packets/s

    x

    x

    junos.system.linecard.interface.egress_stats.if_pause_pkts

    packets/s

    x

    x

    junos.system.linecard.interface.ingress_errors.if_in_fifo_errors

    errors/s

    x

    x

    junos.system.linecard.interface.ingress_errors.if_in_frame_errors

    errors/s

    x

    x

    junos.system.linecard.interface.ingress_errors.if_in_l3_incompletes

    packets/s

    x

    x

    junos.system.linecard.interface.ingress_errors.if_in_runts

    packets/s

    x

    x

    junos.system.linecard.interface.ingress_errors.if_errors

    errors/s

    x

    x

    junos.system.linecard.interface.ingress_errors.if_in_l2chan_errors

    errors/s

    x

    x

    junos.system.linecard.interface.ingress_errors.if_in_resource_errors

    errors/s

    x

    x

    junos.system.linecard.interface.ingress_errors.if_in_qdrops

    drops/s

    x

    x

    junos.system.linecard.interface.ingress_errors.if_in_l2_mismatch_timeouts

    packets/s

    x

    x

    junos.system.linecard.interface.ingress_stats.if_1sec_pkts

    packets/s

    x

    x

    junos.system.linecard.interface.ingress_stats.if_octets

    octets/s

    x

    x

    junos.system.linecard.interface.ingress_stats.if_mc_pkts

    packets/s

    x

    x

    junos.system.linecard.interface.ingress_stats.if_bc_pkts

    packets/s

    x

    x

    junos.system.linecard.interface.ingress_stats.if_1sec_octets

    octets/s

    x

    x

    junos.system.linecard.interface.ingress_stats.if_error

    errors/s

    x

    x

    junos.system.linecard.interface.ingress_stats.if_pkts

    packets/s

    x

    x

    junos.system.linecard.interface.ingress_stats.if_uc_pkts

    packets/s

    x

    x

    junos.system.linecard.interface.ingress_stats.if_pause_pkts

    packets/s

    x

    x

    Table 7 lists the metrics available per interface queue with JTI network device monitoring.

    Table 7: Metrics Available per Interface Queue with JTI Network Device Monitoring

    Metric

    Unit

    Chart

    Alarm

    junos.system.linecard.interface.egress_queue_info.bytes

    bytes/s

    x

    x

    junos.system.linecard.interface.egress_queue_info.packets

    packets/s

    x

    x

    junos.system.linecard.interface.egress_queue_info.allocated_buffer_size

    bytes

    x

    x

    junos.system.linecard.interface.egress_queue_info.avg_buffer_occupancy

    bytes

    x

    x

    junos.system.linecard.interface.egress_queue_info.cur_buffer_occupancy

    bytes

    x

    x

    junos.system.linecard.interface.egress_queue_info.peak_buffer_occupancy

    bytes

    x

    x

    junos.system.linecard.interface.egress_queue_info.red_drop_bytes

    bytes/s

    x

    x

    junos.system.linecard.interface.egress_queue_info.red_drop_packets

    packets/s

    x

    x

    junos.system.linecard.interface.egress_queue_info.rl_drop_bytes

    bytes/s

    x

    x

    junos.system.linecard.interface.egress_queue_info.rl_drop_packets

    packets/s

    x

    x

    junos.system.linecard.interface.egress_queue_info.tail_drop_packets

    packets/s

    x

    x

    OpenContrail vRouter on a Host

    Table 8 lists raw metrics available for an OpenContrail vRouter on a host.

    Table 8: Raw Metrics for OpenContrail vRouter

    Metric

    Chart

    Alarm

    plugin.contrail.vrouter.aged_flows

    x

    x

    plugin.contrail.vrouter.total_flows

    x

    x

    plugin.contrail.vrouter.exception_packets

    x

    x

    plugin.contrail.vrouter.drop_stats_flow_queue_limit_exceeded

    x

    x

    plugin.contrail.vrouter.drop_stats_flow_table_full

    x

    x

    plugin.contrail.vrouter.drop_stats_vlan_fwd_enq

    x

    x

    plugin.contrail.vrouter.drop_stats_vlan_fwd_tx

    x

    x

    plugin.contrail.vrouter.flow_export_drops

    x

    x

    plugin.contrail.vrouter.flow_export_sampling_drops

    x

    x

    plugin.contrail.vrouter.flow_rate_active_flows

    x

    x

    plugin.contrail.vrouter.flow_rate_added_flows

    x

    x

    plugin.contrail.vrouter.flow_rate_deleted_flows

    x

    x

    OpenStack Project in Chart View

    Table 9 lists the raw metrics available in the OpenStack Project Chart View.

    Table 9: Raw Metrics for OpenStack Project

    Metric

    Chart

    Alarm

    openstack.project.active_instances

    x

    x

    openstack.project.vcpus_allocated

    x

    x

    openstack.project.volume_storage_allocated

    x

    x

    openstack.project.memory_allocated

    x

    x

    openstack.project.floating_ip_count

    x

    openstack.project.security_group_count

    x

    x

    openstack.project.volume_count

    x

    x

    ScaleIO Service

    Table 10 lists the raw metrics available for ScaleIO monitoring.

    Table 10: Raw Metrics for ScaleIO Monitoring

    Metric

    Unit

    Chart

    Alarm

    numOfDevices

    count

    x

    x

    numOfProtectionDomains

    count

    x

    x

    numOfSdc

    count

    x

    x

    numOfSds

    count

    x

    x

    numOfStoragePools

    count

    x

    x

    numOfVtrees

    count

    x

    x

    numOfSnapshots

    count

    x

    x

    numOfVolumes

    count

    x

    x

    numOfThickBaseVolumes

    count

    x

    x

    numOfThinBaseVolumes

    count

    x

    x

    numOfVolumesInDeletion

    count

    x

    x

    numOfMappedToAllVolumes

    count

    x

    x

    numOfUnmappedVolumes

    count

    x

    x

    capacityAvailableForVolumeAllocationInKb

    Kbyte

    x

    x

    capacityInUseInKb

    Kbyte

    x

    x

    capacityLimitInKb

    Kbyte

    x

    x

    unusedCapacityInKb

    Kbyte

    x

    x

    spareCapacityInKb

    Kbyte

    x

    x

    protectedCapacityInKb

    Kbyte

    x

    x

    maxCapacityInKb

    Kbyte

    x

    x

    snapCapacityInUseInKb

    Kbyte

    x

    x

    thickCapacityInUseInKb

    Kbyte

    x

    x

    thinCapacityInUseInKb

    Kbyte

    x

    x

    bckRebuildReadBandwidth

    Kbyte/sec

    x

    x

    bckRebuildWriteBandwidth

    Kbyte/sec

    x

    x

    fwdRebuildReadBandwidth

    Kbyte/sec

    x

    x

    fwdRebuildWriteBandwidth

    Kbyte/sec

    x

    x

    normRebuildReadBandwidth

    Kbyte/sec

    x

    x

    normRebuildWriteBandwidth

    Kbyte/sec

    x

    x

    primaryReadBandwidth

    Kbyte/sec

    x

    x

    primaryWriteBandwidth

    Kbyte/sec

    x

    x

    rebalanceReadBandwidth

    Kbyte/sec

    x

    x

    rebalanceWriteBandwidth

    Kbyte/sec

    x

    x

    secondaryReadBandwidth

    Kbyte/sec

    x

    x

    secondaryWriteBandwidth

    Kbyte/sec

    x

    x

    totalReadBandwidth

    Kbyte/sec

    x

    x

    totalWriteBandwidth

    Kbyte/sec

    x

    x

    bckRebuildReadIops

    IOPS

    x

    x

    bckRebuildWriteIops

    IOPS

    x

    x

    fwdRebuildReadIops

    IOPS

    x

    x

    fwdRebuildWriteIops

    IOPS

    x

    x

    normRebuildReadIops

    IOPS

    x

    x

    normRebuildWriteIops

    IOPS

    x

    x

    primaryReadIops

    IOPS

    x

    x

    primaryWriteIops

    IOPS

    x

    x

    rebalanceReadIops

    IOPS

    x

    x

    rebalanceWriteIops

    IOPS

    x

    x

    secondaryReadIops

    IOPS

    x

    x

    secondaryWriteIops

    IOPS

    x

    x

    totalReadIops

    IOPS

    x

    x

    totalWriteIops

    IOPS

    x

    x

    bckRebuildReadIosize

    Kbyte

    x

    x

    bckRebuildWriteIosize

    Kbyte

    x

    x

    fwdRebuildReadIosize

    Kbyte

    x

    x

    fwdRebuildWriteIosize

    Kbyte

    x

    x

    normRebuildReadIosize

    Kbyte

    x

    x

    normRebuildWriteIosize

    Kbyte

    x

    x

    primaryReadIosize

    Kbyte

    x

    x

    primaryWriteIosize

    Kbyte

    x

    x

    rebalanceReadIosize

    Kbyte

    x

    x

    rebalanceWriteIosize

    Kbyte

    x

    x

    secondaryReadIosize

    Kbyte

    x

    x

    secondaryWriteIosize

    Kbyte

    x

    x

    totalReadIosize

    Kbyte

    x

    x

    totalWriteIosize

    Kbyte

    x

    x

    RabbitMQ Service

    Table 11 lists the raw metrics available for RabbitMQ monitoring.

    Table 11: Raw Metrics for RabbitMQ Monitoring

    Metric

    Unit

    Chart

    Alarm

    rabbit.cluster.connection_totals.blocked_connections

    count

    x

    x

    rabbit.cluster.connection_totals.blocked_connections_details

    messages/s

    x

    x

    rabbit.cluster.message_stats.ack

    count

    x

    x

    rabbit.cluster.message_stats.ack_details

    messages/s

    x

    x

    rabbit.cluster.message_stats.deliver

    count

    x

    x

    rabbit.cluster.message_stats.deliver_details

    messages/s

    x

    x

    rabbit.cluster.message_stats.deliver_get

    count

    x

    x

    rabbit.cluster.message_stats.deliver_get_details

    messages/s

    x

    x

    rabbit.cluster.message_stats.get

    count

    x

    x

    rabbit.cluster.message_stats.get_details

    messages/s

    x

    x

    rabbit.cluster.message_stats.publish

    count

    x

    x

    rabbit.cluster.message_stats.publish_details

    messages/s

    x

    x

    rabbit.cluster.message_stats.redeliver

    count

    x

    x

    rabbit.cluster.message_stats.redeliver_details

    messages/s

    x

    x

    rabbit.cluster.object_totals.channels

    count

    x

    x

    rabbit.cluster.object_totals.connections

    count

    x

    x

    rabbit.cluster.object_totals.consumers

    count

    x

    x

    rabbit.cluster.object_totals.exchanges

    count

    x

    x

    rabbit.cluster.object_totals.queues

    count

    x

    x

    rabbit.cluster.queue_totals.blocked_queues

    count

    x

    x

    rabbit.cluster.queue_totals.blocked_queues_details

    messages/s

    x

    x

    rabbit.cluster.queue_totals.consumer_utilisation_percent

    count

    x

    x

    rabbit.cluster.queue_totals.messages

    count

    x

    x

    rabbit.cluster.queue_totals.messages_details

    messages/s

    x

    x

    rabbit.cluster.queue_totals.messages_ready

    count

    x

    x

    rabbit.cluster.queue_totals.messages_ready_details

    messages/s

    x

    x

    rabbit.cluster.queue_totals.messages_unacknowledged

    count

    x

    x

    rabbit.cluster.queue_totals.messages_unacknowledged_details

    messages/s

    x

    x

    rabbit.queue.consumers

    count

    x

    rabbit.queue.consumer_utilisation

    count

    x

    rabbit.queue.messages

    count

    x

    rabbit.queue.messages_ready

    count

    x

    rabbit.queue.messages_ready_detail

    count

    x

    rabbit.queue.memory

    count

    x

    rabbit.queue.messages_detail

    count

    x

    rabbit.queue.messages_unacknowledged

    count

    x

    rabbit.queue.messages_unacknowledged_detail

    count

    x

    rabbit.queue.state

    count

    x

    rabbit.node.sockets_total

    count

    x

    x

    rabbit.node.fd_total

    count

    x

    x

    rabbit.node.sockets_used_percent

    count

    x

    x

    rabbit.node.run_queue

    count

    x

    x

    rabbit.node.proc_used_percent

    count

    x

    x

    rabbit.node.proc_total

    count

    x

    x

    rabbit.node.mem_used_percent

    count

    x

    x

    rabbit.node.uptime

    count

    x

    x

    rabbit.node.disk_usage_ratio

    count

    x

    x

    rabbit.node.disk_free_alarm

    count

    x

    x

    rabbit.node.fd_used_percent

    count

    x

    x

    rabbit.node.mem_limit

    count

    x

    x

    rabbit.node.mem_alarm

    count

    x

    x

    rabbit.node.disk_free

    count

    x

    x

    rabbit.node.sockets_used

    count

    x

    x

    rabbit.node.processors

    count

    x

    x

    rabbit.node.running

    count

    x

    x

    rabbit.node.disk_free_limit

    count

    x

    x

    rabbit.node.fd_used

    count

    x

    x

    rabbit.node.proc_used

    count

    x

    x

    rabbit.node.mem_used

    count

    x

    x

    rabbit.node.heartbeat

    count

    x

    x

    rabbit.node.latency

    count

    x

    x

    Modified: 2018-05-23