Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-4244

threshd process wrong counter-type SNMP data after SNMP data collection failed or restored

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.8.7
    • Fix Version/s: 1.8.13, 1.9.90, 1.11.0
    • Component/s: Thresholding
    • Security Level: Default (Default Security Scheme)
    • Labels:
      None
    • Environment:
      Centos

      Description

      When SNMP data collection failed or restored, threshd will process wrong counter-type SNMP data for thresholds.

      For example:
      I set a Relativechange threshold to monitor the uplink broadwidth (dsname=ifHCOutOctets/ifHCInOctets)

      It works fine except one problem : when SNMP data collection failed like this

      SNMP data collection on interface 158.205.192.19 failed with 'Timeout retrieving SnmpCollectors for 158.205.192.19 for 158.205.192.19/158.205.192.19: SnmpCollectors for 158.205.192.19: snmpTimeoutError for: 158.205.192.19/158.205.192.19'

      I will get Relativechange threshold notification like this:

      Relative change exceeded for SNMP datasource (ifHCOutOctets*8 - ((ifHCOutOctets *8) % 100000)) / 1000000 + 0.01 on interface 158.205.192.19, parms: ifLabel="Te3_4-00169c046400" ifIndex="32" ifIpAddress="158.205.134.6" label="Te3/4" ds="(ifHCOutOctets*8 - ((ifHCOutOctets *8) % 100000)) / 1000000 + 0.01" value="4044.01" instance="32" previousValue="2027.51" multiplier="1.2"

      Obviously, the value is doubled because Opennms trying to calculate it with the data before the last one. And by the time the SNMP data collection restored, I will get another notification like this:

      Relative change exceeded for SNMP datasource (ifHCOutOctets*8 - ((ifHCOutOctets *8) % 100000)) / 1000000 +0.01 on interface 158.205.192.19, parms: ifLabel="Te3_4-00169c046400" ifIndex="32" ifIpAddress="158.205.134.6" label="Te3/4" ds="(ifHCOutOctets*8 - ((ifHCOutOctets *8) % 100000)) / 1000000 +0.01" value="2060.01" instance="32" previousValue="4044.01" multiplier="0.8"

      I think this is a bug, threshd should make sure to divide the right interval time between two counter-type SNMP data like ifHCOutOctets or ifHCInOctets before it evaluate the thresholds.

      I also find something interesting from the source code at git

      src/main/java/org/opennms/netmgt/threshd/CollectionResourceWrapper.java

      Counter-type SNMP data is processed as following

      private Double getCounterValue(String id, Double current) {
      ...
      return m_localCache.get(id) / m_interval;
      ...
      }

      and m_interval is a constant value from threshd-configuration.xml

      <service name="SNMP" interval="300000" user-defined="false" status="on">

        Attachments

          Activity

            People

            • Assignee:
              agalue Alejandro Galue
              Reporter:
              zign Zign Zhao
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: