Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-7839

Counter variables reported as strings (like Net-SNMP extent) are not stored properly when using RRDtool

    Details

      Description

      Here is how to reproduce the problem.

      On my CentOS 6.7 VM running Meridian 2015.1.0, I configured the local snmpd.conf with this:

      /etc/snmp/snmpd.conf
      extend sample /etc/snmp/counter.pl
      

      The script that simulates the counter is very simple:

      /etc/snmp/counter.pl
      #!/usr/bin/perl
      
      use strict;
      
      my $data = 0;
      my $source_file = "/tmp/.counter.data";
      
      if (-e $source_file) {
         open READ, $source_file;
         $data = <READ>;
         close READ;
      }
      
      $data += int(rand(100));
      print $data, "\n";
      
      open WRITE, ">$source_file" or die "Can't write data on $source_file\n";
      print WRITE $data;
      close WRITE;
      

      If you execute the script it will always return an increasing number:

      [agalue@centos6srv ~]$  /etc/snmp/counter.pl 
      403
      [agalue@centos6srv ~]$  /etc/snmp/counter.pl 
      480
      [agalue@centos6srv ~]$  /etc/snmp/counter.pl 
      562
      

      In OpenNMS, I have a file called /opt/opennms/etc/datacollection/sample.xml with the following content:

      /opt/opennms/etc/datacollection/sample.xml
      <datacollection-group name="Sample">
       <group name="sample" ifType="ignore">
         <mibObj oid=".1.3.6.1.4.1.8072.1.3.2.4.1.2.6.115.97.109.112.108.101" instance="1" alias="sample" type="counter"/>
       </group>
       <systemDef name="Net-SNMP Counter Sample">
         <sysoidMask>.1.3.6.1.4.1.8072.3.2.</sysoidMask>
         <collect>
           <includeGroup>sample</includeGroup>
         </collect>
       </systemDef>
      </datacollection-group>
      

      Of course, I've added a reference to it on datacollection-config.xml:

      [agalue@centos6srv ~]$ grep Sample /opt/opennms/etc/datacollection-config.xml 
           <include-collection dataCollectionGroup="Sample"/>
      

      Now, let's try the OID using snmp-request (I use snmp-request, a clone of Net-SNMP's tools based on SNMP4J, because the snmpwalk command is not as good as SNMP4J in terms of the SNMP Protocol implementation):

      [agalue@centos6srv ~]$ /opt/opennms/bin/snmp-request -v 2c -c public -Ow localhost .1.3.6.1.4.1.8072.1.3.2 | grep 6.115.97.109.112.108.101
      SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
      SLF4J: Defaulting to no-operation (NOP) logger implementation
      SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
      1.3.6.1.4.1.8072.1.3.2.2.1.2.6.115.97.109.112.108.101 = /etc/snmp/counter.pl
      1.3.6.1.4.1.8072.1.3.2.2.1.3.6.115.97.109.112.108.101 = 
      1.3.6.1.4.1.8072.1.3.2.2.1.4.6.115.97.109.112.108.101 = 
      1.3.6.1.4.1.8072.1.3.2.2.1.5.6.115.97.109.112.108.101 = 5
      1.3.6.1.4.1.8072.1.3.2.2.1.6.6.115.97.109.112.108.101 = 1
      1.3.6.1.4.1.8072.1.3.2.2.1.7.6.115.97.109.112.108.101 = 1
      1.3.6.1.4.1.8072.1.3.2.2.1.20.6.115.97.109.112.108.101 = 4
      1.3.6.1.4.1.8072.1.3.2.2.1.21.6.115.97.109.112.108.101 = 1
      1.3.6.1.4.1.8072.1.3.2.3.1.1.6.115.97.109.112.108.101 = 1102
      1.3.6.1.4.1.8072.1.3.2.3.1.2.6.115.97.109.112.108.101 = 1102
      1.3.6.1.4.1.8072.1.3.2.3.1.3.6.115.97.109.112.108.101 = 1
      1.3.6.1.4.1.8072.1.3.2.3.1.4.6.115.97.109.112.108.101 = 0
      1.3.6.1.4.1.8072.1.3.2.4.1.2.6.115.97.109.112.108.101.1 = 1102
      

      The last entry is what we have on the datacollection-group.

      Now, let's see how Collectd is handling that data:

      [agalue@centos6srv logs]$ grep "Visiting attribute.*sample" collectd.log 
      2015-08-13 11:14:51,886 DEBUG [Collectd-Thread-2-of-50] o.o.n.c.s.AbstractCollectionAttribute: Visiting attribute node[1].sample [.1.3.6.1.4.1.8072.1.3.2.4.1.2.6.115.97.109.112.108.101] = 1559
      2015-08-13 11:15:23,005 DEBUG [Collectd-Thread-3-of-50] o.o.n.c.s.AbstractCollectionAttribute: Visiting attribute node[1].sample [.1.3.6.1.4.1.8072.1.3.2.4.1.2.6.115.97.109.112.108.101] = 1625
      2015-08-13 11:15:54,132 DEBUG [Collectd-Thread-4-of-50] o.o.n.c.s.AbstractCollectionAttribute: Visiting attribute node[1].sample [.1.3.6.1.4.1.8072.1.3.2.4.1.2.6.115.97.109.112.108.101] = 1654
      2015-08-13 11:16:24,396 DEBUG [Collectd-Thread-5-of-50] o.o.n.c.s.AbstractCollectionAttribute: Visiting attribute node[1].sample [.1.3.6.1.4.1.8072.1.3.2.4.1.2.6.115.97.109.112.108.101] = 1705
      
      [agalue@centos6srv logs]$ grep "updating RRD.*sample" collectd.log 
      2015-08-13 11:14:51,932 INFO  [Collectd-Thread-2-of-50] o.o.n.r.RrdUtils: updateRRD: updating RRD file /opt/opennms/share/rrd/snmp/1/sample.rrd with values '1439478892:1559.0'
      2015-08-13 11:15:23,006 INFO  [Collectd-Thread-3-of-50] o.o.n.r.RrdUtils: updateRRD: updating RRD file /opt/opennms/share/rrd/snmp/1/sample.rrd with values '1439478923:1625.0'
      2015-08-13 11:15:54,132 INFO  [Collectd-Thread-4-of-50] o.o.n.r.RrdUtils: updateRRD: updating RRD file /opt/opennms/share/rrd/snmp/1/sample.rrd with values '1439478954:1654.0'
      2015-08-13 11:16:24,396 INFO  [Collectd-Thread-5-of-50] o.o.n.r.RrdUtils: updateRRD: updating RRD file /opt/opennms/share/rrd/snmp/1/sample.rrd with values '1439478984:1705.0'
      

      As you can see, the values are stored like float numbers (i.e. numbers with decimals), which is a non valid operation for counters on RRDtool.

      Of course, the values are not being stored on the RRDtool file:

      [agalue@centos6srv logs]$ /opt/opennms/bin/rrdtool dump /opt/opennms/share/rrd/snmp/1/sample.rrd | grep "2015-08-13 11:1[456]"
      			<!-- 2015-08-13 11:14:00 EDT / 1439478840 --> <row><v>NaN</v></row>
      			<!-- 2015-08-13 11:14:30 EDT / 1439478870 --> <row><v>NaN</v></row>
      			<!-- 2015-08-13 11:15:00 EDT / 1439478900 --> <row><v>NaN</v></row>
      			<!-- 2015-08-13 11:15:30 EDT / 1439478930 --> <row><v>NaN</v></row>
      			<!-- 2015-08-13 11:16:00 EDT / 1439478960 --> <row><v>NaN</v></row>
      			<!-- 2015-08-13 11:16:30 EDT / 1439478990 --> <row><v>NaN</v></row>
      

      Seeing the source code, I found the following:

      org.opennms.netmgt.collectd.SnmpAttribute
          @Override
          public String getNumericValue() {
              if (getValue() == null) {
                  LOG.debug("No data collected for attribute {}. Skipping", this);
                  return null;
              } else if (getValue().isNumeric()) {
                  return Long.toString(getValue().toLong());
              } else {
                  // Check to see if this is a 63-bit counter packed into an octetstring
                  Long value = SnmpUtils.getProtoCounter63Value(getValue());
                  if (value != null) {
                      return value.toString();
                  }
      
                  try {
                      return Double.valueOf(getValue().toString()).toString();
                  } catch(NumberFormatException e) {
                      LOG.trace("Unable to process data received for attribute {} maybe this is not a number? See bug 1473 for more information. Skipping.", this);
                      if (getValue().getType() == SnmpValue.SNMP_OCTET_STRING) {
                          try {
                              return Long.valueOf(getValue().toHexString(), 16).toString();
                          } catch(NumberFormatException ex) {
                              LOG.trace("Unable to process data received for attribute {} maybe this is not a number? See bug 1473 for more information. Skipping.", this);
                          }
                      }
                  }
                  return null;
              }
          }
      

      The key element here is how isNumeric is implemented:

      org.opennms.netmgt.snmp.snmp4j.Snmp4JValue
          public boolean isNumeric() {
              switch (m_value.getSyntax()) {
              case SMIConstants.SYNTAX_INTEGER:
              case SMIConstants.SYNTAX_COUNTER32:
              case SMIConstants.SYNTAX_COUNTER64:
              case SMIConstants.SYNTAX_TIMETICKS:
              case SMIConstants.SYNTAX_UNSIGNED_INTEGER32:
                  return true;
              default:
                  return false;
              }
          }
      

      Because the extent feature of Net-SNMP returns the value as string, isNumeric will return false, so it will converted into a double, which is not correct for these case.

        Attachments

          Activity

            People

            • Assignee:
              agalue Alejandro Galue
              Reporter:
              agalue Alejandro Galue
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: