Details
-
Type:
Bug
-
Status: Resolved (View Workflow)
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: 14.0.3, Meridian-2015.1.0, 15.0.2, 16.0.2
-
Fix Version/s: 16.0.3, Meridian-2015.1.1, 17.0.0
-
Component/s: Data Output - RRD
-
Security Level: Default (Default Security Scheme)
-
Labels:
Description
Here is how to reproduce the problem.
On my CentOS 6.7 VM running Meridian 2015.1.0, I configured the local snmpd.conf with this:
extend sample /etc/snmp/counter.pl
The script that simulates the counter is very simple:
#!/usr/bin/perl use strict; my $data = 0; my $source_file = "/tmp/.counter.data"; if (-e $source_file) { open READ, $source_file; $data = <READ>; close READ; } $data += int(rand(100)); print $data, "\n"; open WRITE, ">$source_file" or die "Can't write data on $source_file\n"; print WRITE $data; close WRITE;
If you execute the script it will always return an increasing number:
[agalue@centos6srv ~]$ /etc/snmp/counter.pl 403 [agalue@centos6srv ~]$ /etc/snmp/counter.pl 480 [agalue@centos6srv ~]$ /etc/snmp/counter.pl 562
In OpenNMS, I have a file called /opt/opennms/etc/datacollection/sample.xml with the following content:
<datacollection-group name="Sample"> <group name="sample" ifType="ignore"> <mibObj oid=".1.3.6.1.4.1.8072.1.3.2.4.1.2.6.115.97.109.112.108.101" instance="1" alias="sample" type="counter"/> </group> <systemDef name="Net-SNMP Counter Sample"> <sysoidMask>.1.3.6.1.4.1.8072.3.2.</sysoidMask> <collect> <includeGroup>sample</includeGroup> </collect> </systemDef> </datacollection-group>
Of course, I've added a reference to it on datacollection-config.xml:
[agalue@centos6srv ~]$ grep Sample /opt/opennms/etc/datacollection-config.xml
<include-collection dataCollectionGroup="Sample"/>
Now, let's try the OID using snmp-request (I use snmp-request, a clone of Net-SNMP's tools based on SNMP4J, because the snmpwalk command is not as good as SNMP4J in terms of the SNMP Protocol implementation):
[agalue@centos6srv ~]$ /opt/opennms/bin/snmp-request -v 2c -c public -Ow localhost .1.3.6.1.4.1.8072.1.3.2 | grep 6.115.97.109.112.108.101 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 1.3.6.1.4.1.8072.1.3.2.2.1.2.6.115.97.109.112.108.101 = /etc/snmp/counter.pl 1.3.6.1.4.1.8072.1.3.2.2.1.3.6.115.97.109.112.108.101 = 1.3.6.1.4.1.8072.1.3.2.2.1.4.6.115.97.109.112.108.101 = 1.3.6.1.4.1.8072.1.3.2.2.1.5.6.115.97.109.112.108.101 = 5 1.3.6.1.4.1.8072.1.3.2.2.1.6.6.115.97.109.112.108.101 = 1 1.3.6.1.4.1.8072.1.3.2.2.1.7.6.115.97.109.112.108.101 = 1 1.3.6.1.4.1.8072.1.3.2.2.1.20.6.115.97.109.112.108.101 = 4 1.3.6.1.4.1.8072.1.3.2.2.1.21.6.115.97.109.112.108.101 = 1 1.3.6.1.4.1.8072.1.3.2.3.1.1.6.115.97.109.112.108.101 = 1102 1.3.6.1.4.1.8072.1.3.2.3.1.2.6.115.97.109.112.108.101 = 1102 1.3.6.1.4.1.8072.1.3.2.3.1.3.6.115.97.109.112.108.101 = 1 1.3.6.1.4.1.8072.1.3.2.3.1.4.6.115.97.109.112.108.101 = 0 1.3.6.1.4.1.8072.1.3.2.4.1.2.6.115.97.109.112.108.101.1 = 1102
The last entry is what we have on the datacollection-group.
Now, let's see how Collectd is handling that data:
[agalue@centos6srv logs]$ grep "Visiting attribute.*sample" collectd.log 2015-08-13 11:14:51,886 DEBUG [Collectd-Thread-2-of-50] o.o.n.c.s.AbstractCollectionAttribute: Visiting attribute node[1].sample [.1.3.6.1.4.1.8072.1.3.2.4.1.2.6.115.97.109.112.108.101] = 1559 2015-08-13 11:15:23,005 DEBUG [Collectd-Thread-3-of-50] o.o.n.c.s.AbstractCollectionAttribute: Visiting attribute node[1].sample [.1.3.6.1.4.1.8072.1.3.2.4.1.2.6.115.97.109.112.108.101] = 1625 2015-08-13 11:15:54,132 DEBUG [Collectd-Thread-4-of-50] o.o.n.c.s.AbstractCollectionAttribute: Visiting attribute node[1].sample [.1.3.6.1.4.1.8072.1.3.2.4.1.2.6.115.97.109.112.108.101] = 1654 2015-08-13 11:16:24,396 DEBUG [Collectd-Thread-5-of-50] o.o.n.c.s.AbstractCollectionAttribute: Visiting attribute node[1].sample [.1.3.6.1.4.1.8072.1.3.2.4.1.2.6.115.97.109.112.108.101] = 1705
[agalue@centos6srv logs]$ grep "updating RRD.*sample" collectd.log 2015-08-13 11:14:51,932 INFO [Collectd-Thread-2-of-50] o.o.n.r.RrdUtils: updateRRD: updating RRD file /opt/opennms/share/rrd/snmp/1/sample.rrd with values '1439478892:1559.0' 2015-08-13 11:15:23,006 INFO [Collectd-Thread-3-of-50] o.o.n.r.RrdUtils: updateRRD: updating RRD file /opt/opennms/share/rrd/snmp/1/sample.rrd with values '1439478923:1625.0' 2015-08-13 11:15:54,132 INFO [Collectd-Thread-4-of-50] o.o.n.r.RrdUtils: updateRRD: updating RRD file /opt/opennms/share/rrd/snmp/1/sample.rrd with values '1439478954:1654.0' 2015-08-13 11:16:24,396 INFO [Collectd-Thread-5-of-50] o.o.n.r.RrdUtils: updateRRD: updating RRD file /opt/opennms/share/rrd/snmp/1/sample.rrd with values '1439478984:1705.0'
As you can see, the values are stored like float numbers (i.e. numbers with decimals), which is a non valid operation for counters on RRDtool.
Of course, the values are not being stored on the RRDtool file:
[agalue@centos6srv logs]$ /opt/opennms/bin/rrdtool dump /opt/opennms/share/rrd/snmp/1/sample.rrd | grep "2015-08-13 11:1[456]"
<!-- 2015-08-13 11:14:00 EDT / 1439478840 --> <row><v>NaN</v></row>
<!-- 2015-08-13 11:14:30 EDT / 1439478870 --> <row><v>NaN</v></row>
<!-- 2015-08-13 11:15:00 EDT / 1439478900 --> <row><v>NaN</v></row>
<!-- 2015-08-13 11:15:30 EDT / 1439478930 --> <row><v>NaN</v></row>
<!-- 2015-08-13 11:16:00 EDT / 1439478960 --> <row><v>NaN</v></row>
<!-- 2015-08-13 11:16:30 EDT / 1439478990 --> <row><v>NaN</v></row>
Seeing the source code, I found the following:
@Override public String getNumericValue() { if (getValue() == null) { LOG.debug("No data collected for attribute {}. Skipping", this); return null; } else if (getValue().isNumeric()) { return Long.toString(getValue().toLong()); } else { // Check to see if this is a 63-bit counter packed into an octetstring Long value = SnmpUtils.getProtoCounter63Value(getValue()); if (value != null) { return value.toString(); } try { return Double.valueOf(getValue().toString()).toString(); } catch(NumberFormatException e) { LOG.trace("Unable to process data received for attribute {} maybe this is not a number? See bug 1473 for more information. Skipping.", this); if (getValue().getType() == SnmpValue.SNMP_OCTET_STRING) { try { return Long.valueOf(getValue().toHexString(), 16).toString(); } catch(NumberFormatException ex) { LOG.trace("Unable to process data received for attribute {} maybe this is not a number? See bug 1473 for more information. Skipping.", this); } } } return null; } }
The key element here is how isNumeric is implemented:
public boolean isNumeric() { switch (m_value.getSyntax()) { case SMIConstants.SYNTAX_INTEGER: case SMIConstants.SYNTAX_COUNTER32: case SMIConstants.SYNTAX_COUNTER64: case SMIConstants.SYNTAX_TIMETICKS: case SMIConstants.SYNTAX_UNSIGNED_INTEGER32: return true; default: return false; } }
Because the extent feature of Net-SNMP returns the value as string, isNumeric will return false, so it will converted into a double, which is not correct for these case.