Thresholding on HTTP collections is broken

Description

The title says it all. Thresholding packages fail for anything which uses HTTP data collection. Attaching logs so you can see that the value is found and stored via HTTP, but thresholding can't find the attribute. The error comes from CollectionResourceWrapper, so I'm opening it against that component instead of thresholding.

Acceptance / Success Criteria

None

Attachments

1

Lucidchart Diagrams

Activity

Show:

Richard Hesse September 12, 2012 at 5:35 PM

I've verified that the fix works. Thanks!

Alejandro Galue September 11, 2012 at 12:54 PM

Fixed on revision f1a468dbb5dee1426e86382bcc4a5975952a8fdf for 1.10

I've verified the solution through my testing environmen. The new way to generate a unique ID works no matter which kind of collector is being used, and now it is completely independent of the implementation of the toString() method of the CollectionResource interface.

About the Http Collector:

As you can see, the value is not NaN anymore.

Here are some examples for other resource types:

Richard Hesse September 11, 2012 at 12:36 PM

Sorry for getting a little cranky there, but thanks for getting this turned around so quickly! I realize that most OpenNMS opened "issues" are indeed configuration problems, but some of us have filed a half dozen bugs that really were bugs.

Maybe we need some sort of badge or avatar that identifies us "legitimate bug finders" differently than the randoms that open a bug because they just installed the app and can't get notifications working.

Alejandro Galue September 11, 2012 at 12:01 PM

Now I saw more clear the problem:

2012-09-11 11:13:42,060 DEBUG [CollectdScheduler-50 Pool-fiber0] CollectionResourceWrapper: getCounterValue: id=org.opennms.netmgt.collectd.HttpCollector$HttpCollectionResource@1f870b40.my_counter, last=null, current=443.16408755547

The ID used to uniquely identify the metrics for each resource on each node inside the cache used to properly handle counter values is wrong for the HttpCollector. In this particular case it should be something like this:

id=node[3].my_counter

or something like that.

I'm going to fix that soon.

Thanks for report this problem.

Alejandro Galue September 11, 2012 at 11:50 AM

I could reproduce the problem and now I know where is the problem:

I've added a node with the HTTP service using latest snapshot of 1.10.

I've created a perl script and exposed via Apache on the target node. Here is the script:

It will always generate a number that always increase itself (i.e. a counter).

This the the content of the configuration files:

collectd-configuration.xml:

http-datacollection-config.xml:

threshd-configuration.xml:

thresholds.xml:

Here are the logs:

The relevant lines are the following:

Now, the subsequent collections show that the value is increasing, but getCounterValue still return unknown.

I'm going to reproduce the problem through a JUnit test to see where is the error inside getCounterValue.

Fixed

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

PagerDuty

Created August 28, 2012 at 7:18 PM
Updated January 27, 2017 at 4:21 PM
Resolved September 11, 2012 at 12:54 PM