The threshold processor doesn't work well with complex JEXL Expressions

Description

In theory, the correct way to create a threshold expression for the In/Out Octets from IF-MIB is the following:

For 5 minutes of data collection interval, any SNMP interface where its speed is greater than 100Mbps (FastEthernet) may wrap, so the threshold expression should use the High Capacity counters (from ifXTable), instead of the standard 32-bits counters (from ifTable). In this case, if the speed is less or equal than 100Mbps the 32-bits counters can be used.

This is important to know because not all the SNMP Agents supports ifXTable.

But, if the data collection interval is 15 minutes (instead of 5 minutes), the FastEthernet may wrap between the collection interval, so the rule changes. Now, if the speed is greater than 10Mbps, the 64-bits counters must be used.

It seems that the following simple formulas can work for each case:

Speed <= 100Mbps (@5min):

utilization = ((ifInOctets * 8) / ifSpeed) * 100

Speed > 100Mbps (@5min):

utilization = ((ifHCInOctets * 8) / (ifHighSpeed * 1000000)) * 100

The ifHighSpeed can be used only on interfaces where its speed is greater or equals than 1 Mbps.

The question is: how can I know which formula must be used on any environment ?

Well, considering the limitation of the ifHighSpeed, we can try the following:

utilization = ((ifHCInOctets * 8) / ifSpeed) * 100

But, ifSpeed is a 32-bits gauge, so that is not going to work on interfaces where its speed is greater than 4Gbps.

The ultimate solution is to use something like the following:

ifSpeed > 0 and ifSpeed < 100000000 ? ((ifInOctets * 8 / ifSpeed) * 100) : (ifHighSpeed > 0 ? (((ifHCInOctets * 8) / (ifHighSpeed * 1000000)) * 100) : 0)

So, if the ifSpeed is greater than 0 and less than 100 Mbps, use the 32-bits version of the formula, otherwise use the 64-bits version of the formula if the ifHighSpeed is not 0 (i.e., interfaces where the speed is less than 1 Mbps), otherwise return 0.

In case of 15min data collection, replace the first comparison to 10Mbps instead of 100Mbps.

That formula is 100% valid and is going to work fine with a standalone Apache JEXL.

Problem:

The ExpressionConfigWrapper (which uses Apache JEXL to evaluate the expression) has a hack to determinate the variables that must be used on the expression. This hack doesn't work with complex expressions with conditionals, so the complex formula is not going to work with the current versions of OpenNMS.

This hack is not necessary anymore with JEXL 2.1, because the ExpressionImpl also implements the Script interface which contains a method called getVariables, that provides the required information.

So, the idea is to upgrade JEXL from 2.0.1 to 2.1.1 and modify the implementation of ExpressionConfigWrapper to provide a better functionality.

The other benefit of that is that the insane amount of warning on tomcat-internal.log because of the hack on JEXL won't be a problem anymore.

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Alejandro Galue October 17, 2013 at 3:35 PM

Fixed on revision cf9b310334aa5b00ae05c7e538559d346753ede0 for 1.12

Alejandro Galue October 17, 2013 at 12:25 PM
Edited

The solution still requires another feature:

Be able to evaluate the expression even if some datasources are not available or doesn't exist on the resource.

The reason for this is the following:

On Cisco devices (probably because of an old Cisco IOS version), the HC capacity counters for In/Out octets not supported (i.e, the SNMP agent doesn't return any data for them). So, if I want to evaluate the expression on slow frame-relay links, that won't be possible because ifHCInOctets and ifHCOutOctets are not available, even if I know that those metrics won't be used.

The solution is to add an optional boolean parameter called relaxed (where its default will be false) to the Basethresholddef class (i.e, common to the thresholds definition), that will be used by the ThresholdingSet class to determinate if the evaluation of the threshold must be strict or not.

By default relaxed=false, so all the parameters must be available. But, if related=true, the evaluation will be performed even if not all the parameters are available. Of course an exception could be thrown if the parameters are required, so in this case, a warning message will be displayed on the logs, but the operation of the threshold processor and Collectd will continue.

BTW, the parameter could be called strict (default true), if that fits better the use case.

Fixed

Details

Assignee

Reporter

Labels

Components

Fix versions

Affects versions

Priority

PagerDuty

Created October 16, 2013 at 5:32 PM
Updated January 27, 2017 at 4:21 PM
Resolved October 17, 2013 at 3:35 PM

Flag notifications