Capsd may reparent duplicate interfaces from requisitioned nodes

Description

This client's network contains many duplicate IPv4 addresses in the RFC1918 ranges, most of which cannot be eliminated because they are in use by customers. Unsurprisingly, Capsd has a hard time with some of their nodes. One IP address in particular, 192.168.254.54, is present but not reachable on many managed customer-premise nodes. The client also has a core switch with this address and wants to manage that switch with OpenNMS. Discovery, of course, ignores the address since it's already in the DB. Sending a manual newSuspect event does trigger Capsd to do a suspect scan, which completes; once the SNMP interface scan phase finishes, though, Capsd decides that it should not create a new node because (you guessed it) that address is already in the DB.

To get out of this quagmire, we created a requisition yesterday and imported it. The client was happy with the result and wants to use this approach to deal with all problem addresses that fit the same pattern (there's at least a /24 of them, maybe more). When we arrived this morning, though, the node we created still existed but its one IP interface (192.168.254.54) had disappeared overnight. Forcing a rescan on the requisitioned node brought back the IP interface, so I went digging in the DB to work out when and how it had disappeared. I found a duplicateNodeDeleted event from about four hours after we first imported the new requisition, with an event-source indicating it came from Capsd.

I've tracked down the few code paths that generate this type of event and I think it must be happening in org.opennms.netmgt.capsd.RescanProcessor.updateInterface(...). Since the node from which the interface is being reparented has a foreign-source name, Capsd should not be doing this. Right?

The client will be moving to an entirely requisition-driven way of provisioning, but the data in their provisioning DB is not yet of sufficient quality to support it, so for now they're using mostly Capsd with a few requisitions.

Environment

App server: RHEL 5.5 on x86_64. See attached system report output. DB server: [root@opennmsdb ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.5 (Tikanga) [root@opennmsdb ~]# uname -a Linux opennmsdb 2.6.18-194.3.1.el5 #1 SMP Sun May 2 04:17:42 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux [root@opennmsdb ~]# egrep -c '^processor' /proc/cpuinfo 16 [root@opennmsdb ~]# rpm -q postgresql84 postgresql84-8.4.5-1.el5_5.1

Acceptance / Success Criteria

None

Attachments

1

Lucidchart Diagrams

Activity

Show:

Seth Leger June 28, 2011 at 11:36 AM

These changes were merged into the code back in early May. Not sure if they made it into the release but they will be in the next release of 1.8/1.9.

Seth Leger May 9, 2011 at 5:30 PM

I checked in a proposed fix for this earlier today. After we have a chance to test it out, we will merge the code into the release branches for the next versions of 1.8 and 1.9 (if we can test it before those versions go out).

commit 88495f6891263ac7e6a687f7845c658945bff130

Seth Leger May 9, 2011 at 11:20 AM

There are several SQL queries in the RescanProcessor that do not ignore provisioned nodes. I should be able to have an update that will fix this shortly.

Matt Brozowski May 6, 2011 at 1:04 PM

We should probably have capsd's RescanProcessor make sure that it only reparents interfaces that are not from provisioned nodes (that is have a non null foreign source)

Jeff Gehlbach May 6, 2011 at 1:03 PM

I've put a database dump at mail1.opennms.com:~jeffg/NMS-4663_opennms-db-201105050934Z.pgdump to assist in debugging / reproducing the issue.

Fixed

Details

Assignee

Reporter

Labels

Components

Fix versions

Affects versions

Priority

PagerDuty

Created May 5, 2011 at 6:15 AM
Updated January 27, 2017 at 4:20 PM
Resolved June 28, 2011 at 11:36 AM