Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-9415

NullPointerException during nodeScan on devices with broken IP-MIB::ipAddressIfIndex

    Details

      Description

      OpenNMS is unable to finish scanning nodes with broken / missing IP-MIB::ipAddressIfIndex entries, and will retry forever without ever marking the node as scanned. This has been confirmed on OpenNMS 19.x and 20.0, but it's probably been the case in older versions as well.

      The underlying cause - in my situation, at least - is a presumably broken SNMP implementation on specific equipment / versions. I'm only interested in the primary SNMP interface's IP address, which I provision through a foreign requisition, so the problem with the missing ipAddressIfIndex data is entirely harmless in my environment.

      Unfortunately, IPAddressTableTracker fails to check whether it received any data from ipAddressIfIndex before attempting to parse the result, and because of this, the node scan is aborted due to a NullPointerException:
      "The Node with Id: 1; ForeignSource: Dev; ForeignId:testdevice has aborted for the following reason: Aborting node scan : Agent failed while scanning the IP address tables : java.lang.NullPointerException"

      I've attached a sample of the relevant snmp data... or rather, the data that is missing, along with the relevant debug logs from provisiond.log.

      I'm also attaching a tiny patch with a quick and easy fix, primarily to pinpoint the problem and offer a solution along with my bug report. If the patch is used as-is, that's perfectly fine by me, since it solves the problem in my dev environment, by adding an extra warning before returning null:

      2017-06-09 20:24:12,919 WARN [DefaultUDPTransportMapping_0.0.0.0/0] o.o.n.p.s.IPAddressTableTracker: BAD AGENT: Device is missing IP-MIB::ipAddressIfIndex. Skipping.
      2017-06-09 20:24:12,919 INFO [DefaultUDPTransportMapping_0.0.0.0/0] o.o.n.p.s.NodeScan: Processing IPAddress table row with ipAddr null

      This way, OpenNMS is able to successfully detect all services, snmp interfaces and finally update the node's lastcapsdpoll timestamp.

      1. IPAddressTableTracker-broken-ipAddressIfIndex.patch
        1.0 kB
        Brynjar Eide
      2. provisiond.log
        3 kB
        Brynjar Eide
      3. snmpwalk-example.txt
        1 kB
        Brynjar Eide

        Activity

        Hide
        jeffg Jeff Gehlbach added a comment -

        Thanks for the contribution! We'll keep an eye out for your OCA.

        Show
        jeffg Jeff Gehlbach added a comment - Thanks for the contribution! We'll keep an eye out for your OCA.
        Hide
        brynjar Brynjar Eide added a comment -

        OCA submitted earlier today.

        I deployed the patch in our production environment about a week ago, and so far I haven't noticed any downsides or side effects.
        We originally had about 1000 devices (from one vendor) that OpenNMS was unable to scan properly due to the NullPointerException, but everything looks good now!

        Show
        brynjar Brynjar Eide added a comment - OCA submitted earlier today. I deployed the patch in our production environment about a week ago, and so far I haven't noticed any downsides or side effects. We originally had about 1000 devices (from one vendor) that OpenNMS was unable to scan properly due to the NullPointerException, but everything looks good now!
        Hide
        indigo Ronny Trommer added a comment -

        Branch and PR created, git sign-off required. https://github.com/OpenNMS/opennms/pull/1549

        Show
        indigo Ronny Trommer added a comment - Branch and PR created, git sign-off required. https://github.com/OpenNMS/opennms/pull/1549
        Hide
        brynjar Brynjar Eide added a comment -

        I've never done any sign-offs before, so I just want to check how to proceed. Do I simply sign-off my original commit by 'git commit --amend --signoff', or do I need to do something else?

        Show
        brynjar Brynjar Eide added a comment - I've never done any sign-offs before, so I just want to check how to proceed. Do I simply sign-off my original commit by 'git commit --amend --signoff', or do I need to do something else?
        Hide
        brynjar Brynjar Eide added a comment -

        I just realised that I only linked to JIRA from Github, and never added a link to the original pull request here in JIRA: https://github.com/OpenNMS/opennms/pull/1533
        My apologies! I believe we may have duplicate pull requests because of this, and that's perhaps the reason why I need to do an extra sign-off?

        In either case, please let me know how to proceed, and I'll try to avoid making even more of a mess with the pull requests.

        Show
        brynjar Brynjar Eide added a comment - I just realised that I only linked to JIRA from Github, and never added a link to the original pull request here in JIRA: https://github.com/OpenNMS/opennms/pull/1533 My apologies! I believe we may have duplicate pull requests because of this, and that's perhaps the reason why I need to do an extra sign-off? In either case, please let me know how to proceed, and I'll try to avoid making even more of a mess with the pull requests.
        Hide
        indigo Ronny Trommer added a comment -

        The target version is not correct. The problem, the PR is made against develop which means it will end up in Horizon 21.0.0. If you want this change in 20.0.1 as set in "Fix Verisons" the PR needs to be changed to the "release-20.0.1" branch or somebody has to cherry-pick the PR to the release 20.0.1 when it is merged to develop.

        Show
        indigo Ronny Trommer added a comment - The target version is not correct. The problem, the PR is made against develop which means it will end up in Horizon 21.0.0. If you want this change in 20.0.1 as set in "Fix Verisons" the PR needs to be changed to the "release-20.0.1" branch or somebody has to cherry-pick the PR to the release 20.0.1 when it is merged to develop.
        Hide
        brynjar Brynjar Eide added a comment -

        I see. I would definitely like to see this in release-20.0.1, so thank you for your explanation. Unfortunately, I couldn't just change the merge base to release-20.0.1, as that would include the bump from v20 to v21, but I'll see if I can figure out a way to solve this.

        (In the mean time, I'd appreciate any suggestions anyone may have for how to best solve this, as there seem to be hundreds of ways to do anything with Git... and only a few of them would be appropriate in most circumstances.)

        Show
        brynjar Brynjar Eide added a comment - I see. I would definitely like to see this in release-20.0.1, so thank you for your explanation. Unfortunately, I couldn't just change the merge base to release-20.0.1, as that would include the bump from v20 to v21, but I'll see if I can figure out a way to solve this. (In the mean time, I'd appreciate any suggestions anyone may have for how to best solve this, as there seem to be hundreds of ways to do anything with Git... and only a few of them would be appropriate in most circumstances.)
        Hide
        brynjar Brynjar Eide added a comment -

        I did a recommit, based on the latest commit in release-20.0.1 before changing the base branch from develop to release-20.0.1. It looks alright to me, but I'm happy to change anything if my solution turns out to have any drawbacks.
        Thanks again for your quick responses and helpful feedback!

        Show
        brynjar Brynjar Eide added a comment - I did a recommit, based on the latest commit in release-20.0.1 before changing the base branch from develop to release-20.0.1. It looks alright to me, but I'm happy to change anything if my solution turns out to have any drawbacks. Thanks again for your quick responses and helpful feedback!
        Hide
        jeffg Jeff Gehlbach added a comment -

        I would like to see this fix go as far back as possible. If it can be cleanly applied to the foundation branch, let's do that. Otherwise foundation-2016 or foundation-2017.

        Show
        jeffg Jeff Gehlbach added a comment - I would like to see this fix go as far back as possible. If it can be cleanly applied to the foundation branch, let's do that. Otherwise foundation-2016 or foundation-2017 .

          People

          • Assignee:
            Unassigned
            Reporter:
            brynjar Brynjar Eide
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development