Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-10621

Bad response from SNMP agent leads to infinite loop in SNMP tracker

    XMLWordPrintable

    Details

    • Sprint:
      Horizon - March 20th 2019

      Description

      While investigating a system hitting OOM errors:

      $ cat /opt/opennms/logs/gc.log.*.current  | grep "Full GC"
      28225.877: [Full GC (Allocation Failure)  23G->22G(24G), 57.6808883 secs]
      28287.806: [Full GC (Allocation Failure)  23G->22G(24G), 58.2689434 secs]
      28352.225: [Full GC (Allocation Failure)  23G->22G(24G), 58.7228746 secs]
      28417.907: [Full GC (Allocation Failure)  23G->22G(24G), 66.6455167 secs]
      28491.769: [Full GC (Allocation Failure)  23G->22G(24G), 60.6501385 secs]
      28560.747: [Full GC (Allocation Failure)  23G->22G(24G), 60.9638456 secs]
      28630.703: [Full GC (Allocation Failure)  23G->22G(24G), 74.1622266 secs]
      

      we found that the heap was filled with SNMP results:

      $ jmap -histo:live $(cat /opt/opennms/logs/opennms.pid)
       num     #instances         #bytes  class name
      ----------------------------------------------
         1:     167500792     4044939384  [I
         2:     166866828     4004803872  org.opennms.netmgt.snmp.SnmpResult
         3:     169186543     2706984688  org.opennms.netmgt.snmp.snmp4j.Snmp4JValue
         4:     166956271     2671300336  org.opennms.netmgt.snmp.SnmpInstId
         5:      47970259     1089158136  [B
         6:      67030507     1072488112  org.opennms.netmgt.snmp.snmp4j.Integer32IgnoreTooManyBytes
      

      Investigating the heap dump further we found that most of these results were duplicates.

      Looking at the SNMP traffic on the host, we found a large number of request/response packets with the following VBs:

      User Datagram Protocol, Src Port: 40440, Dst Port: 161
      Simple Network Management Protocol
          version: v2c (1)
          community: funkymonkey
          data: getBulkRequest (5)
              getBulkRequest
                  request-id: 1705470191
                  non-repeaters: 0
                  max-repetitions: 2
                  variable-bindings: 10 items
                      1.3.6.1.2.1.2.2.1.1.2: Value (Null)
                      1.3.6.1.2.1.2.2.1.2.2: Value (Null)
                      1.3.6.1.2.1.2.2.1.3.2: Value (Null)
                      1.3.6.1.2.1.2.2.1.4.2: Value (Null)
                      1.3.6.1.2.1.2.2.1.5.2: Value (Null)
                      1.3.6.1.2.1.2.2.1.6.2: Value (Null)
                      1.3.6.1.2.1.2.2.1.7.2: Value (Null)
                      1.3.6.1.2.1.2.2.1.8.2: Value (Null)
                      1.3.6.1.2.1.2.2.1.9.2: Value (Null)
                      1.3.6.1.2.1.31.1.1.1.1.2: Value (Null)
      
      User Datagram Protocol, Src Port: 161, Dst Port: 40440
      Simple Network Management Protocol
          version: v2c (1)
          community: funkymonkey
          data: get-response (2)
              get-response
                  request-id: 1705470191
                  error-status: noError (0)
                  error-index: 0
                  variable-bindings: 20 items
                      1.3.6.1.2.1.2.2.1.1.2: 2
                      1.3.6.1.2.1.2.2.1.2.2: <MISSING>
                      1.3.6.1.2.1.2.2.1.3.2: 6
                      1.3.6.1.2.1.2.2.1.4.2: 1500
                      1.3.6.1.2.1.2.2.1.5.2: 100000000
                      1.3.6.1.2.1.2.2.1.6.2: funkymonkey
                      1.3.6.1.2.1.2.2.1.7.2: 1
                      1.3.6.1.2.1.2.2.1.8.2: 1
                      1.3.6.1.2.1.2.2.1.9.2: 0
                      1.3.6.1.2.1.31.1.1.1.1.2: funkymonkey
                      1.3.6.1.2.1.2.2.1.1.2: 2
                      1.3.6.1.2.1.2.2.1.2.2: <MISSING>
                      1.3.6.1.2.1.2.2.1.3.2: 6
                      1.3.6.1.2.1.2.2.1.4.2: 1500
                      1.3.6.1.2.1.2.2.1.5.2: 100000000
                      1.3.6.1.2.1.2.2.1.6.2: funkymonkey
                      1.3.6.1.2.1.2.2.1.7.2: 1
                      1.3.6.1.2.1.2.2.1.8.2: 1
                      1.3.6.1.2.1.2.2.1.9.2: 0
                      1.3.6.1.2.1.31.1.1.1.1.2: funkymonkey
      

      In this case, the agent is responding with OIDs that are equal to the requested OIDs, rather than successors of the requested OIDs.
      Our SNMP tracking code does not account for this and continues to request for OIDs following the last ones received, which leads to an infinite loop.
      Everytime we receive a response, the results get added to memory, which leads to the observed problems.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                j-white Jesse White
                Reporter:
                j-white Jesse White
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: