Details
-
Bug
-
Status: Resolved (View Workflow)
-
Major
-
Resolution: Fixed
-
23.0.3, Meridian-2018.1.5
-
Security Level: Default (Default Security Scheme)
-
None
-
Horizon - March 20th 2019
Description
While investigating a system hitting OOM errors:
$ cat /opt/opennms/logs/gc.log.*.current | grep "Full GC" 28225.877: [Full GC (Allocation Failure) 23G->22G(24G), 57.6808883 secs] 28287.806: [Full GC (Allocation Failure) 23G->22G(24G), 58.2689434 secs] 28352.225: [Full GC (Allocation Failure) 23G->22G(24G), 58.7228746 secs] 28417.907: [Full GC (Allocation Failure) 23G->22G(24G), 66.6455167 secs] 28491.769: [Full GC (Allocation Failure) 23G->22G(24G), 60.6501385 secs] 28560.747: [Full GC (Allocation Failure) 23G->22G(24G), 60.9638456 secs] 28630.703: [Full GC (Allocation Failure) 23G->22G(24G), 74.1622266 secs]
we found that the heap was filled with SNMP results:
$ jmap -histo:live $(cat /opt/opennms/logs/opennms.pid) num #instances #bytes class name ---------------------------------------------- 1: 167500792 4044939384 [I 2: 166866828 4004803872 org.opennms.netmgt.snmp.SnmpResult 3: 169186543 2706984688 org.opennms.netmgt.snmp.snmp4j.Snmp4JValue 4: 166956271 2671300336 org.opennms.netmgt.snmp.SnmpInstId 5: 47970259 1089158136 [B 6: 67030507 1072488112 org.opennms.netmgt.snmp.snmp4j.Integer32IgnoreTooManyBytes
Investigating the heap dump further we found that most of these results were duplicates.
Looking at the SNMP traffic on the host, we found a large number of request/response packets with the following VBs:
User Datagram Protocol, Src Port: 40440, Dst Port: 161 Simple Network Management Protocol version: v2c (1) community: funkymonkey data: getBulkRequest (5) getBulkRequest request-id: 1705470191 non-repeaters: 0 max-repetitions: 2 variable-bindings: 10 items 1.3.6.1.2.1.2.2.1.1.2: Value (Null) 1.3.6.1.2.1.2.2.1.2.2: Value (Null) 1.3.6.1.2.1.2.2.1.3.2: Value (Null) 1.3.6.1.2.1.2.2.1.4.2: Value (Null) 1.3.6.1.2.1.2.2.1.5.2: Value (Null) 1.3.6.1.2.1.2.2.1.6.2: Value (Null) 1.3.6.1.2.1.2.2.1.7.2: Value (Null) 1.3.6.1.2.1.2.2.1.8.2: Value (Null) 1.3.6.1.2.1.2.2.1.9.2: Value (Null) 1.3.6.1.2.1.31.1.1.1.1.2: Value (Null) User Datagram Protocol, Src Port: 161, Dst Port: 40440 Simple Network Management Protocol version: v2c (1) community: funkymonkey data: get-response (2) get-response request-id: 1705470191 error-status: noError (0) error-index: 0 variable-bindings: 20 items 1.3.6.1.2.1.2.2.1.1.2: 2 1.3.6.1.2.1.2.2.1.2.2: <MISSING> 1.3.6.1.2.1.2.2.1.3.2: 6 1.3.6.1.2.1.2.2.1.4.2: 1500 1.3.6.1.2.1.2.2.1.5.2: 100000000 1.3.6.1.2.1.2.2.1.6.2: funkymonkey 1.3.6.1.2.1.2.2.1.7.2: 1 1.3.6.1.2.1.2.2.1.8.2: 1 1.3.6.1.2.1.2.2.1.9.2: 0 1.3.6.1.2.1.31.1.1.1.1.2: funkymonkey 1.3.6.1.2.1.2.2.1.1.2: 2 1.3.6.1.2.1.2.2.1.2.2: <MISSING> 1.3.6.1.2.1.2.2.1.3.2: 6 1.3.6.1.2.1.2.2.1.4.2: 1500 1.3.6.1.2.1.2.2.1.5.2: 100000000 1.3.6.1.2.1.2.2.1.6.2: funkymonkey 1.3.6.1.2.1.2.2.1.7.2: 1 1.3.6.1.2.1.2.2.1.8.2: 1 1.3.6.1.2.1.2.2.1.9.2: 0 1.3.6.1.2.1.31.1.1.1.1.2: funkymonkey
In this case, the agent is responding with OIDs that are equal to the requested OIDs, rather than successors of the requested OIDs.
Our SNMP tracking code does not account for this and continues to request for OIDs following the last ones received, which leads to an infinite loop.
Everytime we receive a response, the results get added to memory, which leads to the observed problems.
Attachments
Issue Links
- is triggering
-
NMS-10622 Backport SNMP successor validation
-
- Resolved
-