Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-10678

Memory Leak on Drools while reloading config

    XMLWordPrintable

    Details

    • Sprint:
      Horizon 2019 - 19, Horizon 2019 - May 15th 2019

      Description

      A customer has reported a memory leak associated with the Correlation Feature (Drools, to be more precise).

      The problem seems to be a race condition that is triggering an internal problem within the Drools library while it is marshaling the state of the working memory (i.e. create a snapshot of it) in order to be able to reload the configuration (and restore the memory state after it).

      The trigger seems to be related with having the engine receiving events (i.e. the correlation is trying to inject events into the working memory), while the marshaling process is still happening.

      The following is a section of the Leak Suspect Report generated by Eclipse Memory Analyzer:

      FireTask
        at org.drools.core.marshalling.impl.ProtobufMessages$FactHandle$Builder.buildPartial()Lorg/drools/core/marshalling/impl/ProtobufMessages$FactHandle; (ProtobufMessages.java:24720)
        at org.drools.core.marshalling.impl.ProtobufMessages$FactHandle$Builder.build()Lorg/drools/core/marshalling/impl/ProtobufMessages$FactHandle; (ProtobufMessages.java:24712)
        at org.drools.core.marshalling.impl.ProtobufOutputMarshaller.writeFromNodeMemory(ILorg/drools/core/common/Memory;)Lorg/drools/core/marshalling/impl/ProtobufMessages$NodeMemory; (ProtobufOutputMarshaller.java:494)
        at org.drools.core.marshalling.impl.ProtobufOutputMarshaller.writeNodeMemories(Lorg/drools/core/marshalling/impl/MarshallerWriteContext;Lorg/drools/core/marshalling/impl/ProtobufMessages$RuleData$Builder;)V (ProtobufOutputMarshaller.java:376)
        at org.drools.core.marshalling.impl.ProtobufOutputMarshaller.serializeSession(Lorg/drools/core/marshalling/impl/MarshallerWriteContext;)Lorg/drools/core/marshalling/impl/ProtobufMessages$KnowledgeSession; (ProtobufOutputMarshaller.java:162)
        at org.drools.core.marshalling.impl.ProtobufOutputMarshaller.writeSession(Lorg/drools/core/marshalling/impl/MarshallerWriteContext;)V (ProtobufOutputMarshaller.java:118)
        at org.drools.core.marshalling.impl.ProtobufMarshaller.marshall(Ljava/io/OutputStream;Lorg/kie/api/runtime/KieSession;J)V (ProtobufMarshaller.java:162)
        at org.drools.core.marshalling.impl.ProtobufMarshaller.marshall(Ljava/io/OutputStream;Lorg/kie/api/runtime/KieSession;)V (ProtobufMarshaller.java:146)
        at org.opennms.netmgt.correlation.drools.DroolsCorrelationEngine.marshallStateToDisk(Z)V (DroolsCorrelationEngine.java:292)
        at org.opennms.netmgt.correlation.drools.DroolsCorrelationEngine.reloadConfig()V (DroolsCorrelationEngine.java:384)
        at org.opennms.netmgt.correlation.drools.DroolsCorrelationEngine.lambda$initialize$2()V (DroolsCorrelationEngine.java:232)
        at org.opennms.netmgt.correlation.drools.DroolsCorrelationEngine$$Lambda$151.run()V (Unknown Source)
        at java.lang.Thread.run()V (Thread.java:748)
      

       
      There is a single leak suspect eating the whole 16GB of heap configured for OpenNMS. The problem was reported on Meridian 2018, but Horizon should suffer the same problem.

      Attached is the leak suspects, and the heap dump is available on NextCloud.

      Interestingly, the problem is triggered by a new code introduced as part of the solution for NMS-10363.

        Attachments

          Activity

            People

            • Assignee:
              cgorantla Chandra Gorantla
              Reporter:
              agalue Alejandro Galue
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: