Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-12236

Kafka RPC: StackOverflowError while unmarshaling causes processing to halt

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 24.1.3
    • Fix Version/s: 25.0.0
    • Component/s: None
    • Security Level: Default (Default Security Scheme)
    • Labels:
      None
    • Sprint:
      Horizon 2019 - August 14th

      Description

      A user reported that the Kafka RPC response topic would start to lag and wouldn't properly recover until a reboot.

      Stack trace revealed that the consumer thread was no longer consuming:

      "rpc-client-kafka-consumer-1" #80887 prio=5 os_prio=0 cpu=0.14ms elapsed=30594.16s tid=0x00007f02f0363000 nid=0x6e25 waiting on condition  [0x00007effc3ffd000]
         java.lang.Thread.State: WAITING (parking)
          at jdk.internal.misc.Unsafe.park(java.base@11.0.2/Native Method)
          - parking to wait for  <0x000000020b6769f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
          at java.util.concurrent.locks.LockSupport.park(java.base@11.0.2/LockSupport.java:194)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.2/AbstractQueuedSynchronizer.java:2081)
          at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.2/LinkedBlockingQueue.java:433)
          at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.2/ThreadPoolExecutor.java:1054)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.2/ThreadPoolExecutor.java:1114)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.2/ThreadPoolExecutor.java:628)
          at java.lang.Thread.run(java.base@11.0.2/Thread.java:834)
      
         Locked ownable synchronizers:
          - None
      

      output.log contained the following exception which helps explain why the thread exited:

      Exception in thread "rpc-client-kafka-consumer-0" java.lang.StackOverflowError
              at java.base/java.util.ArrayList.add(ArrayList.java:512)
              at org.eclipse.persistence.internal.oxm.record.deferred.DeferredContentHandler$AttributeList.addAttribute(DeferredContentHandler.java:245)
              at org.eclipse.persistence.internal.oxm.record.deferred.DeferredContentHandler.buildAttributeList(DeferredContentHandler.java:107)
              at org.eclipse.persistence.internal.oxm.record.deferred.DeferredContentHandler.startElement(DeferredContentHandler.java:87)
              at org.eclipse.persistence.internal.oxm.record.BinaryDataUnmarshalRecord.startElement(BinaryDataUnmarshalRecord.java:41)
              at org.eclipse.persistence.internal.oxm.record.UnmarshalRecordImpl.startElement(UnmarshalRecordImpl.java:779)
              at org.eclipse.persistence.internal.oxm.XMLBinaryAttachmentHandler.startElement(XMLBinaryAttachmentHandler.java:78)
              at org.eclipse.persistence.internal.oxm.record.deferred.StartElementEvent.processEvent(StartElementEvent.java:40)
              at org.eclipse.persistence.internal.oxm.record.deferred.DeferredContentHandler.executeEvents(DeferredContentHandler.java:64)
              at org.eclipse.persistence.internal.oxm.record.deferred.BinaryMappingContentHandler.executeEvents(BinaryMappingContentHandler.java:75)
        ...
      

      We should:
      1) Make sure that the thread gets restarted if it ever does exit
      2) Catch all errors that may occur when issuing callbacks and log the corresponding payload (what was the actual XML body and RPC module in this case?)

        Attachments

          Activity

            People

            • Assignee:
              cgorantla Chandra Gorantla
              Reporter:
              j-white Jesse White
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: