Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-10593

Alarmd get stucks in dead-lock and stops processing events

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 23.0.0, 23.0.1, 23.0.2, 23.0.3
    • Fix Version/s: 23.0.4
    • Component/s: None
    • Security Level: Default (Default Security Scheme)
    • Labels:
      None
    • Sprint:
      Horizon - Feb 27th 2019

      Description

      I've seen a few cases now where alarmd gets stuck with a stack similar to:

      "Timer-1" #290 prio=5 os_prio=0 tid=0x00007fa78d55e800 nid=0x665b runnable [0x00007fa818ce0000]
         java.lang.Thread.State: RUNNABLE
              at java.net.SocketInputStream.socketRead0(Native Method)
              at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
              at java.net.SocketInputStream.read(SocketInputStream.java:170)
              at java.net.SocketInputStream.read(SocketInputStream.java:141)
              at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143)
              at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112)
              at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:70)
              at org.postgresql.core.PGStream.receiveChar(PGStream.java:283)
              at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1919)
              at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:291)
              - locked <0x000000065d030720> (a org.postgresql.core.v3.QueryExecutorImpl)
              at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:432)
              at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:358)
              at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:171)
              at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:138)
              at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
              at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java)
              at org.hibernate.jdbc.NonBatchingBatcher.addToBatch(NonBatchingBatcher.java:46)
              at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:2591)
              at org.hibernate.persister.entity.AbstractEntityPersister.updateOrInsert(AbstractEntityPersister.java:2495)
              at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:2822)
              at org.hibernate.action.EntityUpdateAction.execute(EntityUpdateAction.java:113)
              at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:273)
              at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:265)
              at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:185)
              at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:321)
              at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:51)
              at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1216)
              at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:383)
              at org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:133)
              at org.springframework.orm.hibernate3.HibernateTransactionManager.doCommit(HibernateTransactionManager.java:662)
              at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:761)
              at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:730)
              at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:484)
              at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:291)
              at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96)
              at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
              at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:208)
              at com.sun.proxy.$Proxy158.setSeverity(Unknown Source)
              at org.opennms.netmgt.alarmd.drools.Rule_setSituationSeverityToMaxAlarmSeverity2065334369.defaultConsequence(Rule_setSituationSeverityToMaxAlarmSeverity2065334369.java:15)
              at org.opennms.netmgt.alarmd.drools.Rule_setSituationSeverityToMaxAlarmSeverity2065334369DefaultConsequenceInvokerGenerated.evaluate(Unknown Source)
              at org.opennms.netmgt.alarmd.drools.Rule_setSituationSeverityToMaxAlarmSeverity2065334369DefaultConsequenceInvoker.evaluate(Unknown Source)
      

      The query is stuck waiting on Postgres.

      Further inspection shows that the query is actually stuck waiting for a lock, which held by another transaction that is not yet committed.

      Looking back at the thread dump, we can find another thread with an open transaction, which is open, and will remain open until the other thread unblocks - hence the deadlock.

      "alarmd-Thread-3-of-4" #409 prio=10 os_prio=0 tid=0x00007fa62409c000 nid=0x6704 waiting on condition [0x00007fa8002f0000]
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x0000000650457b38> (a java.util.concurrent.locks.ReentrantLock$FairSync)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
              at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
              at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
              at org.opennms.netmgt.alarmd.drools.DroolsAlarmContext.handleNewOrUpdatedAlarm(DroolsAlarmContext.java:304)
              at org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.lambda$onNewOrUpdatedAlarm$0(AlarmLifecycleListenerManager.java:138)
              at org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager$$Lambda$793/2064039797.accept(Unknown Source)
              at org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.forEachListener(AlarmLifecycleListenerManager.java:209)
              at org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.onNewOrUpdatedAlarm(AlarmLifecycleListenerManager.java:138)
              at org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.onAlarmUpdatedWithReducedEvent(AlarmLifecycleListenerManager.java:158)
              at org.opennms.netmgt.dao.support.AlarmEntityNotifierImpl.lambda$didUpdateAlarmWithReducedEvent$1(AlarmEntityNotifierImpl.java:60)
              at org.opennms.netmgt.dao.support.AlarmEntityNotifierImpl$$Lambda$842/41578666.accept(Unknown Source)
              at org.opennms.netmgt.dao.support.AlarmEntityNotifierImpl.forEachListener(AlarmEntityNotifierImpl.java:121)
              at org.opennms.netmgt.dao.support.AlarmEntityNotifierImpl.didUpdateAlarmWithReducedEvent(AlarmEntityNotifierImpl.java:60)
              at org.opennms.netmgt.alarmd.AlarmPersisterImpl.addOrReduceEventAsAlarm(AlarmPersisterImpl.java:204)
              at org.opennms.netmgt.alarmd.AlarmPersisterImpl.lambda$persist$0(AlarmPersisterImpl.java:122)
              at org.opennms.netmgt.alarmd.AlarmPersisterImpl$$Lambda$777/2084621144.doInTransaction(Unknown Source)
              at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
              at org.opennms.netmgt.alarmd.AlarmPersisterImpl.persist(AlarmPersisterImpl.java:122)
              at org.opennms.netmgt.alarmd.Alarmd.onEvent(Alarmd.java:86)
      

        Attachments

          Activity

            People

            • Assignee:
              j-white Jesse White
              Reporter:
              j-white Jesse White
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: