Deadlocks on Demo

Description

Seeing a ton of deadlocks on Demo, across multiple daemons. Not sure if this is a problem with the database, or OpenNMS, but it needs investigation.

Acceptance / Success Criteria

None

Attachments

Linked issues

depends on

NMS-7755

c.m.v.a.ThreadPoolAsynchronousRunner: com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@59804d53 -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!

Lucidchart Diagrams

Activity

Jesse White November 5, 2015 at 9:40 AM

Fixed in release-16.0.4 with 87a587810dbd533c657faa29babeec79cf742ec7

Jesse White November 5, 2015 at 9:39 AM

The deadlocks haven't reappeared since demo was updated with this change (they normally would have by now). I'll go ahead and mark this as fixed, and we'll revisit it again if it reoccurs.

Jesse White November 4, 2015 at 1:15 PM

I think I've isolated this one.

We're seeing these deadlocks occur (updates to the assets table) in contexts where we should only be reading values.

I was able to reproduce this by calling org.opennms.features.vaadin.surveillanceviews.service.DefaultSurveillanceViewService#getNodeRtcsForCategories() from a integration test. Hibernate logs revealed that the 'userLastModified' field was being marked as dirty, and as a result, it would attempt to update the row in question.

The 'userLastModified' field is of type 'char(20)', but the default value in the OnmsAssetRecord object is an empty string. char types are always of a fixed size, and padded with spaces in PostgreSQL.

Changing the type of the 'userLastModified' field to 'varchar(20)' while keeping the default value of an empty string resolved the issue with the "update on read". I believe that this should fix the related deadlocks as well.

Andreas Fuchs October 19, 2015 at 8:22 AM

I have made a series of thread dumps on the server with the GUI and the “core”- Server with all other daemons. The dumps are made in an interval from ca. 30 seconds (the dump need some time).
There are some threads in block state for over 4 minutes (I add timestamps in the first line of every dump) but I cant say if that is normal or not.

Seth Leger October 16, 2015 at 11:03 AM

Please get a thread dump from the system whenever it is slow and we can look at that a see what is going on inside the system to diagnose the slowness.

I don't think that database deadlocks would contribute much to slowing the system down. In your logs, the deadlock condition lasts for a couple of seconds and then goes away. I think that the worst effect of the deadlocks would be that Provisiond may not be updating some data properly or adding nodes to the database when the deadlocks occur.

Fixed

Details
Assignee
Jesse White
Reporter
Benjamin Reed
Components
Sprint
None
Fix versions
16.0.4
17.0.0
Affects versions
16.0.3
Priority
Blocker

PagerDuty

Created September 30, 2015 at 11:01 AM

Updated November 9, 2015 at 3:03 PM

Resolved November 5, 2015 at 9:40 AM

Deadlocks on Demo

Description

Acceptance / Success Criteria

Attachments

Linked issues

depends on

Lucidchart Diagrams

Activity

Jesse White November 5, 2015 at 9:40 AM

Jesse White November 5, 2015 at 9:39 AM

Jesse White November 4, 2015 at 1:15 PM

Andreas Fuchs October 19, 2015 at 8:22 AM

Seth Leger October 16, 2015 at 11:03 AM

DetailsAssigneeJesse WhiteJesse WhiteReporterBenjamin ReedBenjamin ReedComponentsSprintNone+2Fix versions16.0.417.0.0Affects versions16.0.3PriorityBlocker

Details

Assignee

Reporter

Components

Sprint

Fix versions

Affects versions

Priority

PagerDutyPagerDuty Incident

PagerDuty

Details
Assignee
Jesse White
Reporter
Benjamin Reed
Components
Sprint
None
Fix versions
16.0.4
17.0.0
Affects versions
16.0.3
Priority
Blocker

PagerDuty