No Reason Code on IPv6 HTTPS outage

Description

The following timeout:

2011-10-27 15:14:58,643 INFO [PollerScheduler-30 Pool-fiber1] HttpsMonitor: checkStatus: HTTP socket connection timed out with timeout: 9000ms retry: 0 of 1
2011-10-27 15:14:59,058 DEBUG [PollerScheduler-30 Pool-fiber1] HttpsMonitor: HttpMonitor: connected to host: 2002:4c4a:ffe6:b:230:48ff:fec5:5622/2002:4c4a:ffe6:b:230:48ff:fec5:5622 on port: 443
2011-10-27 15:15:08,067 INFO [PollerScheduler-30 Pool-fiber1] HttpsMonitor: checkStatus: HTTP socket connection timed out with timeout: 9000ms retry: 1 of 1

resulted in a nodeLostService event with a reason code of "Unknown".

HTTPS outage identified on interface 2002:4c4a:ffe6:000b:0230:48ff:fec5:5622 with reason code: Unknown.

It should list the reason as a timeout connecting to port.

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Seth Leger December 6, 2011 at 2:20 PM

I added a catch(Throwable) to the HttpMonitor so that it will not let any RuntimeExceptions get by (which cause "Unknown" polling failures). Marking as fixed.

commit 2ccf2a5fe50b35c359d528836af6eb5a6a0f5044

Seth Leger November 3, 2011 at 5:19 PM

Actually, it looks like the logs are different on the system where Tarus is observing the problem: it has that "connected to host: 2002:4c4a:ffe6..." message which indicates that it is actually connecting to something unlike my unit test which just times out cleanly against a non-existent address. So there could be another problem somewhere in the connection, maybe at the SSL level. Continuing to investigate...

Seth Leger November 3, 2011 at 5:12 PM

I wrote a simple unit test for this and it appears to be working properly inside the test so I'll have to do more digging. Here's the test output:

Seth Leger November 3, 2011 at 12:44 PM

Please check for a stack trace in the logs. I'm pretty sure that the "Unknown" error code is only filled in when the poll encounters an unexpected exception that should be logged.

Fixed

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

PagerDuty

Created October 27, 2011 at 3:33 PM
Updated January 27, 2017 at 4:20 PM
Resolved December 6, 2011 at 2:20 PM