Any SNMP error-status > 5 treated as unrecognized, aborts AggregateTracker
Description
Environment
Acceptance / Success Criteria
Attachments
- 05 Dec 2016, 01:18 PM
Lucidchart Diagrams
Activity
Benjamin Reed December 21, 2016 at 11:47 AM
This got merged yesterday.
Jeff Gehlbach December 6, 2016 at 3:35 PM
Your reading of RFC 3416 tracks with my own.
noAccess(6)
indicates a MIB view problem, which may go away for a future collection but is unlikely to be transient within the lifetime of a single tracker.
The RFC doesn't really say much about authorizationError(16)
so I'm not sure what to do there. Could we make the behavior in case of this (and maybe other?) statuses user-configurable? That would enable us to adapt to a broader range of lame agents.
Benjamin Reed December 6, 2016 at 3:07 PMEdited
Changes I'm making:
Removed the
LOG.*
calls I put in theAggregateTracker
which could stomp on thread-tracking.Changed
badValue(3)
andreadOnly(4)
to continue to be non-fatal (in case we can continue to retrieve other values), but to returnfalse
when determining retries.If I am reading RFC 3416 correctly,
wrongType(7)
,wrongLength(8)
,wrongEncoding(9)
,wrongValue(10)
,noCreation(11)
,inconsistentValue(12)
,resourceUnavailable(13)
,commitFailed(14)
,undoFailed(15)
,notWritable(17)
, andinconsistentName(18)
are all related only to things that would happen from aSET*
call, so they, too, should be non-fatal but non-retried.
Also, currently noAccess(6)
and authorizationError(16)
are set to fatal: false, retry: true. Should we throw an exception if auth fails? Or retry in the hope it's a transient error? Or just do non-fatal, non-retried, like the others?
Benjamin Reed December 5, 2016 at 2:53 PM
Yeah, a bunch of them were related to setting. I erred on the side of caution but we can certainly make all the set-based ones just reject cleanly...
Details
Assignee
Benjamin ReedBenjamin ReedReporter
Jeff GehlbachJeff GehlbachLabels
Components
Sprint
NoneFix versions
Affects versions
Priority
Blocker
Details
Details
Assignee
Reporter
Labels
Components
Sprint
Fix versions
Affects versions
Priority
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty

The AggregateTracker code is unaware of SNMP error-status values greater than 5 (the original set). Higher values are treated as unknown and invariably fatal to the tracker's operation. RFC3416 defines values 0-18. We need to put in the work to understand which of these values are actually show-stoppers, which can be ignored and which may require some kind of adaptive behavior as in the case of
TOO_BIG_ERR
.public boolean processErrors(int errorStatus, int errorIndex) { if (errorStatus == TOO_BIG_ERR) { int maxVarsPerPdu = m_pduBuilder.getMaxVarsPerPdu(); if (maxVarsPerPdu <= 1) { throw new IllegalArgumentException("Unable to handle tooBigError when maxVarsPerPdu = "+maxVarsPerPdu); } m_pduBuilder.setMaxVarsPerPdu(maxVarsPerPdu/2); reportTooBigErr("Reducing maxVarsPerPdu for this request to "+m_pduBuilder.getMaxVarsPerPdu()); return true; } else if (errorStatus == GEN_ERR) { return processChildError(errorStatus, errorIndex); } else if (errorStatus == NO_SUCH_NAME_ERR) { return processChildError(errorStatus, errorIndex); } else if (errorStatus != NO_ERR){ throw new IllegalArgumentException("Unrecognized errorStatus "+errorStatus); } else { // Continue on.. no need to retry return false; } }