Intermittent error starting Telemetryd: No adapter found for class: org.opennms.netmgt.telemetry.protocols.netflow.adapter.netflow5.Netflow5Adapter

Description

Periodically seen in CI:

This occurs with seemingly a variety of Core, Minion, and Sentinel tests. It seems like more of a general startup problem that is unrelated to the specifics of any test. When this happens, kafka.log is almost empty, e.g.:

By comparison, if I look at a run that didn’t fail in this way, I see karaf.log has what appears to be normal startup content. E.g.:

Also got this error fairly consistently when I enabled Sentinel in onms-k8s-poc with Horizon 30. The Sentinel worked fine in Horizon 29.

Maybe https://opennms.discourse.group/t/karafstartupmonitor-waiting-for-loading-karafhealthservice-could-not-start-daemon/2115 will help.

Saw a somewhat similar issue that might or might not be related: Failed to create listener from registry for listener named: Flow-UDP-50000:

Is it possibly the case that org.opennms.netmgt.telemetry.protocols.netflow.adapter.netflow5.Netflow5Adapter is setup in Karaf but Telemetryd doesn’t wait for the startup to complete? For example:

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Benjamin Reed August 16, 2023 at 2:09 PM

My changes have made startup a bit more regular, but I'm honestly not sure it's enough to "solve" this issue. However, I've already created a lot of churn and I'm afraid to do more until we can see how the system behaves for a bit.

So I'm going to go ahead and close this now, and we'll have to keep a close eye on whether we still see the flow startup issue.

Alex May March 3, 2023 at 6:22 PM
Edited

Offering my 2 cents: I have seen this in the past when working with Docker containers, I think I was able to reliably stop it from happening by allocating more disk space to Docker.

DJ Gregor January 30, 2023 at 3:54 AM

Note: while looking at which might partially be a timing issue when collectd/pollerd/etc. start and try to first store metrics compared to whether the timeseries persister is ready in Karaf, I realized that this issue with telemetryd might also be timing related with Kafka.

Fixed

Details

Assignee

Reporter

Sprint

Priority

PagerDuty

Created November 4, 2022 at 5:48 PM
Updated August 16, 2023 at 2:09 PM
Resolved August 16, 2023 at 2:09 PM