Scriptd consumes CPU even when it does nothing

Description

The default configuration for Scriptd is empty, meaning it should do nothing.

However, the CPU usage of the Scriptd threads increases proportionally to the events injection rate (and fluctuates around some average). That means, on a busy system that is processing thousands of events per second, the amount of CPU taken by Scriptd can decrease the overall performance of OpenNMS, preventing other features from working properly.

I think it would be useful that Scriptd analyses the configuration and inhibit itself from listening to events when there is no configuration requiring that. And when there is a need for a listener, make sure it won't overwhelm the rest of the JVM.

On the system on which I observed this the first time, Actiond was also behaving similarly. I've never seen a customer using Actiond before, but certainly, Scriptd is more widely used, which is why I focused this issue on it.

I'm targeting M2020 and the latest H27 because before the refactoring to use Immutable Events, the impact on CPU was not that high, which makes me believe that code change might be related.

I used jvm-tools to analyze a clean system running 27.1.0:

Also when using stress-events via Karaf Shell to generate 2000 events per second, I can see:

I believe that's excessive for something that is not being in use.

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Zoë Knox March 16, 2022 at 6:26 PM

Zoë Knox March 15, 2022 at 5:56 PM

It is simple enough to disable Scriptd and Actiond when there are no scripts or actions configured. It saves a small amount of CPU, and may help under high event loads. Before the changes, at 2000 ev/s:

and with Scriptd auto-disabled for having no scripts configured:

 

So is it worth it to disable scriptd when not configured? (Detecting whether Actiond has a config is harder and possibly not a "quick win").

Alberto November 18, 2021 at 12:40 AM
Edited

I'm new to OpenNMS and tried to follow the same steps.

  • Started a clean instance 27.1.2

  • Started monitoring scripd

  • Started stress-events for 2000 events/s

Couldn't replicate the CPU usage problem

Running the command

The highest values found were:

Maybe there are other steps I should have followed to be able to replicate?

 

Fixed

Details

Assignee

Reporter

Labels

HB Grooming Date

HB Backlog Status

Components

Sprint

Affects versions

Priority

PagerDuty

Created March 23, 2021 at 2:54 PM
Updated March 29, 2023 at 1:27 PM
Resolved March 29, 2023 at 1:25 PM