Reload Collectd and Pollerd Configuration without restart OpenNMS

Description

Since ages, OpenNMS doesn't support reloading the configuration changes performed on poller-configuration.xml and collectd-configuration.xml, which are the configuration files for the 2 most important daemons: Pollerd (Service Assurance / Availability) and Collectd (Data Collection).

Because each of these daemons contain a thread pool and a scheduler to perform the job on each case, the idea of reloading the configuration is to re-create the schedulers as if OpenNMS is starting up.

Right now, if any of these files is modified, OpenNMS must be restarted in order to get the changes.

Acceptance / Success Criteria

None

Attachments

1
  • 18 Feb 2015, 01:55 PM

Lucidchart Diagrams

Activity

Alejandro Galue February 19, 2015 at 9:01 AM

Merged into develop on revision 656686446567cfbfa3cd762fc7e3174e31c86024.

Alejandro Galue February 19, 2015 at 8:53 AM

Fixed on foundation on revision 0b1cfc4f783851cc6ec00bd424e41ad2bda8d177

Alejandro Galue February 18, 2015 at 2:13 PM

At large scale, I would say that reload the configuration is an expensive because the solution is on a per-node basis, but I think this is better than restarting OpenNMS.

Alejandro Galue February 18, 2015 at 2:06 PM

The solution for the "develop" branch is different due to some API changes, but it is essentially the same. The problem is that after committing the solution to "foundation", a manual merge is required to put the solution on "develop"

Alejandro Galue February 18, 2015 at 2:05 PM

Collectd

The solution is based on the current implementation for the handler of nodeCategoryMembershipChanged inside Collectd.java.

Because reloading the configuration means re-schedule everything, the handler gets the list of node IDs from the NodeDAO, and then for each node it calls the same two methods used by handleNodeCategoryMembershipChanged for reschedule the datacollection.

Besides that, I've added a logic to manage the hash of collection implementations (if something is either added or removed).

On collectd.log, I can see the ABORT_COLLECTION messages that tells me the old collection won't be executed anymore.

Pollerd

Similarly like the solution with Collectd, the new handler is going to use the method called serviceReschedule defined on PollerEventProcessor to re-schedule the polling of the services for each node. To get the list of the nodes, I've added a new method to PollableNetwork called getNodeIds().

Fixed

Details

Assignee

Reporter

Affects versions

Priority

PagerDuty

Created February 13, 2015 at 1:00 PM
Updated February 19, 2015 at 1:53 PM
Resolved February 19, 2015 at 8:53 AM

Flag notifications