Provide a way to selectively detect services on requisitions

Description

The OpenNMS Provisioning system is the way to populate the inventory of the elements that must be monitored by OpenNMS.

The way to configure the provisioning system is by creating provisioning groups. You can either tell the provisioner what exactly is required to include on the inventory, or you can discover everything (with a certain configurable restrictions), or you can create a hybrid (partial discovery and partial specific declaration).

In theory, there are no restrictions on the way to group the nodes that are going to exist on the same requisition. But, the nodes on the same requisition must follow the following rules:

1) All of the them must have a unique Foreign ID (that's the way to uniquely identify a node within a requisition).

2) All the detectors defined on the requisition affect all the IPs (physically defined on the requisition or discovered through SNMP).

3) All the policies defined on the requisition affect all the nodes and interfaces (physically defined on the requisition or discovered through SNMP).

The last two rules impose some limitations about how to group the nodes.

Here are several use cases:

1) Replace Capsd with Provisiond.

Capsd was the old way to fully auto-discover all the inventory and persist it into the OpenNMS database. Because all the discovered nodes are not going to be added to a specific requisition, in order to detect which services exist on each IP address discovered, all of them must be declared on the same set of detectors. In other words, Provisiond will try to detect all the services declared on all the IPs discovered which is not what some users may want to do, specially on those cases on which a user wants to detect a certain services only on a specific range of IP addresses, or when a user wants to detect a service on every single IP except a specific range of addresses.

Capsd is able to perform this selectively discovery, but unfortunately that is not possible with the current code of Provisiond.

2) Suppose for example that you have 100 servers, 50 of them are web servers and 50 of them are DB servers. In order to detect HTTP and MySQL only on the proper devices, it is mandatory to add the nodes on different requisitions and create the proper detector on each of them. Now, imagine that 20 of the web servers are Linux, the rest are Solaris. All of them support SSH, but this service is required only on Linux devices. That means, instead of 2 groups, 3 are required, because only a sub-set of the web servers have SSH. But there are several HTTP related services that must exist on all the web servers (i.e. shared services), but those services must be defined twice (once per requisition). If the way we detect the service changes, we should change it twice (because it exist on two requisitions).

The above situation can be extrapolated to a more complex scenario when the amount of shared services is big and the amount of unique services per server of sub-set of servers is also big. This complicates the administration of the shared services and could lead into errors, and force to split all the servers into several requisitions instead of create one requisition for all the servers (which make sense, and facilitates the administration), or a requisition per device kind, or per physical location, etc.

In order to be able to put all the servers into the same requisition, we need a way to "selectively" detect the service. That means, be able to tell the Provisioner: detect service X only on A, B and C and do not attempt to detect it on D, E and F.

As I mentioned before, this is not possible with the current code, and that is why external scripts, and some tools like PRIS have been developed in order to fill the gap on the restrictions imposed by Provisiond.

Enforcing a service on the requisition might be the solution, but that means, the service must exist and most be valid and reachable prior the synchronization of the requisition, otherwise this could lead into potential problems and unwanted notifications. Also requires to know the list of services that must be added on a node prior adding the node. Also, that breaks the ability to selectively disable the polling of the services (for maintenance purposes, for example) when required (assuming that scheduled outages is not an option because that affect packages instead of single services per IP).

Also, if a customer wish to use auto-discover, that is not the way to go if this customer has a complex scenario like the one described above.

There are several ways to provide a selectively detection, and here are some ideas:

1) Provide a rule to describe the valid list of IPs on which the service should be detected, or the list of IPs on which the services should not be detected, for example:

This is going to tell to the class responsible for creating the detection tasks to create the task only if the IP that it is going to be processed matches the rule defined on the detector (i.e. 10.0.1.*). This is a special parameter that control on which IP the service must be detected. In theory, this special parameter could be treated as "where" the service must be detected (because the detector implementation already provides a way to "how" the service must be detected).

In this example, the value uses an exclamation mark at the beginning, to say that HTTP must be detected on every single IP except on 10.0.1.*.

This parameter "ipMatch" is something targeted for the detection task creator and not for the detectors implementations. Certainly, the detectors implementation can see that attribute, but it is not required they use it on their implementations.

In terms of code changes, this is the easiest and clean way to provide the functionality required, specially because if the parameter is not provided, the provisioner will work as it was designed.

Also, there is no need to change the WebUI to use this feature.

For this reason, a patch to provide this functionality is included (it has been tested on master, but it could be possible to port it to 1.12 if necessary).

2) Be able to add a service into the requisition, and tell the provisioner to detect it instead of force its addition into the database, for example:

If the service has the flag detectable=true (by default is false, to be concise with the current behavior), the provisioner should schedule the detection of the service instead of forcing it on the database.

In terms of code changes, this seems to be complicated to implement, considering that the persisting and scanning phases are separated, and this also requires changes on the WebUI.

3) Create a ServiceDetectionPolicy to control the detection phase.

Unfortunately, the policies are enforced after the scanning phase, so this seems to be complicated in terms of code changes, and the way the policies are executed within the provisioner.

4) Emulate the concept of packages inside the foreign source definition, in order to specify a filter per list of detectors.

This is the idea behind creating several list of detectors and apply an optional filter to each list, for example:

Then, the provisioner should process all the detectors list to generate the final list that should apply to each IP.

This is doable on a similar way like solution 1, but it seems to be more complicated (more places must be modified for this solution).

Fortunately, old formats can still work. Unfortunately, the WebUI requires drastic changes in order to support this way to configure a requisition while the first solution doesn't impose changes on the WebUI (only on the class responsible for creating the list of detection tasks).

Conclusion

The easiest, powerful and less intrusive solution is the first one.

Acceptance / Success Criteria

None

Attachments

Lucidchart Diagrams

Activity

Show:

Matt Brozowski September 18, 2014 at 6:08 PM

I committed this to rc/stable/1.14.0

2188398c6d5ea190bbab7d6a2ebcdf77f04d8597

Matt Brozowski September 18, 2014 at 5:05 PM

I am going to allow this for 1.14 but I cannot guarantee that this feature will continue to be supported in minion/dominion.

Alejandro Galue September 15, 2014 at 11:19 AM

Here is another useful use case that can be done with the proposed changes:

Imagine that you want to detect the HTTP service on different servers but using different parameters (for example, the URL is different), you can create the detectors like this:

As you can see, the ipMatch filter can be used to schedule the detection of the same service using different parameters on different set of IP addresses.

Of course, you must be careful when creating the filters, otherwise Provisiond might schedule the detection of the same service several times.

Alejandro Galue September 12, 2014 at 9:18 AM

One of the advantages of adding services with detectors instead of forcing them on the requisition is that you gain the ability to force the services to be unmanaged. either from the node's page or the ReST API, on those cases on which you would like to temporary disable the polling without using scheduled outages. That way there is no need to touch the requisitions or the detectors.

Christopher Rodman September 11, 2014 at 9:36 AM

Another good use case would be that suppose you have a requisition and the foreign-source has HTTP as a detector and there are 20 nodes in that requisition. Now, what happens if on one of those nodes you're having issues with HTTP and you need to temporarily suspend HTTP on that node. If you delete it from the node, it will get detected again. Or you could remove HTTP from the foreign-source, but that would prevent future nodes from being able to detect HTTP. So this would provide that granular flexibility that is needed in a production environment.

Fixed

Details
Assignee
Alejandro Galue
Reporter
Alejandro Galue
Components
Fix versions
14.0.0
Affects versions
1.12.9
1.13.4
Priority
Blocker

PagerDuty

Created September 10, 2014 at 10:32 AM

Updated September 26, 2017 at 9:38 PM

Resolved September 18, 2014 at 6:08 PM

Provide a way to selectively detect services on requisitions

Description

Acceptance / Success Criteria

Attachments

Lucidchart Diagrams

Activity

Matt Brozowski September 18, 2014 at 6:08 PM

Matt Brozowski September 18, 2014 at 5:05 PM

Alejandro Galue September 15, 2014 at 11:19 AM

Alejandro Galue September 12, 2014 at 9:18 AM

Christopher Rodman September 11, 2014 at 9:36 AM

DetailsAssigneeAlejandro GalueAlejandro GalueReporterAlejandro GalueAlejandro GalueComponentsFix versions14.0.0Affects versions1.12.91.13.4PriorityBlocker

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

PagerDutyPagerDuty Incident

PagerDuty

Details
Assignee
Alejandro Galue
Reporter
Alejandro Galue
Components
Fix versions
14.0.0
Affects versions
1.12.9
1.13.4
Priority
Blocker

PagerDuty