Cannot debug Telemetry persistence on Sentinel

Description

I remember that with Telemetry within OpenNMS, it was possible to show in DEBUG mode all the persisting work, specially when compiling, running and showing debug messages from the Groovy scripts that will take care of preparing the CollectionSets.

This is the information I cannot see on karaf.log for Sentinel, meaning that I don't know why Sentinel is not persisting the Telemetry data. It is receiving the data, as Minion is properly sending it, and Sentinel is properly consuming it.

In other words, when Sentinel consumes or extract a message from the Telemetry topics (as I'm using Kafka), I don't know what it does next, meaning if there is an error, I'm not aware of it.

To clarify what's required:

1) Be able to see the raw message in human-readable form on the logs, from the original GBP message sent by the Nexus switch.
2) Be able to see debug messages generated from within the Groovy script on the logs.
3) Be able to see any error message generated from within the script on the logs.

The above should work regardless if Sentinel or OpenNMS is used to analyze telemetry data.

This is important because in case of Cisco the data provided by the switches come on an undocumented form, and depends on how telemetry was configured. For this reason, to build the Groovy script, a human-readable content of the data is required; otherwise, the script cannot be implemented. Then, once the script is implemented, it is very important to be able to see debug messages and errors generated from the script, as there are no tools to validate the implementation of the script in question, meaning either OpenNMS or Sentinel must be used for this purpose.

Acceptance / Success Criteria

None

Attachments

Lucidchart Diagrams

Activity

Alejandro Galue September 18, 2019 at 4:33 PM

Something that might be hidden on my last reply:

Because Karaf follows different rules compared with OpenNMS in terms of logging, it was not obvious where or how the DEBUG level should be configured to get the expected information.

Fortunately, I now know how to do it, but worth adding a note on the documentation so any normal user without Karaf experience (i.e. the majority of them) can take the advantage of being able to properly debug the groovy scripts.

That said, in OpenNMS those logs are also going to be on karaf.log, and I believe it would be more intuitive to find that on telemetry.log. Similarly, now I know where to find the logs, but as this is not obvious, the documentation should provide some hints.

My $.02

Alejandro Galue September 18, 2019 at 4:17 PM

Good news!

At least with Sentinel the problem was setting the DEBUG level appropriately. By default, there is an entry for org.opennms when you do a log:get to be INFO. That's why enabling DEBUG at ROOT level was not enough. After enabling DEBUG at org.opennms level, I was able to see the raw NX-OS messages, as well as the logs from the Groovy scripts.

I believe it is very likely that the problem in OpenNMS is similar, so having this in mind, I'll re-check. If it works, I'll resolve this issue as WON'T FIX, at least from H25's perspective.

Sounds good?

Jesse White September 18, 2019 at 4:08 PM

Log statements in the Groovy script show up in karaf.log, and these statements can be used to render the GBP payloads as a human-readable string. What's missing?

Alejandro Galue September 17, 2019 at 7:31 PM

The attached .tar.gz contains a test program I wrote to bypass Minion and send a GPB telemetry data directly to the Sink Topic in Kafka. This simple producer takes a binary file with a valid Nexus GBP payload and send it to the default Sink topic for NX-OS. That way OpenNMS or Sentinel can take it and process it without having a GNS3 or a real Nexus around.

On my lab, I have OpenNMS H25-build85 running on a VM, Minion H25-build85 running on another VM, Kafka/Zookeeper running on Docker. OpenNMS and Minion are talking through Kafka. Then, a requisition is created with a fake node that represents the Nexus. The foreignID and/or the label must be nexus9k as that is how the device identifies itself on the provided test data (change it if you use a different payload). It is very important to mark the fake IP as a primary SNMP interface (even if SNMP is unavailable), otherwise telemetry won't work (I have no idea why, but this is how it works); the IP itself doesn't matter.

Once all this is in place, I use the single JAR generated after compiling the tar.gz like this:

On the above example, 192.168.0.17 is the IP of my machine (I'm exposing Kafka running on Docker on port 9092; I can provide the docker-compose file if that helps). The minion ID and the location must match the same used on the Minion VM (in my case minion01 and Apex), as this JAR generates a message like if the Minion would do it, but with the mock data.

In terms of OpenNMS/Sentinel, the data was sent by the Minion, even if we know that was not the case.

I'll attach the Groovy script that "works" with that payload.

Of course, I'm available for testing whatever is required.

Alejandro Galue September 17, 2019 at 7:04 PM

When OpenNMS receives a telemetry message, the following exist on /opt/opennms/logs/karaf.log:

In other words, the NxosGpbAdapter shows a human-readable version of the GPB message when DEBUG is enabled. This is what a user would need to understand what the Nexus is sending to build the Groovy script. Unless I'm missing something, this doesn't seem available on Sentinel.

In terms of the Groovy script, there is no evidence of its execution on any logs, neither in OpenNMS nor in Sentinel.

I know the script is executed because I can see the data on RRD files or Newts (depending on what I'm using OpenNMS or Sentinel), and if I introduce an error, I can see the compilation error on the logs; but if the script is perfect, and it contains multiple debug/info messages as the example scripts we provide, none of those messages are displayed, and its parent class doesn't show evidence of the execution of the script, despite seeing the logs associated with the persistence phase which happens after the script is executed.

Because there are no tools to test the script outside OpenNMS, and there are no log messages that can help a user to debug the scripts while they are being implemented within OpenNMS or Sentinel, a user is blind and completely unware of what might be missing on a compilable script because of this.

Makes sense?

Won't Fix

Details
Assignee
Markus von Rüden(Deactivated)
Reporter
Alejandro Galue
Labels
drift2
Components
Sprint
None
Fix versions
25.0.0
Affects versions
24.1.0
Priority
Major

PagerDuty

Created June 20, 2019 at 3:02 PM

Updated September 19, 2019 at 10:48 AM

Resolved September 19, 2019 at 10:48 AM

Cannot debug Telemetry persistence on Sentinel

Description

Acceptance / Success Criteria

Attachments

Lucidchart Diagrams

Activity

Alejandro Galue September 18, 2019 at 4:33 PM

Alejandro Galue September 18, 2019 at 4:17 PM

Jesse White September 18, 2019 at 4:08 PM

Alejandro Galue September 17, 2019 at 7:31 PM

Alejandro Galue September 17, 2019 at 7:04 PM

DetailsAssigneeMarkus von RüdenMarkus von Rüden(Deactivated)ReporterAlejandro GalueAlejandro GalueLabelsdrift2ComponentsSprintNone+2Fix versions25.0.0Affects versions24.1.0PriorityMajor

Details

Assignee

Reporter

Labels

Components

Sprint

Fix versions

Affects versions

Priority

PagerDutyPagerDuty Incident

PagerDuty

Details
Assignee
Markus von Rüden(Deactivated)
Reporter
Alejandro Galue
Labels
drift2
Components
Sprint
None
Fix versions
25.0.0
Affects versions
24.1.0
Priority
Major

PagerDuty