Integrating Alerts via the OpsCenter REST API
The Alerts feature in DataStax OpsCenter allows users to be notified when specific metrics have exceeded configurable thresholds. While OpsCenter ships with two push notification plugins email and custom url), we've heard from some users that they would rather fetch any active alerts in a custom script in order to integrate with systems they already have in place. In this post, I'm going to walk you through retrieving this information using Python via the OpsCenter REST API.
Note: These steps assume you have already setup alert rules in OpsCenter, and are working with a cluster called MyCluster.
Getting Started
The first thing we need is a way to talk to the API.
|
This gets any url from the API and parses the JSON that's returned into a native Python type. Here's a simple example of using this to get the load for a single node:
|
Getting Alerts Information
Now we're ready to actually find out what alerts are currently fired (ie, alerts that have passed their thresholds).API Reference
|
This will give us a list of dictionaries that look like this:
|
That gives us some information, but obviously we still don't have the full picture about what's going on, or what the "current_value" property really means. The next step is to get our list of configured alert rules, so we can get more information via the "alert_rule_id" property.API Reference
|
This retrieves a list of all configured alert rules, and converts it to a dictionary with the alert rule id as the key, for easy lookup. See the AlertRule definition for a full reference of properties in a single rule.
Putting it all together
Now that we have a list of fired alerts and the rules that they reference, it's time to connect the dots.
|
For this example, I've defined doSomething as a function that simply builds a human readable message and prints it -- but this is where you could use the same data to integrate OpsCenter alerts into your existing systems via their APIs.
|
One thing to notice here is that there are two types of alert rules: rolling-avg and node-down. All rules except for the one that checks whether or not a node is down are of type rolling-avg, which just means it's comparing the average value of a given metric to the configured threshold.
Wrap Up
And that's all there is to it. If you'd like to see all of this code put together, as well as some added functionality for looping through all of your clusters, check out fetch-opscenter-alerts.py.
We would love to hear how you're using the OpsCenter API, or would like to use it, so don't hesitate to drop us a message in the comments.