Monitoring Concepts

Definitions

Specificity

Sensitivity

Car Alarms

Nobody likes car alarms.

Smoke Alarms

_images/160209+smoke+alarms.png

Smoke Alarms

Despite the fact that Smoke Alarms don't score great, we're okay with this.

Why?

Example

Suppose you have a server you know gets a lot of network connections.

Thinking ahead, you bumped nf_conntrack_max, and have a nagios check that determines if you can open a new connection.

Additionally, you have monitoring that graphs ss -4 -a | tail -n +2 | wc -l so you know the number of connections over time.

Example

One day you wake up to an alert that says ip_conntrack: table full, dropping packet error. You quickly check your graph of connections and observe.. that the connections are significantly lower than the max.

What you know

Example

It turns out that your monitoring via ss was incorrect

ss doesn't list all connections! It doesn't list NAT or routed connections at all. Instead, lets look at the routing table in /proc!

$ cat /proc/net/nf_conntrack_max
65535
$ wc -l /proc/net/nf_conntrack
65535
$ ss -4 -a | tail -n +2 | wc -l
75

Example

Tests

Possibilities:

  1. low sensitivity and low specificity
  2. low sensitivity, high specificity
  3. high sensitivity, low specificity
  4. high sensitivity, high specificity

Is 2 or 3 worse?

Example

How does this matter?

Lessons

Time Series Data

Collecting time series data can be very interesting. A time series is a sequence of data measured over a period of time.

Some time series data software attempts to integrate monitoring, but a lot of them are better used alongside monitoring solutions like Nagios.

Time Series Data is often collected using tools based on RRD

Why Time Series Data is Important

Kinds of Analysis

What is Analysis Used For

Learning More