Definitions¶
When it comes to timeseries data, there are lots of terms tossed about that can lead to some confusion. This page is a sort of glossary that helps to define words related to the use of OpenTSDB.
Cardinality¶
Cardinality is a mathematical term defined as the number of elements in a set. In database lingo, it’s often used to refer to the number of unique items in an index. With regards to OpenTSDB it can refer to:
The number of unique time series for a given metric
The number of unique tag values associated with a tag name
Due to the nature of the OpenTSDB storage schema, metrics with higher cardinality may take longer return results during query execution than those with lower cardinality. E.g. we may have metric foo
with the tag name datacenter
and there are 100 possible values for datacenter. Then we have metric bar
with the tag host
and 50,000 possible values for host. Metric bar
has a higher cardinality than foo
: 50,000 possible time series for bar
an only 100 for foo
.
Compaction¶
An OpenTSDB compaction takes multiple columns in an HBase row and merges them into a single column to reduce disk space. This is not to be confused with HBase compactions where multiple edits to a region are merged into one. OpenTSDB compactions can occur periodically for a TSD after data has been written, or during a query.
Data Point¶
Each of the metrics above can be recorded as a number at a specific time. For example, we could record that Sue worked 8 hours at the end of each day. Or that “mylogo.jpg” was downloaded 400 times in the past hour. Thus a datapoint consists of:
A metric
A numeric value
A timestamp when the value was recorded
One or more sets of tags
Metric¶
A metric is simply the name of a quantitative measurement. Metrics include things like:
hours worked by an employee
webserver downloads of a file
snow accumulation in a region
Note
Notice that the metric
did not include a specific number or a time. That is becaue a metric
is just a label of what you are measuring. The actual measurements are called datapoints
, as you’ll see later.
Unfortunately OpenTSDB requires metrics to be named as a single, long word without spaces. Thus metrics are usually recorded using “dotted notation”. For example, the metrics above would have names like:
hours.worked
webserver.downloads
accumulation.snow
Time Series¶
A collection of two or more data points for a single metric and group of tag name/value pairs.
Timestamp¶
Timestamps are simply the absolute time when a value for a given metric was recorded.
Value¶
A value represents the actual numeric measurement of the given metric. One of our employees, Sue, worked 8 hours yesterday, thus the value would be 8
. There were 1,024 downloads of logo.jpg
from our webserver in the past hour. And 12 inches of snow fell in New England today.