Query Filters¶
A critical function of any database system is to enable fetching subsets of the full data set using some form of filtering. OpenTSDB has provided filtering since version 1.x with expanded capabilities starting with 2.2 and beyond. Filters currently operate on tag values at this time. That means that any metrics and tag keys must be specified exactly as they appear in the database when fetching data.
Example Data
As each filter is explained below, the following data set is used. It consists of a single metric with multiple time series defined on various tags. Only one data point is given at T1 as an example.
TS# |
Metric |
Tags |
Value @ T1 |
---|---|---|---|
1 |
sys.cpu.system |
dc=dal host=web01 |
3 |
2 |
sys.cpu.system |
dc=dal host=web02 |
2 |
3 |
sys.cpu.system |
dc=dal host=web03 |
10 |
4 |
sys.cpu.system |
host=web01 |
1 |
5 |
sys.cpu.system |
host=web01 owner=jdoe |
4 |
6 |
sys.cpu.system |
dc=lax host=web01 |
8 |
7 |
sys.cpu.system |
dc=lax host=web02 |
4 |
Grouping¶
Grouping, also referred to as group-by, is the process of combining multiple time series into one using the required aggregation function and filters. By default, OpenTSDB groups everything by metric so that if the query returned 10 time series with an aggregator of sum
, all 10 series would be added together over time to arrive at one value. See Aggregation for details on how time series are merged.
To avoid grouping and fetch each underlying time series without any aggregation, use the none
aggregator included in version 2.2. Alternatively, you can disable grouping on a per-filter bases with OpenTSDB 2.2 and later. See API documentation on how to do so.
OpenTSDB 1.x - 2.1¶
In the original OpenTSDB release and up to 2.1, only two types of filters were available and they were implicitly configured for grouping. The two operators allowed were:
* - The asterisk (or wildcard) will return a separate result for each unique tag value detected. E.g. if the tag key
host
was paired withweb01
andweb02
then there would be two groups emitted, one onweb01
and one onweb02
.| - The pipe (or literal_or) will return a separate result only for the exact tag values specified. I.e. it will match only time series with the given tag value and group on each of those matches.
Multiple filters can be provided per query and the results are always ANDed together. These filters are still available for use in 2.x and later.
Examples¶
The following examples use the v1 HTTP URI syntax wherein the m
parameter consists of the aggregator, a colon, then the metric and tag filters in brackets separated by equal signs.
Example 1:
http://host:4242/q?start=1h-ago&m=sum:sys.cpu.system{host=web01}
Time Series Included |
Tags |
Aggregated Tags |
Value @ T1 |
---|---|---|---|
1, 4, 5, 6 |
host=web01 |
16 |
In this case the aggregated tags set will be empty as time series 4 and 5 have tags that are not in common with the entire set.
Example 2:
http://host:4242/q?start=1h-ago&m=sum:sys.cpu.system{host=web01,dc=dal}
Time Series Included |
Tags |
Aggregated Tags |
Value @ T1 |
---|---|---|---|
1 |
host=web01,dc=dal |
3 |
Example 3:
http://host:4242/q?start=1h-ago&m=sum:sys.cpu.system{host=*,dc=dal}
Time Series Included |
Tags |
Aggregated Tags |
Value @ T1 |
---|---|---|---|
1 |
host=web01,dc=dal |
3 |
|
2 |
host=web02,dc=dal |
2 |
|
3 |
host=web03,dc=dal |
10 |
This time we provided the *
for the host and an explicit match for dc
. This will group on the host
tag key and return a time series per unique host tag value, in this case 3 series.
Example 4:
http://host:4242/q?start=1h-ago&m=sum:sys.cpu.system{dc=dal|lax}
Time Series Included |
Tags |
Aggregated Tags |
Value @ T1 |
---|---|---|---|
1, 2, 3 |
dc=dal |
host |
15 |
6, 7 |
dc=lax |
host |
12 |
Here the |
operator is used to match only the values for the dc
tag key that are provided in the query. Therefore the TSD will group together any time series with those values. The host
tag is moved to the Aggregated Tags list as every time series in the set has a host
tag and there are multiple values for the tag key.
Warning
Because these filters are limited, if users write time series like #1, #4 and #5, unexpected results can be returned as a result of aggregating time series that may have one common tag but varying additional tags. This problem is somewhat addressed with 2.3 and Explicit Tags.
OpenTSDB 2.2¶
In OpenTSDB 2.2 a more flexible filter framework was added that allows for disabling grouping as well as additional filter types such as regular expressions and wild cards. The filter framework is plugable to allow for tying into external systems such as asset management or provisioning systems.
Multiple filters on the same tag key are allowed and when processed, they are ANDed together e.g. if we have two filters host=literal_or(web01)
and host=literal_or(web02)
the query will always return empty. If two or more filters are included for the same tag key and one has group by enabled but another does not, then group by will effectively be true for all filters on that tag key.
Warning
Some type of filters may cause queries to execute slower than others, particularly the regexp
, wildcard
and case-insensitive filters. Before fetching data from storage, the filters are processed to create a database filter based on UIDs so using the case sensitive literal_or
filter is always faster than regexp
because we can resolve the strings to UIDs and send those to the storage system for filtering. Instead if you ask for regex or wildcards with pre, post or infix filtering the TSD must retrieve all of the rows from storage with the tag key UID, then for each unique row, resolve the UIDs back to strings and then run the filter over the results. Also, filter sets with a large list of literals will be processed post storage to avoid creating a massive filter for the backing store to process. This limit defaults to 4096
and can be configured via the tsd.query.filter.expansion_limit
parameter.
Built-in 2.x Filters¶
The following list are built-in filters included with OpenTSDB. Additional filters can be loaded as plugins. Each heading is the type
of filter to use in a URI or JSON query. When writing a URI query, the filter is used by placing the filter name after the tag key’s equals sign and placing the filter value in parentheses. E.g. {host=regexp(web[0-9]+.lax.mysite.com)}
. For JSON queries simply use the filter name as the type
parameter and the filter value as the filter
parameter, e.g.
{
"type": "regexp",
"filter": "web[0-9]+.lax.mysite.com",
"tagk": "host",
"groupBy": false
}
The examples below use the URI syntax.
literal_or¶
Takes a single literal value or a |
pipe delimited list of values and returns any time series matching the results on a case sensitive bases. This is a very efficient filter as it can resolve the strings to UIDs and send that to the storage layer for pre-filtering. In SQL this is similar to the IN
or =
predicates.
Examples
host=literal_or(web01|web02|web03)
In SQL:WHERE host IN ('web01', 'web02', 'web03')
host=literal_or(web01)
In SQL:WHERE host = 'web01'
iliteral_or¶
The same as a literal_or
but is case insensitive. Note that this is not efficient like the literal or as it must post-process all rows from storage.
not_literal_or¶
Case sensitive literal_or
that will return series that do NOT match the given list of values. Efficient as it can be pre-processed by storage.
not_iliteral_or¶
Case insensitive not_literal_or
.
wildcard¶
Provides case sensitive postfix, prefix, infix and multi-infix filtering. The wildcard character is an asterisk (star) *
. Multiple wildcards can be used. If only the asterisk is given, the filter effectively returns any time series that include the tag key (and is an efficient filter that can be pre-processed). In SQL land, this is similar to LIKE
predicate with a bit more flexibility.
Examples
host=wildcard(*mysite.com)
In SQL:WHERE host='%mysite.com'
host=wildcard(web*)
host=wildcard(web*mysite.com)
host=wildcard(web*mysite*)
host=wildcard(*)
This is equivalent to the v1 basic group by operator and is efficient.
iwildcard¶
The same as wildcard
but case insensitive.
regexp¶
Filters using POSIX compliant regular expressions post fetching from storage. The filter uses Java’s built-in regular expression operation. Be careful to escape special characters depending on the query method used.
Examples
regexp(web.*)
In SQL:WHERE host REGEXP 'web.*'
regexp(web[0-9].mysite.com)
Loaded Filters¶
To show the loaded filters in OpenTSDB 2.2 and later, call the HTTP /api/config/filters
endpoint. This will list loaded plugins along with a description and example usage.
Plugins¶
As developers add plugins we will list them here.
To develop a plugin, simply extend the net.opentsdb.query.filter.TagVFilter
class, create JAR per the Plugins documentation and drop it in your plugins directory. On start, the TSD will search for the plugin and load it. If there was an error with the implementation the TSD will not start up and will log the exception.