TCollector
tcollector is a client-side process that gathers data from local collectors and pushes the data to OpenTSDB. You run it on all your hosts, and it does the work of sending each host’s data to the TSD.
OpenTSDB is designed to make it easy to collect and write data to it. It has a simple protocol, simple enough for even a shell script to start sending data. However, to do so reliably and consistently is a bit harder. What do you do when your TSD server is down? How do you make sure your collectors stay running? This is where tcollector comes in.
Tcollector does several things for you:
Runs all of your data collectors and gathers their data
Does all of the connection management work of sending data to the TSD
You don’t have to embed all of this code in every collector you write
Does de-duplication of repeated values
Handles all of the wire protocol work for you, as well as future enhancements
Collectors in tcollector can be written in any language. They just need to be executable and output the data to stdout. Tcollector will handle the rest. However most collectors written in python for easy portability.
Deduplication
Typically you want to gather data about everything in your system. This generates a lot of datapoints, the majority of which don’t change very often over time (if ever). However, you want fine-grained resolution when they do change. Tcollector remembers the last value and timestamp that was sent for all of the time series for all of the collectors it manages. If the value doesn’t change between sample intervals, it suppresses sending that datapoint. Once the value does change (or 10 minutes have passed), it sends the last suppressed value and timestamp, plus the current value and timestamp. In this way all of your graphs and such are correct. Deduplication typically reduces the number of datapoints TSD needs to collect by a large fraction. This reduces network load and storage in the backend. A future OpenTSDB release however will improve on the storage format by using RLE (among other things), making it essentially free to store repeated values.
Runtime Requirements for tcollector
OS specific collectors located in collectors/available/Linux|FreeBSD|MacOS
should run using base tools installed with respected OS. Other software collectors located
in collectors/available/software
may have additional requirements.
Installation of tcollector
Clone or download tcollector archive from GitHub on https://github.com/OpenTSDB/tcollector .
Consider using of shell script
tcollector/tcollector
to run tcollector if you want to auto start after OS restart. Source codes to build RPM or DEB packages are also available in tcollector folder.
Collecting metrics with tcollector
Available collectors are placed in the collectors/available
directory that divided
into OS specific (linux, freebsd, macos) and software specific (software) subdirectories.
To make Tcollector use a collector create a symbolic link for the collector in
collectors/available
sub-directory to collectors/enabled
sub-directory.
Tcollector iterates over every directory in collectors/enabled
named with a number in
that directory and runs all the collectors in each directory. If you name the directory
60
, then tcollector will try to run every collector in that directory every 60
seconds. The shortest supported interval is 15 seconds, you should use a
long-running collector in the 0 folder for intervals shorter than 15 seconds.
TCollector will sleep for 15 seconds after each time it runs the collectors
so intervals of 15 seconds are the only actually supported intervals. For example,
this would allow you to run a collector every 15, 30, 45, 60, 75, or 90 seconds,
but not 80 or 55 seconds. Use the directory 0
for any collectors that are long-lived
and run continuously. Tcollector will read their output and respawn them if they
die. If there any non-numeric named directories in the collectors
directory,
then they are ignored.
Generally you want to write long-lived collectors since that has less overhead. OpenTSDB is designed to have lots of datapoints for each metric (for most metrics we send datapoints every 15 seconds by default).
We’ve included a lib
, etc
, and test
directories for library, config data,
and unittest for all collectors.
The structure of directories:
collectors/
┣ available/
┃ ┣ linux/
┃ ┃ ┣ long-lived/
┃ ┃ ┗ one-shot/
┃ ┣ freebsd/
┃ ┃ ┣ long-lived/
┃ ┃ ┗ one-shot/
┃ ┣ software/
┃ ┃ ┣ long-lived/
┃ ┃ ┗ one-shot/
┣ enabled/
┃ ┣ 0/
┃ ┣ 60/
┃ ┣ 3600/
┃ ┗ (...)
┣ etc/
┣ lib/
┗ test/
Note
To avoid having to run mkmetric
for every metric that
tcollector tracks you can to start TSD with the --auto-metric
flag. This is useful to get started quickly, but it’s not recommended to
keep this flag in the long term, to avoid accidental metric creation.
Collectors bundled with tcollector
The following are the collectors we’ve included as part of the package, together with all of the metric names they report on and what they mean. If you have any others you’d like to contribute, we’d love to hear about them so we can reference them or include them with your permission in a future release.
Linux OS collectors
0/dfstat.py
These stats are similar to ones provided by /usr/bin/df
util.
It shows disks space usage of mounted file systems.
df.bytes.total
Total size of data
df.bytes.used
Bytes used
df.bytes.free
Bytes free
df.bytes.percentused
Percent used
df.inodes.total
Total number of inodes
df.inodes.used
Number of inodes used
df.inodes.free
Number of inodes free
df.inodes.percentused
Percent used
These metrics include time series tagged with each mount point and the
filesystem’s fstype. This collector filters out any cgroup, debugfs, devtmpfs,
nfs, rpc_pipefs, rootfs filesystems, as well as any any mountpoints mounted under
/dev/
, /sys/
, /proc/
, and /lib/
.
With these tags you can select to graph just a specific filesystem, or all filesystems with a particular fstype (e.g. ext3).
0/ifstat.py
These stats are from /proc/net/dev
that provides network interface statistics
proc.net.bytes
Bytes in/out
proc.net.packets
Packets in/out
proc.net.errs
Packet errors in/out
proc.net.dropped
Dropped packets in/out
proc.net.fifo.errs
FIFO errors in/out
proc.net.frame.errs
Frame errors in/out
proc.net.compressed
Compressed packets in/out
proc.net.multicast
Multicast packets in/out
proc.net.collisions
Collisions transmitting packets in/out
proc.net.carrier.errs
Carrier errors in/out
These are interface counters, tagged with the interface, iface=
, and
direction=
in or out. Only ethN
interfaces are tracked. We
intentionally exclude bondN
interfaces, because bonded interfaces
still keep counters on their child ethN
interfaces and we don’t want
to double-count a box’s network traffic if you don’t select on iface=
.
0/iostat.py
Data is from /proc/diskstats
.
iostat.disk
Per-disk stats
iostat.part.*
Per-partition stats (kernel specific)
See iostats.txt kernel specific metrics, depending on if you have a 2.6 kernel before 2.6.25 or after.
/proc/diskstats
has stats for a given physical device.
These are all rate counters, except ios_in_progress
.
kernel 2.6+
iostat.read_requests
Number of reads completediostat.read_merged
Number of reads merged
iostat.read_sectors
Number of sectors read
iostat.msec_read
Time in msec spent reading
iostat.write_requests
Number of writes completed
iostat.write_merged
Number of writes merged
iostat.write_sectors
Number of sectors written
iostat.msec_write
Time in msec spent writing
iostat.ios_in_progress
Number of I/O operations in progress
iostat.msec_total
Time in msec doing I/O
iostat.msec_weighted_total
Weighted time doing I/O (multiplied by ios_in_progress)
kernel 4.18+
iostat.discards_requests
Number of discards completed
iostat.discards_merged
Number of discards merged
iostat.sectors_discarded
Number of sectors discarded
iostat.msecs_discard
Time is msec doing discarding
kernel 5.5+
iostat.flush_requests
Number of flush requests completed
iostat.msecs_flushing
Time is msec doing discarding
in 2.6.25 and later, by-partition stats are reported the same as disks.
For partitions, these *_issued
are counters collected before requests are
merged, so aren’t the same as *_requests
(which is post-merge, which
more closely represents represents the actual number of disk transactions).
Given that diskstats provides both per-disk and per-partition data, for
TSDB purposes we put them under different metrics (versus the same
metric and different tags). Otherwise, if you look at a given metric, the data
for a given box will be double-counted, since a given operation will increment
both the disk series and the partition series. To fix this, we output by-disk
data to iostat.disk.*
and by-partition data to iostat.part.*
.
0/mountstats.py
NFS mountstats data, deduped by mount point and put into the following namespaces:
proc.mountstats.<rpccall>.<metric> nfshost=<nfsserver> nfsvol=<nfsvolume>
proc.mountstats.getattr.totaltime
proc.mountstats.getattr.ops
proc.mountstats.getattr.timeouts
proc.mountstats.getattr.qtime
proc.mountstats.getattr.txbytes
proc.mountstats.getattr.rttime
proc.mountstats.getattr.rxbytes
proc.mountstats.getattr.txs
proc.mountstats.access.totaltime
proc.mountstats.access.ops
proc.mountstats.access.timeouts
proc.mountstats.access.qtime
proc.mountstats.access.txbytes
proc.mountstats.access.rttime
proc.mountstats.access.rxbytes
proc.mountstats.access.txs
proc.mountstats.read.totaltime
proc.mountstats.read.ops
proc.mountstats.read.timeouts
proc.mountstats.read.qtime
proc.mountstats.read.txbytes
proc.mountstats.read.rttime
proc.mountstats.read.rxbytes
proc.mountstats.read.txs
proc.mountstats.write.totaltime
proc.mountstats.write.ops
proc.mountstats.write.timeouts
proc.mountstats.write.qtime
proc.mountstats.write.txbytes
proc.mountstats.write.rttime
proc.mountstats.write.rxbytes
proc.mountstats.write.txs
proc.mountstats.other.totaltime
proc.mountstats.other.ops
proc.mountstats.other.timeouts
proc.mountstats.other.qtime
proc.mountstats.other.txbytes
proc.mountstats.other.rttime
proc.mountstats.other.rxbytes
proc.mountstats.other.txs
proc.mountstats.bytes.normalread
proc.mountstats.bytes.normalwrite
proc.mountstats.bytes.directread
proc.mountstats.bytes.directwrite
proc.mountstats.bytes.serverread
proc.mountstats.bytes.serverwrite
proc.mountstats.bytes.readpages
proc.mountstats.bytes.writepages
0/netfilter.py
Metrics from /proc/sys/net/ipv4/netfilter/*
proc.sys.net.netfilter.nf_conntrack_buckets
proc.sys.net.netfilter.nf_conntrack_checksum
proc.sys.net.netfilter.nf_conntrack_count
proc.sys.net.netfilter.nf_conntrack_generic_timeout
proc.sys.net.netfilter.nf_conntrack_icmp_timeout
proc.sys.net.netfilter.nf_conntrack_log_invalid
proc.sys.net.netfilter.nf_conntrack_max
proc.sys.net.netfilter.nf_conntrack_tcp_be_liberal
proc.sys.net.netfilter.nf_conntrack_tcp_loose
proc.sys.net.netfilter.nf_conntrack_tcp_max_retrans
proc.sys.net.netfilter.nf_conntrack_tcp_timeout_close
proc.sys.net.netfilter.nf_conntrack_tcp_timeout_close_wait
proc.sys.net.netfilter.nf_conntrack_tcp_timeout_established
proc.sys.net.netfilter.nf_conntrack_tcp_timeout_fin_wait
proc.sys.net.netfilter.nf_conntrack_tcp_timeout_last_ack
proc.sys.net.netfilter.nf_conntrack_tcp_timeout_max_retrans
proc.sys.net.netfilter.nf_conntrack_tcp_timeout_syn_recv
proc.sys.net.netfilter.nf_conntrack_tcp_timeout_syn_sent
proc.sys.net.netfilter.nf_conntrack_tcp_timeout_time_wait
proc.sys.net.netfilter.nf_conntrack_udp_timeout
proc.sys.net.netfilter.nf_conntrack_udp_timeout_stream
0/netstat.py
Metrics from /proc/net/sockstat
(socket allocation).
net.sockstat.num_sockets
Number of sockets allocated (only TCP)
net.sockstat.num_timewait
Number of TCP sockets currently in TIME_WAIT state
net.sockstat.sockets_inuse
Number of sockets in use (TCP/UDP/raw)
net.sockstat.num_orphans
Number of orphan TCP sockets (not attached to any file descriptor)
net.sockstat.memory
Memory allocated for this socket type (in bytes)
net.sockstat.ipfragqueues
Number of IP flows for which there are currently fragments queued for reassembly
Metrics from /proc/net/netstat
(netstat -s
command).
net.stat.tcp.abort
Number of connections that the kernel had to abort. “type=memory” is especially bad, the kernel had to drop a connection due to having too many orphaned sockets. Other types are normal (e.g. timeout)
net.stat.tcp.abort.failed
Number of times the kernel failed to abort a connection because it didn’t even have enough memory to reset it (bad)
net.stat.tcp.congestion.recovery
Number of times the kernel detected spurious retransmits and was able to recover part or all of the CWND
net.stat.tcp.delayedack
Number of delayed ACKs sent of different types.
net.stat.tcp.failed_accept
Number of times a connection had to be dropped after the 3WHS. “reason=full_acceptq” indicates that the application isn’t accepting connections fast enough. You should see SYN cookies too
net.stat.tcp.invalid_sack
Number of invalid SACKs we saw of diff types. (requires Linux v2.6.24-rc1 or newer)
net.stat.tcp.memory.pressure
Number of times a socket entered the “memory pressure” mode (not great).
net.stat.tcp.memory.prune
Number of times a socket had to discard received data due to low memory conditions (bad)
net.stat.tcp.packetloss.recovery
Number of times we recovered from packet loss by type of recovery (e.g. fast retransmit vs SACK)
net.stat.tcp.receive.queue.full
Number of times a received packet had to be dropped because the socket’s receive queue was full. (requires Linux v2.6.34-rc2 or newer)
net.stat.tcp.reording
Number of times we detected re-ordering and how
net.stat.tcp.syncookies
SYN cookies (both sent & received)
0/nfsstat.py
These stats are from /proc/net/rpc/nfs
.
nfs.client.rpc.stats
RPC stats counter
It tagged with the type (<code>type=</code>) of operation. There are 3
operations: authrefrsh
- number of times the authentication information
refreshed, calls
- number of calls conducted, and retrans
- number
of retransmissions
nfs.client.rpc
RPC calls counter
It tagged with the version (version=
) of NFS server that conducted
the operation, and name of operation (op=
).
Description of operations can be found at appropriate RFC: NFS ver. 3 RFC1813, NFS ver. 4 RFC3530, NFS ver. 4.1 RFC5661.
0/ntpstats.py
It runs ntpq to get NTP offset
ntp.offset
Estimated offset
0/procnettcp.py
These stats are all from /proc/net/tcp{,6}
. (Note if IPv6 is enabled,
some IPv4 connections seem to get put into /proc/net/tcp6
). Collector
sleeps 60 seconds in between intervals. Due in part to a kernel performance
issue in older kernels and in part due to systems with many TCP connections,
this collector can take sometimes 5 minutes or more to run one interval, so
the frequency of datapoints can be highly variable depending on the
system.
proc.net.tcp
Number of TCP connections
For each run of the collector, we classify each connection and generate
subtotals. TSD will automatically total these up when displaying the graph,
but you can drill down for each possible total or a particular one. Each
connection is broken down with a tag for user=username
(with a fixed list
of users we care about or put under “other” if not in the list). It is also
broken down into state with state=
, (established, time_wait, etc). It is
also broken down into services with <code>service=</code> (http, mysql,
memcache, etc) Note that once a connection is closed, Linux seems to forget
who opened/handled the connection. For connections in time_wait, for example,
they will always show user=root. This collector does generate a large amount
of datapoints, as the number of points is (S*(U+1)*V), where S=number of TCP
states, U=Number of users you track, and V=number of services (collections of
ports). The deduper does dedup this down very well, as only 3 of the 10 TCP
states are generally ever seen. On a typical server this can dedup down to
under 10 values per interval.
0/procstats.py
Miscellaneous stats from /proc
.
proc.stat.cpu
(rate) CPU counters (jiffies), tagged by cpu type (type=user, nice, system, idle, iowait, irq, softirq, etc). As a rate they should aggregate up to approximately 100*numcpu per host. Best viewed as type=* or maybe type={user|nice|system|iowait|irq}
proc.stat.intr
(rate) Number of interrupts
proc.stat.ctxt
(rate) Number of context switches
See http://www.linuxhowtos.org/System/procstat.htm
proc.vmstat.*
A subset of VM Stats from
/proc/vmstat
(mix of rate and non-rate). See http://www.linuxinsight.com/proc_vmstat.html .
proc.meminfo.*
Memory usage stats from
/proc/meminfo
. See the Linux kernel documentation
proc.loadavg.*
1min, 5min, 15min, runnable, total_threads metrics from
/proc/loadavg
proc.uptime.total
(rate) Seconds since boot
proc.uptime.now
(rate) Seconds since boot that the system has been idle
proc.kernel.entropy_avail
Amount of entropy (in bits) available in the input pool (the one that’s cryptographically strong and backing
/dev/random
among other things). Watch this value on your frontend servers that do SSL unwrapping, if it gets too low, your SSL performance will suffer
sys.numa.zoneallocs
Number of pages allocated from the preferred node (
type=hit
) or not (type=miss
)
sys.numa.foreign_allocs
Number of pages this node allocated because the preferred node didn’t have a free page to accommodate the request
sys.numa.allocation
Number of pages allocated locally (
type=local
) or remotely (type=remote
) for processes executing on this node
sys.numa.interleave
Number of pages allocated successfully by the interleave strategy
0/smart-stats.py
Stats from SMART disks.
smart.raw_read_error_rate
Data related to the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number. (vendor specific)
smart.throughput_performance
Overall throughput performance of a hard disk drive
smart.spin_up_time
Average time of spindle spin up (from zero RPM to fully operational [millisecs])
smart.start_stop_count
A tally of spindle start/stop cycles
smart.reallocated_sector_ct
Count of reallocated sectors
smart.seek_error_rate
Rate of seek errors of the magnetic heads. (vendor specific)
smart.seek_time_performance
Average performance of seek operations of the magnetic heads
smart.power_on_hours
Count of hours in power-on state, shows total count of hours (or minutes, or seconds) in power-on state. (vendor specific)
smart.spin_retry_count
Count of retry of spin start attempts
smart.recalibration_retries
The count that recalibration was requested (under the condition that the first attempt was unsuccessful)
smart.power_cycle_count
The count of full hard disk power on/off cycles
smart.soft_read_error_rate
Uncorrected read errors reported to the operating system
smart.program_fail_count_chip
Total number of Flash program operation failures since the drive was deployed
smart.erase_fail_count_chip
“Pre-Fail” Attribute
smart.wear_leveling_count
The maximum number of erase operations performed on a single flash memory block
smart.used_rsvd_blk_cnt_chip
The number of a chip’s used reserved blocks
smart.used_rsvd_blk_cnt_tot
“Pre-Fail” Attribute (at least HP devices)
smart.unused_rsvd_blk_cnt_tot
“Pre-Fail” Attribute (at least Samsung devices)
smart.program_fail_cnt_total
Total number of Flash program operation failures since the drive was deployed
smart.erase_fail_count_total
“Pre-Fail” Attribute
smart.runtime_bad_block
The total count of all read/program/erase failures
smart.end_to_end_error
The count of parity errors which occur in the data path to the media via the drive’s cache RAM (at least Hewlett-Packard)
smart.reported_uncorrect
The count of errors that could not be recovered using hardware ECC
smart.command_timeout
The count of aborted operations due to HDD timeout
smart.high_fly_writes
HDD producers implement a Fly Height Monitor that attempts to provide additional protections for write operations by detecting when a recording head is flying outside its normal operating range. If an unsafe fly height condition is encountered, the write process is stopped, and the information is rewritten or reallocated to a safe region of the hard drive. This attribute indicates the count of these errors detected over the lifetime of the drive
smart.airflow_temperature_celsius
Airflow temperature
smart.g_sense_error_rate
The count of errors resulting from externally induced shock & vibration
smart.power-off_retract_count
The count of times the heads are loaded off the media
smart.load_cycle_count
Count of load/unload cycles into head landing zone position
smart.temperature_celsius
Current internal temperature
smart.hardware_ecc_recovered
The count of errors that were recovered using hardware ECC
smart.reallocated_event_count
Count of remap operations. The raw value of this attribute shows the total count of attempts to transfer data from reallocated sectors to a spare area
smart.current_pending_sector
Count of “unstable” sectors (waiting to be remapped, because of unrecoverable read errors)
smart.offline_uncorrectable
The total count of uncorrectable errors when reading/writing a sector
smart.udma_crc_error_count
The count of errors in data transfer via the interface cable as determined by ICRC (Interface Cyclic Redundancy Check)
smart.write_error_rate
The total count of errors when writing a sector
smart.media_wearout_indicator
The normalized value of 100 (when the SSD is new) and declines to a minimum value of 1
smart.transfer_error_rate
Count of times the link is reset during a data transfer
smart.total_lba_writes
Total count of LBAs written
smart.total_lba_read
Total count of LBAs read
Description of metrics can be found at: S.M.A.R.T. article on wikipedia. The best way to understand/find metric is to look at producer’s specification.
0/sysload.py
CPU detailed statistics gathered from mpstat
.
cpu.usr
cpu.nice
cpu.sys
cpu.irq
cpu.idle
0/tcp_bridge.py
Statistics for metrics gathered by listening on a local TCP socket for incoming Metrics.
tcollector.tcp_bridge.lines_read
tcollector.tcp_bridge.connections_processed
tcollector.tcp_bridge.processing_time
tcollector.tcp_bridge.active
FreeBSD OS collectors
0/gstat.py
Disks detailed statistics gathered from gstat
.
disk.queue
disk.ops.read
disk.b.read
disk.bps.read
disk.ms.read
disk.ops.write
disk.b.write
disk.bps.write
disk.ms.write
disk.ops.delete
disk.b.delete
disk.bps.delete
disk.ms.delete
disk.ops.other
disk.ms.other
disk.busy
0/ifrate.py
Network interfaces detailed statistics gathered from netstat
.
ifrate.byt.in
ifrate.byt.out
ifrate.pkt.in
ifrate.pkt.out
ifrate.err
ifrate.drp
ifrate.err.in
ifrate.drp.in
ifrate.err.out
ifrate.drp.out
ifrate.col
0/sysload.py
CPU detailed statistics gathered from top
.
cpu.usr
cpu.nice
cpu.sys
cpu.irq
cpu.idle
load.1m
load.5m
load.15m
ps.all
ps.start
ps.run
ps.sleep
ps.stop
ps.zomb
ps.wait
ps.lock
mem.active
mem.inact
mem.wired
mem.cache
mem.buf
mem.free
arc.total
arc.mru
arc.mfu
arc.anon
arc.header
arc.other
swap.total
swap.used
swap.free
swap.inuse
swap.inps
swap.outps
Software collectors
0/couchbase.py
Stats from couchbase (document-oriented NoSQL database).
All metrics are tagged with name of related bucket(bucket=
). A bucket is
a logical grouping of physical resources within a cluster of Couchbase
Servers. They can be used by multiple client applications across a cluster.
Buckets provide a secure mechanism for organizing, managing, and analyzing
data storage resources.
Refer to the following documentation for metrics description: Cbstats documentation.
0/docker.py
see source code.
0/docker_engine.py
see source code.
0/elasticsearch.py
Stats from Elastic Search (search and analytics engine).
Refer to the following documentation for metrics description: ElasticSearch cluster APIs.
0/flume.py
see source code.
0/g1gc.py
see source code.
0/graphite_bridge.py
see source code.
0/hadoop_datanode.py
Stats from Hadoop (framework for the distributed processing), DataNode stats.
Following metrics are disabled at the collector by default: revision, hdfsUser, hdfsDate, hdfsUrl, date, hdfsRevision, user, hdfsVersion, url, version, NamenodeAddress, Version, RpcPort, HttpPort, CurrentThreadCpuTime, CurrentThreadUserTime, StorageInfo, VolumeInfo.
Refer to the following documentation for metrics description: HBase metrics.
0/hadoop_journalnode.py
see source code.
0/hadoop_namenode.py
see source code.
0/hadoop_yarn_node_manager.py
see source code.
0/hadoop_yarn_resource_manager.py
see source code.
0/haproxy.py
Stats from Haproxy (TCP/HTTP load balancer).
haproxy.current_sessions
Current number of sessions
haproxy.session_rate
Number of new sessions per second
All metrics are tagged with server (server=
) and cluster (cluster=
).
Refer to the following documentation for metrics description: Haproxy configuration # section 9.2.Unix Socket commands
0/hbase_master.py
see source code.
0/hbase_regionserver.py
Stats from Hadoop (framework for the distributed processing), RegionServer stats.
Following metrics are disabled at the collector by default: revision, hdfsUser, hdfsDate, hdfsUrl, date, hdfsRevision, user, hdfsVersion, url, version, Version, RpcPort, HttpPort,HeapMemoryUsage, NonHeapMemoryUsage.
Refer to the following documentation for metrics description: HBase metrics.
0/jolokia.py
see source code.
0/mapr_metrics.py
see source code.
0/mongo.py
Stats from Mongo (document NoSQL database).
Refer to the following documentation for metrics description: Mongo DB server-status.
0/mongo3.py
see source code.
0/mysql.py
Stats from MySQL (relational database).
Refer to the following documentation for metrics description: InnoDB Innodb monitors, Global Show status, Engine Show engine, Slave Show slave status, Process list Show process list.
0/opentsdb.sh
see source code.
0/postgresql.py
Stats from PostgreSQL (relational database).
Refer to the following documentation for metrics description: PostgreSQL monitoring stats.
0/postgresql_replication.py
see source code.
0/prometheus.py
see source code.
0/pxc-collector.py
see source code.
0/redis-stats.py
Stats from Redis (key-value store).
Refer to the following documentation for metrics description: Redis info comands.
0/riak.py
Stats from Riak (document NoSQL database).
Refer to the following documentation for metrics description: Riak statistics.
0/tcollector.py
Tcollector collector.
tcollector.processes
Number of running tcollector processestcollector.cputime
CPU time spent on tcollector
tcollector.mem_bytes
Memory consumed in bytes, has 2 types: VSIZE and RSS.
0/varnishstat.py
Stats from Varnish (HTTP accelerator).
By default all metrics collected, it can be changed by editing “vstats” array of the collector.
Refer to the following documentation for metrics description: run “varnishstat -l” to have lists the available metrics.
0/zabbis_bridge.py
see source code.
0/zfsiostats.py
see source code.
0/zfskerelstats.py
see source code.
0/zookeeper.py
Stats from Zookeeper (centralized service for distributed synchronization).
Refer to the following documentation for metrics description: Zookeeper admin commands.
60/aws_cloudwatch_stats.py
see source code.
60/zabbix_bridge_cache.py
see source code.