Sensu "solution for multi-cloud monitoring" disadvantages.

You will known in 5 minutes:
  1. Why Sensu monitoring is no suitable for production use.

RabbitMQ certificate expired.

Sensu often relays on RabbitMQ as it's transport in case you are configured RabbitMQ as a secured transport.
If your sensu suddenly stopped working - check for the following message:
<---you can scroll here--->
/var/log/rabbitmq/rabbit@`hostname -a`.log
=ERROR REPORT==== X-Mon-XXXX::06:43:42 ===
SSL: certify: ssl_handshake.erl:1387:Fatal error: certificate expired
In this case use sensu_ssl_tool to re-generate SSL certificates and put them into the following places:
<---you can scroll here--->
cat sensu_ssl_tool/sensu_ca/cacert.pem          ->      /etc/rabbitmq/ssl/cacert.pem
cat sensu_ssl_tool/server/cert.pem              ->      /etc/rabbitmq/ssl/cert.pem
cat sensu_ssl_tool/server/key.pem               ->      /etc/rabbitmq/ssl/key.pem
cat sensu_ssl_tool/client/cert.pem              ->      /etc/sensu/ssl/cert.pem
cat sensu_ssl_tool/client/key.pem               ->      /etc/sensu/ssl/key.pem

cat /etc/rabbitmq/ssl/cacert.pem >> /etc/ssl/certs/ca-certificates.crt
systemctl restart rabbitmq-server.service 
cat /usr/share/ca-certificates/extra/foo.crt >> /etc/ssl/certs/ca-certificates.crt
systemctl restart rabbitmq-server.service 
Now your certificates is up to date.

Suggestions

But tune sensu_ssl_tool and set expire time as long as you need.
Let's say - for a 100 years.
It is better not to wait the problem appears but to do it right after you installed sensu environment.

Sensu repository not accessable.

If you deploying or upgrading or checking configuration (usually with ansible / puppet / chef) you can face that your playbooks/cookbooks stops working and you can't proceed with other tasks with the error like this:
<---you can scroll here--->
[XXXX-XX-XXT17:20:10+03:00] ERROR: Server returned error 503 for http://repositories.sensuapp.org/apt/pubkey.gpg, retrying 2/5 in 7s
[XXXX-XX-XXT17:20:17+03:00] ERROR: Server returned error 503 for http://repositories.sensuapp.org/apt/pubkey.gpg, retrying 3/5 in 11s
[XXXX-XX-XXT17:20:28+03:00] ERROR: Server returned error 503 for http://repositories.sensuapp.org/apt/pubkey.gpg, retrying 4/5 in 27s
[XXXX-XX-XXT17:20:55+03:00] ERROR: Server returned error 503 for http://repositories.sensuapp.org/apt/pubkey.gpg, retrying 5/5 in 54s
[XXXX-XX-XXT17:21:49+03:00] WARN: remote_file[/var/chef/cache/pubkey.gpg] cannot be downloaded from http://repositories.sensuapp.org/apt/pubkey.gpg: 503 "Service Unavailable"
For example, if you use chef with chef-server, this will triggers all clients on all nodes will stop to execute simultaneously.
All nodes will report errors in this case.
For the monitoring team (without additional digging) it will look like the global problem.
They declare: Monitoring for mission-critical systems. ;)

Suggestions

Tune your software provisioning system ignores errors on the sensu side.

Sensu stop to write local log on network failure.

<---you can scroll here--->
{"timestamp":"XXXX-XX-XXT20:13:20.695784+0300","level":"error","message":"[amqp] Detected missing amqp heartbeats"}
{"timestamp":"XXXX-XX-XXT20:13:20.696023+0300","level":"warn","message":"reconnecting to transport"}
{"timestamp":"XXXX-XX-XXT20:13:25.698631+0300","level":"error","message":"[amqp] Detected TCP connection failure: Errno::ETIMEDOUT"}
{"timestamp":"XXXX-XX-XXT20:13:29.699197+0300","level":"error","message":"[amqp] Detected TCP connection failure: Errno::ETIMEDOUT"}
When client lost connection to RabbitMQ server it stops to write checks\metrics even to the log file.
If your RabbitMQ killed or not respond in various situation - you will lack even local statistics in log file.

Suggestions

Do not to use Sensu in the any production environment and mission-critical systems.

Inconsistent state of Sensu plugins.

Sensu plugins can be in the broken dependency state.
<---you can scroll here--->
root@web003-vps945514:~# sensu-install -vvvp raid-checks 
[SENSU-INSTALL] installing Sensu plugins ...
[SENSU-INSTALL] provided Sensu plugins: ["raid-checks"]
[SENSU-INSTALL] compiled Sensu plugin gems: ["sensu-plugins-raid-checks"]
[SENSU-INSTALL] determining if Sensu gem 'sensu-plugins-raid-checks' is already installed ...
[SENSU-INSTALL] gem list -i sensu-plugins-raid-checks
false
[SENSU-INSTALL] Sensu gem 'sensu-plugins-raid-checks' has not been installed
[SENSU-INSTALL] Sensu plugin gems to be installed: ["sensu-plugins-raid-checks"]
[SENSU-INSTALL] installing Sensu gem 'sensu-plugins-raid-checks'
[SENSU-INSTALL] gem install sensu-plugins-raid-checks --no-document --verbose
HEAD https://api.rubygems.org/api/v1/dependencies
200 OK
GET https://api.rubygems.org/api/v1/dependencies?gems=sensu-plugins-raid-checks
200 OK
Getting SRV record failed: DNS result has no information for _rubygems._tcp.api.rubygems.org
GET https://api.rubygems.org/api/v1/dependencies?gems=english,sensu-plugin
200 OK
ERROR:  Could not find a valid gem 'english' (= 0.6.3) in any repository
GET https://api.rubygems.org/latest_specs.4.8.gz
304 Not Modified
ERROR:  Possible alternatives: english
[SENSU-INSTALL] failed to install Sensu gem 'sensu-plugins-raid-checks'
[SENSU-INSTALL] please take note of any failure messages above
[SENSU-INSTALL] make sure you have build tools installed (e.g. gcc)
[SENSU-INSTALL] trying to determine the Sensu plugin homepage for sensu-plugins-raid-checks ...
homepage: https://github.com/sensu-plugins/sensu-plugins-raid-checks
root@web003-vps945514:~# echo $?
2
root@web003-vps945514:~# 
You can see the same problem with pure ruby too:
<---you can scroll here--->
root@web003-vps945514:~# gem install sensu-plugins-raid-checks
ERROR:  Could not find a valid gem 'english' (= 0.6.3) in any repository
ERROR:  Possible alternatives: english
root@web003-vps945514:~# 
You cannon rely on sensu in context of installing plugins with official sensu-install from the official repository.
This will cause error when you are deploying sensu on server.
Also you can not rely that some plugin will be here and will work at any time.

Suggestions

Use your own plugins code.

Out of disk makes sensu unusable.

If your server out of free disk space:
  1. You will fail to start or restart sensu-client.
  2. sensu-client will not work and will not send any data to server.
Ironically, sensu will fail to send notice that disk is full because of disk is full.

Suggestions

Using separate /var may be a solution (not tested).

Installed plugins can suddenly become non-working.

Please not that this plugin itself does not upgrades.
<---you can scroll here--->
root@big32:~#  /etc/sensu/plugins/checks/check-disk.rb
Traceback (most recent call last):
        2: from /etc/sensu/plugins/checks/check-disk.rb:29:in `
' 1: from /usr/local/rvm/rubies/ruby-2.7.0/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in `require' /usr/local/rvm/rubies/ruby-2.7.0/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in `require': cannot load such file -- sensu-plugin/check/cli (LoadError) root@big32:~# echo $? 1 root@big32:~#

Suggestions

sensu-install -vvvp check-disk is not enough to fix the problem.
Try to reinstall whole sensu client.

Side code on your servers.

  1. Sensu have many plugins (ruby gems) that installs many other gems as dependencies
  2. All this code comes from the Internet.
  3. Those gems developed by many different individuals and groups.
  4. Those gems can contain at least intentional malicious code.
  5. Significant redundant dependency code may slow down your servers significantly.

Suggestions

  1. It is better not to update plugins in the automatic way.
  2. Also It is better to write your plugins code by you own.
  3. It is great to have your code "native" - without any dependencies from other ruby gems.