Keycloak is a versatile Opensource Identity Provider. Amongst all its features, there is the possibility of building a High-Availability Cluster. Keycloak itself has two different storages: one for persistent data, the database, mostly MariaDB or Postgres, and one for more frequently accessed data such as sessions and used action tokens (InfiniSpan).


The „fast“ volatile Database

As mentioned before, Keycloak stores all its more frequently accessed data within its InfiniSpan storage. The store can be replicated to other Keycloak instances with some configuration changes. It is also possible to persist the data to a slower Database for persistance. This is a tradeoff between more database traffic and the loss of sessions (amongst other things) when restarting all Keycloak instances. We will opt for the second option, not persisting this data within a database.
When running larger instances (in regards to the amount of users) one should also think about the amount of logins happening simultaneous after the loss of the session cache, which can be a lot of traffic that would normally be distributed over the day. This could for example be circumvented by running a separate InfiniSpan cluster outside of the main data center, but this is not part of this article.

The „slow“ persistent Database

As far as persistent data is concerned, there are several ways to operate a distributed database. Postgres and also MariaDB allow the simultaneous operation of a cluster of database servers, but there is a big difference in the manageability of both systems.
Postgres only supports one Master (Read/Write) and multiple Standby or Hot-Standby nodes, which require manual intervention if the master failes. Of course there are solutions to help circumvent this issue [1] by running a daemon to supervise the master node, but this would also mean that the complexity of the system will grow to a point where it is no longer simply maintainable for „just“ a high availability setup.
Another way of running a redundant database is using MariaDB with the Writeset Replication (wsrep) adapter Galera [2]. This provides a multi-master setup in which every node can accept write requests. As for setup complexity, every modern Linux Distribution (we mostly use Ubuntu with minimal packages) already shipps all neccesairy components to build a Galera MariaDB cluster. In this case we will be using the standard MariaDB Docker image with a few custom additions to generate the wsrep config on the fly from environment variables and to be able to bootstrap the cluster in cases where all database server are shut down. MariaDB with Galera has some special cases when it comes to restarting the complete cluster. Be sure to read the documentation to understand how the cluster will work and how to recover from failures [6].

Load Balancer and TLS Termination

Concerning the Load Balancer, we will use the Load Balancer service provided by Hetzner[3]. As there is enough documentation on how to use it, we will use a Least Connections Load Balancer with Proxy Protocol enabled. TLS Termination will be done on the nodes itself with NGINX, so we keep the certificates to our self. We will take Letsencrypt certificates, as this is the most convenient and up to date method of obtaining certificates.
The Load Balancer will check the /health/ready endpoint of each keycloak instance and will only connect to an instance if the instance reports all green. Be aware that already connected clients will not reconnect, meaning if you accessed keycloak in a working state and the healthcheck fails, you will still connect to the failed instance until the browser or nginx closes the connection.

Architecture

The overall Architecture will look like the following

As the InfiniSpan traffic is not secured by default (as is the MariaDB Cluster traffic), all Nodes are connected via a mesh network with wireguard. To be able to avoid Split-Brain situations, the Cluster should consist of an uneven number of nodes [4]. To avoid single points of failure, the VMs shall not run on the same VM-Host. Hetzner allows for virtual servers to be in placement groups that avoid running more than one server of this placement group on one physical host, which is important to reduce the risk of service outages because of hardware failures.

As mentioned before, Keycloak uses InfiniSpan as volatile storage, which also integrates mechanisms to synchronize to other InfiniSpan instances. The configuration of InfiniSpan does allow for various possibilities on how the instances communicate and how they find each other. In this case we will use JDBC TCPPING (you can read about this and other methods of communication here [5]) as we already have a shared database, this communication can take place over. This way, the nodes write information about themselves to the Database and the other nodes can then connect with this information. As Detecting Split-Brain situations is done as a combination of reachable servers and information from the MariaDB, it is important that the MariaDB Cluster works.
To take care of the start times of a resyncing MariaDB Cluster, we need to implement a little healthcheck within the docker container of MariaDB, to wait until keycloak can connect to the MariaDB Server. Otherwise Keycloak will instantly fail because it cannot connect to the database.

Monitoring

Monitoring is as important as Backups are. The cluster could run in a unsynchronized state for a long time before someone even starts to notice. This could be for example communication failure between the MariaDB nodes, as the isolated node only becomes read-only, everything might seem to work fine until someone tries to write on exactly that node. As we use prometheus based monitoring, we will monitor the WSREP Variables with the mysqld_exporter [7], especially the cluster size and state. Keycloak is able to export metrics on its own which we use to monitor it.

Conclusion

Careful consideration has to be taken designing the database infrastructure and monitoring infrastructure to quickly spot faults and be resilient to short network outages. Non the less running a highly available keycloak cluster is possible without too much overhead. If some downtime is acceptable, it may be enough to only run one instance and persist the session tokens, which reduces the complexity of the setup and avoids logouts of the users.

[1] https://github.com/sorintlab/stolon
[2] https://mariadb.com/kb/en/getting-started-with-mariadb-galera-cluster/#restarting-the-cluster
[3] https://docs.hetzner.com/cloud/load-balancers/getting-started/creating-a-load-balancer/
[4] https://galeracluster.com/library/documentation/weighted-quorum.html
[5] http://jgroups.org/manual3/index.html#_jdbc_ping
[6] https://galeracluster.com/library/documentation/index.html
[7] https://github.com/prometheus/mysqld_exporter