A few weeks ago the cobot website was down for a few hours. At the time we were running on a rented server and it was 10:30pm, which means our provider’s support staff had gone home half and hour earlier. We only learned the next morning that the server’s mainboard had failed.
When that downtime happened we promised to take action so this wouldn’t happen again. Not only are we running a website and it’s inconvenient when it’s not accessible, with our Radius Wifi authentication feature cobot is also a core part of some coworking space’s network infrastructure, so if cobot goes down the people in these coworking spaces can’t log into their wifi anymore.
So obviously we don’t want any more downtime. And this is what we’ve done:
- It has enabled us to fully automate setting up new EC2 instances, which means we can start up and configure as many servers as we want at a click of a button.
- If a server goes down scalarium’s auto healing feature will detect it and bring up a brand new instance within a few minutes.
Right now that we’re still very small we are running on a single EC2 instance. If that should ever go down it will only be for a very short time. Without any human intervention the site will be running again within a few minutes.
Within the next months we will add a second (or more) instance and a load balancer to our cluster. First of all this will allow us to scale for more traffic, and secondly it will reduce the potential for downtime even more, as when one instance goes down the other(s) can simply take over until that one is back.