I've been thinking a lot about https://eieio.games/essays/scaling-one-million-checkboxes/ lately. I not only think it's a fascinating story, but also on a technical level it's got a bunch of interesting challenges to solve. So, naturally, I've been considering alternative solutions, and I'm starting to play with the idea of creating my own version of OMCB.
Something I've read about, but never have actually had the need for myself, is sharding. So instead of doing the pragmatic thing and use redis like the original, I was thinking of creating a distributed monstrosity where state is spread out over several different instances. The basic architecture would consist of a 3 layers:
1) caddy to serve static content and act as a reverse proxy for 2)
2) webserver responsible for handling the requests from clients, including
* establishing and maintaining websocket connections
* forwarding updates to the correct shard
* subscribing to incremental updates from all shards and forwarding
to all connected clients
* regularly broadcasting a full state to all connected clients to reset
state in case we lose some incremental updates along the way
3) application responsible for maintaining the state of a given shard
* stores state in-memory as a bitmap
* regularly backs up state to disk so that we can recover in case of failure
* has methods for updating shard state, fetching whole shard state, and broadcasting
incremental updates to subscribed webservers
There's probably a billion things that can go wrong with this architecture, and that is among the reasons why I'd like to try it out. It should provide ample opportunity for learning new things.
Some more random ideas:
* Create an orchestrator/control plane that can handle redistribution of data if we increase or decrease the number of shards
* The control plane can also be responsible for backing up the entire grid
* It should be able to configure the web servers without having to restart them to let them know of the changes in shards
* It should be possible to dynamically set the shard state so that we can recover the entire grid from a snapshot