I have a small websocket server, running on top of a set of libraries:
- ZeroMQ, using a
The code is basically the same as in the tutorials.
The eventloop starts correctly, users are able to connect to the server, they are getting correct messages, when the other side pushes something, but after a while, usually a few days (depending on the usage) the messages stop arriving.
The usage is not overwhelming at all – only one or two frontend developers connect at the moment, as this is a development stage.
The loop is running, it returns HTTP 101 Switching protocols on connect correctly, but does not broadcast messages that were correctly broadcast before. No errors anywhere. Restarting the event loop helps.
My questions are:
1) What can cause this? Has someone encountered similar behaviour?
2) Can you recommend a way I could debug this in long running process of the event loop?
Currently, I must stop the loop, change the code (add logging calls), restart the loop again and wait for it to go wrong again, which is tedious at least.
Any help greatly appreciated.
Well, I guess the ZMQ was the culprit.
When there were multiple applications using ZMQ on the same machine, messages sometimes ended up in the wrong consumer – even though every application had a different port specified for connection to ZMQ sockets.
So users were sometimes getting websocket frames from a completely different application, and when there was no corresponding user for the message, the frame vanished on the way. So websockets didn’t stop broadcasting, messages were just routed incorrectly.
I have no larger knowledge of ZMQ and whether this is a documented or otherwise known behaviour.
I solved the problem by rewriting the backend to RabbitMQ with a separate vhost and channel for every application. The problems are gone now, every frame ends up where it should.