Home » Javascript » Node.js reliability for large application

Node.js reliability for large application

Posted by: admin February 12, 2018 Leave a comment


I am new to Node.js and am currently questioning its reliability.

Based on what I’ve seen so far, there seems to be a major flaw: any uncaught error/exceptions crashes the server. Sure, you can try to bullet-proof your code or put try/catch in key areas, but there will almost always be bugs that slip through the crack. And it seems dangerous if one problematic request could affect all other requests. There are 2 workarounds that I found:

  1. Use daemon or module like forever to automatically restart the server when it crashes. The thing I don’t like about this is that the server is still down for a second or two (for a large site, that could be hundreds (of thousands?) of request).

  2. Catch uncaught exceptions using process.on('uncaughtException'). The problem with this approach (as far as I know) is that there is no way to get a reference to the request that causes the exception. So that particular request is left hanging (user sees loading indicator until timeout). But at least in this case, other non-problematic requests can still be handled.

Can any Node.js veteran pitch in?


For automatic restarts and load-balancing, I’d suggest you check out Learnboost’s up balancer.

It allows you to reload a worker behind the load-balancer without dropping any requests. It stops directing new requests towards the worker, but for existing requests that are already being served, it provides workerTimeout grace period to wait for requests to finish before truly shutting down the process.

You might adapt this strategy to be also triggered by the uncaughtException event.


You have got full control of the base process, and that is a feature.

If you compare Node to an Apache/PHP setup the latter is really just equivalent to having a simple Node server that sends each incoming request to it’s own process which is terminated after the request has been handled.

You can make that setup in Node if you wish, and in many cases something like that is probably a good idea. The great thing about Node is that you can break this pattern, you could for instance have the main process or another permanent process do session handling before a request is passed to it’s handler.

Node is a very flexible tool, that is good if you need this flexibility, but it takes some skill to handle.


Exceptions don’t crash the server, they raise exceptions.

Errors in node.js that bring down the entire process are a different story.

Your best bet (which you should do with any technology), is just test it out with your application as soon as possible to see if it fits.


An uncaught exception will, if not caught, crash the server. Something like calling a misspelled function. I use process.on('uncaughtException') to capture such exceptions. If you use this then, yes, the error sent to process.on('uncaughtException') is less informative.

I usually include a module like nomnom to allow for command-line flags. I include one called --exceptions which, when set, bypasses process.on('uncaughtException'). Basically, if I see that uncaught exceptions are happening then I, in development, start up the app with --exceptions so that when that error is raised it will not be captured, which causes Node to spit out the stack trace and then die. This tells you what line it happened on, and in what file.

Capturing the exceptions is one way to deal with it. But, like you said, that means that if an error happens it may result in users not receiving responses, et cetera. I would actually recommend letting the error crash the server. (I use process.on('uncaughtException') in apps, not webservers). And using forever. The fact is that it is likely better for the webserver to crash and then expose what you need to fix.

Let’s say you used PHP instead of Node. PHP does not abruptly crash the server (since it doesn’t really serve). It spits out really ugly errors. Sure, it doesn’t result in a whole server going down and then having to come back up. Nobody wants their clients to have any downtime. But it also means that a problem will persist and will be less noticeable. We’ve all seen sites that have said errors, and they don’t get patched very fast. If such a bug were to take everything down for one small blip (which honestly its not all that bad in the larger picture) then it would surely call attention to itself. You would see it happen and would track that bug down.

The fact is that bugs will exist in any system, independent of language or platform. And it is arguably better for them to be fatal in order for you to know they happened. And over time it causes you to become more aware of how these error occur. I don’t know about you, but I know a lot of PHP devs who make the same common mistakes time after time.