Home » Linux » What are the thread limitations when working on Linux compared to processes for network/IO-bound apps?

What are the thread limitations when working on Linux compared to processes for network/IO-bound apps?

Posted by: admin November 29, 2017 Leave a comment

Questions:

I’ve heard that under linux on multicore server it would be impossible to reach top performance when you have just 1 process but multiple threads because Linux have some limitations on the IO, so that 1 process with 8 threads on 8-core server might be slower than 8 processes.

Any comments? Are there other limitation which might slow the applications?
The applications is a network C++ application, serving 100s of clients, with some disk IO.

Update: I am concerned that there are some more IO-related issues other than the locking I implement myself… Aren’t there any issues doing simultanious network/disk IO in several threads?

Answers:

Drawbacks of Threads

Threads:

  • Serialize on memory operations. That is the kernel, and in turn the MMU must service operations such as mmap() that perform page allocations.
  • Share the same file descriptor table. There is locking involved making changes and performing lookups in this table, which stores stuff like file offsets, and other flags. Every system call made that uses this table such as open(), accept(), fcntl() must lock it to translate fd to internal file handle, and when make changes.
  • Share some scheduling attributes. Processes are constantly evaluated to determine the load they’re putting on the system, and scheduled accordingly. Lots of threads implies a higher CPU load, which the scheduler typically dislikes, and it will increase the response time on events for that process (such as reading incoming data on a socket).
  • May share some writable memory. Any memory being written to by multiple threads (especially slow if it requires fancy locking), will generate all kinds of cache contention and convoying issues. For example heap operations such as malloc() and free() operate on a global data structure (that can to some degree be worked around). There are other global structures also.
  • Share credentials, this might be an issue for service-type processes.
  • Share signal handling, these will interrupt the entire process while they’re handled.

Processes or Threads?

  • If you want to make debugging easier, use threads.
  • If you are on Windows, use threads. (Processes are extremely heavyweight in Windows).
  • If stability is a huge concern, try to use processes. (One SIGSEGV/PIPE is all it takes…).
  • If threads aren’t available, use processes. (Not so common now, but it did happen).
  • If your threads share resources that can’t be use from multiple processes, use threads. (Or provide an IPC mechanism to allow communicating with the “owner” thread of the resource).
  • If you use resources that are only available on a one-per-process basis (and you one per context), obviously use processes.
  • If your processing contexts share absolutely nothing (such as a socket server that spawns and forgets connections as it accept()s them), and CPU is a bottleneck, use processes and single-threaded runtimes (which are devoid of all kinds of intense locking such as on the heap and other places).
  • One of the biggest differences between threads and processes is this: Threads use software constructs to protect data structures, processes use hardware (which is significantly faster).

Links

Questions:
Answers:

it really should make no difference but is probably about design.

A multi process app may have to do less locking but may use more memory. Sharing data between processes may be harder.

On the other hand multi process can be more robust. You can call exit() and quit the child safely mostly without affecting others.

It depends how dependent the clients are. I usually recommend the simplest solution.