ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect…

Follow publication

title

Node.js: Heroes of Worker Threads ― C++ Addon

Novokhatskyi Oleksii
ITNEXT
Published in
8 min readNov 15, 2021

--

Epigraph

No dragon can resist the fascination of riddling talk and of wasting time trying to understand it. ― J.R.R. Tolkien

Problem

Today we will try to reveal ways to solve one of the biggest problems of JavaScript ― CPU-bound tasks. We’ll do this in 2 parts. In the Part I we’ll talk and try to use purely CPU-bound tasks. In the Part II ― we’ll try a little bit more ― modules that are used on frontend, CPU+I/O-bound tasks etc.

What we have?

Node.js has several ways to perform the following tasks:
1. Just run a CPU-bound tasks in one process by blocking the event loop. Someone will notice that this option is not an option at all, but if the process is created specifically for this task, then why not. But not everyone has a pair of extra cores.
2. Create separate processes (Child Processes), divide tasks between them.
3. Create a cluster, force processes, and make them work.
4. Use Worker Threads and create some additional threads.
5. Ask the C++ developer to write a C++ Addon that mysteriously executes the CPU-bound task. After all, I think everyone has heard ancient legends about compiled programming languages ​​and that the “native” implementation is always a success (in this phrase somewhere in the world should cry React Native developer, looking at the performance of his/her application).

In this article, we will not discuss the implementation of each of these methods, because these methods are already described in detail in other articles and reports.

Tools

As an example of CPU-bound tasks, we take various hash functions. The tools will be a “native” implementation of a particular module and a purely js version. Hardware with 8-core Intel® Core ™ i7–7700HQ CPU @ 2.80GHz

Will there be something fun?

The latter raised the question of how to convey the idea, process and results of the study…
And for this I chose the most popular, coolest, most advanced game…
of 1999 – Heroes of Might and Magic III

And now — let’s dive into the legend

Our hero is Node.js, and like any hero, he has a way. Our hero is strong, victorious, has already defeated many evils and decided that it was time to get rid of one of the worst evils – CPU-bound tasks.

Node.js team

Our hero must have a team. So who will we take on?
Cluster – 7 black dragons, 7 Child Processes – 7 red dragons and 1 red dragon, called JS, for always sending only one stream of fire at his enemies.
7 Worker Threads – 7 young green dragons. Not experienced, but eager to fight.
1 C++ Addon – 1 archangel. An experienced warrior who does not reveal all the secrets of his strength, but has shown himself very well in past battles.

Node.js team

Part I

First battle

And the first evil in his path is 1,400,000 strings (skeletal infantry) and the only thing that will allow us to defeat them is to drive them through Murmurhash (a non-cryptographic hash function suitable for general hash-based lookup) as soon as possible.

Module murmurhash3js will be used as a pure js implementation and murmurhash-native as “native” one.

Murmurhash

Implementation (JS 1 process) – nothing special, just run hash function in a loop and count time difference before and after:

Implementation (Child Processes) – spawn new processes and wait until all calculations will be finished (“close” event):

Implementation (Cluster) – fork a few (number depends on the number of CPU cores) worker processes and wait until messages from them (about work finish) will be received by main process. As the “message” number of the process was used:

Implementation (Worker Threads) – almost the same as in “cluster” – we create a few workers and wait in main thread for a message about finished job:

Implementation (C++ Addon) – just use C++ Addon (module) in main thread (as simple JS implementation):

…and “Action!”

Murmurhash battle

Results of the first battle:

Murmurhash results

As we can see C++ Addon is the fastest implementation in this case. Child Processes/Cluster/Worker Threads show almost the same result.

Round two

The next evil in his path is 140 strings (skeletal dragons) and we can beat them only with Bcrypt (sync for now) ― a password-hashing function, based on the Blowfish cipher. It’s the default password hash algorithm for OpenBSD and was the default for some Linux distributions.

Implementation is absolutely the same as for Murmurhash only modules are different ― bcryptjs and bcrypt respectively.

Bcrypt sync

Implementation example:

Bcrypt sync battle

Results of round two:

Bcrypt sync results

In this case (sync) the best option is to divide tasks and perform in parallel so Child Processes/Cluster/Worker Threads do their best.

Final fight

The next evil is 140 strings (more powerful skeletal dragons) and we can beat them but this time only with async Bcrypt. Implementation is absolutely the same as in previous battles (modules too).

Bcrypt async

Implementation example:

Bcrypt async battle

Results of final fight:

Bcrypt async results

We didn’t block Event Loop and use UV thread pool so in this case
C++ Addon is on top again.

…and results of all previous battles:

First part results

Need to mention that there is a secret weapon for our archangel. With its help he can fight even more effectively. It’s number of threads in the UV thread pool (UV_TREADPOOL_SIZE=size) which can be increased (has value 4 by default). In our case bcrypt uses crypto.randomBytes() so this helped us to reduce the time of bcrypt async execution by almost 2 times (by setting 8).

Part II

Argon 2 fortress

Second part of our epic story starts near the castle named “Argon 2”. It’s named by a key derivation function that was selected as the winner of the Password Hashing Competition in July 2015 which has 3 versions:
- Argon2d maximizes resistance to GPU cracking attacks. It accesses the memory array in a password dependent order, which reduces the possibility of time–memory trade-off (TMTO) attacks, but introduces possible side-channel attacks.
- Argon2i is optimized to resist side-channel attacks. It accesses the memory array in a password independent order.
- Argon2id is a hybrid version. It follows the Argon2i approach for the first half pass over memory and the Argon2d approach for subsequent passes.

Node.js with the team needs to take this fortress using argon2-browser (js) and hash-wasm (native) modules.

Implementation example:

Argon2

Results of the fight:

Argon2 results

C++ Addon is on top again and is the best for the pure CPU-bound task.

Rebuild the castle

For now all battles are finished. We need to rebuild the city and make friends with the local population. For this we’ll read about all the laws and traditions. Fortunately, everything is in xlsx format in 7 files with 5000 rows in each document (again js xlsx and native xlsx-util modules will be used as “magic readers”).

Implementation example (read and parse the file):

Xlsx

Results of reading and parsing:

Xlsx results

In this case we have mixed I/O (read file) and CPU (parse) intensive tasks. So C++ Addon makes this faster because of the second task’s component.

Time for changes

Finally now we can change old bad laws and rebuild a new prosperous society. We’ll use jsonnet template language for this purpose. It helps us to:
- Generate config data.
- Manage sprawling config.
- Have no side-effects.
- Organize, simplify and unify our code.

Modules @rbicker/jsonnet (js) and @unboundedsystems/jsonnet (native).

Implementation example:

Jsonnet

A little bit of strange results but it mostly depends on the inner implementation of modules.
So anyway we have final results:

Jsonnet results

And final results of Part II:

Second part results

Conclusions

From our study we can draw the following:

  1. Select modules responsibly. Read the code of the libraries and consider the environment where your application will be deployed. Module popularity is by far not the most important criteria.
  2. Сhoose a solution depending on a specific task. Child Processes, Cluster, Worker Threads ― each of these tools has its own characteristics and areas of use.
  3. Don’t forget about other programming languages that can help to solve some of the tasks (C++ Addons, Node-API, Neon library).
  4. Plan you resources utilization (number of CPU or GPU cores).
  5. Make rational architectural decisions (implement you own thread pool, run CPU-bound tasks in separate microservice etc).
  6. Find the best possible combination (C/C++/Rust/Go can be used not in the main thread and event loop will not be blocked) and you’ll get something like this:
Logo

Thanks for reading.
Hope you enjoy this epic story and feel yourself a part of the legend.
Please clap and follow me so as not to miss new articles.

For more info and to have the possibility to check results by running scripts by yourself ― please, visit github repository.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

Written by Novokhatskyi Oleksii

Software Engineer, Co-Founder of Purport (purportapp.com), Tech Read channel (t.me/technicalread)

Responses (1)

Write a response