MongoDB: watch your pipeline
Epigraph
This is the rat that ate the malt
That lay in the house that Jack built. ― British nursery rhyme
Love is blind
Everybody likes new features especially if it makes your life easier.
For example this one: https://docs.mongodb.com/manual/reference/operator/aggregation/merge
You can save results of your aggregation in… your aggregation. Wow.
You can forget about this:
… and just do this:
what can goes wrong?..
Task
Some statistics about user changes (statuses) is needed on daily bases (we need to get this info every day) to build some charts.
We need to know the number of users “added” and “removed” today.
Final result should looks like this:
!important: logs about status changes are in separate collection.
So we need to do some magic to get the result. Let’s look on some parts of “some_super_long_pipeline”
What we have here?
1. Some match conditions for users and for logs (logs must be changed during one day)
2. We have a $group stage where are pushing all of out users info into array “users” for future sorting
3. Actually “filtering” users into two different arrays (“added” and “removed”)
4. Saving this info into “users_stats” collection
Have you already find the problem?
Failure story
Everything was ok during development and testing and… even on prod.
But in a few weeks we found out the hole in out charts!!! There were a long holidays (few days one by one) and none of the users were “added” and “removed”. So $merge hasn’t save anything. Why?
Because this is how it works. If you don’t have results on some stages of the pipeline then next stage has nothing to do. It’s easy (for inattentive developers like me) to miss this of you have a big number of steps in aggregation.
In our case we’ve faced the situation when none of the users went through the $match conditions and there was nothing to $push into “users” array on $group stage
Conclusion
1. Be attentive
2. Try not to build toooooo loooooong aggregation pipelines if it’s possible.
If it’s impossible ― see point 1
PS. I haven’t described a solution for this problem because there are actually many ways and you can choose the one that suits your particular case.
In this article I just wanted to warn you against such errors just because it’s better to learn from others’ mistakes
Thanks for reading and feel free to comment
References
https://www.codeproject.com/Articles/1149682/Aggregation-in-MongoDB
https://stackoverflow.com/questions/49077775/cant-group-when-root-has-no-results-mongodb