MongoDB: $lookup or not $lookup?

Novokhatskyi Oleksii
4 min readMay 3, 2020

Epigraph

There’s a difference between knowing the path and walking the path. ― Morpheus (The Matrix, 1999)

JOINs? No, God, please, no. NO!!!

As developers, we always try to make our database queries as fast as possible. For this we use various schemas, structures, relations. But still, sometimes we need to combine information taken from several sources (collections, tables, etc.). In this article I will try to describe possible ways of collecting data from different collections of MongoDB

Tools

  1. MongoDB v4.0.3 used (module “mongodb-memory-server”) ― replica set
    https://www.npmjs.com/package/mongodb-memory-server
  2. Using MongoDB driver v3.3.4 (module “mongodb”)
    https://www.npmjs.com/package/mongodb
  3. “microseconds” module
    https://www.npmjs.com/package/microseconds

Task

We have data in 3 collections: “teachers”, “students” and “exams”

We need to get a result as a number of student’s “passed” exams for each teacher

Before start

Let’s create “teachers”, “students” and “exams” collections

And insert 100 teachers, 300 students (3 for each teacher) and 900 exams (3 for each studens)

Way 1: Multiple requests (bad but works)

We can get all teachers -> then students for each teacher -> then exams for each student -> and finally filter only “passed” exams

But what if the number of teachers will dramatically increase? Or number of students? Or exams? We’ll have a bigger number of requests to database. It’s bad for performance

Way 2: Get whole collections (bad but depends)

We can get all teachers -> get all students -> get exams and combine the data in code

… or do this in parallel (but for current example there’ll not be big difference)

But again, what if number of document in our collections increases? In some cases performance can be even worse than in previous example

Way 3: Aggregate (let MongoDB do it’s job)

We can get teachers -> aggregate data from “students” collection by “teacherId” -> aggregate data from “exams” collection by “studentId” -> add/remove needed/not needed fields.
For all of this we can use MongoDB’s “aggregate” method

  1. First “$lookup” gets data from “students” collection by “teacherId” and puts it into “students” array
  2. Second “$lookup” gets data from “exams” collection by “studentId” (in current documents “studentIds” are in “students” array but MongoDB understands us and checks every “_id” in array) and puts it into “exams” array
  3. “$project” used for removing unnecessary data (“students” array is not needed anymore)
  4. Than we need to get only “passed” exams. For this we’ll filter “exams” array and put filtered values in new field “examsPassedArray” (with “$addFields”)
  5. We need only the number of passed exams ($size of “examsPassedArray”) and _id of teacher (_id passes $project stage of aggregation by default)

Just for information (result may vary)

Only default indexes (_id):

Number of iterations: 100
Multiple Requests: 182.40 +- 14.18 milliseconds
Get collections: 94.55 +- 1.26 milliseconds
Aggregation: 39.30 +- 0.30 milliseconds

Some indexes added (teacherId for “students” collection and studentId for “exams”):

Number of iterations: 100
Multiple Requests: 99.51 +- 14.38 milliseconds
Get collections: 104.60 +- 4.42 milliseconds
Aggregation: 10.66 +- 0.53 milliseconds

Conclusion

Instrument provided by a service in most cases are the best for the service.
So don’t forget to check official documentation

For more info and to have possibility to check results by running scripts by yourself ― please, visit github repository

Thanks for reading.
Feel free to comment, hope we’ll have a good discussion

--

--