Mongodb — no aggregation, no problem
How I implemented a quick version of group by in NOSQL without using aggregation pipelines ?
With MongoDB, there exists no traditional SQL Group By command. You have to use aggregation pipelines, which you can also use through API requests with service libraries such as mongoose or MongoDB aggregate commands.
Working on a File Export Feature for Invoices, I had already written simple codes for retrieving the data. With the initial requirements, I only had to export flat data. Later, I also needed to add another solution to group the invoices by the address (person) who created them.
I did not want to modify the existing codes to use aggregation pipelines, as I would have adapt my codes and test for the initial feature (simple data export) all over again. Mongoose Restify plugin did most of the heavy lifting for me, such as populating related records.
Assuming implementing a group by solution to be straightforward, I wrote a small function that looped through the collection to group addresses together.
Yet to my surprise, I was stuck with the string comparison. The field, however similar, was never equal. I could not understand the reason behind. I tried type coercion with double equals equality check (==) with nothing better. I lost a couple of hours solely trying to understand the reason. I added console.log everywhere to check for the data state. The address, appeared to be matching.
I chose to group the data in an object, called listOfInvoices, which contained an array of objects with property address and invoices. The address contained the address name of the grouped by invoices and the invoices, contained the list.
I written the codes for extracting the address name and added it to the main object — listOfInvoices. However, when I wanted to check if the name of the next bill already existed or was new, the comparison failed. I grew confused and instead of trying any hasty attempts — I spent time understanding the cause.
A better approach for me could be using Test Driven Development (TDD). I could get a set of test data from the actual collection and wrote a series of tests with their expected results. I would write a function to return the grouped data. Finally I would make the test pass with minimum codes.
I perhaps underestimated the nature of this task. When dealing with logic, I usually wrote unit tests.
After a few moments of mindless code tinkering and searching on Google — for a solution on equality check with Lodash, I remembered it had a group by function.
I mapped the address name, found in the invoice details of the data, to the first level so that I didn’t need to specify the levels in lodash groupBy function. I did not have to specify a deeper reference like invoice.address.name.
I then copied pasted the codes verbatim from Stack Overflow, changing only the groupBy parameter name and the final fields name as illustrated below:
Finally, as I later observed, the issue was not with the codes I had initially written. The list of invoices that I referred to, all had contrasting addresses. In other words, there was nothing to group. My codes worked well for the comparison. Nonetheless, I got a battle proven, tested and less hacky solution.