MongoDB: Data Modeling Patterns
There are many reasons like performance, scalability etc., for people to choose NoSQL data base over Relational DB however the vital part of meeting these requirements is based on your schema data model.
Data Modeling is a journey starts from understanding the requirement and transforming into model using patterns and then evolve the model continuously.
Where to start ?
Traditionally we tend to start ER or UML diagrams to capture the model once we have some amount of clarity on the requirements but the best way to start the data modelling is to write an application with in-memory and understand them based on some critical parameters like Usage Pattern, How the data is accessed, workload identification and Ratio between reads and writes.
Model for Simplicity or Performance
Simplicity means gather all the data in a flexible way and easy to understand by the developers. In other words, use of Embedded document approach to maintain most of the relationships over multiple collections in your schema.
Performance always demands fine-grained data exchange to reduce the disk, cpu usage but it also comes with the increase of complexity in your model like Sharding and Aggregation. So it’s always preferred to start with Simple model and evolve to meet your performance criteria. In this way, your domain/data understanding will be deeper and it also reduce the wastage in your modeling effort.
Too many documents or too big document
When the document size grows then it can lead to a problem associated with a working set that exceeds RAM, which eventually access your disk to return the data set. The idea behind Subset Pattern is to split the collection into two collection based on Most used vs Less used part of documents.
Basically we need to look out for one-to-many relationships in our document and then split the less used part into separate collection . For example, A product will have many reviews and we don’t need to show all of them in the website. So keeping only a limited reviews along with product on the same collection and move the rest of reviews into another collection will reduce the size of your document which in turns reflect in the working set as well.
In contrast to big document, if we model too many smaller documents then that can also pose some issues as our application scales in terms of data and index size. By using the Bucket Pattern we can group the data based on the application usage. A typical use cases to apply this pattern are Internet of Things (IoT), Real-Time Analytics and Time-series data.
For example, a customer search in the website we would like to list only the available products from the inventory. Here we can apply the bucket pattern to group the available products per day as shown below.
Pattern to boost your read performance
The Computed Pattern is utilized when we have data that needs to be computed repeatedly in our application. For example, a customer search in the website we need to list only the cheapest products for the given property and date. This computation can be managed during the write channel and create the required document as shown below. In MongoDB, Aggregation Pipeline is a good framework to apply these kind of computations in a effective manner.
This powerful pattern helps to reduce the CPU workload and increase the application performance especially on read intensive applications.
Pattern to Evolve your schema
Altering schema is a nightmare in Relational DB and often it requires a downtime in the application with well coordinated rollout plan. The Schema Versioning Pattern allows you to alter the schema without any downtime. It takes advantage of MongoDB’s support for differently shaped documents to exist in the same data base collection.
Basically we can create and save the new schema to the data base with schema_version field. Now the application can differentiate the document model based on the schema_version and it can support hybrid model as a rollout strategy. This also helps to avoid big bang data migration which is a critical step for heavily data involved applications.
We have few more patterns like Attribute, Polymorphic, Tree etc., but i have highlighted only the patterns which i have experienced in our project. Its always important to pick the right pattern for your domain/use case to get a successful outcome. Also you will get a greater results when combining the patterns to model your schema.
Modeling is a journey and combining with evolutionary thinking will produce better outcomes.