Passing MongoDB data modeling

Track the website visitors

Dear readers of our blog, we'd like to recommend you to visit the main page of our website, where you can learn about our product SQLS*Plus and its advantages.

SQLS*Plus - best SQL Server command line reporting and automation tool! SQLS*Plus is several orders of magnitude better than SQL Server sqlcmd and osql command line tools.

REQUEST COMPLIMENTARY SQLS*PLUS LICENCE

Enteros UpBeat offers a patented database performance management SaaS platform. It proactively identifies root causes of complex revenue-impacting database performance issues across a growing number of RDBMS, NoSQL, and deep/machine learning database platforms. We support Oracle, SQL Server, IBM DB2, MongoDB, Casandra, MySQL, Amazon Aurora, and other database systems.

30 September 2020

Flexibility

SQLS*Plus - 1649775553725513F3569 EC1B 48DD A605 65E2BDE047AC optimize

One of the most advertised features of MongoDB is its flexibility. Flexibility, however, is a double-edged sword. More flexibility means more choices when modeling data (this reminds me of Zen Python: ” There must be one – and preferably only one – obvious way to do it “). However, I like the flexibility that MongoDB provides, it’s just important to consider some best practices before choosing a data model.

This is a problem

In this post, we will look at how we modeled the mailing lists and the people who belong to them. Here are the requirements:

Each person can have one or more e-mail addresses.
Every person can belong to any number of mailing lists.
Everyone who belongs to a mailing list can choose which name he wants to use for the mailing list.

These requirements are obviously somewhat simplified, but they are enough to express the mechanics of the kernel on which Fiesta is based.

0-Code

Let’s take a look at what our data model looks like if we never embed anything – we’ll call it a 0 investment strategy.

We have People who have a name and a password:

{
_id: PERSON_ID,
name: "Mike Dirolf"
pw: "Some Hashed Password"
}

We have a separate collection of addresses, where each address contains a link to one person:

{
_id: ADDRESS_ID,
person: PERSON_ID,
address: "mike@corp.fiesta.cc"
}

We have groups, each of which is basically just an identifier (the IRL has some more group-specific metadata that will also be here, but we are going to ignore them to focus on the relationship):

{
_id: GROUP_ID
}

Finally, we have a membership that links a person to the group. Each membership includes the name of the list that the person uses for the group and a link to the address where they want to receive mail for that group:

{
_id: MEMBERSHIP_ID,
person: PERSON_ID,
group: GROUP_ID,
address: ADDRESS_ID,
group_name: "family"
}

This data model is easy to design, easy to think about, and easy to maintain. However, we mainly model data as in the DBMS; we do not use the document-based MongoDB approach. For example, let’s see how we can get the addresses of other members of a group, considering one incoming address and the group name (this is a very common query for Fiesta):

Request a collection of addresses to obtain an ID for the person concerned.
Request membership in a collection with a person’s identifier from step 1 and the group name to obtain a group identifier.
Request a Memberships collection again to obtain all memberships with a group identifier from step 2.
Request an address collection to obtain the address that will be used for each of the participants in step 3.

Insert everything

The strategy that many novices use when modeling their data, we will call the strategy of implementing everything. To use this strategy for Fiesta, we would take all group memberships and implement them directly in the group document. We would also embed address and person metadata directly into each membership:

{
_id: GROUP_ID,
memberships: [{
address: "mike@corp.fiesta.cc",
name: "Mike Dirolf",
pw: "Some Hashed Password",
person_addresses = ["mike@corp.fiesta.cc", "mike@dirolf.com", ...],
group_name: "family"
}, ...]
}

The theory behind the Embed All strategy is that by storing all related data in one place, we can simplify common requests. With this strategy, the query we have done above is trivial (remember that the query is “given address and group name, what are the other addresses of the group members”):

Query the Groups collection for a group containing a membership with the same address as the person_addresses and group_name.
Go through the resulting document to get the other addresses of the members.

This is as easy as it can be. But what if we want to change a person’s name or password? We would have to change it in every built-in membership.

The same goes for adding a new person_address or removing an existing one. This underscores the characteristics of the embed everything model: it can be useful to perform one specific request (because we mostly do pre-join), but it can be a nightmare for long-term maintainability. I highly recommend this approach in general.

Insert trivial things

The approach we used in Fiesta and the approach I most often recommend is to start by thinking about the 0-embed model. Once you’ve figured out this model, you can choose simple cases where embedding makes sense. In most cases, these cases tend to be too many relationships.

For example, each of our addresses belongs to one user (and is also referred to by members). Addresses are also unlikely to change very often. Let’s build them into an array to save some queries and synchronize our data model with our human mental model.

Each membership is linked to one person and one group, so we could imagine that they are built into either the person model or the group. In such cases, it is important to consider both the data access patterns and the size of the relationships.

We expect people to have no more than 1000 memberships in groups and groups to have no more than 1000 memberships, so the size does not tell us much. However, our access template – when we display the Fiesta control panel, we need access to all memberships of a person.

To simplify this request, let’s build membership into the Person model. It also has the advantage that all person addresses are stored in the Person model (because they are referenced both at the highest level and within the membership). If an address needs to be deleted or changed, we can do it all in one place.

This is what it looks like now (this is the Person model – the only other model is Group, which is identical to the 0-embed case):

{
_id: PERSON_ID,
name: "Mike Dirolf",
pw: "Some Hashed Password",
addresses: ["mike@corp.fiesta.cc", "mike@dirolf.com", ...],
memberships: [{
address: "mike@corp.fiesta.cc",
group_name: "family",
group: GROUP_ID
}, ...]
}

The request that we discussed now looks like this:

Request for a user with the appropriate address and built-in membership with the correct group name.
Use the identifier of the group in the built-in membership, starting with step 1, to request other people who have membership in that group to get the addresses directly from their built-in membership.

This is still almost as easy as inline, but our data model is much cleaner and easier to maintain. We hope this step by step guide was useful – if you have any questions, let us know!

Data Modeling with MongoDB

Enteros

About Enteros

Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of RDBMS, NoSQL, and machine learning database platforms.

Tags: MongoDB, MySQL, SQL

NoSQL life without a schema

NoSQL: life without a schema

PreambleNoSql is not a replacement for SQL databases but is a valid alternative for many situations where standard SQL is not the best approach for...

SQLS*Plus - MongoDB Sample operators 1

MongoDB Sample operators

PreambleMongoDB Conditional operators specify a condition to which the value of the document field shall correspond.Comparison Query Operators $eq...

5 Database management trends impacting database administration

5 Database management trends impacting database administrationIn the realm of database management systems, moreover half (52%) of your competitors feel...

SQL Server data types

SQL Server data types

The data type is defined as the type of data that any column or variable can store in MS SQL Server. What is the data type? When you create any table or...

SQL Server Architecture

SQL Server Architecture

PreambleMS SQL Server is a client-server architecture. MS SQL Server process starts with the client application sending a query.SQL Server accepts,...

Get a better understanding of the MongoDB master slave configuration

Get a better understanding of the MongoDB master slave configuration

First the basics: what is the master/slave?One database server (“master”) responds and can do anything. A lot of other database servers store copies of all...

Run a MongoDB data adapter for Atom Hopper with Netbeans

Run a MongoDB data adapter for Atom Hopper with Netbeans

PreambleAtom Hopper (based on Apache Abdera) for those who may not know is an open-source project sponsored by Rackspace. Today we will figure out how to...

Improve MongoDB aggregation structure

Improve MongoDB aggregation structure

PreambleMongoDB recently introduced its new aggregation structure. This structure provides a simpler solution for calculating aggregated values rather...

SQLShell: SQL tool for multiple databases with NoSQL potential

SQLShell: SQL tool for multiple databases with NoSQL potential

PreambleSQLShell is a cross-platform command-line tool for SQL, similar to psql for PostgreSQL or MySQL command-line tool for MySQL.Why use it?If you...

Use Mongosniff to clarify what your MongoDB hears and says

Use Mongosniff to clarify what your MongoDB hears and says

PreambleWriting an application on top of the framework on top of the driver on top of the database is a bit like a game on the phone: you say “insert...

Notes on Oracle Coherence

Notes on Oracle Coherence

PreambleOracle Coherence is a distributed cache that is functionally comparable with Memcached. In addition to the basic function of the API cache, it...

FrankenQueries: when SQL and NoSQL collide

FrankenQueries: when SQL and NoSQL collide

PreambleIBM pureXML, a proprietary XML database built on a relational mechanism (designed for puns) that offers both relational ( SQL / XML ) and...

PostgreSQL array

PostgreSQL array

What is PostgreSQL array? In PostgreSQL we can define a column as an array of valid data types. The data type can be built-in, custom or enumerated....

7 steps to create a new Oracle database from the command line

7 steps to create a new Oracle database from the command line

PreambleIf you are a Linux sysadmin or developer, there comes a time when you need to manage an Oracle database that can work in your environment.In this...

Restoring access to the SQL Server instance without restarting it

Restoring access to the SQL Server instance without restarting

PreambleStarting with Microsoft SQL Server 2008, by default, the group of local administrators is no longer added to SQL Server administrators during the...

SQLS*Plus - How to output a list of PostgreSQL databases and tables using psql

How to output a list of PostgreSQL databases and tables using psql

PreambleWhen administering PostgreSQL database servers, one of the most common tasks you will probably perform is enumerating databases and their tables....