Many to Many Relation in DynamoDB

6 min readJul 26, 2020

If you have seen Rick Houlihan’s videos, you would know the importance of single table, and also experienced its many complexities. In this article, we will walk through one of them, ie the handling of many-to-many relations and solving their access patterns.

And the best way to do this, of course by taking an example and going through the process of defining our table. Buckle up, this will be lengthy.

Step 1 : Gathering the requirements

Let’s imagine we have two entities, User and Job.

User Entity has the following properties :
- UserId (always unique)
- UserName
- DoB
Job has the following properties :
- JobId (always unique)
- JobType
- StartDate (can change frequently — you’ll see why this is important later)
The relation between User and Job is that a User can have multiple Jobs, and a Job can involve multipleUsers.

So the 3rd point defines our many-to-many relationship.

Step 2 : Access Patterns

Fetch details of the User ← UserId
Fetch details of the Job ← JobId
For a particular User, get all the jobs he/she has ← UserId
For a particular Job, get all the users involved in it← JobId

Note that this only contains very generic use case, and usually, there are many more access patterns defined.

Step 3 : Formulate the Table

Lets see how we arrive at our final table by going through the info we have gathered one by one.

Step 1 — Requirement 1 (Defining User Entity)

We want to store the User details, and for this, we can use UserId as the Partition Key since we know that it will always be unique.

Step 1— Requirement 2 (Defining Job Entity)

Since now we have two entities in the same table, let’s override the Partition Key name to PK . This allows us to fetch items of both entities.

Also, since the IDs might clash and for better identification, we will prepend the IDs with constants denoting the entity type, ie USER# for User Entity and JOB# for Job Entity

Step 1 — Requirement 3 (Relation between User and Job)

Now we arrive at creating the relation, and for this we can use Adjacency List Design Pattern. To relate the User and Job Entities, we will utilise Sort Key. Like the PK, it too will be overloaded, and so its name is set as SK.

Now when we are searching for all the Jobs related to a User, we query based on USER#<UserId> and beginning with JOB# . And similarly, for all the Users having the Job, we can query based on JOB#<JobId> and beginning with USER# .

You might also enquire why we are using SK same as PK for the metadata/details. It is to keep the system extensible and this will help when new requirements come in the future.

So now that we have our basic table defined, let’s check how we can handle our access patterns.

Step 2 — Access Pattern 1 (Get User Details)

We have to now fetch the User Details based on the UserId . Now this is quite simple, and supposing the UserId was 1, we can query with

PK = USER#1 AND SK = USER#1

Step 2 — Access Pattern 2 (Get Job Details)

Similar to the above query, we can directly use the JobId to fetch the job details. Supposing the JobId was 9, we can query with

PK = JOB#9 AND SK = JOB#9

[Skipping Access Pattern 3 for now]

Step 2 — Access Pattern 4 (Get Users involved in a particular Job)

Here, we want the details of the Users involved in a particular Job given JobId . Now, based on our table, if we were to query the followingPK=JOB#9 AND SK BeginsWith USER# , we would get all the UserId s that we needed. And using this, we can certainly go and fetch the User details. But this would increase the overhead for us, and it would be quite costly when you have say 1000 users working on 1 job.

To solve this, we provide the details of the User in the Job section under its linkage. So our table now transforms as below:

This approach would let us fetch the User details along with the Job PK. Allowing us to query by the following to get the details on all the Users involved in a job.

PK=JOB#9 AND SK BeginsWith USER#

Step 2 — Access Pattern 3 (Get Jobs belonging to User)

Here, we want the details of the Job belonging to a particular User given UserId. Applying similar technique as above will indeed solve our problem, but what about when things need to be UPDATED? If you remember, the requirements mentioned that the StartDate of the Job is volatile and changes frequently. Now what if we used the above method to store the details containing StartDate? It's still fine if we have very few Users per Job. But what happens in case the number of Users are more, say 1000 or so? We have to do 1000 updates every time that the StartDate of the particular Job changes. That is a big overhead.

And this is the place where we finally see limitations to our model. To overcome this limitation, we can do few things, depending on multiple factors.

Update across all relations : As discussed above, when the updates are limited (number of updates based on frequency of updation to the data and/or number of related elements), this is a feasible solution. But when the number of updates increases, the overhead might turn out to be a bit too costly.
Aggregate your data : Another technique is to store your required data in form of aggregation, hence decreasing the number of updates required. This usually involves a separate service monitoring the DDB Streams and updating the aggregation as and when changes happen. While improves throughput, the codebase handling becomes a mess, with interwoven dependencies making life miserable.
Fetch metadata : Though this may seem like an Anti-Pattern (and from my point of view it is), we can improve our efficiency by Batching the query calls once you have the links (IDs). Also note, the batching happens on the client-side, and is not handled by DynamoDB, meaning the cost is still there, just that you don’t have to worry about multi-threading them.

TL;DR,

The basic idea here is to only include non-mutable data in our linkings and to fetch the details that we require using BatchGetItem operation.

A common use case here would be to do a prefetch of sorts (partial data) so that the call returns fast and you can show the partial data while you load the complete data.

So getting back to our Access Pattern, we have the table populated as below