• Home
  • Blog
  • Bitpanda's new trade engine: Part #3 - Leveraging a single-table design for high performance

Bitpanda's new trade engine: Part #3 - Leveraging a single-table design for high performance

Vladimír Vráb

By Vladimír Vráb

In Part #3, we examine the main philosophy behind the single-table design, why it makes sense to use it, and some practical examples. Let’s dive in!

We previously explored the programming languages we selected for our trade engine and the frameworks we decided to use. Now we are shifting our focus to a more technical topic: the extensive use of single-table design in our core services. 

What is a single-table design?

For those of you who do not have experience with single-table design, here’s my definition:
“A single-table design is a method in which an application holds some, the majority, or even all data in one table.”

You might wonder if we mean one database, however, it’s simply just one table. With single-table design, there are no rules regarding what kinds of entities you want to store. It’s up to you to create a good table setup that allows you to do so.

Here are some common phrases you’ll notice when people mention single-table design:

  • They are typically applied to NoSQL databases.

    • A relational database expert might be horrified to see all data stored in a single SQL-based table, breaking all normal forms and eliminating the possibility of using relational databases' advanced querying features.

  • They are primarily mentioned in the context of DynamoDB.

    • DynamoDB is a fast, scalable NoSQL database service that lets you easily store and access large amounts of data.

    • Single-table design is almost always associated with AWS and AWS promotes a design with as few tables as possible (ideally one, if applicable).

  • They can also work with other NoSQL databases (like Redis, Cassandra)

    • An engineer just needs to be fully aware of the constraints they may face

As single-table design is strongly associated with DynamoDB, this article will focus solely on this database.

Why NoSQL instead of SQL?

Before we continue, let's take a moment to explore why we chose a NoSQL database for some critical parts of our system instead of a SQL database, and what the main differences are between them.

We could spend a lot of time going through the differences between NoSQL and SQL databases, but the biggest difference is that most well-known NoSQL databases, such as DynamoDB, Redis, and Cassandra, offer partition tolerance. On the other hand, traditional SQL databases like PostgreSQL, Oracle, and MySQL, provide high availability and consistency.

For those unfamiliar with the concepts of consistency, availability, and partition tolerance, these guarantees come from the CAP theorem. This theorem states that any distributed data store can only provide two out of the following three guarantees:

  • C - Consistency

    • This means that every read operation on the database returns the most recent write for a given piece of data. This ensures that every client sees the same data at any given time, which is crucial for scenarios where accurate and up-to-date information is required

    • For example, if two users are accessing the same account balance at the same time and one user updates the balance, consistency ensures that both users will see the updated balance immediately

  • A - Availability: 

    • This means that every request to the database will receive a response, regardless of whether it is successful or not.

    • In practice, this implies that if you make a request to a database, you will get a response - either the data you requested or an error indicating why the request could not be fulfilled.

    • For example, even if some servers in a distributed database cluster fail, the database system will still respond to requests, although it may not be able to return the most recent data due to potential delays in synchronisation.

  • P - Partition tolerance:

    • This refers to the database's ability to continue operating even when there are network partitions or communication failures between different nodes in the system. 

    • For example, if a distributed database is divided into multiple partitions due to network issues, the system can continue to operate and process requests on the remaining partitions, even if some nodes are temporarily inaccessible.

So, if you are willing to sacrifice consistency, you can use DynamoDB. If you use DynamoDB, you should use a single-table design.

Why choose a single-table design?

Let’s outline a single-table design’s main benefits:

  •  One table for everything

    • This is one of the biggest benefits. It provides simplicity that radically eases the setup of monitoring, auto-scaling, and backups.

  • Small roundtrips

    • You have the ability to retrieve some or all data with just one request, reducing the number of roundtrips needed.

  • Schemaless

    • Having no schema makes it extremely easy to store new models and relatively easy to update existing ones. This is crucial for faster development.

      • However, some might argue that this merely shifts the responsibility of defining a schema from the database to the application. They may have a point as the application must be capable of supporting all versions of an entity throughout the table's lifecycle.

  • “Unlimited” throughput

    • As a single-table design is strongly coupled with DynamoDB, you will benefit from automated auto-scaling options. This allows you to support tens of thousands of requests per second without manual intervention.

How to use a single-table design

Now we have a good understanding of what single-table design is, why integrating NoSQL into your ecosystem can be beneficial, and why a single-table design is worth trying out. Before we dive into the implementation, we need to set aside everything we’ve learned about designing relational databases and the normal forms we follow.

Let’s go through the process of designing and executing a simplified version of an "offer service” - a service responsible for creating offers within our ecosystem. For more context on the role of the offer service, feel free to revisit Part #1 which explores the development of the Trade Engine.

Understand your use case

The first step is to analyse our use cases based on a product description that might look like this: 

“An application needs to be able to create an offer for a user for any given asset using the latest price.” 

We will start with the following considerations:

  • We need to get a user

  • We need to get an asset

  • We need to get a price

  • We need to store an offer

Know your access patterns

Once we have listed all the use cases in a spreadsheet, the next step is to write the exact access pattern next to each use case. For example:

  • We need to get a user -> get user by id

  • We need to get an asset -> get asset by id

  • We need to get a price -> get asset price by asset id OR get asset price by asset symbol

It is crucial to perform this step carefully. If you find that your access patterns are too broad, it may be better not to use a single-table design. A classic example of overly broad access patterns is having a filter that allows you to query data across any column in a dataset.

Build the table

Once we have listed our use cases and carefully identified all our access patterns, we can proceed to building the table. When creating the table, you need to set up your primary key, which can be defined in two ways:

  • Partition key only: This defines the primary key.

  • Partition key and sort key: This combination defines the primary key.

To fully leverage the potential of single-table design, we will choose the second option, using one column as the partition key and another as the sort key. We recommend using the following names for clarity:

  • Partition key: PK

  • Sort key: SK

Once this setup is in place, the primary constraint is that every item must be uniquely identified by the primary key. By default, DynamoDB does not reject requests if you attempt to insert two different items with the same primary key. Instead, one item will overwrite the other. To ensure the uniqueness of your primary key during insertion, you can use condition expressions.

In a DynamoDB table, you don’t define columns. The table is schemaless, so every item can have a different set of attributes.

Entities

Once the table is set up and operational, we can insert data to illustrate the extreme simplicity of the process.

General format

The only attributes defined in our abstract entity will be the columns for the partition key (PK) and sort key (SK). When extending this class to create a specific model, you would only need to specify the entity name and provide a list of eligible attributes.

It’s important to note that we are using a constant value for the partition key, such as ‘USERS’. This approach can lead to a hot partitioning problem if the load is very high as all requests would be directed to the same partition, potentially reducing the table’s responsiveness. To mitigate this, you might add a suffix like #{ID} to the partition key to ensure that the load is distributed more evenly across all partitions.

Users

We should keep the user entity as simple as possible by including only 2 optional attributes: email and language.

Assets

We’ll continue with our minimalistic approach and add just two optional attributes: symbol and trading_active.

You may notice how easily we avoided the hot partitioning problem by adding the #{ID} suffix to the partition key. While this adjustment effectively addresses one issue, it introduces another: the challenge of elegantly listing all assets. When querying the data, you must always specify the partition key name and value as an equality condition, which can complicate elegantly querying for a complete list of assets.

For more details on how to manage these queries, you might want to read up on key condition expressions.

Asset prices

The last minimalistic approach would be an asset price model that has two optional attributes: price_bid and price_ask. In this case, we are again using #{ID} suffix to avoid the hot partitioning problem while accepting the inability to list asset prices as it makes no sense to us in this scenario.

Offers

Next, consider the offer entity, which has four attributes: price, fee, amount, and created_at. Notice the difference in the partition key format between an asset and an offer. In this case, we use the USERS#{USER_ID} format. This allows us not only to retrieve an offer by its ID but also to efficiently list all offers associated with a specific user.

Final table

Finally, our table could have a solid format like this.

Fantastic! We were able to store 4 different entities in one table and there is “no limit” to how many entities we can store. Just keep in mind that there is a maximum item size limit which is 400 Kb.

GSI and LSI to the rescue

Often, it may be challenging to satisfy all your access pattern requirements using just the partition and sort keys. To accommodate complex features with various needs, DynamoDB offers two additional options for indexing your data. You can choose the index that best suits your requirements:

  • Global secondary index (GSI): A GSI allows you to create an index on a selection of attributes from the base table, organised by a different primary key than that of the base table. This enables you to query your data using different attributes and key structures.

  • Local secondary index (LSI): An LSI provides an alternate sort key for a given partition key. This allows you to query the data within the same partition key using different sorting criteria.

These indexing options provide the flexibility to efficiently query your data in various ways, enhancing your ability to handle complex access patterns.

GSI

Imagine we have a new access pattern: getting a user by email. With our current approach, we have one undesirable option to achieve this - using the Scan operation along with filter expressions. Because the Scan operation is something we usually want to avoid (especially in single-table design, as retrieval gets more expensive with more data), we will create a new index.

Given the base table in this form, we can create a GSI on the columns PK and EMAIL, allowing us to efficiently query a user by their email address.

LSI

Imagine we have a new access pattern: getting an asset by symbol. This can be easily handled with an LSI, but don’t forget that there is one hard restriction - it must be specified during table creation.

Given the base table in this form, this allows us to create an alternate index by providing an alternate sort key to an existing partition key. 

Drawbacks

Every application design has its drawbacks. If there was a perfect solution for every use case, the internet wouldn't be full of resources and tutorials where various experts explain different integration patterns based on specific product requirements. A single-table design is no exception. Here are some of the main drawbacks:

  • Great for OLTP but not for OLAP

    • A single-table design is excellent for Online Transaction Processing (OLTP) but terrible for Online Analytical Processing (OLAP). NoSQL tables, especially DynamoDB, do not offer convenient options for querying and aggregating data. Additionally, exporting a DynamoDB table is not a trivial task and requires significant expertise. Full exports necessitate the use of slow and expensive Scan operations.

  • Complex access patterns require many GSI

    • Although you can theoretically store dozens or even hundreds of various entities in the same table, you often encounter one or two additional access patterns that necessitate creating indices. Be aware that AWS has a strong limit of 20 Global Secondary Indexes (GSIs) per table.

  • Complicated bulk updates

    • While updating every item in a table with 100 records is relatively easy, doing the same task with 1 million records is much more complex.

  • Steep learning curve

    • Adapting to a different way of working with data can be frustrating for engineers, especially those who have been working with relational databases for decades.

Conclusion 

As I mentioned in our previous article, we should grow and expand our horizons as our value is based on how much we know. Being aware of NoSQL databases is great, but knowing how to leverage single-table design to your advantage is a significant leap. I'll conclude this article with a practical example.

If you are a Bitpanda user, you know every trade starts with an offer. This offer is created and persisted in the offer service, which uses a single-table design to store all its entities. Since we have very few access patterns for the offer entity itself, it is an excellent candidate for a single-table design. Remember, our primary goals are high availability and partition tolerance.

We benefit from high throughput capabilities, which we have tested extensively. We’ve seen spikes from a regular load of 2-5 trades per second to an astronomical 80-90 trades per second. Additionally, we can easily scale our resources up and down based on our needs, resulting in very low spending during nights, calm days, and weekends. This wouldn’t be possible with relational databases, as scaling one instance or a cluster vertically often means downtime, which is unacceptable for the Tier 1 services our team - the trade engine - owns.

The green line represents the consumed write resources (write capacity units), while the white line indicates the provisioned resources. Notice how resources automatically scale up as the load increases and gradually scale down as the load decreases.

Therefore, implementing a single-table design was one of our greatest ideas, as it perfectly supports our mission to become the number one broker in Europe.

Vladimír Vráb

Vladimír Vráb