RogueDB Throughput: Effects of Client Hardware

Pattern

Introduction

The total throughput achieved with RogueDB relies on the database's capability to process data effectively and the client's ability to generate data and requests. Our team's experience has been the bottleneck primarily resides in the ability to transmit data effectively and quickly. gRPC channels with active bidirectional streams consume available CPU bandwidth to process tasks. Therefore, maximizing throughput requires fast processing client side by doing as little work between requests to ensure requests are being sent as fast as possible over the network. Any delays or time spent doing work directly contributes to a marked decline in throughput. When we discuss this importance of client side performance, the impact can be somewhat mitigated in scenarios of multiple independent clients sending workloads. This blog focuses primarily on single client scenarios and impact.

Setup

For this benchmark, the YCSB General Purpose benchmark serves as the measuring stick for differences in client side hardware. Code is the exact match found in our public GitHub repo. The variable altered in these benchmarks is the client's hardware: E2 vs. CD4. E2 for Google correlates to the Intel Broadwell chip series, specifically the standard setup (eg. not shared). CD4 correlates to the AMD Turin chip series. Both tests utilize a 4 core setup with 16GB and 15GB for the E2 and CD4 instances respectively. The RogueDB instance runs on CD4 compute in the 4 core 15GB setup. Also included in the throughput numbers are Read operations at different batch sizes to demonstrate that batching does not compensate the hardware differences.

Throughput Results

Side by side comparisons (E2 vs CD4):

  • YCSB: 128,809 op/s vs 452,039 op/s
  • Read Batch 1: 206,784 op/s vs. 746,983 op/s
  • Read Batch 10: 364,917 op/s vs. 1,129,685 op/s
  • Read Batch 100: 363,553 op/s vs. 1,170,998 op/s
  • Read Batch 1,000: 362,054 op/s vs. 1,117,608 op/s
Pattern

The difference between the E2 and CD4 CPU series from Intel and AMD translate to a throughput increase between 3x to 3.5x for E2 to CD4. To repeat, the only difference in the two benchmarks was solely the client side hardware used. Similar performance boost and penalties occur when placing expensive operations and computations in a loop shipping requests to RogueDB.

While we discuss batching extensively in other blog posts, the key takeaways around batching in this scenario is that it closes the gap marginally, but batching alone cannot overcome generational hardware improvements.

Discussion

A number of differences exist in the E2 (Intel Broadwell) and CD4 (AMD Turin) CPU series. These chip series belong to different generations of processors that historically bring significant upgrades in performance. The team at RogueDB are primarily casual enthusiasts of hardware, so we leave the in-depth analysis to reviewers and experts in the space. Clock speed and instruction count throughput are highly likely to be the key culprits achieved through hardware advancements.

These benchmarks encouraged us to select the C4D series by default for all our offerings. While 30% more expensive compared to the E2, a 3x improvement made the decision easy. Balanced hyperdisks also come by default on the C4D series meaning the latest long-term storage upgrades for read and write bandwidth to drives. When the newest generation of the Zen 6 AMD EPYC lineup becomes generally available, the team plans on migrating all customers to the latest hardware for a boost in throughput from hardware improvements.

Conclusion

The key takeaways for customers requiring maximal throughput from RogueDB: match client side hardware to the demands and reduce expensive computations in the loop sending requests. Additional insights into the hardware design differences that contribute to reduced total throughput exist outside the realm of our expertise, but the order of magnitude difference in performance demanded coverage to increase awareness for those interested in pushing the limits.

Given the choice to increase throughput by a likely factor of 3x for RogueDB server side, the choice was obvious to go with the C4D for customers.

Pattern

Featured Blogs

Explore additional topics on Software Foundations, Software Architecture, Benchmarking, Performance Optimizations, and DevOps.

Understanding the effect on throughput of client hardware when using RogueDB.
Understanding the effect on throughput when reusing channels for gRPC.
Understanding the effect on throughput of batching on RogueDB's internal implementation.
Understanding the effect on throughput when batching messages using gRPC.
Understanding the effect on throughput for different communication patterns using gRPC bidrectional streaming.

First Mover Advantage
Limited Availability Discounts

0-100 Users: 50% Discount. Promo: FIRST100
101-250 Users: 25% Discount. Promo: FIRST250
251-500 Users: 10% Discount. Promo: FIRST500