Building data cubes using dedicated servers improves darwin.Cloud performance
Updating Data Cubes on dedicated hardware improves the speed of darwin.Cloud
At AccountTECH, we constantly innovate to improve darwin.Cloud performance and efficiency. One area we're enhancing is how our data cubes are updated to ensure minimal impact on darwin.Cloud users while maintaining timely and accurate analytics. Here's how we achieve it:
Improving Data Cube Efficiency with Kafka and SQL Replica
The Challenge: Resource-Intensive Updates
Data cubes, which power darwin.Cloud reports, spotlights, and analytics, need to be updated whenever data in the darwin database changes. Previously, these updates occurred directly on the production server, creating challenges such as:
- Increased load on the production server.
- Slower performance for users performing day-to-day tasks.
The Solution: Offloading Updates with Kafka and SQL Replica
To address these challenges, we re-engineered our data cube update process with the following enhancements:
Explore the logic behind this engineering change here
- Offloading Processing:
- All cube update tasks are being moved off the production server, ensuring that user operations like data entry and retrieval remain fast and uninterrupted.
- Event-Driven Updates via Kafka:
- A Kafka-based system has been introduced to asynchronously process data change events (INSERT, UPDATE, DELETE) to the darwin.Cloud cubes. This change decouples cube updates from real-time production operations as users work in darwin.Cloud thru the day.
- Using SQL Replica for Data:
- Cube updates now retrieve data exclusively from our Always On SQL Replica, a read-only copy of the production database. This ensures production servers are not impacted by heavy read-write queries that are necessary to update data cubes..
- Delaying Updates for Synchronization:
- Using Kafka, we have introduced a deliberate delay before processing cube update events, this allows the SQL Replica sufficient time to synchronize with the primary production database. This guarantees that updates are based on the most complete and accurate data. Usually, each client's SQL Replica databases updates in milliseconds, but in the case there is a delay of minutes in the asynchronous update, the Kafka code for updating the cubes will have waited 10 minutes to make sure the the replica has all the latest updates.
Benefits of the New Approach
- Faster Production Performance:
- The darwin.Cloud production servers are no longer burdened by cube processing, enabling faster data entry and retrieval for users.
- Efficient Resource Utilization:
- By leveraging the SQL Replica, we distribute workloads more effectively across our infrastructure.
- Scalability:
- Kafka ensures the system can handle a high volume of data changes without slowing down.
- Accurate and Timely Analytics:
- While updates are not real-time, the delay is negligible for most use cases, ensuring analytics remain relevant and reliable. This means that cube updates will happen automatically in “real-enough” time.
Why this matters to you
This innovative approach allows darwin.Cloud to deliver faster, more efficient, and reliable analytics without compromising the performance of darwin's day-to-day production servers. At AccountTECH, we're committed to leveraging cutting-edge technology to provide the best experience for our users.