Why Software Engineers Should Stop Stuffing Everything in MySQL

Aggregating data from multiple sources into a centralized place can be a challenging task when creating reports. In the early stages, many software engineering teams tend to rely on familiar tools, often their application databases. Since the majority of data for tech startups is generated from their apps, it may seem logical to incorporate additional data. However, while databases like MySQL are excellent for powering applications, they may not be as effective in handling analytical workflows.

MySQL databases can be compared to single-lane roads in a peaceful countryside town. They are efficient and designed to handle the regular traffic. However, when faced with a sudden influx of data, congestion can occur. On the other hand, Data Warehouses can be likened to the vast multi-lane highways of a bustling city. They are built to accommodate a large volume of data from different sources, ensuring a smooth flow even during peak data periods. While MySQL is well-suited for specific tasks, Data Warehouses excel in consolidating extensive data from various origins.

Dealing with Changing Schemas is a Headache

One of the realities of SaaS applications like Salesforce is that the object schemas change frequently. Salesforce, in particular, is continuously updated with new and modified fields to meet business needs. This creates a headache for software engineers trying to replicate Salesforce data into MySQL. The MySQL schema inevitably falls behind, and then engineering resources must be spent on identifying discrepancies and modifying the schema to catch up.

Log Volume Overwhelms MySQL

A common pitfall I see is teams storing all their log data in MySQL by default. The rationale is understandable – if it’s data, it should be in the database, right? However observational log data accumulates extremely quickly, and suddenly, engineers realize their MySQL instance is dominated by log storage needs. The compute capability required to support obscure analytical queries over these logs forces overprovisioning and leads to bloated costs. A data warehouse elegantly handles the separation of storage and compute, so it is cheap to store the data and cost-effectively analyze it since it’s not accessed frequently.

BI Tools Do not Play Nice with MySQL

Business intelligence tools like Sigma, Tableau, and Power BI have become table stakes in modern analytics stacks. However, because these tools are designed for data warehousing architectures, getting them to work with MySQL requires jumping through hoops. The right connectors must be installed and configured, SQL queries have to be optimized differently, and performance still tends to lag. Modern BI tools will have native connectors for Data warehouses.

Transactional Workloads Bog Down MySQL

While MySQL performs well for transactional applications, its performance deteriorates significantly under analytic workloads. Running complex queries on large datasets is exactly the type of workload MySQL was not designed for. The more data stuffed into MySQL, the worse this issue becomes. Modern data warehouses offer much snappier performance for these types of analytical and reporting queries.

Security and Governance Risks

Centralizing all data into MySQL increases security and governance risks. Features like fine-grained access controls, data masking, and robust audit logs are native to modern data warehouses. However, adding these enterprise-grade controls to MySQL requires additional tools and complexity. There is also the risk of transactional data being queried inappropriately. With a modern data warehouse, governance policies like row-level security can be enforced.

Lack of Support for Machine Learning

MySQL lacks native support for machine learning capabilities required in modern analytics. While some SQL functions and extensions are available, they pale in comparison to the advanced ML tooling built into cloud data warehouses. Tasks like model training/scoring, feature engineering, and model drift monitoring require added platforms and services if relying solely on MySQL.

Final Thoughts

Defaulting to MySQL for analytics and data consolidation inevitably leads to pain. Trying to force rigid schemas, performance bottlenecks, cost overruns, and tool incompatibilities to make it work is not the solution. Instead, use the right technology – purpose-built data warehouses that handle analytics at scale – alongside MySQL for transactional apps. This divides responsibilities based on strengths. The bottom line is leveraging the right tools, not trying to make one tool do everything.

Thanks for reading!

Posted

August 24, 2023

Data Engineering

Paul

Tags: