Archives

Archives

Posts by Category


All Posts

  • The Problems with Data Warehousing for Modern Analytics

    The Problems with Data Warehousing for Modern Analytics

    Cloud data warehouses have become the cornerstone of modern data analytics stacks, providing a centralized repository for storing and efficiently querying data from multiple sources. They offer a rich ecosystem of integrated data apps, enabling seamless team collaboration. However, as data analytics has evolved, cloud data warehouses have become expensive and slow. In this post,…

    Read More


  • How to Export Data from MySQL to Parquet with DuckDB

    How to Export Data from MySQL to Parquet with DuckDB

    In this post, I will guide you through the process of using DuckDB to seamlessly transfer data from a MySQL database to a Parquet file, highlighting its advantages over the traditional Pandas-based approach.

    Read More


  • The Reality of Self-Service Reporting in Embedded BI Tools

    The Reality of Self-Service Reporting in Embedded BI Tools

    Offering the feature for end-users to create their own reports in an app sounds innovative, but it often turns out to be impractical. While this approach aims to give users more control and reduce the workload for developers, it usually ends up being too complex for non-technical users who find themselves lost in the data,…

    Read More


  • Unlocking Real-Time Data with Webhooks: A Practical Guide for Streamlining Data Flows

    Unlocking Real-Time Data with Webhooks: A Practical Guide for Streamlining Data Flows

    Webhooks are like the internet’s way of sending instant updates between apps. Think of them as automatic phone calls between software, letting each other know when something new happens. For people working with data, this means getting the latest information without having to constantly check for it. But, setting them up can be challenging. This…

    Read More


  • Streamlining Data Analysis with Dynamic Date Ranges in BigQuery

    Streamlining Data Analysis with Dynamic Date Ranges in BigQuery

    Effective data analysis hinges on having complete data sets. Commonly, grouping data by days or months can result in significant gaps due to missing data points. In this post, I’ll guide you through a more efficient strategy: dynamically creating date ranges in BigQuery. This approach allows for on-the-fly date range generation without the overhead of…

    Read More


  • Effortless Python Automation: Simple Script Scheduling Solutions

    Effortless Python Automation: Simple Script Scheduling Solutions

    If you want your Python script to run daily, it might seem as simple as setting a time and starting it. However, it’s not that straightforward as most Python environments lack built-in scheduling features. There’s a range of advice out there, with common suggestions often involving complex cloud services, which are overkill for simple tasks.…

    Read More


  • Solving Pandas Memory Issues: When to Switch to Apache Spark or DuckDB

    Solving Pandas Memory Issues: When to Switch to Apache Spark or DuckDB

    Data Engineers often face the challenge of Jupyter Notebooks crashing when loading large datasets into Pandas DataFrames. This problem signals a need to explore alternatives to Pandas for data processing. While common solutions like processing data in chunks or using Apache Spark exist, they come with their own complexities. In this post, we’ll examine these…

    Read More


  • From JSON Snippets to PySpark: Simplifying Schema Generation in Data Pipelines

    From JSON Snippets to PySpark: Simplifying Schema Generation in Data Pipelines

    When managing data pipelines, there’s this crucial step that can’t be overlooked: defining a PySpark schema upfront. It’s a safeguard to ensure every new batch of data lands consistently. But if you’ve ever wrestled with creating Spark schemas manually, especially for those intricate JSON datasets, you know that it’s challenging and time-consuming. In this post,…

    Read More


  • Getting BI Right the First Time: An Insider’s Guide to High-Impact BI

    Getting BI Right the First Time: An Insider’s Guide to High-Impact BI

    Business Intelligence (BI) Implementations go wrong more often than right. I’ve experienced this first hand and this post is going to outline the top challenges that get in the way of a successfully deployed dashboard at a lean tech startup.  In this post, BI encompasses reports and dashboards used for internal and external (customer-facing) purposes. 

    Read More


  • Why Software Engineers Should Stop Stuffing Everything in MySQL

    Why Software Engineers Should Stop Stuffing Everything in MySQL

    Aggregating data from multiple sources into a centralized place can be a challenging task when creating reports. In the early stages, many software engineering teams tend to rely on familiar tools, often their application databases. Since the majority of data for tech startups is generated from their apps, it may seem logical to incorporate additional…

    Read More