Data Analytics Simplified
Welcome to Data Analytics Simplified, a blog dedicated to helping you streamline data workflows, automate processes, and scale your infrastructure—without the headaches. Whether you’re battling messy spreadsheets, inefficient pipelines, or trying to get the most out of your data analytics investments, you’re in the right place.
I’ll share proven strategies, tips, and frameworks from my experience in data engineering and analytics, focusing on:
Data doesn’t have to be overwhelming. With the right approach, you can declutter, optimize, and build a solid foundation for data science and analytics.
Let’s get to work.
Running multiple aggregations when grouping a Pandas DataFrame can be accomplished using the .agg function and passing in a dictionary. In this post, I’ll show you how to do that.
Sometimes it’s just easier to work with a single-level index in a DataFrame. In this post, I’ll show you a trick to flatten out MultiIndex Pandas columns to create a single index DataFrame.
If you have imported a python file and later make changes to it, you’ll need to reload it in your Jupyter Notebook to take advantage of any recent changes.
Having random or test data is a great way to test out various functions before applying them to actual data. Here are a few ways to generate random or test data in pandas.
Easily and quickly combine multiple excel files that contain the same type of data.
Easily and automatically capture data from websites using some built-in functionality in Google Sheets.
Using subqueries in SQL is a trick that can be used to make a query dynamic or greatly decrease the execution time of a query. In this post, I’ll show you two tricks I use often to make my queries more efficient.
I recently created my own SQLite database to do a one-off analysis on a special project. My database was pretty simple and had a couple of very large tables that consisted of millions of rows.
In this post, I’ll show you how you can quickly and easily read and combine multiple excel files into one Pandas DataFrame.