Speed up Long-Running SQL Queries Using a For Loop in Python

If you have a long-running query, splitting it up into smaller queries can help with performance. With Python, we can dynamically loop through each query.

I have run into situations where my query joins two large tables and it can take forever to run. However, when I filter each table down to a single account, the query performance is greatly improved. With Python, we loop through a list of items, dynamically fill the query statement, and run them using a for loop. Let’s get started.

Step 1: Generate a List

The first step is to generate a list of id’s or whatever you are using in your SQL WHERE statement. In my case, I am converting a Pandas column to a list.

ids = reminders_df.ids.unique().tolist()

You could also create a random list:

from numpy import random
ids = [random.randint(100) for i in range(0,100)]

Step 2: Break up List into Smaller Lists

Next, we can break up the list into smaller chunks. In this example, my chunk size will be 10.

chunk_size = 10
ids_chunks = [ids[x:x+chunk_size] for x in range(0, len(ids), chunk_size)]

The final result is a list of lists. Each sublist has 10 items. The last last item in the list will likely have less than 10 items unless your original list size is a multiple of 10.

Step 3: Loop Through your Query

Now that we have our list of lists, we can perform the following:

  1. Dynamically add them into the WHERE statement of our SQL query
  2. Run the query
  3. Append the results to a Pandas DataFrame

Here’s an example:

#Query starts here with brackets to insert your list items
query ='''Select * from accounts where account_id in ({})'''

#Function starts here that will take your list as the parameter
def query_loop(lst):
  
  #create an empty DataFrame that will hold the results once the query runs
  combined_df = pd.DataFrame()
  
  #Here is the loop. We use first need to convert the list into a string then use format to insert the string into the query
  for i in lst:
    df = function_to_run_query(query.format(",".join(str(int(j)) for j in i)) )
    
    #Once the Loop Finishes, we concat the results into the DataFrame
    combined_df = pd.concat([combined_df,df])
  
  return combined_df

Once you do this, you can call query_loop(ids_chunks).

Note that I have a function called function_to_run_query() and I don’t define that anywhere in this post. The basic idea of that function is that it takes one parameter (your query), then runs it in your database, and will return the results as a Pandas DataFrame. You can check out my post on connecting to a remote MySQL database to learn more.

Final Thoughts

Looping through a query is one way to speed up a long-running query. The other thing to consider is why your query takes so long to run. Sometimes, it can be faster to break apart the query and import the raw data from the tables into Python. Then you can leverage Pandas to merge the datasets.

Thanks for reading!


Posted

in

,

by

Tags: