How to Extract all Numbers from a String Column in Python Pandas

In this post, I’ll show you how to extract every number from a string in Python using regular expressions.

Extracting phone numbers or simply removing the symbols from numbers are the two most common reasons I am looking to strip out everything but the actual numbers from a Python Pandas string column.

Consider the following DataFrame:

import pandas as pd
df = pd.DataFrame(['+1-555-555-5555','+1 (123) 456 7899', 'Here is my number 1-555-555-6666',123456789],columns=['Numbers and Text'])
df

Here is how you can run to return a new column with only the numbers:

df['Numbers Only'] = df['Numbers and Text'].astype('str').str.extractall('(\d+)').unstack().fillna('').sum(axis=1).astype(int)

Breakdown

.astype('str')

First, we need to make sure that the entire column is a string in order to apply regex. This step is important if you have mixed data types in your column.

.str.extractall('(\d+)')

Using extract all, we can return every number from the string. The (\d)+ will look for any digit or any length in the string.

The result ends up being a longer DataFrame where every match becomes its own row in a multi-index DataFrame.

.unstack()

This brings the matched numbers into the same row as the original string. Each grouping will be in a separate column.

.fillna('')

Since certain strings may have more matched groups than others, there may be NaN values in your DataFrame. This step removes them because otherwise, any operation that involves a NaN value results in NaN value.

.sum(axis=1)

Finally, to bring all the numbers together into one cell, we can run a sum. The axis=1 argument will sum the row values. The groupings are each stored as a string so it doesn’t add the numbers but will concatenate the numbers. The resulting column will be stored as a float data type.

.astype(int)

This last step is optional if you don’t want your number to be a float.

Final Thoughts

Check out more Python tricks in this Colab Notebook or in my recent Python Posts.

Thanks for reading!


Posted

in

by

Tags: