How to Calculate a Net Promoter Score (NPS) using Python

The Net Promoter Score has become a popular way to analyze survey data. Instead of calculating a straight average for 0 through 10 scores, scores are bucketed into Detractors, Passives, and Promoters.

In this post, I’ll use the Net Promoter Score methodology and apply it to a dataset of raw scores using Python.

What is the Net Promoter Score?

If you’ve ever flown on an airplane, stayed at a hotel, or purchased something online, you’ve probably received a survey emailed to you after the experience. The survey has a number of questions where you would rate on a scale of 0 to 10. The most important question of the survey is something phrased about how likely you would be to recommend the product, service, or company to friends or family.

  • Promoters – this group gave a 9 or a 10 rating and was very satisfied with the product, service, or company. They will likely speak about it proactively with friends and family.
  • Passives – this group rated a 7 or an 8 and had an okay experience with the product, service, or company. They are only counted in the denominator of the equation.
  • Detractors – this group rated a 6 or lower and had a bad experience. They will likely proactively speak poorly about the product, service, or company.

The Net Promoter Score methodology involves bucketing scores into three groups as illustrated above. After that, you subtract the Detractors from the Promoters and then divide by the All Survey Responses.

Here is the calculation:

The final result can mean that the NPS score can range anywhere from 100 to -100. Yes, scores can be so bad they go negative.

Overview of the Analysis

Here are the steps to calculating and plotting the NPS calculation I’ll follow in this post:

  1. Create Sample Data
  2. Melt the DataFrame
  3. Categorize each Score
  4. Calculate the Net Promoter Score
  5. Chart the Data

Create Sample Data

For this project, I am creating a random data set with numbers from 0 to 10. 9 and 10 are weighted more heavily than 7 and 8 scores, and 7 and 8 scores more heavily than 0 and 6 scores. I typically run across data sets that are skewed to higher numbers so that is why I am applying this weighting.

In addition to random scores, I am assigning a random country and traveler type (business or leisure) to make the data set more interesting. This way we can break out the data by country and traveler type in our chart and analyze some trends.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick

df1 = pd.DataFrame(np.random.randint(9,11,size=(1000, 1)), columns=['How likely are you to reccomend the product?']) #promoters
df2 = pd.DataFrame(np.random.randint(7,9,size=(400, 1)), columns=['How likely are you to reccomend the product?']) #passives
df3 = pd.DataFrame(np.random.randint(0,7,size=(100, 1)), columns=['How likely are you to reccomend the product?']) #detractors

df = pd.concat([df1,df2,df3], ignore_index=True)

df['Country Number'] = np.random.randint(1, 6, df.shape[0]) #assiging a random number to assign a country
df['Traveler Type Number'] = np.random.randint(1, 3, df.shape[0]) #assigning a random number to assign a traveler type

#Function to assign a country name
def country_name(x):
    if x['Country Number'] == 1:
        return 'United States'
    elif x['Country Number'] == 2:
        return 'Canada'
    elif x['Country Number'] == 3:
        return 'Mexico'
    elif x['Country Number'] == 4:
        return 'France'
    elif x['Country Number'] == 5:
        return 'Spain'
    else:
        pass

#Function to assign a traveler type
def traveler_type(x):
    if x['Traveler Type Number'] == 1:
        return 'Business'
    elif x['Traveler Type Number'] == 2:
        return 'Leisure'
    else:
        pass

#apply the function to the numbered columns
df['Country'] = df.apply(country_name, axis=1)
df['Traveler Type'] = df.apply(traveler_type, axis=1)

df[['How likely are you to reccomend the product?', 'Country', 'Traveler Type']] #view to remove the random number columns for country and traveler type

Here’s a quick preview of the data:

Melt the DataFrame

Melting the DataFrame turns the data into a long format. This makes it much easier for grouping the data.

It doesn’t really make a big difference for this sample data set and it’s more relevant if you have multiple survey questions.

melted_df = pd.melt(frame = df, id_vars = ['Country','Traveler Type'], value_vars = ['How likely are you to reccomend the product?'],value_name='Score', var_name = 'Question' )

melted_df = melted_df.dropna()

melted_df['Score'] = pd.to_numeric(melted_df['Score'])
melted_df

Categorize each Score

I am using a custom function to categorize each score.

def nps_bucket(x):
    if x > 8:
        bucket = 'promoter'
    elif x > 6:
        bucket = 'passive'
    elif x>= 0:
        bucket = 'detractor'
    else:
        bucket = 'no score'
    return bucket
melted_df['nps_bucket'] = melted_df['Score'].apply(nps_bucket)

melted_df

Here is a preview of the DataFrame so far.

New DataFrame preview with the nps_bucket column at the end

Calculate the Net Promoter Score

To calculate the actual NPS score, I am going to group my data frame by the Country and Traveler Type. At the same time, I am applying a lambda function that uses the equation above.

grouped_df = melted_df.groupby(['Country','Traveler Type','Question'])['nps_bucket'].apply(lambda x: (x.str.contains('promoter').sum() - x.str.contains('detractor').sum()) / (x.str.contains('promoter').sum() + x.str.contains('passive').sum() + x.str.contains('detractor').sum())).reset_index()

grouped_df_sorted = grouped_df.sort_values(by='nps_bucket', ascending=True)

Chart the Data

I am using seaborn to chart and style the data.

sns.set_style("whitegrid")
sns.set_context("poster", font_scale = 1)
f, ax = plt.subplots(figsize=(15,7))

sns.barplot(data = unpivoted_sorted, 
            x = 'nps_bucket', 
            y='Country', 
            hue='Traveler Type',
               ax=ax)
ax.set(ylabel='',xlabel='', title = 'NPS Score by Country and Traveler Type')
ax.set_xlim(0,1)
ax.xaxis.set_major_formatter(plt.NullFormatter())
ax.legend()

#data labels
for p in ax.patches:
    ax.annotate("{:.0f}".format(p.get_width()*100),
                (p.get_width(), p.get_y()),
                va='center', 
                xytext=(-35, -18), #offset points so that the are inside the chart
                textcoords='offset points', 
                color = 'white')
    
plt.tight_layout()
plt.savefig('NPS by Country.png')
plt.show()

In my sample data set, it looks like Leisure guests rated higher in every country except for Spain.

And that’s it! Leave a comment below if you enjoyed this post or have a question!


Posted

in

by

Tags: