Eye-Catching Python Sunburst Charts: One of a Kind Visuals That Really Shine
Practical examples showcasing UNHCR refugee multi-dimensional data with Plotly
A sunburst chart is a beautiful way to visualize multi-dimensional data. It can be quite spectacular to look at.
My eye is always drawn towards sunburst charts, making me curious about what data points they are representing.
Recently, I found a data set that seemed to be a perfect fit for trying out my beginner Python Plotly sunburst chart creation skills — the UNHCR asylum seeker data set.
Want to see how easy it really is?
Here’s how I put this dataset into action with a few simple sunburst charts.
A Dataset: UNHCR Database on Global Statistics
The UN High Commission for Refugees (UNHCR) tracks statistics on refugee movements across the globe.
Their data is freely accessible HERE.
After arriving at the download page we can be granular on the data that we select:
For this project, we want the dataset to contain the county of origin for each refugee and the country of asylum.
With this data, we can create a global map that shows the numbers of
from country of origin — where asylum seekers are going to
from country of asylum — where asylum seekers are coming from
Once we download the dataset, we can open it up in spreadsheet format to see what we are dealing with:
The data fields that we are interested in for this project are:
Country of origin (including 3-letter ISO code) — where a person seeking asylum is coming from
Country of asylum (including 3-letter ISO code) — where a person is actually seeking asylum
Recognized decisions — whether the person seeking asylum was accepted (numeric total by country)
Both the country of origin and asylum have what is called a “3-letter ISO code” that we can use as a unique country identifier. Super useful!
Python Plotly provides a function called starburst() that allows us to view this data in a fun, colorful, and useful way.
Now that we have our data ready to go, let’s create our first sunburst chart.
Attempt 1: The simplest starburst chart
From a programmatic perspective, let’s start as simply as we can. Let’s create a sunburst data visualization that represents all of our data.
And let’s start with looking at the Country of Asylum angle. We want to answer the question where are asylum seekers to THIS country coming FROM?
Let’s start with setting up our Python environment.
1. Import Libraries, Load and Filter Dataset
First, we need to make sure we have all of the libraries available to handle our dataset and to display our visualizations, and then we can access our dataset:
import pandas as pd
import plotly.express as px
# Load the dataset
file_path = 'asylum-decisions.csv'
data = pd.read_csv(file_path)
# Filter out rows to avoid division errors
sunburst_data = data[data['Recognized decisions'] > 0].groupby(['Country of asylum', 'Country of origin']).sum().reset_index()
Our code explained:
Pandas: A powerful data manipulation and analysis library in Python.
Plotly Express: A high-level data visualization library in Python, which is part of the Plotly library.
file_path: The path to the CSV file containing the asylum decisions data.
data: A Pandas DataFrame that stores the data from the CSV file.
Filter rows: Removes rows where the ‘Total decisions’ column has a value of zero to prevent errors in the sunburst chart.
Group by: Groups the data by ‘Country of asylum’ and ‘Country of origin’ to aggregate the ‘Total decisions’ for each combination.
sum(): Aggregates the data by summing the values for each group.
reset_index(): Resets the index of the DataFrame to turn the grouped columns into regular columns.
2. Create the Sunburst Chart
Now that the data is structured for data visulalization, we can draw our first sunburst chart:
fig = px.sunburst(
sunburst_data,
path=['Country of asylum', 'Country of origin'],
values='Recognized decisions',
color='Recognized decisions',
color_discrete_sequence=px.colors.qualitative.Plotly,
title='Asylum Decisions Sunburst Chart'
)
The Plotly sunburst() function creates a sunburst chart using Plotly Express. The data frame is passed in as a parameter, then:
path: Defines the hierarchy of the sunburst chart (first by ‘Country of asylum’, then by ‘Country of origin’).
values: The size of each segment in the sunburst chart, determined by ‘Total decisions’.
color: Colors each segment based on the ‘Total decisions’ value.
color_continuous_scale: The color scale used for the chart, here ‘Viridis’ is selected.
title: Sets the title of the chart.
3. Update Layout and Display the Chart
fig.update_layout( coloraxis_showscale=True, title_font_size=20 )
fig.show()
The update_layout() function updates the layout of the chart.
coloraxis_showscale: Displays the color scale legend.
title_font_size: Sets the font size of the title.
The show() function displays the sunburst chart.
And that is all the code we need. We can Copy/Paste/Save/Run each of these code snippets together into our favourite Python Editor (I use Pycharm). With Plotly, the results will be displayed as a webpage into your default browser.
The results of our first attempt:
Wow, that is a lot of information to view in one visualization! Too much as it turns out — it is extremely hard to make any sense of this visualization.
So how can we fix this? Well, we can narrow down what it is that we see.
Attempt 2: Narrowing the focus — top 10 countries of asylum
One approach is to display only the top ten countries by Recognized decisions. With this approach, all we need to do is provide further filtering to our data frame.
To modify our code, we can remove some of our previous code and add a few lines of new code.
Previous Code (remove):
# Filter out rows where 'Recognized decisions' is zero
sunburst_data = data[data['Recognized decisions'] > 0].groupby(['Country of asylum', 'Country of origin']).sum().reset_index()
New Code (add):
# Group by 'Country of asylum' and aggregate 'Recognized decisions'
asylum_data = filtered_data.groupby('Country of asylum')['Recognized decisions'].sum().reset_index()
# Get top 10 countries of asylum by 'Recognized decisions'
top_asylum_countries = asylum_data.nlargest(10, 'Recognized decisions')
# Filter the original data to include only top 10 countries of asylum
top_asylum_data = filtered_data[filtered_data['Country of asylum'].isin(top_asylum_countries['Country of asylum'])]
# Group by 'Country of asylum' and 'Country of origin' and aggregate 'Recognized decisions'
origin_data = top_asylum_data.groupby(['Country of asylum', 'Country of origin'])['Recognized decisions'].sum().reset_index()
The key new code additions:
Groups the data by ‘Country of asylum’ to get the total ‘Recognized decisions’.
Selects the top 10 countries of asylum based on these decisions.
Filters the original data to include only the top 10 countries.
Then groups the data by both ‘Country of asylum’ and ‘Country of origin’ to prepare for the sunburst chart.
NOTE: Make sure you also change the data frame input parameter for the sunburst function from sunburst_data to origin_data.
Our new sunburst chart is a little clearer:
As an example of what we can see, Germany is accepting a large percentage of refugees from Syria during this time period.
We can narrow it down even more if we choose specific countries (ie. maybe only the top 5 countries). As a more advanced task, we can give the user the option to do this (left to another tutorial).
Now that we have good idea of how to create a sunburst chart from our data, let’s create a 2nd chart.
For this one, let’s show where folks are going TO from a particular country.
Attempt 3: Reversing direction — top 10 countries of origin
The code for this chart is very similar to our previous chart except that we are now focusing on the country of origin rather than the country of asylum
For brevity, here is the new code for filtering the data and for setting up our sunburst chart (the big change is the focus on “Country of Origin” instead of “Country of Asylum”:
# Filter out rows where 'Recognized decisions' is zero to avoid division errors
filtered_data = data[data['Recognized decisions'] > 0]
# Group by 'Country of origin' and aggregate 'Recognized decisions'
origin_data = filtered_data.groupby('Country of origin')['Recognized decisions'].sum().reset_index()
# Get top 10 countries of origin by 'Recognized decisions'
top_origin_countries = origin_data.nlargest(10, 'Recognized decisions')
# Filter the original data to include only top 10 countries of origin
top_origin_data = filtered_data[filtered_data['Country of origin'].isin(top_origin_countries['Country of origin'])]
# Group by 'Country of origin' and 'Country of asylum' and aggregate 'Recognized decisions'
asylum_data = top_origin_data.groupby(['Country of origin', 'Country of asylum'])['Recognized decisions'].sum().reset_index()
# Create the sunburst chart using a qualitative color scheme
fig = px.sunburst(
asylum_data,
path=['Country of origin', 'Country of asylum'],
values='Recognized decisions',
color='Country of origin', # Use a discrete color based on 'Country of origin'
color_discrete_sequence=px.colors.qualitative.Set1, # Set a qualitative color scheme
title='Asylum Decisions Sunburst Chart by Country of Origin'
)
Our resulting data visualization:
And if you want to make this chart “less busy”, we can narrow it down to just the top 5 countries. We need only change one value in our code, For this line of code:
# Get top 10 countries of origin by 'Recognized decisions'
top_origin_countries = origin_data.nlargest(10, 'Recognized decisions')
Change the first argument in the function nlargest() from 10 to 5. Save your new code and run it. The result:
Now if the characters are too small, or if you are unclear on what each sector is representing, Plotly has a terrific hover feature that displays a text box with all the relevant data for that sector. For example, for Iraq, you can see that the total number of recognized decisions (folks seeking asylum from Iraq) are 237,597 for this time period.
And that’s all for now! Thank you for following along — I hope it all worked for you.
In Summary…
The purpose of this tutorial was to introduce how to create sunburst plots using the Python Plotly library. A real-life useful dataset was provided to help illustrate how sunburst plots can be used in a practical way.
There are folks out there who would never use a chart like this — and I agree that there are other ways that may better represent this data (ie. choropleth maps, time-series charts, etc).
But there is a visual appeal to sunburst charts that really do work for some folks. I am one of them — I am drawn towards this style of chart with the color and the sections and all the puzzle pieces that connect together.
How about you?