Python Plotly and GPT-4: Engineering Beautiful Maps for Kicks
A modular prompting approach to maximize Python Plotly coding productivity
Modular prompt engineering with GPT-4 for Python code generation is highly efficient method for generating complex data visualization code, such as choropleth maps.
This approach saves time, effort, and reduces the complexity traditionally associated with crafting such maps. It’s so easy, you can do it just for fun!
By clearly defining the task and input parameters, this LLM can generate relevant and usable code requiring only a copy/paste/run for instant data visualizations.
In this article, I present three specific hands-on examples (in increasing complexity) to prompt GPT-4 for Python Plotly code creation. Specifically to:
generate a global choropleth map for a dataset for a specific year.
generate an animated global choropleth map for a dataset across a number of years
generate an animated choropleth map for a specific region and include an additional line chart to provide a deeper representation of the data
And let’s do all this with a well known global dataset - the Global Happiness Data Set
The Dataset
To showcase how GPT-4 can generate Python Plotly interactive code, let’s use the global happiness data that can be found HERE.
For the sake of this exercise, the data from 2015–2022 has been aggregated in a file called “happiness.csv”. The CSV (first 5 rows) looks like so:
Country,2015,2016,2017,2018,2019,2020,2021,2022
Switzerland,7.587,7.509,7.493999958,7.487,7.48,7.559899807,7.571,7.345
Iceland,7.561,7.501,7.50400018692017,7.495,7.494,7.504499912,7.554,7.664
Denmark,7.527,7.526,7.52199983596802,7.555,7.6,7.645599842,7.62,7.743
Norway,7.522,7.498,7.53700017929077,7.594,7.554,7.487999916,7.392,7.365
Canada,7.427,7.404,7.31599998474121,7.328,7.278,7.23210001,7.103,7.417
For this dataset, global happiness for each country is determined from a number of performance indicators. The higher the value, the more happy a country is considered to be.
With this dataset, we can use tools that give us a global representation of countries compared to each other, to perhaps see regional comparisons.
A great tool to start with is a choropleth map.
A choropleth map uses differences in shading or colour within predefined areas (ie countries) to indicate the average values of a particular quantity in that area. As an example, we can use a darker shade to indicate countries with higher values.
Let’s get GPT-4 to help us draw one.
Example 1. Global Choropleth Map For a Single Year
Given the structure of our dataset, we’ll focus on representing the happiness scores for the year 2020 only on a choropleth map.
Plotly Express will be our main tool for this task.
To prompt GPT-4 to generate the required code:
Prompt to GPT-4: Please write Python code using Plotly that will generate a choropleth map of global happiness scores in 2020. The input data is a DataFrame named ‘df’ with columns ‘Country’ and ‘2020’.”
GPT-4 Response:
import pandas as pd
import plotly.express as px
# Load the happiness dataset
df = pd.read_csv('happiness.csv')
fig = px.choropleth(df,
locations="Country",
locationmode = "country names",
color="2020",
hover_name="Country",
color_continuous_scale = 'YlOrRd',
title = 'Global Happiness Score for 2020',
labels={'2020':'Happiness Score'},
projection="natural earth")
fig.show()
And there we have it, just like that! What is happening in this Python code?
First, we import the required libraries.
pandas
is used for data manipulation, whileplotly.express
provides the visualization functions. (NOTE: make sure you have both libraries installed with your Python version)The
read_csv()
function of pandas is then used to load the 'happiness.csv' dataset into a DataFramedf
.We create a choropleth map using the
choropleth
function ofplotly.express
. We specify the DataFrame (df
), the column to be used for the locations on the map (Country
), the location mode (country names
), the column to be used for the color scale (2020
), and the column to be displayed when hovering over the map (Country
).We set
color_continuous_scale
to 'YlOrRd' for a yellow-to-red color scale, andprojection
to 'natural earth' for a more geographical realistic map.The
title
andlabels
parameters are used to add a title to the map and rename the color axis label, respectively.We display the map using the
show()
function.
And if we Copy/Save/Run this code into our favourite Python editing tool (ie. Jupyter Notebooks, or PyCharm), the beautiful result:
This map allows us to visually compare the happiness scores of different countries in the year 2020. We can see parts of Europe shaded as a dark red (the happiest color using our color scale).
So this is a great start, but this example does not allow any interactivity. We can generate additional code to allow the user to select a year. This provides more complete access to this dataset.
And with Plotly, we can do this with a slider.
Example 2: Plotting Global Data Across a Number of Years on a Choropleth Map
To add a bit more complexity, let’s now prompt GPT-4 to create a choropleth map that displays across all of the years in the dataset.
Prompt to GPT-4: Given a pandas DataFrame ‘df’ that contains columns ‘Country’ and the years 2015–2022 representing happiness scores), write a Python script using Plotly to create a choropleth map where the color of each country represents its happiness score and display for each year.
GPT-4 Response:
import pandas as pd
import plotly.graph_objects as go
# load the happiness dataset
df = pd.read_csv('happiness.csv')
fig = go.Figure(data=go.Choropleth(
locations = df['Country'],
z = df['2015'],
text = df['Country'],
colorscale = 'YlOrRd',
autocolorscale=False,
reversescale=False,
marker_line_color='darkgray',
marker_line_width=0.5,
colorbar_title = 'Happiness<br>Score',
))
fig.update_layout(
title_text='Global Happiness Score (2015)',
geo=dict(
showframe=False,
showcoastlines=False,
projection_type='equirectangular'
)
)
fig.show()
In this code snippet, we first import the necessary libraries and read our data from the ‘happiness.csv’ file, then:
Create a choropleth map using Plotly’s go.Choropleth function, where the ‘locations’ parameter is set to the ‘Country’ column and the ‘z’ parameter is set to the ‘2015’ column of our DataFrame, which represents the happiness scores for the year 2015.
The ‘colorscale’ parameter is set to ‘PuBuGn, which stands for Purple-Blue-Green, a sequential color scale that is suitable for displaying continuous data.
The ‘colorbar_title’ parameter sets the title for the color bar that shows the legend for our colorscale, which in this case, represents happiness scores.
The
fig.update_layout
function is used to set the title of our plot and customize the appearance of our geographical layout.
You can see that this map contains a slider on the bottom that may be clicked on to display the global results for that particular year.
In addition, the “Play” icon in the bottom left may be clicked on to view year-to-year as an animation. Very cool.
Example 03: Plotting an Animated Choropleth Map (and Line Chart) for a Specific Region
Using the same approach and the same dataset from the previous examples, we can progress to slightly more complex visualizations.
For this example, let’s create a map for a region of the world, and an accompanying line chart (to provide a deeper analysis)
Part 1: Animated Choropleth Map
For the choropleth map, let’s just focus on two countries: Canada and the USA.
Prompt for GPT-4: Write Python code to create an animated choropleth map highlighting the United States and Canada with happiness scores from 2015 to 2022. The color scale should remain consistent across all years, ranging from 6 to 8. The input data is a DataFrame named ‘df’ with columns ‘Country’ and years from 2015 to 2022.
GPT-4 Response:
import pandas as pd
import plotly.express as px
# Load the happiness dataset
df = pd.read_csv('happiness.csv')
# Filter for Canada and USA
df_subset = df[df['Country'].isin(['United States', 'Canada'])]
# Reshape the data
df_melt = df_subset.melt(id_vars='Country', var_name='Year', value_name='Score')
# Create choropleth map
fig = px.choropleth(df_melt,
locations="Country",
locationmode = "country names",
color="Score",
hover_name="Country",
animation_frame="Year",
color_continuous_scale = 'YlOrRd',
title = 'Happiness Score for United States and Canada (2015-2022)',
labels={'Score':'Happiness Score'},
scope='north america',
range_color=(6,8)) # consistent color scale
fig.show()
In this code, we add the range_color
parameter to px.choropleth()
to ensure that the color scale remains consistent across all animation frames, ranging from 6 to 8.
And our regional result, comparing Canada and the USA:
By clicking the “Play” icon in the bottom left corner, the animation starts at 2015 and displays each year in sequence moving left-to-right to 2022.
In this way, we can see the comparative changes in happiness over time specifically between these 2 countries.
What we cannot easily see from this animated choropleth map is how these two countries compare across all of the years.
A simple and awesome way to provide this comparison is by using a time-series line chart.
Part 2: Line Chart
So let’s ask GPT-4 to generate a time-series line chart data visualization.
Prompt to GPT-4: Please write Python code to create a line chart comparing the happiness scores from 2015 to 2022 between the United States and Canada. The input data is the same DataFrame from Part 1.
GPT-4 response:
import plotly.graph_objects as go
# Reshape the data
df_melt = df_subset.melt(id_vars='Country', var_name='Year', value_name='Score')
# Create the line chart
fig = go.Figure()
for country in ['United States', 'Canada']:
fig.add_trace(go.Scatter(
x=df_melt[df_melt['Country'] == country]['Year'],
y=df_melt[df_melt['Country'] == country]['Score'],
mode='lines+markers',
name=country
))
fig.update_layout(
title='Happiness Score: United States vs. Canada (2015-2022)',
xaxis_title='Year',
yaxis_title='Happiness Score'
)
fig.show()
Here, we reshape the DataFrame to a long format suitable for line chart plotting. Then we create the bar chart using the add_trace() function as a Scatter diagram. By setting the mode to ‘lines+markers” each point is connected in a time-series line chart.
The trend of happiness scores for the United States and Canada from 2015 to 2022?
These two visualizations, when viewed together, provide both geographical context and a clear depiction of happiness score trends for the United States and Canada over time.
So who’s happier? Me (the Canadian), or my wife (the American)? On average, it would seem to be me. However, my wife is definitely an outlier - there’s no way I can be as happy as she is.
And that’s all, folks! I hope this all worked out for you as well.
One final note - A disclaimer is necessary here. LLMs like GPT-4 do not produce the same results every time. Inherent in their design is some variability which may manifest it self in different ways. For example, your code may not look exactly like mine, or produce exactly the same visual display.
In Summary…
The power of modular prompt engineering for code generation is clear — for Python code generation it can seamlessly produce complex data visualization code.
The purpose of this article is to show how you can use a modular approach to gradually build complexity into your prompts — exactly the same as you would do if you were actually directly writing Python code.
The advantage of this model, of course, is that you only need to be able to clearly articulate what you want to do, rather than worry about the nit-picky syntax details of how to do it.
This allows you to produce high-level mapping code in record time.
Thank you for reading.
If you want to learn more about how to prompt engineer GPT-4, sign up for this free 5-day email course on Prompting GPT-4 for data visuals. Let me know what you think: