Exploratory Data Visualization With GPT-4: Prompting for Python Folium Maps
Interacting with GPT-4 for data storytelling on the illicit Cheetah trade
The ease with which GPT-4 can create choropleth and heatmaps from a dataset is pretty astounding — IF you know which questions to ask.
Having worked with Python for many years, I have now integrated GPT-4 into all of my data visualization workflows.
It really is a terrific tool for the exploration phase of any data visualization project AND as a tutor to help you learn new data visualization libraries - like the Python folium library!
Let me step you through an example of how to learn and explore with GPT-4
Finding Some Fun Data
One of my favourite sites to find cool and interesting data is the site data-is-plural.com. Each week, they put out a newsletter with 4–5-ish sets of interesting and eclectic data.
This site pointed me towards a site containing an Illegal Cheetah Trade dataset.
In a previous article I created a bar chart visualization showing which countries have the highest incidence rate of “cheetah interventions”.
Perhaps a polite way of saying: Hey, you aren’t supposed to have that cheetah, and we know it.
Visualizing Our Data on A Map
Now everyone loves a good choropleth map.
A choropleth map is just a type of heat map that is displayed using geographical data, on a map.
So how can I visualize this data visually on a map? Turns out that the folium library for Python does a great job — it even generates a ready-made HTML page file for us.
Next question then, how can I supercharge the speed of writing this code in Python? Some gentle “massaging” with GPT4 will work.
To start with, we can download this data as an xlsx file (from (HERE).
I exported the file to a CSV file to simplify the data analysis (I named it global_cheetah_data.csv).
Next, we can upload the file and ask GPT4 to convert it to a CSV (easier to work with):
GPT4 responds with an overview:
Incident Data: Includes columns like 'Incident No.', 'Incident Type', 'Incident Date', and 'Discovery Date'.
Geographic Information: Details about the locations involved, including 'Region', 'Country', and 'City/Region'.
Source and Verification: Contains information about the source of the data, such as 'Ultimate source type', 'Ult. Verif. Source', and 'Original Source'.
Incident Details: Includes specific details about the incident like the number of cheetahs involved, whether the animals were confiscated, surrendered, alive, or died, and descriptions of the incidents.
Once we have the CSV file ready to go, the first question to ask:
Prompt to GPT4:
Using the provided dataset can you create the Python and folium code to generate a heatmap that identifies areas with higher or lower cheetah populations.Response from GPT4:
Let's start by examining the contents of the provided dataset to understand its structure and the data available for creating a heatmap of cheetah populations. I'll load the data from the fileglobal_cheetah_data.xlsx
and display the first few rows to see the relevant columns.:
import pandas as pd
import folium
from folium.plugins import HeatMap
# Load dataset
df = pd.read_csv("global_cheetah_data.csv")
# Create map centered at (0, 0)
map = folium.Map(location=[0, 0], zoom_start=2)
# Create a HeatMap layer using the cheetah observations
heatmap = HeatMap(data=df[['Latitude', 'Longitude']], radius=8, blur=6)
# Add HeatMap layer to the map
heatmap.add_to(map)
# Show the map
map
Great!
But, hmmm…. peeking at the Global Cheetah Observations dataset, I know that there is no “Latitude” or “Longitude” field in there.
So I need to be clearer to ChatGPT about which fields to use in the dataset. Let’s try this:
Prompt to GPT4:
Referring to the dataset, what fields may be used to create a heatmap in Python using folium?GPT4 response:
To create a heatmap using folium in Python from the dataset you could use the following fields: Country
OK, GPT4 knows there’s a field called Country, so let’s go with that:
Prompt to GPT4:
Can you change the above code to create the heatmap based on “Country”GPT4 response:
Yes, of course! Here’s the code:
import pandas as pd
import folium
from folium.plugins import HeatMap
# Load dataset
df = pd.read_csv("global_cheetah_data.csv")
# Group the dataset by country and count the number of observations per country
country_counts = df.groupby('Country').size().reset_index(name='Observations')
# Load a GeoJSON file of country borders
geojson_data = 'https://raw.githubusercontent.com/python-visualization/folium/master/examples/data'
# Create map centered at (0, 0)
map = folium.Map(location=[0, 0], zoom_start=2)
# Create a choropleth map layer using the cheetah observations by country
choropleth = folium.Choropleth(
geo_data=f'{geojson_data}/world-countries.json',
name='choropleth',
data=country_counts,
columns=['Country', 'Observations'],
key_on='feature.properties.name',
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Number of Cheetah Observations by Country'
).add_to(map)
# Add a layer control panel to the map
folium.LayerControl().add_to(map)
# Show the map
map
And boom, we now have a field that we can use.
You can Copy/Paste the code into your favourite editor (I use PyCharm) and click Run.
So, where’s is my map?
This code will display in Jupyter Notebooks but not as an HTML page. With the folium ibrary we need to add the code to explicitly display to an html page.
Prompt to GPT4:
What needs to be changed to save to an html file?GPT4 response:
You can use themap.save()
method to save the map as an HTML file to your local directory. For example:
map.save('cheetah_observations_map.html')
Wow, pretty easy — I copied and pasted this line to replace the last line in the previous code block (and made sure I installed the folium
library). And boom, here’s the result:
Wow, looks good — except that the heat map display is not very heatmap-ish. We really only have cold, warm, and hot, hot, hot (ie Saudi Arabia).
So how can we adjust this to provide a better representation? We know from the bar chart displayed earlier that the maximum value of this representation (Saudi Arabia) is well above the average value.
Let’s ask GPT4 for some possible work-arounds here.
Prompt to GPT4:
If you are coding a heat map scale and you have a maximum value that is well above the average range, how can you account for this?GPT4 response:
One way to do this is to use a logarithmic scale for the color scale. A logarithmic scale compresses the high end of the data range, which can help to bring out more detail in the lower end of the range.
From this GPT4 answer, we need a logarithmic scale setting. So let’s ask GPT4 one more question:
Prompt to GPT4:
Can you please provide the code that will modify the previous code to use a logarithmic scale for the color scale? And set the maximum threshold value to 1000?GPT Response:
By setting thelog_scale
parameter toTrue
, the color scale will use a logarithmic scale instead of a linear scale
It took me a little fiddling (trial and error) to figure that last question out. The original code displayed the variance in color, no problem, but the legend was all squished to the left.
By hard-coding the maximum (from the data, I know that Saudi Arabia has the highest value at 942), the values in the legend are distributed more reasonably.
Save and run the new code and voila:
An actual choropleth map, created in Python using GPT4 without minimal Python coding (I tinkered with 1 line, but I think that could have been avoided with better prompt engineering).
Not perfect, but certainly a decent representation.
And the loser is… Saudi Arabia. By far the most cheetah interventions and seizures are from the country of Saudi Arabia.
In Summary…
GPT works well with the Python folium library to generate working code.
It’s worth noting (in my opinion) that if you want to go beyond basic configurations (ie, fine tune this choropleth map) GPT-4 seems to work like a typical Microsoft product — it does 90% of what you need very easily, but the final 10% can be a bit of a challenge.
Why? From my interactions it appears that the model currently being used has a good understanding of the well-used Python libraries, but tends to cough up more on the less-frequently-used ones.
Or maybe that’s just me. :-)
Hope you find this useful!
Thank you for reading.
If you want to learn more about how to prompt engineer GPT-4, sign up for this free 5-day email course on Prompting GPT-4 for data visuals. Let me know what you think: