Data at Depth 12: Surging Substack, Creative Reflections, Streamlit GIS Tutorial

Loving Substack, Intrinsic/extrinsic motivation, GIS forest fire maps with GPT-4 prompting

John Loewen

May 02, 2024

Dall-E image: mapping forest fire data for Canada.

Today, a few topics:

My Data at Depth: documenting my own creator journey - my raw and exposed data for the past 2 weeks. Substack is trending very well, Medium/Linked in slow, and a definite uptick on Twitter-X.
Thinking and Creating: Some insight into what I have been creating and doing during the past 2 weeks (6 new articles).
Doing - Tutorial: Prompting GPT-4 to filter GIS data and display in an interactive Python Streamlit map.

Let’s get to the details!

1. My Data at Depth - My Real Statistics

Who doesn’t like to see someone else’s numbers? What platforms have done well for me the past two weeks? What has slowed down? Am I making ANY progress?

My Substack:

Substack growth for the past 2 weeks has been fantastic! I have 33 new subscribers including 2 new paid subscribers (thank you so much). All your support makes this work all worthwhile - VERY much appreciated.

And with this increase, I have seen a massive uptick in views. I finally reached my goal of averaging over 100 views a day, and then some (3.47K for the last 30 days).

This is all fantastic. I am attributing this growth to my continued dedication to engagement and to regularly publishing useful articles on this platform.

Substack is now the platform that I spend the most time on - by far.

My Medium:

My Medium stats for the past 2 weeks have been a bit slow.

And the results show in my earnings. My slowest month in the past 7 months. I am not surprised about this though as I am spending less time on Medium and more time on Substack.

I will continue to post articles on Medium as I have almost 13,000 followers and am generating significant traffic to my Substack newsletter.

My LinkedIn:

It has been somewhat slower on Linked in than my previous 2 weeks. By the numbers, a few articles garnered some interest:

It’s my Python-related articles that do very well here. I post to a group called Python Web Developers - my Streamlit/Plotly tutorials do very well in this group.

Now even though the impressions were down, new followers were similar to my previous 2 weeks (75 in total). I am now approaching the 1500 mark.

My Twitter-X:

In my last newsletter, I pledged and committed to increasing my presence on Twitter-X. And I have! I signed up to a premium account and now I have the blue tick (https://twitter.com/jgloewen). With regular posting and engagement my numbers have skyrocketed - which is what happens when you start from zero.

I have been reading and applying a lot of what online guru Tim Denning has to say about managing your Twitter account - and so far it’s working pretty well.

2. Thinking and Creating

For the past 2 weeks I have had my nose to the grindstone on the creator front. It has been a really nice combination of thinking and creating.

In the spirit of life-long learning, I’ve been looking more deeply into how to tell better stories.

I have been spending time reading and listening to Lawrence Yeo, a well known writer who has the ability to tell amazing stories from what most would consider “the mundane”.

He talks about looking at your reasons for writing. It boils down to intrinsic vs extrinsic motivation:

Extrinsic folks are more likely to burnout or quit when the money or notoriety don’t materialize
Intrinsic folks are more likely to find satisfaction in writing good content over the long haul

Where do I lie on all this? I certainly started out a year ago with extrinsic goals. It’s fair to say that in the beginning, my writing was garbage.

I found no joy in this approach. Recently, I have been investing more time and resources on writing better quality content.

The intrinsic vs extrinsic spectrum - and where I currently sit (it can change!)

And I feel like i’m finally sliding to the left. But even though it looks flat in this diagram, it can still be a slippery slope!

Where do you fit on this spectrum?

Articles I’ve written in the past 2 weeks:

Since my last newsletter update (Apr 18) I have published/submitted 6 articles.

3 articles are in the realm of AI, GPT-4, and Python:

Buckle up for Beautiful Python Plotly Mapping With GPT-4 Prompting
Creating an Awesome Interactive Dashboard using Python StreamLit
Sinfully Simple GPT-4 Prompting For Stunning Streamlit Interactive Maps

And 3 articles in the domain of in-depth data analysis/data storytelling:

The Art of Data Science: Crafting Better Data Visuals (doing very well)
W.E.B. Du Bois: Five Beautiful Visuals That Paved The Way For Modern Data Science
The Great Divide: What Happens When AI Models Meet Human Reason

And lastly, for the next two weeks my focus will be:

Continued Twitter (X) engagement - follow posting and engagement techniques recommended by the Twi-X gurus.
Product generation ideation - how-to manuals and book ideas (yep, time to start thinking this through - I do have a live Convert Kit account)
Continued outreach and engagement on Substack to grow my audience here.
More reading and research in how to integrate deeper storytelling into my writing.

Item #2 is a new one on my list - it’s been on my mind for awhile, but now I am setting aside time to focus on the ideation phase. Stay tuned!

3. Tutorial: GPT-4 Prompting For Streamlit Interactive GIS Maps

This week, I went back and revisited an article I wrote earlier about how to prompt GPT-4 to access a set of GIS data points in a CSV file and to draw it on a map.

I wanted to know how well this would work using the Python Streamlit library.

The Problem and the Dataset

The forest fire situation in Canada over the past 10 years or so has been pretty terrible - particularly in British Columbia which is where I am from.

To show the monthly effects of forest fires, I want to create a data visual that shows forest fires over time (by month) for British Columbia.

Now I know that NASA provides a comprehensive data set on global forest fires, so let’s start there!

The dataset (in GIS lat/long points) can be downloaded from NASA’s online library (HERE).

You can create a new request and select parameters (including a date range) for your download:

Accessing and downloading 2021 forest fire data for Canada

NOTE: When I requested this data set, it took about 2 hours for my request to be approved (via email)

I chose 2021 as the year because this was a particularly bad year for forest fires in my home province of British Columbia.

Once I received the notification email, I downloaded the file, unzipped it, and saved it as canada_fires_2021.csv.

Next, you can upload this file to GPT-4 and prompt GPT-4 for a list of the data fields.

The response from GPT-4 gives me all the fields in the dataset (there are 15 in total). The relevant fields for our data visualizations are:

latitude: The latitude coordinate of the fire detection.
longitude: The longitude coordinate of the fire detection.
acq_date: The acquisition date of the fire detection.

Next, to further refine our area of interest (for example, British Columbia), we need a way to determine if a set of points (ie the location of a fire) are within our area of interest.

We can use a geoJSON file to give us the boundaries of each province in Canada. This file allows us to narrow our results to display fires for just British Columbia (BC).

Luckily, there are publicly accessible geoJSON files for Canada. We can download one from HERE.

Now to filter all the data to just the fires in BC, I can upload both of our relevant files and prompt GPT-4:

Response from GPT-4:

import geopandas as gpd
import pandas as pd

# Load the fire dataset
fire_data_path = 'canada_fires_2021.csv'
df = pd.read_csv(fire_data_path)

# Load the provided GeoJSON file for Canada's geographical data, filter for BC
canada_geojson_path = 'canada.geojson'  # Replace with your file path
canada_geojson = gpd.read_file(canada_geojson_path)
bc_geo = canada_geojson[canada_geojson['name'] == 'British Columbia']

# Creating a GeoDataFrame for the fire data
fires_gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude))
# Setting the CRS for the fires GeoDataFrame to match the GeoJSON CRS
fires_gdf.set_crs(bc_geo.crs, inplace=True)

# Performing a spatial join to filter out only the fires within British Columbia's boundaries
bc_fires = gpd.sjoin(fires_gdf, bc_geo, how="inner", predicate='intersects')

# Dropping the geometry column and any additional columns from the spatial join
bc_fires_filtered = bc_fires.drop(columns=[col for col in bc_fires.columns if col.startswith('index') or col == 'geometry'])

# Save the file
bc_fires_filtered.to_csv('BC_fires_2021.csv', index=False)

NOTE: You may need to install the pandas and geoPandas libraries if they are not already installed.

From GPT-4, you can Copy/Paste the generated code into you favourite Python editor (I use Pycharm), Save it, and Run it.

You will now have a BC-focused dataset called BC_fires_2021.csv.

Now I can prompt GPT-4 for the Streamlit code to create the interactive dashboard.

Interactive Map For Fires of British Columbia

For this Streamlit interactive map, we want to create a dropdown menu that shows each month. The user can select by month and the resulting map display shows all of the fires in British Columbia that started during that month.

We first start by uploading our newly created file BC_fires_2021.csv.

Prompt to GPT-4: Using the attached file BC_fires_2021.csv, generate the Python Streamlit code that creates a map using the data from these coordinates. Please create a dropdown menu that displays by month. Make sure you convert to a written month. Increase the point radius size to 5000.

Response from GPT-4:

import streamlit as st
import pandas as pd
import pydeck as pdk  # For map visualization

# Function to load and preprocess data
def load_data():
    data = pd.read_csv('BC_fires_2021.csv')
    # Convert acq_date from string to datetime
    data['acq_date'] = pd.to_datetime(data['acq_date'])
    # Extract month and map it to month name
    data['month'] = data['acq_date'].dt.month_name()
    return data

# Load your dataset
data = load_data()

# Streamlit app
st.title('Map Visualization of Coordinates')

# Dropdown to select month
month_list = data['month'].unique()
selected_month = st.selectbox('Select a Month', month_list)

# Filter data based on selected month
filtered_data = data[data['month'] == selected_month]

# Display map only if there is data for the selected month
if not filtered_data.empty:
    st.pydeck_chart(pdk.Deck(
        map_style='mapbox://styles/mapbox/light-v9',
        initial_view_state=pdk.ViewState(
            latitude=filtered_data['latitude'].mean(),
            longitude=filtered_data['longitude'].mean(),
            zoom=5,
            pitch=50,
        ),
        layers=[
            pdk.Layer(
                'ScatterplotLayer',
                data=filtered_data,
                get_position='[longitude, latitude]',
                get_color='[200, 30, 0, 160]',
                get_radius=5000,
            ),
        ],
    ))
else:
    st.write('No data available for the selected month.')

Presto, working Python Streamlit code without any intervention.

GPT-4 was able to load the dataset into a pandas data frame and create a map using the pydeck library. The longitude and latitude fields are accessed from the data frame as part of the map creation.

In this code, we can see the added lines for the dropdown menu (using the month field from our CSV file).

And it is the pydeck library that actually draws the map to our webpage (using the pydeck_chart() function). A map style is provided to the function as well as the latitude and longitude values from our data frame (filtered_data).

NOTE: You may need to install the streamlit and pydeck libraries if they are not already installed.

Let’s load this code into our Python editor of choice, Save it, and Run it.

I use PyCharm on Mac to write my Python code. Here is my PyCharm project for this tutorial:

I have highlighted the 3 files that are needed in addition to the Python files that are created:

BC_fires_2021.csv - file generated from Python code (above)
canada_fires_2021.csv - original file downloaded from the NASA site
canada_geoJSON.json - JSON file with the boundary coordinates for each province in Canada.

Running Our Streamlit App

This tool has a built-in terminal window that I can open to run the code in my Project:

You can see from the screenshot above that the Streamlit app is running on localhost. It will be displayed in the default browser on port 8502.

And if the GPT-4 code generated is indeed correct, the beautiful result:

Terrific. We have a dropdown menu to choose by month (highlighted in red) . Each data point for that month is shown on the map — I increased the radius for each fire point (to 5000) to clearly show each point (the default value for this property is quite small).

GPT-4 really shines now for this type of Python code generation. It creates out-of-the-box ready-to-use interactive dashboard code.

Now if you’ve used Large Language Models (LLMs) like GPT-4 to write code for you, you probably know that there is a disclaimer needed here.

These tools are still inconsistent in how they generate code.

In my experience, GPT-4 creates working code to solve this problem, but the code it generates is not always exactly the same.

So don’t be surprised if your results don’t look exactly the same as mine.

Let me know how it went for you!

And thank you for reading. See you again soon!

If you want to learn more about how to prompt engineer GPT-4, sign up for this free 5-day email course on Prompting GPT-4 for data visuals. Let me know what you think:

https://stats-and-stories.ck.page/prompting_gpt4