Governing the Gold Rush: Visualizing AI Policy vs. Private Investment

Governing the Gold Rush: Visualizing AI Policy vs. Private Investment#

Project 2: Working Across Datasets#

Hello, it’s Wuhao here! Welcome to my Project 2 notebook. The goal of this assignment is to take two different datasets, combine them in Python, and create a single visualization that shows a relationship.

For my project, I wanted to explore a topic I’m passionate about: AI Governance.

My research question is: Globally, is the adoption of national AI policies growing at the same rate as private investment in AI?

To answer this, I’ll be combining two world-class datasets:

Dataset 1: The OECD AI Policy Observatory.

Dataset 2: Our World in Data (from Stanford AI Index).

Let’s get started!

Part 1: Loading & Cleaning Dataset 1 (AI Policies)#

First, I’ll load the data on AI policies from the OECD.

OECD.AI is an online interactive platform dedicated to promoting trustworthy, human-centric artificial intelligence (AI). Launched by the Organisation for Economic Co-operation and Development in 2020, the Observatory is an essential resource for policymakers, researchers, businesses, and civil society, offering a comprehensive view of global AI initiatives, trends, and governance frameworks.

Source: OECD.AI Database of National AI Policies

import plotly.io as pio

pio.renderers.default = "notebook_connected+plotly_mimetype"

import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

policies_df = pd.read_csv("oecd-ai-all-ai-policies.csv", encoding='utf-8', encoding_errors='ignore')

print("OECD AI Policy Raw Data")
policies_df.info()
policies_df.head()

OECD AI Policy Raw Data
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1884 entries, 0 to 1883
Data columns (total 52 columns):
 #   Column                                                                                                                          Non-Null Count  Dtype  
---  ------                                                                                                                          --------------  -----  
 Policy initiative ID                                                                                                            1884 non-null   object 
 Platform URL                                                                                                                    1884 non-null   object 
 English name                                                                                                                    1883 non-null   object 
 Original name(s)                                                                                                                985 non-null    object 
 Acronym                                                                                                                         529 non-null    object 
 Country                                                                                                                         1884 non-null   object 
 Start date                                                                                                                      1830 non-null   float64
 End date                                                                                                                        466 non-null    float64
 Description                                                                                                                     1854 non-null   object 
 Theme area(s)                                                                                                                   1884 non-null   object 
Theme(s)                                                                                                                        1884 non-null   object 
Background                                                                                                                      1151 non-null   object 
Objective(s)                                                                                                                    1831 non-null   object 
Target group type(s)                                                                                                            1826 non-null   object 
Target group(s)                                                                                                                 1826 non-null   object 
Responsible organisation(s)                                                                                                     1793 non-null   object 
Yearly budget range                                                                                                             1884 non-null   object 
Budget amount
(in local currency)                                                                                               5 non-null      float64
Has funding from private sector ?                                                                                               1884 non-null   bool   
Public access URL                                                                                                               1546 non-null   object 
Is a structural reform ?                                                                                                        1884 non-null   bool   
Is evaluated ?                                                                                                                  1884 non-null   bool   
Evaluation URL                                                                                                                  83 non-null     object 
AI Principle(s)                                                                                                                 1745 non-null   object 
AI Policy Area(s)                                                                                                               1555 non-null   object 
Other AI Policy Area(s)                                                                                                         18 non-null     object 
Shift(s) related to Covid                                                                                                       36 non-null     object 
Evaluation performed by                                                                                                         71 non-null     object 
Evaluation type                                                                                                                 69 non-null     object 
Evaluation provides input to                                                                                                    58 non-null     object 
Policy instrument ID                                                                                                            1884 non-null   object 
Policy instrument type category                                                                                                 1837 non-null   object 
Policy instrument type                                                                                                          1837 non-null   object 
Policy instrument name                                                                                                          859 non-null    object 
Policy instrument description(s)                                                                                                500 non-null    object 
Strategy priority targets and deadlines                                                                                         48 non-null     object 
Coordinating institution name                                                                                                   20 non-null     object 
Consultation process objective                                                                                                  15 non-null     object 
Consultation process begin date                                                                                                 18 non-null     object 
Consultation process end date                                                                                                   12 non-null     object 
Link                                                                                                                            431 non-null    object 
Policy instrument mini-field(s)                                                                                                 1351 non-null   object 
Objective                                                                                                                       70 non-null     object 
Deployment year                                                                                                                 48 non-null     float64
Cancellation reason                                                                                                             9 non-null      object 
Entities involvement                                                                                                            23 non-null     object 
Allocated funding                                                                                                               13 non-null     float64
Methodology in place to assess the risk and evaluate the impact of AI in public services                                        5 non-null      object 
Measures taken to communicate the use of the AI system to citizens (transparency)                                               23 non-null     object 
Measures taken to enable citizens to understand and challenge the outcome of the AI system (explainability and accountability)  4 non-null      object 
Audit, certification, monitoring, evaluation or regulation process                                                              12 non-null     object 
Entered into force on                                                                                                           42 non-null     object 
dtypes: bool(3), float64(5), object(44)
memory usage: 726.9+ KB

	Policy initiative ID	Platform URL	English name	Original name(s)	Acronym	Country	Start date	End date	Description	Theme area(s)	...	Objective	Deployment year	Cancellation reason	Entities involvement	Allocated funding	Methodology in place to assess the risk and evaluate the impact of AI in public services	Measures taken to communicate the use of the AI system to citizens (transparency)	Measures taken to enable citizens to understand and challenge the outcome of the AI system (explainability and accountability)	Audit, certification, monitoring, evaluation or regulation process	Entered into force on
0	2021/data/policyInitiatives/1335	https://oecd.ai/en/dashboards/policy-initiativ...	SPACERESOURCES.LU	NaN	NaN	Luxembourg	2016.0	NaN	Within the SpaceResources.lu initiative, the c...	National AI Policies	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	2021/data/policyInitiatives/1337	https://oecd.ai/en/dashboards/policy-initiativ...	DIGITAL LUXEMBOURG	Digital L??tzebuerg	NaN	Luxembourg	2014.0	NaN	Consolidating Luxembourgs position in the ICT ...	National AI Policies	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	2021/data/policyInitiatives/1337	https://oecd.ai/en/dashboards/policy-initiativ...	DIGITAL LUXEMBOURG	Digital L??tzebuerg	NaN	Luxembourg	2014.0	NaN	Consolidating Luxembourgs position in the ICT ...	National AI Policies	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	2021/data/policyInitiatives/1355	https://oecd.ai/en/dashboards/policy-initiativ...	DIGITAL TECH FUND	NaN	NaN	Luxembourg	2016.0	NaN	A seed fund was set up in 2016 jointly by the ...	National AI Policies	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	2021/data/policyInitiatives/13968	https://oecd.ai/en/dashboards/policy-initiativ...	GAMEINN	NaN	NaN	Poland	2016.0	NaN	Funding opportunities for the producers of vid...	National AI Policies	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 52 columns

Cleaning the Policy Data#

The raw data is great, but info() tells me the year column needs cleaning.

My goal is to get a global, cumulative count of policies by year.

Thus, we conduct the following steps:

Drop any rows where the Start date is missing.
Rename Start date to Year for simplicity.
Convert Year to an integer.
Filter for the modern AI era (2015 onwards).
Group by Year and count policies.
Calculate the cumsum() (cumulative sum).

# We'll drop rows without a 'Start date'.
policies_cleaned = policies_df.dropna(subset=['Start date']).copy()

# Rename 'Start date' to 'Year' for better understanding
policies_cleaned = policies_cleaned.rename(columns={'Start date': 'Year'})

# Convert 'Year' to integer
policies_cleaned['Year'] = policies_cleaned['Year'].astype(int)

# After all, let's check the count for last 10 years first.
print("Years present in data:\n", policies_cleaned['Year'].value_counts().sort_index().tail(10))

Years present in data:
   24
   56
   85
  299
  397
  363
  263
  119
  100
    9
Name: Year, dtype: int64

# Filter for the modern AI era (after 2016)
policies_modern = policies_cleaned[policies_cleaned['Year'] >= 2015].copy()

# Group by year and count policies
policies_by_year = policies_modern.groupby('Year')['Policy initiative ID'].count().reset_index()
policies_by_year = policies_by_year.rename(columns={'Policy initiative ID': 'annual_policies'})

# Calculate the cumulative sum
policies_by_year['cumulative_policies'] = policies_by_year['annual_policies'].cumsum()

print("\nProcessed Policy Data (Global, Cumulative)")
policies_by_year.tail(10)

Processed Policy Data (Global, Cumulative)

	Year	annual_policies	cumulative_policies
0	2015	24	24
1	2016	56	80
2	2017	85	165
3	2018	299	464
4	2019	397	861
5	2020	363	1224
6	2021	263	1487
7	2022	119	1606
8	2023	100	1706
9	2024	9	1715

Part 2: Loading & Cleaning Dataset 2 (AI Investment)#

Now for the AI investment. I’m using the Our World in Data (OWID) dataset, sourced from the Stanford AI Index.

Source: Our World in Data - Private Investment in AI

investment_df = pd.read_csv("private-investment-in-artificial-intelligence.csv")

print("OWID AI Investment Raw Data")
investment_df.info()
investment_df.head()

OWID AI Investment Raw Data
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48 entries, 0 to 47
Data columns (total 4 columns):
 #   Column                                 Non-Null Count  Dtype 
---  ------                                 --------------  ----- 
 0   Entity                                 48 non-null     object
 1   Code                                   36 non-null     object
 2   Year                                   48 non-null     int64 
 3   Global total private investment in AI  48 non-null     int64 
dtypes: int64(2), object(2)
memory usage: 1.6+ KB

	Entity	Code	Year	Global total private investment in AI
0	China	CHN	2013	717196188
1	China	CHN	2014	771392286
2	China	CHN	2015	2385249620
3	China	CHN	2016	5102962786
4	China	CHN	2017	7314146469

Cleaning the Investment Data#

This data is already in great shape. My only steps needed are:

Filter for just the ‘World’ total investment amount in AI.
Rename the main investment column
Ensure Year is an integer
Select only the columns we need

# Filter for just the 'World' total
investment_global = investment_df[investment_df['Entity'] == 'World'].copy()

# Use Billions to count the investment and rename the main investment column
investment_global['Global total private investment in AI'] = investment_global['Global total private investment in AI'] / 1000000000
investment_global = investment_global.rename(columns={
    'Global total private investment in AI': 'Investment_Billions_USD'
})

# Ensure Year is an integer
investment_global['Year'] = investment_global['Year'].astype(int)

# Select only the columns we need
investment_global_clean = investment_global[['Year', 'Investment_Billions_USD']]
print("\n--- Processed Investment Data (Global, Annual) ---")
print(investment_global_clean)

--- Processed Investment Data (Global, Annual) ---
    Year  Investment_Billions_USD
2013                 6.013620
2014                10.942456
2015                15.262405
2016                19.339919
2017                28.432395
2018                46.509286
2019                61.664788
2020                77.256670
2021               145.400000
2022               104.636244
2023                92.789054
2024               130.255020

Part 3: Merging for the Final Visualization#

Now I’ll merge the two DataFrames on the Year column.

# Merge the two datasets on the 'Year' column
merged_df = pd.merge(policies_by_year, investment_global_clean, on='Year', how='inner')

print(' Merged Data for Plotting')
merged_df

 Merged Data for Plotting

	Year	annual_policies	cumulative_policies	Investment_Billions_USD
0	2015	24	24	15.262405
1	2016	56	80	19.339919
2	2017	85	165	28.432395
3	2018	299	464	46.509286
4	2019	397	861	61.664788
5	2020	363	1224	77.256670
6	2021	263	1487	145.400000
7	2022	119	1606	104.636244
8	2023	100	1706	92.789054
9	2024	9	1715	130.255020

Part 4: The Main Visualization#

In this section I’ll use a dual-axis chart to show cumulative_policies (Bars) and Investment_Billions_USD (Line).

# Create a figure with a secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add Annual Policies as a Bar chart
fig.add_trace(
    go.Bar(
        x=merged_df['Year'],
        y=merged_df['annual_policies'],
        name='Annual Number of New AI Policies',
        marker_color='royalblue'
    ),
    secondary_y=False,
)

# Add Annual Investment as a Line chart
fig.add_trace(
    go.Scatter(
        x=merged_df['Year'],
        y=merged_df['Investment_Billions_USD'],
        name='Annual AI Investment (Billions USD)',
        marker_color='red'
    ),
    secondary_y=True,
)

# Add figure titles and axis labels
fig.update_layout(
    title_text='<b>AI Policy Adoption vs. Private Investment (Global)</b>',
    xaxis_title='Year',
    legend_title='Metrics',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

# Set the y-axes titles
fig.update_yaxes(
    title_text='Annual Number of New AI Policies',
    secondary_y=False,
    color='royalblue'
)
fig.update_yaxes(
    title_text='Annual Private AI Investment (Billions USD)',
    secondary_y=True,
    color='red'
)

# Make the X-axis show proper years
fig.update_xaxes(
    tickvals=merged_df['Year']
)

# Display the interactive plot
fig.show()

Part 5: Takeaways#

This “annual vs. annual” chart tells a complex and interesting story about a proactive government and an explosive market.

Takeaway 1: Policy and Investment Come in Waves#

Policy (Blue Bars): Starting around 2018, governments around the world suddenly got busy. There’s a clear policy wave with new strategies, regulations, guidelines—building up year after year and hitting a peak around 2020.

Investment (Red Line): Private investment doesn’t follow that pattern at all. Instead of increasing steadily, it goes absolutely vertical in 2021.

It’s not two curves following each other, instead it’s two totally different rhythms.

Takeaway 2: Governments Weren’t Reacting. They Were Preparing.#

This is the most critical insight, and it reverses our common knowledge in terms of government policies on AI.

The data shows governments were proactive. The global policy wave (2018-2020) clearly precedes the 2021 investment explosion. This suggests that governments saw the AI “gold rush” coming and were actively trying to build frameworks, strategies, and guardrails before the market peaked.

Takeaway 3: The Market’s Scale is Unimaginable#

Even though governments were proactive, the sheer scale of the 2021 investment spike ($140B+) shows that the market’s eventual force was beyond anyone’s predictions.

This suggests that while policy can be forward-thinking, it cannot fully contain or predict the explosive, speculative nature of a technological gold rush.

This finding is made even stronger by our knowledge from the readme.doc. That $140B+ spike is a conservative underestimate that excludes all R&D from public companies (Google, Microsoft, etc.) and all public spending. The true market explosion that policymakers were trying to get ahead of was even larger.

Takeaway 4: The Post-2021 Policy Decline#

The chart shows a sharp drop in new AI policies after 2021. This doesn’t mean governments gave up on governance. Rather, it signals a critical shift into the second phase of policymaking.

Phase I (2018-2020): This was the High-Level Strategy phase. Governments were racing to publish broad National AI Strategy blueprints, leading to the 2020 spike.

Phase II (2021-Present): This is the Execution & Regulation phase. The focus shifted from announcing new strategies to the much slower, harder work of writing specific regulations (like the multi-year EU AI Act) and handling implementation details. This work is more difficult, takes far longer, and doesn’t appear in the database as a large number of new initiatives.

Part 6: Conclusions#

This project successfully combined two datasets to illustrate the complex relationship between AI governance and private investment.

Our final analysis, using a more rigorous “annual vs. annual” comparison, refutes the simple narrative of a “governance gap.” Instead, it reveals a more sophisticated story: Proactive governments laid the policy groundwork from 2018-2020, only to be followed by a private investment explosion in 2021 of a magnitude no one could have fully anticipated. After this peak, policymaking has shifted from “strategic breadth” to “regulatory depth,” entering a slower, more difficult phase of implementation.

It’s a great reminder that in AI, the visualization you choose isn’t just about aesthetics, it can completely change the narrative.

Part 7: Data Sources#

AI Policy Data:

Source: OECD.AI Policy Observatory

Dataset: “Database of National AI Policies”

Link: https://wp.oecd.ai/app/uploads/2024/03/oecd-ai-all-ai-policies.csv

AI Investment Data:

Source: Our World in Data

Dataset: “Global total private investment in AI”

Link: https://ourworldindata.org/grapher/private-investment-in-artificial-intelligence