Governing the Gold Rush: Visualizing AI Policy vs. Private Investment#
Project 2: Working Across Datasets#
Hello, it’s Wuhao here! Welcome to my Project 2 notebook. The goal of this assignment is to take two different datasets, combine them in Python, and create a single visualization that shows a relationship.
For my project, I wanted to explore a topic I’m passionate about: AI Governance.
My research question is: Globally, is the adoption of national AI policies growing at the same rate as private investment in AI?
To answer this, I’ll be combining two world-class datasets:
Dataset 1: The OECD AI Policy Observatory.
Dataset 2: Our World in Data (from Stanford AI Index).
Let’s get started!
Part 1: Loading & Cleaning Dataset 1 (AI Policies)#
First, I’ll load the data on AI policies from the OECD.
OECD.AI is an online interactive platform dedicated to promoting trustworthy, human-centric artificial intelligence (AI). Launched by the Organisation for Economic Co-operation and Development in 2020, the Observatory is an essential resource for policymakers, researchers, businesses, and civil society, offering a comprehensive view of global AI initiatives, trends, and governance frameworks.
Source: OECD.AI Database of National AI Policies
import plotly.io as pio
pio.renderers.default = "notebook_connected+plotly_mimetype"
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
policies_df = pd.read_csv("oecd-ai-all-ai-policies.csv", encoding='utf-8', encoding_errors='ignore')
print("OECD AI Policy Raw Data")
policies_df.info()
policies_df.head()
OECD AI Policy Raw Data
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1884 entries, 0 to 1883
Data columns (total 52 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Policy initiative ID 1884 non-null object
1 Platform URL 1884 non-null object
2 English name 1883 non-null object
3 Original name(s) 985 non-null object
4 Acronym 529 non-null object
5 Country 1884 non-null object
6 Start date 1830 non-null float64
7 End date 466 non-null float64
8 Description 1854 non-null object
9 Theme area(s) 1884 non-null object
10 Theme(s) 1884 non-null object
11 Background 1151 non-null object
12 Objective(s) 1831 non-null object
13 Target group type(s) 1826 non-null object
14 Target group(s) 1826 non-null object
15 Responsible organisation(s) 1793 non-null object
16 Yearly budget range 1884 non-null object
17 Budget amount
(in local currency) 5 non-null float64
18 Has funding from private sector ? 1884 non-null bool
19 Public access URL 1546 non-null object
20 Is a structural reform ? 1884 non-null bool
21 Is evaluated ? 1884 non-null bool
22 Evaluation URL 83 non-null object
23 AI Principle(s) 1745 non-null object
24 AI Policy Area(s) 1555 non-null object
25 Other AI Policy Area(s) 18 non-null object
26 Shift(s) related to Covid 36 non-null object
27 Evaluation performed by 71 non-null object
28 Evaluation type 69 non-null object
29 Evaluation provides input to 58 non-null object
30 Policy instrument ID 1884 non-null object
31 Policy instrument type category 1837 non-null object
32 Policy instrument type 1837 non-null object
33 Policy instrument name 859 non-null object
34 Policy instrument description(s) 500 non-null object
35 Strategy priority targets and deadlines 48 non-null object
36 Coordinating institution name 20 non-null object
37 Consultation process objective 15 non-null object
38 Consultation process begin date 18 non-null object
39 Consultation process end date 12 non-null object
40 Link 431 non-null object
41 Policy instrument mini-field(s) 1351 non-null object
42 Objective 70 non-null object
43 Deployment year 48 non-null float64
44 Cancellation reason 9 non-null object
45 Entities involvement 23 non-null object
46 Allocated funding 13 non-null float64
47 Methodology in place to assess the risk and evaluate the impact of AI in public services 5 non-null object
48 Measures taken to communicate the use of the AI system to citizens (transparency) 23 non-null object
49 Measures taken to enable citizens to understand and challenge the outcome of the AI system (explainability and accountability) 4 non-null object
50 Audit, certification, monitoring, evaluation or regulation process 12 non-null object
51 Entered into force on 42 non-null object
dtypes: bool(3), float64(5), object(44)
memory usage: 726.9+ KB
| Policy initiative ID | Platform URL | English name | Original name(s) | Acronym | Country | Start date | End date | Description | Theme area(s) | ... | Objective | Deployment year | Cancellation reason | Entities involvement | Allocated funding | Methodology in place to assess the risk and evaluate the impact of AI in public services | Measures taken to communicate the use of the AI system to citizens (transparency) | Measures taken to enable citizens to understand and challenge the outcome of the AI system (explainability and accountability) | Audit, certification, monitoring, evaluation or regulation process | Entered into force on | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021/data/policyInitiatives/1335 | https://oecd.ai/en/dashboards/policy-initiativ... | SPACERESOURCES.LU | NaN | NaN | Luxembourg | 2016.0 | NaN | Within the SpaceResources.lu initiative, the c... | National AI Policies | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 2021/data/policyInitiatives/1337 | https://oecd.ai/en/dashboards/policy-initiativ... | DIGITAL LUXEMBOURG | Digital L??tzebuerg | NaN | Luxembourg | 2014.0 | NaN | Consolidating Luxembourgs position in the ICT ... | National AI Policies | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 2021/data/policyInitiatives/1337 | https://oecd.ai/en/dashboards/policy-initiativ... | DIGITAL LUXEMBOURG | Digital L??tzebuerg | NaN | Luxembourg | 2014.0 | NaN | Consolidating Luxembourgs position in the ICT ... | National AI Policies | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 2021/data/policyInitiatives/1355 | https://oecd.ai/en/dashboards/policy-initiativ... | DIGITAL TECH FUND | NaN | NaN | Luxembourg | 2016.0 | NaN | A seed fund was set up in 2016 jointly by the ... | National AI Policies | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 2021/data/policyInitiatives/13968 | https://oecd.ai/en/dashboards/policy-initiativ... | GAMEINN | NaN | NaN | Poland | 2016.0 | NaN | Funding opportunities for the producers of vid... | National AI Policies | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 52 columns
Cleaning the Policy Data#
The raw data is great, but info() tells me the year column needs cleaning.
My goal is to get a global, cumulative count of policies by year.
Thus, we conduct the following steps:
Drop any rows where the Start date is missing.
Rename Start date to Year for simplicity.
Convert Year to an integer.
Filter for the modern AI era (2015 onwards).
Group by Year and count policies.
Calculate the cumsum() (cumulative sum).
# We'll drop rows without a 'Start date'.
policies_cleaned = policies_df.dropna(subset=['Start date']).copy()
# Rename 'Start date' to 'Year' for better understanding
policies_cleaned = policies_cleaned.rename(columns={'Start date': 'Year'})
# Convert 'Year' to integer
policies_cleaned['Year'] = policies_cleaned['Year'].astype(int)
# After all, let's check the count for last 10 years first.
print("Years present in data:\n", policies_cleaned['Year'].value_counts().sort_index().tail(10))
Years present in data:
2015 24
2016 56
2017 85
2018 299
2019 397
2020 363
2021 263
2022 119
2023 100
2024 9
Name: Year, dtype: int64
# Filter for the modern AI era (after 2016)
policies_modern = policies_cleaned[policies_cleaned['Year'] >= 2015].copy()
# Group by year and count policies
policies_by_year = policies_modern.groupby('Year')['Policy initiative ID'].count().reset_index()
policies_by_year = policies_by_year.rename(columns={'Policy initiative ID': 'annual_policies'})
# Calculate the cumulative sum
policies_by_year['cumulative_policies'] = policies_by_year['annual_policies'].cumsum()
print("\nProcessed Policy Data (Global, Cumulative)")
policies_by_year.tail(10)
Processed Policy Data (Global, Cumulative)
| Year | annual_policies | cumulative_policies | |
|---|---|---|---|
| 0 | 2015 | 24 | 24 |
| 1 | 2016 | 56 | 80 |
| 2 | 2017 | 85 | 165 |
| 3 | 2018 | 299 | 464 |
| 4 | 2019 | 397 | 861 |
| 5 | 2020 | 363 | 1224 |
| 6 | 2021 | 263 | 1487 |
| 7 | 2022 | 119 | 1606 |
| 8 | 2023 | 100 | 1706 |
| 9 | 2024 | 9 | 1715 |
Part 2: Loading & Cleaning Dataset 2 (AI Investment)#
Now for the AI investment. I’m using the Our World in Data (OWID) dataset, sourced from the Stanford AI Index.
Source: Our World in Data - Private Investment in AI
investment_df = pd.read_csv("private-investment-in-artificial-intelligence.csv")
print("OWID AI Investment Raw Data")
investment_df.info()
investment_df.head()
OWID AI Investment Raw Data
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48 entries, 0 to 47
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Entity 48 non-null object
1 Code 36 non-null object
2 Year 48 non-null int64
3 Global total private investment in AI 48 non-null int64
dtypes: int64(2), object(2)
memory usage: 1.6+ KB
| Entity | Code | Year | Global total private investment in AI | |
|---|---|---|---|---|
| 0 | China | CHN | 2013 | 717196188 |
| 1 | China | CHN | 2014 | 771392286 |
| 2 | China | CHN | 2015 | 2385249620 |
| 3 | China | CHN | 2016 | 5102962786 |
| 4 | China | CHN | 2017 | 7314146469 |
Cleaning the Investment Data#
This data is already in great shape. My only steps needed are:
Filter for just the ‘World’ total investment amount in AI.
Rename the main investment column
Ensure Year is an integer
Select only the columns we need
# Filter for just the 'World' total
investment_global = investment_df[investment_df['Entity'] == 'World'].copy()
# Use Billions to count the investment and rename the main investment column
investment_global['Global total private investment in AI'] = investment_global['Global total private investment in AI'] / 1000000000
investment_global = investment_global.rename(columns={
'Global total private investment in AI': 'Investment_Billions_USD'
})
# Ensure Year is an integer
investment_global['Year'] = investment_global['Year'].astype(int)
# Select only the columns we need
investment_global_clean = investment_global[['Year', 'Investment_Billions_USD']]
print("\n--- Processed Investment Data (Global, Annual) ---")
print(investment_global_clean)
--- Processed Investment Data (Global, Annual) ---
Year Investment_Billions_USD
36 2013 6.013620
37 2014 10.942456
38 2015 15.262405
39 2016 19.339919
40 2017 28.432395
41 2018 46.509286
42 2019 61.664788
43 2020 77.256670
44 2021 145.400000
45 2022 104.636244
46 2023 92.789054
47 2024 130.255020
Part 3: Merging for the Final Visualization#
Now I’ll merge the two DataFrames on the Year column.
# Merge the two datasets on the 'Year' column
merged_df = pd.merge(policies_by_year, investment_global_clean, on='Year', how='inner')
print(' Merged Data for Plotting')
merged_df
Merged Data for Plotting
| Year | annual_policies | cumulative_policies | Investment_Billions_USD | |
|---|---|---|---|---|
| 0 | 2015 | 24 | 24 | 15.262405 |
| 1 | 2016 | 56 | 80 | 19.339919 |
| 2 | 2017 | 85 | 165 | 28.432395 |
| 3 | 2018 | 299 | 464 | 46.509286 |
| 4 | 2019 | 397 | 861 | 61.664788 |
| 5 | 2020 | 363 | 1224 | 77.256670 |
| 6 | 2021 | 263 | 1487 | 145.400000 |
| 7 | 2022 | 119 | 1606 | 104.636244 |
| 8 | 2023 | 100 | 1706 | 92.789054 |
| 9 | 2024 | 9 | 1715 | 130.255020 |
Part 4: The Main Visualization#
In this section I’ll use a dual-axis chart to show cumulative_policies (Bars) and Investment_Billions_USD (Line).
# Create a figure with a secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])
# Add Annual Policies as a Bar chart
fig.add_trace(
go.Bar(
x=merged_df['Year'],
y=merged_df['annual_policies'],
name='Annual Number of New AI Policies',
marker_color='royalblue'
),
secondary_y=False,
)
# Add Annual Investment as a Line chart
fig.add_trace(
go.Scatter(
x=merged_df['Year'],
y=merged_df['Investment_Billions_USD'],
name='Annual AI Investment (Billions USD)',
marker_color='red'
),
secondary_y=True,
)
# Add figure titles and axis labels
fig.update_layout(
title_text='<b>AI Policy Adoption vs. Private Investment (Global)</b>',
xaxis_title='Year',
legend_title='Metrics',
legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1
)
)
# Set the y-axes titles
fig.update_yaxes(
title_text='Annual Number of New AI Policies',
secondary_y=False,
color='royalblue'
)
fig.update_yaxes(
title_text='Annual Private AI Investment (Billions USD)',
secondary_y=True,
color='red'
)
# Make the X-axis show proper years
fig.update_xaxes(
tickvals=merged_df['Year']
)
# Display the interactive plot
fig.show()
Part 5: Takeaways#
This “annual vs. annual” chart tells a complex and interesting story about a proactive government and an explosive market.
Takeaway 1: Policy and Investment Come in Waves#
Policy (Blue Bars): Starting around 2018, governments around the world suddenly got busy. There’s a clear policy wave with new strategies, regulations, guidelines—building up year after year and hitting a peak around 2020.
Investment (Red Line): Private investment doesn’t follow that pattern at all. Instead of increasing steadily, it goes absolutely vertical in 2021.
It’s not two curves following each other, instead it’s two totally different rhythms.
Takeaway 2: Governments Weren’t Reacting. They Were Preparing.#
This is the most critical insight, and it reverses our common knowledge in terms of government policies on AI.
The data shows governments were proactive. The global policy wave (2018-2020) clearly precedes the 2021 investment explosion. This suggests that governments saw the AI “gold rush” coming and were actively trying to build frameworks, strategies, and guardrails before the market peaked.
Takeaway 3: The Market’s Scale is Unimaginable#
Even though governments were proactive, the sheer scale of the 2021 investment spike ($140B+) shows that the market’s eventual force was beyond anyone’s predictions.
This suggests that while policy can be forward-thinking, it cannot fully contain or predict the explosive, speculative nature of a technological gold rush.
This finding is made even stronger by our knowledge from the readme.doc. That $140B+ spike is a conservative underestimate that excludes all R&D from public companies (Google, Microsoft, etc.) and all public spending. The true market explosion that policymakers were trying to get ahead of was even larger.
Takeaway 4: The Post-2021 Policy Decline#
The chart shows a sharp drop in new AI policies after 2021. This doesn’t mean governments gave up on governance. Rather, it signals a critical shift into the second phase of policymaking.
Phase I (2018-2020): This was the High-Level Strategy phase. Governments were racing to publish broad National AI Strategy blueprints, leading to the 2020 spike.
Phase II (2021-Present): This is the Execution & Regulation phase. The focus shifted from announcing new strategies to the much slower, harder work of writing specific regulations (like the multi-year EU AI Act) and handling implementation details. This work is more difficult, takes far longer, and doesn’t appear in the database as a large number of new initiatives.
Part 6: Conclusions#
This project successfully combined two datasets to illustrate the complex relationship between AI governance and private investment.
Our final analysis, using a more rigorous “annual vs. annual” comparison, refutes the simple narrative of a “governance gap.” Instead, it reveals a more sophisticated story: Proactive governments laid the policy groundwork from 2018-2020, only to be followed by a private investment explosion in 2021 of a magnitude no one could have fully anticipated. After this peak, policymaking has shifted from “strategic breadth” to “regulatory depth,” entering a slower, more difficult phase of implementation.
It’s a great reminder that in AI, the visualization you choose isn’t just about aesthetics, it can completely change the narrative.
Part 7: Data Sources#
AI Policy Data:
Source: OECD.AI Policy Observatory
Dataset: “Database of National AI Policies”
Link: https://wp.oecd.ai/app/uploads/2024/03/oecd-ai-all-ai-policies.csv
AI Investment Data:
Source: Our World in Data
Dataset: “Global total private investment in AI”
Link: https://ourworldindata.org/grapher/private-investment-in-artificial-intelligence