Dorling Cartogram of US County Population

This document guides you through the steps to create a Dorling cartogram of US county population data from the 2020 Census using R. You’ll learn how to retrieve, transform, and visualize the data, and export the cartogram in SVG and PNG formats.
Author

Julian Hoffmann Anton

Published

01-11-2024

USA County Population Dorling Cartogram (2020 Census)

This document guides you through the steps to create a Dorling cartogram of US county population data from the 2020 Census using R. You’ll learn how to retrieve, transform, and visualize the data, and export the cartogram in SVG and PNG formats.

Prerequisites

Install required packages if they are not already installed.

#Install necessary packages
#install.packages(c("tidyverse", "tidycensus", "sf", "devtools"))

# Install and load 'cartogram' from GitHub
library(devtools)
#install_github("sjewo/cartogram",force = TRUE)

# Load libraries
library(tidyverse)
library(tidycensus)
library(sf)
library(cartogram)
library(ggplot2)

Step 1: Register for a Census API Key

To access Census data, you’ll need an API key. Register for a free API key on the U.S. Census Bureau API Key Request page: https://api.census.gov/data/key_signup.html

Once registered, you’ll receive the key via email.

Here is also the general guide: https://www.census.gov/data/developers/guidance/api-user-guide.html

Step 2: Set Up Your Census API

After receiving your key, replace “your_actual_api_key” with your actual key in the code below.

# Set up the Census API key (add your actual API key)
your_actual_api_key <-"write_the_key_you_received"

# REMOVE HASHTAG HERE census_api_key(your_actual_api_key, install = TRUE, overwrite = TRUE) 
readRenviron("~/.Renviron")
Your original .Renviron will be backed up and stored in your R HOME directory if needed.
Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CENSUS_API_KEY"). 
To use now, restart R or run `readRenviron("~/.Renviron")`

Step 3: Load 2020 Decennial Census Variables

Retrieve the variables for the 2020 Census and the data itself.

# Load census variables
variables <- load_variables(2020, "pl", cache = TRUE)

# Retrieve county-level population data with geometries
county_data <- get_decennial(
  geography = "county",
  variables = "P1_001N",  # Total population variable
  year = 2020,
  sumfile = "pl",
  geometry = TRUE,
  output = "wide"
)
Getting data from the 2020 decennial Census
Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Using the PL 94-171 Redistricting Data Summary File
Note: 2020 decennial Census data use differential privacy, a technique that
introduces errors into data to preserve respondent confidentiality.
ℹ Small counts should be interpreted with caution.
ℹ See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.
This message is displayed once per session.
# Transform to Albers Equal Area projection
county_data_proj <- st_transform(county_data, crs = "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96")

Step 4: Create and Plot the Dorling Cartogram

Generate the cartogram using the cartogram_dorling function. /!\ This can take several minutes/!\

# Create the Dorling cartogram : /!\ This can take a couple of minutes
county_dorling <- cartogram_dorling(county_data_proj, 
                                    weight = "P1_001N", 
                                    k = 0.1)

Step 5: Plot the cartogram of the United States of America

library(ggrepel)
# Identify the top 10 counties by population and create label columns
county_dorling_extra <- county_dorling %>%
  arrange(desc(P1_001N)) %>%
  mutate(
    rank = row_number(),                             # Rank by population
    pop_millions = if_else(rank <= 10, round(P1_001N / 1e6, 1), NA_real_), # Convert top 10 populations to millions with one decimal
    label = if_else(rank <= 10, NAME, NA_character_), # Label only top 10 counties with their names
    label_rank = if_else(rank <= 10, paste0(rank, " | ", NAME, " (", pop_millions, "M)"), NA_character_) # Rank, name, and population in millions
  )

# Calculate centroids for all geometries to act as repellers
centroids <- st_centroid(county_dorling_extra$geometry)

# Add centroid coordinates to the data frame for plotting
county_dorling_extra <- county_dorling_extra %>%
  mutate(
    centroid_x = st_coordinates(centroids)[, 1],
    centroid_y = st_coordinates(centroids)[, 2]
  )

# Plot with repelled labels, including rank, name, and population for top 10
USA_cartogram <- ggplot() +
  geom_sf(data = county_dorling_extra, aes(fill = P1_001N), color = "black") +
  scale_fill_viridis_c(option = "plasma", trans = "log10", labels = scales::comma) +
  
  # Invisible points for all geometries to act as repellers
  geom_point(data = county_dorling_extra, aes(x = centroid_x, y = centroid_y), color = NA) +
  
  # Add repelled labels with rank, name, and population in millions for top 10
  geom_label_repel(
    data = county_dorling_extra,
    aes(x = centroid_x, y = centroid_y, label = label_rank), # Display rank, name, and population in millions
    size = 2,
    color = "black",
    fill = scales::alpha("white", 0.8), # Set alpha for label background
    max.overlaps = Inf,
    box.padding = 0.7,                  # Padding around labels
    point.padding = 0,                  # Padding around points
    min.segment.length = 0,
    segment.color = "darkgrey",         # Segment color
    force = 2,                          # Repelling force
    segment.size = 0.5,                 # Segment thickness
    segment.curvature = 0.15,           # Curved lines for segments
    segment.square = TRUE,              # Connects to label edges
    seed = 69,
    label.size = NA                     # Remove label border
  ) +
  theme_void() +
  labs(
    title = "USA County Population",
    subtitle = "2020 Census - Top 10 Counties labelled",
    fill = "Population",
    caption = "Data source: US CENSUS 2020 API\nDorling Cartogram made in R - Map and Tutorial by www.Julian-Hoffmann-Anton.com"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, vjust = 0, margin = margin(t = 10, b = 0), size = 13, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, vjust = 0, margin = margin(t = 3, b = -110), size = 8),
    plot.margin = margin(t = 0, b = 0, l = 10, r = 8),
    plot.caption = element_text(size = 7)
  )

USA_cartogram

Step 6: Export the Plot as SVG and PNG

Use the following code to save your cartogram

# Save as SVG
ggsave("my_USA_cartogram_v1.svg", plot = USA_cartogram, width = 10, height = 6, dpi = 300)

# Save as PNG
ggsave("my_USA_cartogram_v3.png", plot = USA_cartogram, width = 10, height = 6, dpi = 300)

Step 7: Blender 3D version in part 2

check www.Julian-Hoffmann-Anton.com for updates