library(tidyverse)

A GRAMMAR OF GRAPHICS

(Hadley Wickham rev., 2010) to describe the components of a plot:

ggplot2 further organizes these components into layers, where each layer displays a single type of (highly configurable) geometric object.

(data,aesthetics) + geometry

ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
#?midwest (ggplot2 built-in data)
#The data set contains information on each of 437 counties in 5 states in the midwestern United States (specifically, Illinois, Indiana, Michigan, Ohio, and Wisconsin)

# Plot the `midwest` data set, with the percentag of people with a college education on the x-axis and
# percentage of adult poverty on the y-axis
ggplot(data = midwest) +
  geom_point(mapping = aes(x = percollege, y = percadultpoverty)) # 

幾個步驟

  1. The ggplot() function is passed the data frame to plot as the named data argument (it can also be passed as the first positional argument). Calling this function creates the blank canvas on which the visualization will be created.

  2. You specify the type of geometric object (sometimes referred to as a geom) to draw by calling one of the many geom_ functions — in this case, geom_point(). Functions to render a layer of geometric objects all share a common prefix (geom_), followed by the name of the kind of geometry you wish to create. For example, geom_point() will create a layer with “point” (dot) elements as the geometry.

  3. In each geom_ function, you must specify the aesthetic mappings, which specify how data from the data frame will be mapped to the visual aspects of the geometry. These mappings are defined using the aes() (aesthetic) function.

  1. add layers of geometric objects to the plot by using the addition (+) operator.

幾何物件設定 specifying geometrics

using the appropriate geom_ function including:

  • geom_point() for drawing individual points 點 (e.g., for a scatterplot)

  • geom_line() for drawing lines 線 (e.g., for a line chart)

  • geom_smooth() for drawing smoothed lines 平滑線 (e.g., for simple trends or approximations)

  • geom_col() for drawing columns 行(e.g., for a bar chart)

  • geom_polygon() for drawing arbitrary shapes 多邊形 (e.g., for drawing an area in a coordinate plane)

Each of these geom_ functions requires as an argument a set of aesthetic mappings (defined using the aes() function.

Since graphics are two-dimensional representations of data, almost all geom_ functions require an x and y mapping.

# A bar chart of the total population of each state
# The `state` is mapped to the x-axis, and the `poptotal` is mapped
# to the y-axis

ggplot(data = midwest) +
  geom_col(mapping = aes(x = state, y = poptotal))

# A hexagonal aggregation that counts the co-occurrence of college
# education rate and percentage of adult poverty
# 六角形
ggplot(data = midwest) +
  geom_hex(mapping = aes(x = percollege, y = percadultpoverty))

  • What makes this really powerful is that you can add multiple geometries to a plot. This allows you to create complex graphics showing multiple aspects of your data
# A plot with both points and a smoothed line
ggplot(data = midwest) +
  geom_point(mapping = aes(x = percollege, y = percadultpoverty)) +
  geom_smooth(mapping = aes(x = percollege, y = percadultpoverty))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

# A plot with both points and a smoothed line, sharing aesthetic mappings
ggplot(data = midwest, mapping = aes(x = percollege, y = percadultpoverty)) +
  geom_point() + # uses the default x and y mappings
  geom_smooth()  # uses the default x and y mappings
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

視覺特性映射 Aesthetic Mappings

The aesthetic mappings take properties of the data and use them to influence visual channels (graphical encodings), such as position, color, size, or shape.

# Change the color of each point based on the state it is in
ggplot(data = midwest) +
  geom_point(
    mapping = aes(x = percollege, y = percadultpoverty, color = state)
  )

# Set a consistent color ("red") for all points -- not driven by data
ggplot(data = midwest) +
  geom_point(
    mapping = aes(x = percollege, y = percadultpoverty),
    color = "red",
    alpha = .3
  )

COMPLEX LAYOUTS AND CUSTOMIZATION

Position Adjustments

  • A stacked bar chart of the number of people in each state (by race). Colors are added by setting a fill aesthetic based on the race column.
# Wrangle and reshape the data using `tidyr` and `dplyr` -- a common step!
# Select the columns for racial population totals, then
# `gather()` those column values into `race` and `population` columns
state_race_long <- midwest %>%
  select(state, popwhite, popblack, popamerindian, popasian, popother) %>%
  gather(key = race, value = population, -state) # all columns except `state`

# Create a stacked bar chart of the number of people in each state
# Fill the bars using different colors to show racial composition
ggplot(state_race_long) +
  geom_col(mapping = aes(x = state, y = population, fill = race))

  • position argument: To see the relative measures within each state side by side, you can use a “dodge” position.
# Create a percentage (filled) column of the population (by race) in each state
ggplot(state_race_long) +
  geom_col(
    mapping = aes(x = state, y = population, fill = race), position = "fill"
  )

# Create a grouped (dodged) column of the number of people (by race) in each state
ggplot(state_race_long) +
  geom_col(
    mapping = aes(x = state, y = population, fill = race), position = "dodge"
  )

Styling with Scales

Whenever you specify an aesthetic mapping, ggplot2 uses a particular scale to determine the range of values that the data encoding should be mapped to.

# Plot the `midwest` data set, with college education rate on the x-axis and
# percentage of adult poverty on the y-axis. Color by state.
ggplot(data = midwest) +
  geom_point(mapping = aes(x = percollege, y = percadultpoverty, color = state))

Thus, when you specify a plot, ggplot2 automatically adds a scale for each mapping to the plot:

# Plot the `midwest` data set, with college education rate and
# percentage of adult poverty. Explicitly set the scales.
ggplot(data = midwest) +
  geom_point(mapping = aes(x = percollege, y = percadultpoverty, color = state)) +
  scale_x_continuous() + # explicitly set a continuous scale for the x-axis
  scale_y_continuous() + # explicitly set a continuous scale for the y-axis
  scale_color_discrete() # explicitly set a discrete scale for the color aesthetic

Coordinate Systems

  • coord_cartesian(): The default Cartesian coordinate system, where you specify x and y values—x values increase from left to right, and y values increase from bottom to top

  • coord_flip(): A Cartesian system with the x and y flipped

  • …..

# Filter down to top 10 most populous counties
top_10 <- midwest %>%
  top_n(10, wt = poptotal) %>%
  unite(county_state, county, state, sep = ", ") %>% # combine state + county
  arrange(poptotal) %>% # sort the data by population
  mutate(location = factor(county_state, county_state)) # set the row order

# Render a horizontal bar chart of population
ggplot(top_10) +
  geom_col(mapping = aes(x = location, y = poptotal)) +
  coord_flip() # switch the orientation of the x- and y-axes

Facets

  • Facets are ways of grouping a visualization into multiple different pieces (subplots), allowing you to view a separate plot for each unique value in a categorical variable.

(breaking a plot up into facets is similar to using the group_by() verb in dplyr: it creates the same visualization for each group separately (just as summarize() performs the same analysis for each group).)

  • You can construct a plot with multiple facets by using a facet_ function such as facet_wrap(). This function will produce a “row” of subplots, one for each categorical variable (the number of rows can be specified with an additional argument)
#A comparison of each county’s adult poverty rate and college education rate. A separate plot is created for each state using the facet_wrap() function.

# Create a better label for the `inmetro` column
labeled <- midwest %>%
  mutate(location = if_else(inmetro == 0, "Rural", "Urban"))

# Create the same chart as Figure 16.9, faceted by state
ggplot(data = labeled) +
  geom_point(
    mapping = aes(x = percollege, y = percadultpoverty, color = location),
    alpha = .6
  ) +
 facet_wrap(~state) # pass the `state` column as a *formula* to `facet_wrap()` !!!!

Labels and Annotations

# Adding better labels to the plot in Figure 16.10
ggplot(data = labeled) +
  geom_point(
    mapping = aes(x = percollege, y = percadultpoverty, color = location),
    alpha = .6
  ) +

  # Add title and axis labels
  labs(
    title = "Percent College Educated versus Poverty Rates", # plot title
    x = "Percentage of College Educated Adults", # x-axis label
    y = "Percentage of Adults Living in Poverty", # y-axis label
    color = "Urbanity" # legend label for the "color" property
  )

  • You can also add labels into the plot itself (e.g., to label each point or line) by adding a new geom_text() (for plain text) or geom_label() (for boxed text). The background and border for each piece of text is created by using the geom_label_repel() function, which provides labels that don’t overlap.
# Using labels to identify the county in each state with the highest level of poverty. The ggrepel package is used to prevent labels from overlapping.

library(ggrepel)

# Find the highest level of poverty in each state
most_poverty <- midwest %>%
  group_by(state) %>% # group by state
  top_n(1, wt = percadultpoverty) %>% # select the highest poverty county
  unite(county_state, county, state, sep = ", ") # for clear labeling

# Store the subtitle in a variable for cleaner graphing code
subtitle <- "(the county with the highest level of poverty
  in each state is labeled)"

# Plot the data with labels
ggplot(data = labeled, mapping = aes(x = percollege, y = percadultpoverty)) +

  # add the point geometry
  geom_point(mapping = aes(color = location), alpha = .6) +
  # add the label geometry
  geom_label_repel(
    data = most_poverty, # uses its own specified data set
    mapping = aes(label = county_state),
    alpha = 0.8
  ) +

  # set the scale for the axis
  scale_x_continuous(limits = c(0, 55)) +

  # add title and axis labels
  labs(
    title = "Percent College Educated versus Poverty Rates", # plot title
    subtitle = subtitle, # subtitle
    x = "Percentage of College Educated Adults", # x-axis label
    y = "Percentage of Adults Living in Poverty", # y-axis label
    color = "Urbanity" # legend label for the "color" property
  )

BUILDING MAPS

  • you can use the package to draw geographic maps. Because two-dimensional maps already depend on a coordinate system (latitude and longitude)

  • Choropleth Maps and Dot distribution maps

# Load a shapefile of U.S. states using ggplot's `map_data()` function
# install.packages("maps")
#install.packages("mapproj")
state_shape <- map_data("state")

# Create a blank map of U.S. states
ggplot(state_shape) +
  geom_polygon(
    mapping = aes(x = long, y = lat, group = group),
    color = "white", # show state outlines
    size = .1        # thinly stroked
  ) +
  coord_map() # use a map-based coordinate system

# Create a data frame of city coordinates to display
cities <- data.frame(
  city = c("Seattle", "Denver"),
  lat = c(47.6062, 39.7392),
  long = c(-122.3321, -104.9903)
)

# Draw the state outlines, then plot the city points on the map
ggplot(state_shape) +
  geom_polygon(mapping = aes(x = long, y = lat, group = group)) +
  geom_point(
    data = cities, # plots own data set
    mapping = aes(x = long, y = lat), # points are drawn at given coordinates
    color = "red"
  ) +
  coord_map() # use a map-based coordinate system

Exercise

共筆連結