library(tidyverse)
(Hadley Wickham rev., 2010) to describe the components of a plot:
The data being plotted
The geometric objects (e.g., circles, lines) that appear on the plot
The aesthetics (appearance) of the geometric objects, and the mappings from variables in the data to those aesthetics
A position adjustment for placing elements on the plot so they don’t overlap
A scale (e.g., a range of values) for each aesthetic mapping used
A coordinate system used to organize the geometric objects
The facets or groups of data shown in different plots
ggplot2 further organizes these components into layers, where each layer displays a single type of (highly configurable) geometric object.
(data,aesthetics) + geometry
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
先用 ggplot()
繪製一個圖布 canvas(吃一個 dataframe 引數),建立一個座標系統,讓你之後可以疊加不同的圖層上去。
ggplot2
函式庫提供許多 geom
函式, 每個皆可以為圖表新增不同類型的圖層。
geom
函式都接受一個 mapping
引數,定義了資料集中的變數是如何被映射 (mapped) 到視覺特性上。#?midwest (ggplot2 built-in data)
#The data set contains information on each of 437 counties in 5 states in the midwestern United States (specifically, Illinois, Indiana, Michigan, Ohio, and Wisconsin)
# Plot the `midwest` data set, with the percentag of people with a college education on the x-axis and
# percentage of adult poverty on the y-axis
ggplot(data = midwest) +
geom_point(mapping = aes(x = percollege, y = percadultpoverty)) #
幾個步驟
The ggplot()
function is passed the data frame to plot as the named data argument (it can also be passed as the first positional argument). Calling this function creates the blank canvas on which the visualization will be created.
You specify the type of geometric object (sometimes referred to as a geom
) to draw by calling one of the many geom_ functions — in this case, geom_point()
. Functions to render a layer of geometric objects all share a common prefix (geom_), followed by the name of the kind of geometry you wish to create. For example, geom_point()
will create a layer with “point” (dot) elements as the geometry.
In each geom_ function
, you must specify the aesthetic mappings, which specify how data from the data frame will be mapped to the visual aspects of the geometry. These mappings are defined using the aes()
(aesthetic) function.
aes()
function takes a set of named arguments (like a list), where the argument name is the visual property to map to, and the argument value is the data feature (i.e., the column in the data frame) to map from. The value returned by the aes()
function is passed to the named mapping argument (or passed as the first positional argument).using the appropriate geom_
function including:
geom_point()
for drawing individual points 點 (e.g., for a scatterplot)
geom_line()
for drawing lines 線 (e.g., for a line chart)
geom_smooth()
for drawing smoothed lines 平滑線 (e.g., for simple trends or approximations)
geom_col()
for drawing columns 行(e.g., for a bar chart)
geom_polygon()
for drawing arbitrary shapes 多邊形 (e.g., for drawing an area in a coordinate plane)
Each of these geom_
functions requires as an argument a set of aesthetic mappings (defined using the aes()
function.
Since graphics are two-dimensional representations of data, almost all geom_ functions require an x and y mapping.
# A bar chart of the total population of each state
# The `state` is mapped to the x-axis, and the `poptotal` is mapped
# to the y-axis
ggplot(data = midwest) +
geom_col(mapping = aes(x = state, y = poptotal))
# A hexagonal aggregation that counts the co-occurrence of college
# education rate and percentage of adult poverty
# 六角形
ggplot(data = midwest) +
geom_hex(mapping = aes(x = percollege, y = percadultpoverty))
# A plot with both points and a smoothed line
ggplot(data = midwest) +
geom_point(mapping = aes(x = percollege, y = percadultpoverty)) +
geom_smooth(mapping = aes(x = percollege, y = percadultpoverty))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
# A plot with both points and a smoothed line, sharing aesthetic mappings
ggplot(data = midwest, mapping = aes(x = percollege, y = percadultpoverty)) +
geom_point() + # uses the default x and y mappings
geom_smooth() # uses the default x and y mappings
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
The aesthetic mappings take properties of the data and use them to influence visual channels (graphical encodings), such as position, color, size, or shape.
# Change the color of each point based on the state it is in
ggplot(data = midwest) +
geom_point(
mapping = aes(x = percollege, y = percadultpoverty, color = state)
)
# Set a consistent color ("red") for all points -- not driven by data
ggplot(data = midwest) +
geom_point(
mapping = aes(x = percollege, y = percadultpoverty),
color = "red",
alpha = .3
)
# Wrangle and reshape the data using `tidyr` and `dplyr` -- a common step!
# Select the columns for racial population totals, then
# `gather()` those column values into `race` and `population` columns
state_race_long <- midwest %>%
select(state, popwhite, popblack, popamerindian, popasian, popother) %>%
gather(key = race, value = population, -state) # all columns except `state`
# Create a stacked bar chart of the number of people in each state
# Fill the bars using different colors to show racial composition
ggplot(state_race_long) +
geom_col(mapping = aes(x = state, y = population, fill = race))
# Create a percentage (filled) column of the population (by race) in each state
ggplot(state_race_long) +
geom_col(
mapping = aes(x = state, y = population, fill = race), position = "fill"
)
# Create a grouped (dodged) column of the number of people (by race) in each state
ggplot(state_race_long) +
geom_col(
mapping = aes(x = state, y = population, fill = race), position = "dodge"
)
Whenever you specify an aesthetic mapping, ggplot2 uses a particular scale to determine the range of values that the data encoding should be mapped to.
# Plot the `midwest` data set, with college education rate on the x-axis and
# percentage of adult poverty on the y-axis. Color by state.
ggplot(data = midwest) +
geom_point(mapping = aes(x = percollege, y = percadultpoverty, color = state))
Thus, when you specify a plot, ggplot2 automatically adds a scale for each mapping to the plot:
# Plot the `midwest` data set, with college education rate and
# percentage of adult poverty. Explicitly set the scales.
ggplot(data = midwest) +
geom_point(mapping = aes(x = percollege, y = percadultpoverty, color = state)) +
scale_x_continuous() + # explicitly set a continuous scale for the x-axis
scale_y_continuous() + # explicitly set a continuous scale for the y-axis
scale_color_discrete() # explicitly set a discrete scale for the color aesthetic
coord_cartesian()
: The default Cartesian coordinate system, where you specify x and y values—x values increase from left to right, and y values increase from bottom to top
coord_flip()
: A Cartesian system with the x and y flipped
…..
# Filter down to top 10 most populous counties
top_10 <- midwest %>%
top_n(10, wt = poptotal) %>%
unite(county_state, county, state, sep = ", ") %>% # combine state + county
arrange(poptotal) %>% # sort the data by population
mutate(location = factor(county_state, county_state)) # set the row order
# Render a horizontal bar chart of population
ggplot(top_10) +
geom_col(mapping = aes(x = location, y = poptotal)) +
coord_flip() # switch the orientation of the x- and y-axes
(breaking a plot up into facets is similar to using the group_by()
verb in dplyr
: it creates the same visualization for each group separately (just as summarize()
performs the same analysis for each group).)
facet_wrap()
. This function will produce a “row” of subplots, one for each categorical variable (the number of rows can be specified with an additional argument)#A comparison of each county’s adult poverty rate and college education rate. A separate plot is created for each state using the facet_wrap() function.
# Create a better label for the `inmetro` column
labeled <- midwest %>%
mutate(location = if_else(inmetro == 0, "Rural", "Urban"))
# Create the same chart as Figure 16.9, faceted by state
ggplot(data = labeled) +
geom_point(
mapping = aes(x = percollege, y = percadultpoverty, color = location),
alpha = .6
) +
facet_wrap(~state) # pass the `state` column as a *formula* to `facet_wrap()` !!!!
# Adding better labels to the plot in Figure 16.10
ggplot(data = labeled) +
geom_point(
mapping = aes(x = percollege, y = percadultpoverty, color = location),
alpha = .6
) +
# Add title and axis labels
labs(
title = "Percent College Educated versus Poverty Rates", # plot title
x = "Percentage of College Educated Adults", # x-axis label
y = "Percentage of Adults Living in Poverty", # y-axis label
color = "Urbanity" # legend label for the "color" property
)
geom_text()
(for plain text) or geom_label()
(for boxed text). The background and border for each piece of text is created by using the geom_label_repel()
function, which provides labels that don’t overlap.# Using labels to identify the county in each state with the highest level of poverty. The ggrepel package is used to prevent labels from overlapping.
library(ggrepel)
# Find the highest level of poverty in each state
most_poverty <- midwest %>%
group_by(state) %>% # group by state
top_n(1, wt = percadultpoverty) %>% # select the highest poverty county
unite(county_state, county, state, sep = ", ") # for clear labeling
# Store the subtitle in a variable for cleaner graphing code
subtitle <- "(the county with the highest level of poverty
in each state is labeled)"
# Plot the data with labels
ggplot(data = labeled, mapping = aes(x = percollege, y = percadultpoverty)) +
# add the point geometry
geom_point(mapping = aes(color = location), alpha = .6) +
# add the label geometry
geom_label_repel(
data = most_poverty, # uses its own specified data set
mapping = aes(label = county_state),
alpha = 0.8
) +
# set the scale for the axis
scale_x_continuous(limits = c(0, 55)) +
# add title and axis labels
labs(
title = "Percent College Educated versus Poverty Rates", # plot title
subtitle = subtitle, # subtitle
x = "Percentage of College Educated Adults", # x-axis label
y = "Percentage of Adults Living in Poverty", # y-axis label
color = "Urbanity" # legend label for the "color" property
)
you can use the package to draw geographic maps. Because two-dimensional maps already depend on a coordinate system (latitude and longitude)
Choropleth Maps and Dot distribution maps
# Load a shapefile of U.S. states using ggplot's `map_data()` function
# install.packages("maps")
#install.packages("mapproj")
state_shape <- map_data("state")
# Create a blank map of U.S. states
ggplot(state_shape) +
geom_polygon(
mapping = aes(x = long, y = lat, group = group),
color = "white", # show state outlines
size = .1 # thinly stroked
) +
coord_map() # use a map-based coordinate system
# Create a data frame of city coordinates to display
cities <- data.frame(
city = c("Seattle", "Denver"),
lat = c(47.6062, 39.7392),
long = c(-122.3321, -104.9903)
)
# Draw the state outlines, then plot the city points on the map
ggplot(state_shape) +
geom_polygon(mapping = aes(x = long, y = lat, group = group)) +
geom_point(
data = cities, # plots own data set
mapping = aes(x = long, y = lat), # points are drawn at given coordinates
color = "red"
) +
coord_map() # use a map-based coordinate system