R语言 | ggplot2简明绘图之散点图

Original 大邓大邓和他的Python

2024-09-09

准备

在这里，我需要导入本节需要的包。tidyverse 包括八个包，其中之一是 ggplot。primer.data包拥有比R 内置的更多的数据集。今天以散点图为例，一点点绘制出来

画布gglot

画画需要画布，对于数据分析的绘图也是同理。导入相关R包后，用ggplot函数构造一个画布。因为还没设定数据，所以这是一个空画布

ggplot()

我们将使用nhanes数据集，传入数据的代码ggplot(data=nhanes)

ggplot(data=nhanes)

画布看起来依然是空白的，不要紧张。理解这个之前类比PS这类绘图软件，将修图工作看做是很多个图层的叠加。现在我们使用时依然在最底层的ggplot图层，在ggplot函数内添加mapping=aes()参数，准备添加x轴、y轴、color。的图层。

ggplot(data=nhanes,
       mapping=aes())

注意了，从现在图层即将发生变化。我们选择设置x周、y轴、color的字段。

x轴 height身高
y轴 weight体重
color gender性别

ggplot(data=nhanes,
       mapping=aes(
         x=height,
         y=weight,
         color=gender))

现在我们将开始添加高层次的图层，也会显示越来越多的信息。

添加geom

现在添加geom层，该层是通过 + 构建在ggplot层之上。这里使用geom_point绘制散点图，

ggplot(data=nhanes,
       mapping = aes(
         x=height,
         y=weight,
         color=gender))+
  geom_point()

Wow! 不错的开始，不过这个图中的点互相之间重叠的有一点点严重，需要设定点的大小size和透明度alpha来控制重叠效果。

ggplot(data=nhanes,
       mapping=aes(
         x=height,
         y=weight,
         color=gender))+
  geom_point(alpha=0.3, size=0.5)

much better! 但能否按性别，分别绘制男、女的散点图。

分面facet

接下来添加一个分面函数 facet_wrap。该函数会分别生成男性分面、女性分面

ggplot(data=nhanes,
       mapping=aes(
         x=height,
         y=weight,
         color=gender))+
  geom_point(alpha=0.3, size=0.5)+
  facet_wrap(~gender)

现在我们有了两个分面图

添加第二个geom

现在我们需要添加一个趋势线，可以使用 geom_smooth 函数，因为geom_smooth和geom_point都是geom层的函数，理所当然它俩比facet_wrap层更近一些。为了让趋势线更明显，将散点的透明度设置的更浅，比如0.1

ggplot(data=nhanes,
       mapping=aes(
         x=height,
         y=weight,
         color=gender))+
  geom_point(alpha=0.1, size=0.5)+
  geom_smooth()+
  facet_wrap(~gender)

现在，我们想让趋势线更平滑一些。在geom_smooth中，我们会设置 method="loess"以使得趋势线更平滑。formula=y~x表示y的变化与x有关。

ggplot(data=nhanes,
       mapping=aes(
         x=height,
         y=weight,
         color=gender))+
  geom_point(alpha=0.1, size=0.5)+
  geom_smooth(method="loess", formula=y~x)+
  facet_wrap(~gender)

标签labs

现在我们需要用labs函数给图片添加标签图层。例如title、subtitle、caption、x、y、legend。

ggplot(data=nhanes,
       mapping=aes(
         x=height,
         y=weight,
         color=gender))+
  geom_point(alpha=0.1, size=0.5)+
  geom_smooth(method="loess", formula=y~x)+
  facet_wrap(~gender)+
  labs(title = "Heights in the U.S.",
       subtitle = "On average, men weigh more and are taller than women")

现在有了正副标题，横纵坐标没有数量单位，不太nice，这里更改为 Height(cm)、Weight(kg)

ggplot(data=nhanes,
       mapping=aes(
         x=height,
         y=weight,
         color=gender))+
  geom_point(alpha=0.1, size=0.5)+
  geom_smooth(method="loess", formula=y~x)+
  facet_wrap(~gender)+
  labs(title = "Heights in the U.S.",
       subtitle = "On average, men weigh more and are taller than women",
       x="Height (cm)",
       y="Weight (kg)")

Awesome! 但图例Lengend中的 gender 依然是小写，我希望改为大写。我们知道x、y、color分别对应height、weight、gender，所以如果更改gender，需要设置的是color。

ggplot(data=nhanes,
       mapping=aes(
         x=height,
         y=weight,
         color=gender))+
  geom_point(alpha=0.1, size=0.5)+
  geom_smooth(method="loess", formula=y~x)+
  facet_wrap(~gender)+
  labs(title = "Heights in the U.S.",
       subtitle = "On average, men weigh more and are taller than women",
       x="Height (cm)",
       y="Weight (kg)",
       color="Gender")

但是看到这个图片时，其他人会想原始数据是啥情况，怎么来的。这时候我们需要告诉大家nhances数据集来自于 National Health and Nutrition Examination Survey。通过设置labs的caption参数即可。

ggplot(data=nhanes,
       mapping=aes(
         x=height,
         y=weight,
         color=gender))+
  geom_point(alpha=0.1, size=0.5)+
  geom_smooth(method="loess", formula=y~x)+
  facet_wrap(~gender)+
  labs(title = "Heights in the U.S.",
       subtitle = "On average, men weigh more and are taller than women",
       x="Height (cm)",
       y="Weight (kg)",
       color="Gender",
       caption="Source: National Health and Nutrition Examination Survey")

更改配色

绘图已经相当完整，但geom层的散点颜色可能不是咱的最爱，如何设置颜色呢？

更改geom层的颜色，所以该层紧贴geom层，且在geom层之上。设置方法使用 scale_color_manual() 即可。scale_color_munual()中的values可以传入颜色十六进制的字符串，还可以传入颜色字符串。

ggplot(data=nhanes,
       mapping=aes(
         x=height,
         y=weight,
         color=gender))+
  geom_point(alpha=0.1, size=0.5)+
  geom_smooth(method="loess", formula=y~x)+
  scale_color_manual(values=c("magenta", "blue"))+
  facet_wrap(~gender)+
  labs(title = "Heights in the U.S.",
       subtitle = "On average, men weigh more and are taller than women",
       x="Height (cm)",
       y="Weight (kg)",
       color="Gender",
       caption="Source: National Health and Nutrition Examination Survey")

中文问题

默认ggplot2不支持中文，为了能显示中文，需要使用showtext包

library(ggplot2)
library(primer.data) #提供数据
library(showtext) #支持中文
showtext_auto()

ggplot(data=nhanes,
       mapping=aes(
         x=height,
         y=weight,
         color=gender))+
  geom_point(alpha=0.1, size=0.5)+
  geom_smooth(method="loess", formula=y~x)+
  scale_color_manual(values=c("magenta", "blue"))+
  facet_wrap(~gender)+
  labs(title = "美国身高",
       subtitle = "平均而言，男性群体的身高会高于女性群体",
       x="身高(cm)",
       y="体重(kg)",
       color="性别",
       caption="数据源: National Health and Nutrition Examination Survey")

精选文章

从符号到嵌入：计算社会科学的两种文本表示
长期征稿 | 欢迎各位前来投稿
17G数据集 | 深交所企业社会责任报告
百度指数 | 使用qdata采集百度指数
推荐 | 社科(经管)文本分析快速指南
视频分享 | 文本分析在经管研究中的应用
MS | 使用网络算法识别创新的颠覆性与否
使用cntext训练Glove词嵌入模型
认知的测量 | 向量距离vs语义投影
Wordify | 发现和区分消费者词汇的工具
在jupyter中显示pdf内容
EmoBank | 中文维度情感词典
Asent库 | 英文文本数据情感分析
视频专栏课 | Python网络爬虫与文本分析
PNAS | 文本网络分析&文化桥梁Python代码实现
BERTopic库 | 使用预训练模型做话题建模
tomotopy | 速度最快的LDA主题模型
管理世界 | 使用文本分析词构建并测量短视主义
Wow~70G上市公司定期报告数据集
100min视频 | Python文本分析与会计
在jupyter内运行R代码
blogdown包 | 使用R语言维护Hugo静态网站
R语言 | 使用posterdown包制作学术会议海报

R语言 | 将多个txt汇总到一个csv文件中

继续滑动看下一个

大邓和他的Python

向上滑动看下一个

一把短刀，怎么就让他连捅18人？！

这次我怀疑邱成桐已经“学阀化”了

13岁工作的常务副县长，接连缺席官方活动

内塔尼亚胡喊话伊朗人民：我们是一伙的，哈梅内伊政权才是敌人

向杨大市长道歉

R语言 | ggplot2简明绘图之散点图

准备

画布gglot

添加geom

分面facet

添加第二个geom

标签labs

更改配色

中文问题

精选文章

R语言 | 将多个txt汇总到一个csv文件中

您可能也对以下帖子感兴趣

一把短刀，怎么就让他连捅18人？！

这次我怀疑邱成桐已经“学阀化”了

13岁工作的常务副县长，接连缺席官方活动

内塔尼亚胡喊话伊朗人民：我们是一伙的，哈梅内伊政权才是敌人

向杨大市长道歉

生成图片，分享到微信朋友圈

R语言 | ggplot2简明绘图之散点图

准备

画布gglot

添加geom

分面facet

添加第二个geom

标签labs

更改配色

中文问题

精选文章

R语言 | 将多个txt汇总到一个csv文件中

您可能也对以下帖子感兴趣