R语言学习（二）数据可视化

数据可视化

条形图(Bar plots)

基本条形图：barplot(height)，其中height是一个向量或者矩阵
例子：

# vertical barplot
barplot(counts,main="Simple Bar Plot",xlab="Improvement",ylab="Frequency")

# horizontal bar plot
barplot(counts,main="Horizontal Bar Plot",xlab="Frequency", ylab="Improvement",horiz=TRUE)

main是条形图标题，xlab和ylab是横总坐标的标签，最后一个可选参数表示是否为纵条形图

堆砌条形图：
ppt4.8

library(vcd)
counts <- table(Arthritis$Improved, Arthritis$Treatment)

# stacked barplot
barplot(counts,main="Stacked Bar Plot",xlab="Treatment", ylab="Frequency",col=c("red", "yellow","green"),
legend=rownames(counts))

# grouped barplot
barplot(counts,main="Grouped Bar Plot",xlab="Treatment", ylab="Frequency",col=c("red", "yellow", "green"),
legend=rownames(counts), beside=TRUE)

均值条形图：例,美国各地区平均文盲率排序的条形图:

states <- data.frame(state.region, state.x77)#R自带数据
means <- aggregate(states$Illiteracy,by=list(state.region), FUN=mean) #求均值 (见第三章)

means <- means[order(means$x),] #排序
barplot(means$x, names.arg=means$Group.1)
title("Mean Illiteracy Rate")

饼图(Pie charts)

pie(x,label)
例：

slices <- c(10, 12, 4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany","France")
pie(slices, labels = lbls, main="Simple Pie Chart")

3D饼图：

install.packages("plotrix")
library(plotrix)

pie3D(slices,label=lbls,explode=0.1,main="3D Pie Chart")

其中explode是快之间的距离

扇形图(Fan plot)

install.packages("plotrix")
library(plotrix)

fan.plot(slices,label=lbls,main=""Fan Plot")

直方图(Histograms)

hist(vector,
    breaks=12, #组的数量
    col="red", #条形颜色
    xlab="Miles Per Gallon",
    main="Colored histogram with 12 bins")

核密度图(Kernal density plots)

plot(density(x)),x是一个数值型向量

使用lines()叠加：

hist(vector,freq=FALSE,breaks=12,col="red",xlab="Miles Per Gallon",main="Histogram, density curve")
lines(density(vector), col="blue", lwd=2)

箱线图/盒状图(Box plots)

boxplot(mtcars$mpg, main="Box plot",ylab="Miles per Gallon")

并列箱线图：
进行跨组比较，箱线图可以展示单个变量或分组变量。使用格式为:boxplot(formula, data=dataframe)，其中的formula是一个公式,dataframe代表提供数据的数据框(或列表)。一个示例公式为y~A,这将为类别型变量A的每个值并列地生成数值型变量y的箱线图。公式y~A*B则将为类别型变量A和B所有水平的两两组合生成数值型变量y的箱线图。

boxplot(mpg~cyl,data=mtcars,
main="Car Milage Data",
xlab="Number of Cylinders",
ylab="Miles Per Gallon")

点图(Dot plots)

dotchart(x,labels=),其中x是一个数值向量，而labels是由每个点标签组成的向量。例1：

dotchart(mtcars$mpg, labels=row.names(mtcars), cex=.7,
main="Gas Mileage for Car Models", xlab="Miles Per
Gallon")
#cex指定字符大小

例2：（分组颜色点图）

x <- mtcars[order(mtcars$mpg),]
x$cyl <- factor(x$cyl)
x$color[x$cyl==4] <- "red"
x$color[x$cyl==6] <- "blue"
x$color[x$cyl==8] <- "darkgreen"
dotchart(x$mpg,
labels = row.names(x),
cex=.7,
pch=19,
groups = x$cyl,
gcolor = "black",
color = x$color,
main = "Gas Mileage for Car Models\ngrouped by cylinder",
xlab = "Miles Per Gallon")

散点图(Scatter plots)

plot(x,y)，xy是数值型向量，代表(x,y)点

例子：

#探究车重和单位油量行驶公里数的关系
attach(mtcars)
plot(wt, mpg,
main="Basic Scatterplot of MPG vs. Weight",
xlab="Car Weight (lbs/1000)",
ylab="Miles Per Gallon ", pch=19)
abline(lm(mpg ~ wt), col="red", lwd=2, lty=1)
#abline()函数用来添加最佳拟合的线性直线
detach(mtcars)

pairs()函数可以创建基础的散点图矩阵。
例：

pairs(~ mpg + disp + drat + wt, data=mtcars,
main="Basic Scatterplot Matrix")

install.packages("car")
library(car)
scatterplotMatrix(~ mpg + disp + drat + wt,
data=mtcars, spread=FALSE,
smoother.args=list(lty=2), main="Scatter Plot
Matrix via car Package")

线性和平滑拟合曲线被默认添加,主对角线处添加了核密度曲线和轴须图。spread = FALSE选项表示不添加展示分散度和对称信息的直线,lty.smooth =2设定平滑(loess)拟合曲线使用虚线而不是实线。

马赛克图(Mosaic plots)

马赛克图用于可视化两个以上的类别型变量(只观察单个类别型变量,可以使用柱状图或者饼图)
mosaic(table)