函数因子用于将向量编码为因子(术语“category”和“enumerated type”也用于因子)。如果参数ordered为TRUE,则假定factorlevels是有序的。为了与S兼容,还有一个函数是有序的。
factor(x = character(), levels, labels = levels,
exclude = NA, ordered = is.ordered(x), nmax = NA)
ordered(x, ...)
is.factor(x)
is.ordered(x)
as.factor(x)
as.ordered(x)
addNA(x, ifany = FALSE)
x : 一种数据向量,通常取少量的离散值。
levels : x可能采用的唯一值(作为字符串)的可选向量。默认值是作为角色(x) ,按x的递增顺序排序。请注意,此集合可以指定为小于sort(unique(x))。
labels : 级别标签的可选字符向量(与删除exclude中的级别后的级别顺序相同),或长度为1的字符串。标签中的重复值可用于将x的不同值映射到同一因子级别。
exclude : 在形成水平集时要排除的值向量。这可能是与xor应该是字符的级别集相同的因子。
ordered : 逻辑标志,用于确定级别是否应重新排序(按给定的顺序)。
nmax : 级别数的上限;见“细节”。
... : (在命令中):除命令本身以外的上述任何一种。
ifany : 仅在使用时添加NA水平,如有(is.na公司(x) )。
(ff < - factor(substring("statistics", 1:10, 1:10), levels = letters))
as.integer(ff) # the internal codes
(f. < - factor(ff)) # drops the levels that do not occur
ff[, drop = TRUE] # the same, more transparently
factor(letters[1:20], labels = "letter")
class(ordered(4:1)) # "ordered", inheriting from "factor"
z < - factor(LETTERS[3:1], ordered = TRUE)
## and "relational" methods work:
stopifnot(sort(z)[c(1,3)] == range(z), min(z) < max(z))
## suppose you want "NA" as a level, and to allow missing values.
(x < - factor(c(1, 2, NA), exclude = NULL))
is.na(x)[2] < - TRUE
x # [1] 1
is.na(x)
# [1] FALSE TRUE FALSE
## More rational, since R 3.4.0 :
factor(c(1:2, NA), exclude = "" ) # keeps
factor(c(1:2, NA), exclude = NULL) # always did
## exclude =
z # ordered levels 'A < B < C'
factor(z, exclude = "C") # does exclude
factor(z, exclude = "B") # ditto
## Now, labels maybe duplicated:
## factor() with duplicated labels allowing to "merge levels"
x < - c("Man", "Male", "Man", "Lady", "Female")
## Map from 4 different values to only two levels:
(xf < - factor(x, levels = c("Male", "Man" , "Lady", "Female"),
labels = c("Male", "Male", "Female", "Female")))
#> [1] Male Male Male Female Female
#> Levels: Male Female
## Using addNA()
Month < - airquality$Month
table(addNA(Month))
table(addNA(Month, ifany = TRUE))