| Title: | Nonparametric Statistical Methods |
|---|---|
| Description: | Accompanies the book "Nonparametric Statistical Methods Using R, 2nd Edition" by Kloke and McKean (2024, ISBN:9780367651350). Includes methods, datasets, and random number generation useful for the study of robust and/or nonparametric statistics. Emphasizes classical nonparametric methods for a variety of designs --- especially one-sample and two-sample problems. Includes methods for general scores, including estimation and testing for the two-sample location problem as well as Hogg's adaptive method. |
| Authors: | John Kloke [aut, cre], Joseph McKean [aut] |
| Maintainer: | John Kloke <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 2.0.1 |
| Built: | 2026-05-20 07:44:13 UTC |
| Source: | https://github.com/kloke/npsm |
This a simulated data set which is used as an example of analysis of covariance. The data frame acov231 contains the data. The responses are in column 1, column 2 contains the levels of factor A, column 3 contains the levels of factor B, and the 4th column contains the covariate. All true parameters (effects) are 0 in this generated data set.
data(acov231)data(acov231)
A data frame with 33 observations and 4 variables.
responsenumeric. the response.
fAnumeric. factor A with 2 levels.
fBnumeric. factor B with 3 levels.
covariatenumeric. a covariate.
Kloke, J. and McKean J.W. (2014), Nonparametric Statistical Methods using R, Boca Raton, FL: Chapman-Hall.
levs = c(2,3) data = acov231[,1:3] xcov = matrix(acov231[,4],ncol=1) temp = kancova(levs,data,xcov)levs = c(2,3) data = acov231[,1:3] xcov = matrix(acov231[,4],ncol=1) temp = kancova(levs,data,xcov)
Aligned rank test for a group/treatment effect after adjusting for covariates.
aligned.test(x, y, g, scores = Rfit::wscores,...)aligned.test(x, y, g, scores = Rfit::wscores,...)
x |
n by p design matrix |
y |
n by 1 response vector |
g |
n by 1 vector denoting group/treatment membership. |
scores |
Which scores should be used for the fit and the test. An object of class scores. |
... |
optional arguments. passed to rfit. |
Data are aligned based on the design matrix x using a rank-based fit via rfit.
statistic |
The value of the test statistic. |
p.value |
The p-value based on a chisq(k-1) distribution where k is the number of groups/treatments. |
John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
y<-rt(30,2) x<-runif(30) g<-rep(1:3,each=10) aligned.test(x,y,g)y<-rt(30,2) x<-runif(30) g<-rep(1:3,each=10) aligned.test(x,y,g)
Demographics and position information on 1000 randomly selected baseball players who debuted after 1945.
data("baseball_players1000")data("baseball_players1000")
A data frame with 1000 observations on the following 28 variables.
playerIDa character vector
birthYeara numeric vector
birthMontha numeric vector
birthDaya numeric vector
birthCountrya character vector
birthStatea character vector
nameFirsta character vector
nameLasta character vector
weighta numeric vector
heighta numeric vector
batsa character vector
throwsa character vector
debutYeara numeric vector
G_alla numeric vector
G_pa numeric vector
G_ca numeric vector
G_1ba numeric vector
G_2ba numeric vector
G_3ba numeric vector
G_ssa numeric vector
G_lfa numeric vector
G_cfa numeric vector
G_rfa numeric vector
G_ofa numeric vector
G_dha numeric vector
G_pha numeric vector
G_pra numeric vector
pitchera logical vector
A random subset of baseball players who debuted after 1945 and played in at least 160 games. Includes information on birth (date and location); height (inches) and weight (pounds); whether they bat left (L), right (R), or switch (B); and games played at each postion. The variable pitcher is a derived variable based on if the majority of games were played as a pitcher (i.e.; G_pr/G_all > 0.5).
https://github.com/chadwickbureau/baseballdatabank
https://github.com/chadwickbureau/baseballdatabank/blob/master/readme2014.txt
data(baseball_players1000) hist(baseball_players1000$weight,xlab="Weight (lbs)", probability=TRUE, ylim=c(0,0.02), main="Histogram of Weight for 1000 Baseball Players") lines(density(baseball_players1000$weight,na.rm=TRUE))data(baseball_players1000) hist(baseball_players1000$weight,xlab="Weight (lbs)", probability=TRUE, ylim=c(0,0.02), main="Histogram of Weight for 1000 Baseball Players") lines(density(baseball_players1000$weight,na.rm=TRUE))
Batting (average, home runs, RBIs) statistics for 2010 full time players. By full time we mean that the batter had at least 450 official at bats during the season.
data(bb2010)data(bb2010)
A data frame with 122 observations on the following 3 variables.
avebatting average
hrhome runs
rbiruns batted in
baseballguru.com
plot(hr~ave,data=bb2010)plot(hr~ave,data=bb2010)
Data table from Table 9.11 of Hollander and Wolfe (1999). The data consists of triglyceride levels on 13 patients. Two factors, each at two levels, were recorded: Sex and Obesity. The concomitant variables are chylomicrons, age, and three lipid variables (very low-density lipoproteins (VLDL), low-density lipoproteins (LDL), and high-density lipoproteins (HDL)).
data(blood.plasma)data(blood.plasma)
A data frame with 13 observations on 8 variables.
TotalTriglyceride level, response
SexSex, 2 levels
ObeseObesity, 2 levels
ChyloChylomicrons, covariate
VLDLVery low density, lipids, covariate
LDLLow density, lipids, covariate
HDLHigh density, lipids, covariate
AgeAge
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
data(blood.plasma) plot(Total~Age,data=blood.plasma) boxplot(Total~Obese,data=blood.plasma)data(blood.plasma) plot(Total~Age,data=blood.plasma) boxplot(Total~Obese,data=blood.plasma)
Basic Summaries of Boxscores for the Major League Baseball team Milwaukee (WI) Brewers 1982 Season. The Brewers won the American League championship that year. Brewer, Robin Yount won the Most Valueable Player (MVP) award. #Robin Yount. MVP.
data("brewers1982")data("brewers1982")
A data frame with 163 observations on the following 8 variables.
Datea character vector
Oppa character vector
Ra numeric vector
RAa numeric vector
Timea character vector
Attendancea numeric vector
homea logical vector
wina logical vector
data(brewers1982) # proportion of wins for a given number of runs scored pwin <- with(brewers1982,tapply(win,R,mean)) pwin # graphical display of the above plot(names(pwin),pwin,xlab='Runs', ylab='Proportion of Wins',main='Brewers 1982')data(brewers1982) # proportion of wins for a given number of runs scored pwin <- with(brewers1982,tapply(win,R,mean)) pwin # graphical display of the above plot(names(pwin),pwin,xlab='Runs', ylab='Proportion of Wins',main='Brewers 1982')
Survival times (in days) for undergoing standard treatment (S) and a new treatment (N).
data("cancertrt")data("cancertrt")
A data frame with 17 observations on the following 3 variables.
timeSurvival time in days
eventIndicator for event
trta factor with levels N S
Higgins (2004), Introduction to Modern Nonparametric Statistics, Pacific Grove, CA:Brooks/Cole–Thomson Learning
data(cancertrt) with(cancertrt,gehan.test(time,event,trt))data(cancertrt) with(cancertrt,gehan.test(time,event,trt))
Centers a matrix.
centerx(x)centerx(x)
x |
a matrix |
Returns a centered matrix, i.e., each column of the matrix is replaced by deviations from its column mean.
The centered matrix.
John Kloke, Joseph McKean
scale
x <- cbind(seq(1,5,length=5),seq(10,20,length=5)) xc <- centerx(x) apply(xc,1,mean)x <- cbind(seq(1,5,length=5),seq(10,20,length=5)) xc <- centerx(x) apply(xc,1,mean)
A regression example with response cloud point of a liquid and predictor the percent of Iodine 8 added to the liquid; see Chapter 3 of Hettmansperger and McKean (2011) or Exercise 4.9.10 of Kloke and McKean (2014)/Exercise 4.7.7 of Kloke and McKean (2024).
data(cloud)data(cloud)
Nineteen observations on two variables.
cloud.pointCloud point of the liquid
I8Percent Iodine 8 added
Draper, N.R. and Smith, H. (1966), Applied Regression Analysis, New York: John Wiley and Sons.
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods Using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods Using R, Second Edition, Boca Raton, FL: Chapman-Hall.
rfit(cloud.point ~ I8,data=cloud)rfit(cloud.point ~ I8,data=cloud)
Returns a bootstrap confidence interval for any of the correlations available in the base R
cor function.
cor.boot.ci(x, y, method = "spearman", conf = 0.95, nbs = 3000)cor.boot.ci(x, y, method = "spearman", conf = 0.95, nbs = 3000)
x |
n by 1 vector |
y |
n by 1 vector |
method |
Which correlation to use. Argument passed to |
conf |
Confidence level. |
nbs |
number of bootstrap samples to base CI on. |
Obtains a percentile bootstrap confidence interval.
The bootstrap samples are obtained via the function boot.
A confidence interval.
John Kloke, Joseph McKean
See Also as cor
library(boot) with(bb2010,cor.boot.ci(ave,hr))library(boot) with(bb2010,cor.boot.ci(ave,hr))
A regression example with response energy output in watts and the predictor temperature difference in degrees Kevin; see Devore (2012) and Exercise 4.9.11 of Kloke and McKean (2014)/Exercise 4.7.8 of Kloke and McKean (2024).
data(energy)data(energy)
Twenty-four observations on two variables.
outputEnergy output in watts
temp.diffTemperature difference in K
Devore, J. (2012), Probaility and statistics for engineering and the sciences, 8th ed., Boston: Brooks/Cole.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
rfit(output ~ temp.diff,data=energy)rfit(output ~ temp.diff,data=energy)
The amount of time it took 22 baseball players to round first base for each of three methods of rounding.
data(firstbase)data(firstbase)
A data frame with 22 observations on the following 3 variables.
round.outTime when using round out method.
narrow.angleTime when using narrow angle method.
wide.angleTime when using wide angle method.
Rounding methods are illustrated in Figure 7.1 of Hollander and Wolfe (1999).
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Returns the Fligner-Kileen test for homogeneous scales for two-samples. Also estimates of ratio of scales based on the logs of folded median-aligned samples and a corresponding confidence interval is computed. fk.test computes the value of the statistic based on squared-normal scores following the optimal (for normal errors) such test described in Section 2.10 of Hettmansperger and McKean (2011). Hence, it will differ from the core R routine fligner.test; see the discussion in Section 3.3 of Kloke and McKean (2014)/Section 3.5 of Kloke and McKean (2024).
fk.test(x,y,alternative = c("two.sided", "less", "greater"),conf.level = 0.95)fk.test(x,y,alternative = c("two.sided", "less", "greater"),conf.level = 0.95)
x |
vector of first sample responses |
y |
vector of second sample responses |
alternative |
alternative indicator for hypotheses |
conf.level |
confidence coefficient for the returned confidence intervals |
Returns the Fligner-Kileen test for the two-sample scale problem.
statistic |
chi-squared test statistic |
p.value |
p-value of the test |
estimate |
vector of estimates of ratio of scales |
conf.int |
table of confidence intervals |
John Kloke, Joseph McKean
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
fkk.test
x<-rnorm(18) y<-rnorm(22)*3 fk.test(x,y)x<-rnorm(18) y<-rnorm(22)*3 fk.test(x,y)
Returns the Fligner-Kileen test for homogeneous scales for k-samples. Also estimates of ratio of scales based on the logs of folded median-aligned samples and a corresponding confidence interval is computed. The first level (sample) is referenced. See the discussion in Section 5.7 of Kloke and McKean (2014)/Section 5.8 of Kloke and McKean (2024).
fkk.test(y,ind,conf.level = 0.95)fkk.test(y,ind,conf.level = 0.95)
y |
vector of responses |
ind |
vector of corresponding levels |
conf.level |
confidence coefficient for the returned confidence intervals |
Returns the Fligner-Kileen test for the k-sample scale problem.
statistic |
chi-squared test statistic |
p.value |
p-value of the test |
estimate |
vector of estimates of ratio of scales |
conf.int |
table of confidence intervals |
cwts |
vector of weights based on the estimates difference in scales |
John Kloke, Joseph McKean
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
fk.test
y1 <- rnorm(10) y2 <- rnorm(12)*3 y3 <- rnorm(15)*5 y<-c(y1,y2,y3) ind<-rep(1:3,times=c(10,12,15)) fkk.test(y,ind)y1 <- rnorm(10) y2 <- rnorm(12)*3 y3 <- rnorm(15)*5 y<-c(y1,y2,y3) ind<-rep(1:3,times=c(10,12,15)) fkk.test(y,ind)
Returns the test based on placements for the Behrens-Fisher problem. This test was developed by Fligner and Policello (1981); see, also, Section 2.11 of Hettmansperger and McKean (2011) and Section 4.4 of Hollander and Wolfe (1999). The version computed by fp.test is discussed in Section 3.4 of Kloke and McKean (2014)/Section 3.6 of Kloke and McKean (2024).
fp.test(x,y,delta0=0,alternative = "two.sided")fp.test(x,y,delta0=0,alternative = "two.sided")
x |
vector of first sample responses |
y |
vector of second sample responses |
delta0 |
null value tested |
alternative |
alternative indicator for hypotheses |
Returns the Placement Test for the Behrens-Fisher problem.
statistic |
chi-squared test statistic |
p.value |
p-value of the test |
numerator |
numerator of test statistic |
denominator |
denominator of test statistic |
John Kloke, Joseph McKean
Fligner, M.~A. and Policello, G.~E. (1981), Robust rank procedures for the Behrens-Fisher problem, Journal of the American Statistical Association, 76, 162–168.
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Hollander, M. and Wolfe, D.~A. (1999), Nonparametric statistical methods, 2nd Edition, New York: John Wiley and Sons.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Generalization of the Wilcoxon rank sum which allows for censored data.
gehan.test(time, event, trt)gehan.test(time, event, trt)
time |
Time of event or of censoring |
event |
Indicator variable representing a event occur or not (time is censored) |
trt |
Variable indicating treatment group. |
statistic |
Value of the test statistic |
p.value |
p-value |
John Kloke
Higgins (2004), Introduction to Modern Nonparametric Statistics, Pacific Grove, CA:Brooks/Cole–Thomson Learning
n<-76 y<-rexp(n) event<-rbinom(n,1,0.7) # about 30% censored trt<-sample(c(0,1),n,replace=TRUE) gehan.test(y,event,trt)n<-76 y<-rexp(n) event<-rbinom(n,1,0.7) # about 30% censored trt<-sample(c(0,1),n,replace=TRUE) gehan.test(y,event,trt)
Returns the hetrogeneous slopes design matrix used in ANCOVA. It refereences the first level.
getxact(amat,bmat)getxact(amat,bmat)
amat |
cell mean design matrix of factor. |
bmat |
matrix of covariates. |
Returns the heterogeneous slopes analysis of covariance matrix.
cmat |
heterogeneous slopes analysis of covariance matrix |
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
Returns the hetrogeneous slopes design matrix used in ANCOVA. It refereences the first level. Also, column names are supplied.
getxact2(amat,bmat)getxact2(amat,bmat)
amat |
cell mean design matrix of factor. |
bmat |
matrix of covariates. |
Returns the heterogeneous slopes analysis of covariance matrix.
cmat |
heterogeneous slopes analysis of covariance matrix eith columns named |
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
Hemorrhage data from Dupont.
data(hemorrhage)data(hemorrhage)
A data frame with 71 observations on the following 3 variables.
genotypea numeric vector
timea numeric vector
recura numeric vector
Dupont
data(hemorrhage) ## maybe str(hemorrhage) ; plot(hemorrhage) ...data(hemorrhage) ## maybe str(hemorrhage) ; plot(hemorrhage) ...
Hodges-Lehmann type estimation and confidence intervals.
hodges_lehmann.ci(x, y, var.equal = FALSE, conf.level = 0.95, ...)hodges_lehmann.ci(x, y, var.equal = FALSE, conf.level = 0.95, ...)
x |
numeric vector. |
y |
numeric vector. |
var.equal |
logical. Assume scales are equal (TRUE) of not (FALSE). |
conf.level |
confidence level to be used for the confidence interval. |
... |
optional arguments. currently unused. |
Currently implements 2-sample estimation and confidence intervals based on methods purposed by Hodges and Lehnmann.
estimate |
parameter point estimate |
stderr |
estimated standard error of point estimate |
conf.int |
estimated confidence interval |
John Kloke, Joseph McKean
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall.
zoo<-c(390,258,298,255,324,240,416,319,225,284) rh <- c(187,186,179,269,382,264,353 ,38,350,267,229,383,254,302,195, 43,337,390) hodges_lehmann.ci(zoo,rh)zoo<-c(390,258,298,255,324,240,416,319,225,284) rh <- c(187,186,179,269,382,264,353 ,38,350,267,229,383,254,302,195, 43,337,390) hodges_lehmann.ci(zoo,rh)
These data are described in Example~11.7 of Hollander and Wolfe (1999). Results from a clinical trial in early Hodgkin's disease. Subjects received one of two treatments: radiation of affected node (AN) or total nodal radiation (TN).
data("hodgkins")data("hodgkins")
A data frame with 49 observations on the following 3 variables.
timeSurvival time
relapseIndicator variable for relapse
trttreatment: a factor with levels AN TN
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Based on selector statistics (Q1 & Q2) one of four score functions is choosen. A rank test and p-value is then calculated based on it.
hogg.test(x, y, ...)hogg.test(x, y, ...)
x |
n by 1 vector |
y |
m by 1 vector |
... |
additional arguments. currently not used |
statistic |
Value of the test statistic. |
p.value |
p-value based on a normal approximation. |
scores |
Which of the score functions was choosen. |
John Kloke, Patrick Kimes
Hogg, R. McKean, J, Craig, A (2013) Introduction to Mathematical Statistics, 7th Ed. Boston: Pearson.
hogg.test(rt(20,1),rt(22,1)+0.2)hogg.test(rt(20,1),rt(22,1)+0.2)
Q1 is a measure of skewness and Q2 is a measure of tail heaviness.
Q1(z)Q1(z)
z |
n by 1 vector |
Used as selector statistics in adaptive schemes. Both Q1 and Q2 are ratios. For Q1, the numerator is upper 5% mean minus the middle 50% mean, while the denominator is difference between the middle 5% mean and the lower 5% mean. For Q2, the numerator is upper 5% mean minus the lower 5% mean, while the denominator is difference between the upper 50% mean and the lower 50% mean. These statistics are not robust.
Returns the calculated ratio as a numeric scalar.
John Kloke
Hogg, R. McKean, J, Craig, A (2013) Introduction to Mathematical Statistics, 7th Ed. Boston: Pearson.
A data set presented on Page 496 of huitema (2011). The design is a 2 by 2 with one covariate.
data(huitema496)data(huitema496)
A 16 by 4 array with the following 4 columns:
ynumber of novel responses.
itype of reinforcement (2 levels).
jtype of program (2 levels).
xcovariate, a measure of verbal fluency.
Discussion can be found in both references listed below.
Huitema, B.E. (2011), The analysis of covariance and alternatives, 2nd ed., New York: Wiley.
Huitema, B.E. (2011), The analysis of covariance and alternatives, 2nd ed., New York: Wiley.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
huitema496 <- data.frame(huitema496) fit <- rfit(y~factor(i)+factor(j)+x,data=huitema496) summary(fit)huitema496 <- data.frame(huitema496) fit <- rfit(y~factor(i)+factor(j)+x,data=huitema496) summary(fit)
Study the breakdown time of an electrical insulating fluid subject to seven different levels of voltage stress.
data("insulation")data("insulation")
A data frame with 76 observations on the following 2 variables.
log.stresslog of voltage stress
log.timelog of failure time
Nelson, W. (1982), Applied lifetime data analysis, New York: John Wiley and Sons.
Lawless, J.F. (1982), Statistical models and methods for lifetime data, New York: John Wiley and Sons.
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
myscores <- logGFscores myscores@param <- c(1,5) fit <- rfit(log.time ~ log.stress,scores=myscores,data=insulation) summary(fit) fit$tauhatmyscores <- logGFscores myscores@param <- c(1,5) fit <- rfit(log.time ~ log.stress,scores=myscores,data=insulation) summary(fit) fit$tauhat
Internal functions not intended for general use. Used in calculation of Hogg's Qs.
lmean(z, p)lmean(z, p)
z |
n by 1 vector |
p |
scalar |
Returns the calculated value as a numeric scalar.
John Kloke, Joseph McKean
Computes Jonckheere's Test for Ordered Alternatives; see Section 5.6 of Kloke and McKean (2014)/Section 5.7 of Kloke and McKean (2024).
jonckheere(y, groups)jonckheere(y, groups)
y |
vector of responses |
groups |
vector of associated groups (levels) |
Computes Jonckheere's Test for Ordered Alternatives. The main source was downloaded from the site:
smtp.biostat.wustl.edu/sympa/biostat/arc/s-news/2000-10/msg00126.html
Jonckheere |
test statistic |
ExpJ |
null expectation |
VarJ |
null variance |
p |
p-value |
John Kloke, Joseph McKean
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
smtp.biostat.wustl.edu/sympa/biostat/arc/s-news/2000-10/msg00126.html
r<-rnorm(30) gp<-c(rep(1,10),rep(2,10),rep(3,10)) jonckheere(r,gp)r<-rnorm(30) gp<-c(rep(1,10),rep(2,10),rep(3,10)) jonckheere(r,gp)
Returns a robust rank-based analysis of covariance for a k-way layout assuming heterogenous slopes; see Section 5.4 of Kloke and McKean (2014)/Sections 5.6 and 7.3 of Kloke and McKean (2024). Currently only wilcoxon scores are used.
kancova(levs,data,xcov,print.table=TRUE)kancova(levs,data,xcov,print.table=TRUE)
levs |
vector of levels corresponding to the factors A, B, C, etc. |
data |
matrix with response in column 1 and level in column 2 |
xcov |
matrix of covariates |
print.table |
logical indicating a table should be printed |
Returns the analysis of covariance table assuming heterogenous slopes for a k-way layout.
tab2 |
analysis of covariance |
fint |
rank-based ful model (heterogenous slopes |
fithomog |
rank-based ful model (homogeneous slopes |
John Kloke, Joseph McKean
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
levels <- c(2,2) y.group <- huitema496[,c('y','i','j')] xcov <- huitema496[,'x'] kancova(levels,y.group,xcov)levels <- c(2,2) y.group <- huitema496[,c('y','i','j')] xcov <- huitema496[,'x'] kancova(levels,y.group,xcov)
routine used in making the display of the ANCOVA table obtained by kancova.
kancovarown(vec)kancovarown(vec)
vec |
vector to be labeled. |
Returns the labels.
nm |
vector of labels |
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
Train a k nearest neighbors (knn) classifer via cross validation (cv). The number of folds and the set of the number of neihbors to consider may be specified.
knn_cv(xy, k.cv = 5, kvec = seq(1, 47, by = 2))knn_cv(xy, k.cv = 5, kvec = seq(1, 47, by = 2))
xy |
Data frame with the data matrix x as the first set of columns and the vector y as the last column. |
k.cv |
scalar. number of folds to use. default is 5. |
kvec |
vector. set of neighbors to consider. default is odd integers between 1 and 47 (inclusive). |
kvec |
set of neighbors considered |
error |
vector of misclassification error rates corresponding to kvec |
k.best |
number of neighbors with lowest error rate |
k.cv |
number of folds to used |
John Kloke
Hastie, T., Tibshiani, R., and Friedman, J. (2017), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, New York: Springer.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013), An Introduction to Statistical Learning with Applications in R, New York: Springer.
Venables, W. N. and Ripley, B. D. (2002) _Modern Applied Statistics with S._ Fourth edition. Springer.
train_set <- sim_class2[sim_class2$train==1,-1] set.seed(19180511) fit_cv <- knn_cv(train_set,k.cv=10) fit_cvtrain_set <- sim_class2[sim_class2$train==1,-1] set.seed(19180511) fit_cv <- knn_cv(train_set,k.cv=10) fit_cv
The response variable is the quality of a vintage based on a scale of 1 to 5 over the years 1961 to 2004. The predictor is end of harvest, days between August 31st and the end of harvest for that year, and the factor of interest is whether or not it rained at harvest time.
data(latour)data(latour)
A data frame with 44 rows and 4 columns.
yearYear of harvest
qualityRating on a scale of 1-5
end.of.harvestDays August 31 and the end of harvest
rainindicator variable for rain
Sheather, SJ (2009), A Modern Approach to Regression with R, New York: Springer.
data(latour) plot(quality~end.of.harvest,pch='',data=latour) points(quality~end.of.harvest,data=latour[latour$rain==0,],pch=3) points(quality~end.of.harvest,data=latour[latour$rain==1,],pch=4)data(latour) plot(quality~end.of.harvest,pch='',data=latour) points(quality~end.of.harvest,data=latour[latour$rain==0,],pch=3) points(quality~end.of.harvest,data=latour[latour$rain==1,],pch=4)
Mood's classical nonparametric method for calculating a difference in population medians.
mood.ci(x, y, var.equal = FALSE, conf.level = 0.95, ...)mood.ci(x, y, var.equal = FALSE, conf.level = 0.95, ...)
x |
n x 1 vector |
y |
m x 1 vector |
var.equal |
Logical. Assume scale of the two populations are equal. |
conf.level |
numeric value. confidence level for the confidence interval. |
... |
not currently implmented |
A vector of length 2 containing the lower and upper endpoints of the confidence interval.
John Kloke, Joseph McKean
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall.
x <- rt(101,9) y <- rt(108,9)+0.3 mood.ci(x,y)x <- rt(101,9) y <- rt(108,9)+0.3 mood.ci(x,y)
Returns tests for homogeneous slopes and also assuming homogeneous slopes a test for differences in level. Currently only wilcoxon scores are used.
onecova(levs,data,xcov,print.table=TRUE)onecova(levs,data,xcov,print.table=TRUE)
levs |
Number of levels of the one-way design |
data |
matrix with response in column 1 and level in column 2 |
xcov |
matrix of covariates |
print.table |
logical indicating a table should be printed |
Returns the analysis of covariance table.
tab |
analysis of covariance |
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
data=latour[,c('quality','rain')] xcov<-cbind(latour['end.of.harvest']) onecova(2,data,xcov,print.table=TRUE)data=latour[,c('quality','rain')] xcov<-cbind(latour['end.of.harvest']) onecova(2,data,xcov,print.table=TRUE)
Returns a robust rank-based analysis of covariance for a one-way layout assuming heterogenous slopes; see Section 5.4 of Kloke and McKean (2014)/Sections 5.6 and 7.3 of Kloke and McKean (2024). Currently only wilcoxon scores are used.
onecovaheter(levs,data,xcov,print.table=TRUE)onecovaheter(levs,data,xcov,print.table=TRUE)
levs |
Number of levels of the one-way design |
data |
matrix with response in column 1 and level in column 2 |
xcov |
matrix of covariates |
print.table |
logical indicating a table should be printed |
Returns the analysis of covariance table assuming heterogenous slopes.
tab |
analysis of covariance |
fit |
rank-based ful model (heterogenous slopes |
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
data=latour[,c('quality','rain')] xcov<-cbind(latour['end.of.harvest']) onecovaheter(2,data,xcov,print.table=TRUE)data=latour[,c('quality','rain')] xcov<-cbind(latour['end.of.harvest']) onecovaheter(2,data,xcov,print.table=TRUE)
Returns a robust rank-based analysis of covariance for a one-way layout assuming homogeneous slopes; see Section 5.4 of Kloke and McKean (2014)/Sections 5.6 and 7.3 of Kloke and McKean (2024). Currently only wilcoxon scores are used.
onecovahomog(levs,data,xcov,print.table=TRUE)onecovahomog(levs,data,xcov,print.table=TRUE)
levs |
Number of levels of the one-way design |
data |
matrix with response in column 1 and level in column 2 |
xcov |
matrix of covariates |
print.table |
logical indicating a table should be printed |
Returns the analysis of covariance table assuming homogeneous slopes.
tab |
analysis of covariance |
fit |
rank-based ful model (homogeneous slopes |
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
data=latour[,c('quality','rain')] xcov<-cbind(latour['end.of.harvest']) onecovahomog(2,data,xcov,print.table=TRUE)data=latour[,c('quality','rain')] xcov<-cbind(latour['end.of.harvest']) onecovahomog(2,data,xcov,print.table=TRUE)
Returns the placements of the first vector in terms of the second vector used the R function fp.test; see Section 2.11 of Hettmansperger and McKean (2011) and Section 4.4 of Hollander and Wolfe (1999). The version computed by fp.test is discussed in Section 3.4 of Kloke and McKean (2014)/Section 3.6 of Kloke and McKean (2024).
place(x,y)place(x,y)
x |
first vector |
y |
second vector of second sample responses |
Returns the Placements for the routine fp.test.
ic |
vector of placements. |
John Kloke, Joseph McKean
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Hollander, M. and Wolfe, D.~A. (1999), Nonparametric statistical methods, 2nd Edition, New York: John Wiley and Sons.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Abebe et al. (2001) discuss a dataset resulting from a three-way layout for a neurological experiment in which the time required for a mouse to exit a narrow elevated wooden plank is measured. The response is the log of time (in seconds) to exit. Interest lies in assessing the effects of three factors: the Mouse Strain (Tg+, Tg-), the mouse's Gender (female, male), and the mouse's Age (Aged, Middle, Young). The design is a 2 by 2 by 3 factorial design.
data(plank)data(plank)
A data frame with 64 observations on the following 4 variables.
responsea numeric vector
straina factor with levels 1 2
gendera factor with levels 1 2
agea factor with levels 1 2 3
Abebe, A., Crimin, K., McKean, J. W., Vidmar, T. J., and Haas, J. V. (2001) “Rank-Based Procedures for Linear Models: Applications to Pharmaceutical Science Data" Drug Information Journal,
data(plank) boxplot(response~strain,data=plank) raov(response~strain:gender:age,data=plank)data(plank) boxplot(response~strain,data=plank) raov(response~strain:gender:age,data=plank)
plots the misclassification error rate versus number of neighbors based on call to knn_cv
## S3 method for class 'knn_cv' plot(x, ...)## S3 method for class 'knn_cv' plot(x, ...)
x |
object of class knn_cv. |
... |
additional arguments. currently not used. |
The list x is assumed to have attributes kvec and error representing the number of neighbors and the corresponding misclassification rate, respectively.
No return value, called for side effects of creating plot.
John Kloke
Hastie, T., Tibshiani, R., and Friedman, J. (2017), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, New York: Springer.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013), An Introduction to Statistical Learning with Applications in R, New York: Springer.
Venables, W. N. and Ripley, B. D. (2002) _Modern Applied Statistics with S._ Fourth edition. Springer.
A simulated polynomial (3rd degree) model discussed in Section 4.7.1 of Kloke and McKean (2014)/4.6.1 of Kloke and McKean (2024).
data(poly)data(poly)
One-hundred observations on two variables.
yresponse variable
xpredictor
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
plot(y ~ x,data=poly)plot(y ~ x,data=poly)
Tests for the degree of a polnomial. This test was suggested by Graybill (1976) and is discussed from a robust point-of-view in Section 4.7.1. of Kloke and McKean (2014)/4.6.1 of Kloke and McKean (2024).
polydeg(y, x, P, alpha = 0.05)polydeg(y, x, P, alpha = 0.05)
y |
vector of responses |
x |
Predictor |
P |
Super degree of polynomial which provides a satisfactory fit |
alpha |
Level of the testing |
Returns the degree of the polynomial based on the algorithm.
deg |
The determined degree |
coll |
Matrix of step information |
fitf |
Fit of the polynomial based on the determoned degreer |
Graybill, F.A. (1976), Theory and application of the linear model, North Scituate, Ma: Duxbury Press.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
x <- 1:20 xc <- x - mean(x) y<- .2*xc + xc^3 +rt(20,3)*90 plot(y~x) polydeg(y,xc,6)x <- 1:20 xc <- x - mean(x) y<- .2*xc + xc^3 +rt(20,3)*90 plot(y~x) polydeg(y,xc,6)
Internal print functions
## S3 method for class 'hogg.test' print(x, digits = max(5, .Options$digits - 2), ...) ## S3 method for class 'rank.test' print(x,...) ## S3 method for class 'fkk.test' print(x,...) ## S3 method for class 'knn_cv' print(x,...) ## S3 method for class 'npsm.ci' print(x, estimate=FALSE,stderr=FALSE,digits = max(5, .Options$digits - 2),...)## S3 method for class 'hogg.test' print(x, digits = max(5, .Options$digits - 2), ...) ## S3 method for class 'rank.test' print(x,...) ## S3 method for class 'fkk.test' print(x,...) ## S3 method for class 'knn_cv' print(x,...) ## S3 method for class 'npsm.ci' print(x, estimate=FALSE,stderr=FALSE,digits = max(5, .Options$digits - 2),...)
x |
Object to be printed. |
digits |
Number of digits to present. Passed to print function. |
... |
Additional arguments. |
estimate |
not currently implemented. |
stderr |
not currently implemented. |
No return value, called for side effects
John Kloke, Joseph McKean
Under investigation in this clinical trial was the pharmaceutical agent diethylstilbestrol DES; subjects were assigned treatment to 1.0 mg DES (treatment = 2) or to placebo (treatment = 1).
data(prostate)data(prostate)
A data frame with 38 observations on the following 8 variables.
patienta numeric vector
treatmenta numeric vector
timea numeric vector
statusa numeric vector
agea numeric vector
shba numeric vector
sizea numeric vector
indexa numeric vector
http://www.crcpress.com/product/isbn/9781584883258
Collett, D. (2003) Modeling survival data in medical research CRC press.
data(prostate) boxplot(size~treatment,data=prostate)data(prostate) boxplot(size~treatment,data=prostate)
A regression example with response yearly upkeep of a home and the predictor value of home; see Bowerman et al. (2005) and Exercise 4.9.8 of Kloke and McKean (2014)/Exercise 7.6.2 of Kloke and McKean (2024).
data(qhic)data(qhic)
Forty observations on two variables.
upkeepannual upkeep expenditure of home (y)
valuevalue of the home (x)
Bowerman, B.L., O'Connell, R.T., and Koehler, A.B. (2005), Forecasting, time series, and regression: An applied approach, Australia: Thomson.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
plot(upkeep~value,data=qhic,xlab='Value (in $1000s)',ylab='Annual upkeep (in $10s)')plot(upkeep~value,data=qhic,xlab='Value (in $1000s)',ylab='Annual upkeep (in $10s)')
Two sample quail data.
data(quail2)data(quail2)
A data frame with 30 observations on the following 2 variables.
treatindicator variable for treatment
ldlldl measurement
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
McKean J.W., Vidmar, T.J., and Sievers, G.L. (1989), A robust two stage multiple comparison procedure with application to a random drug screen, Biometrics, 45, 1281–1297.
data(quail2) boxplot(ldl~treat,data=quail2)data(quail2) boxplot(ldl~treat,data=quail2)
A generalization of the Wilcoxon rank-sum test where a score function is applied to the ranks. Any scores from Rfit can be used as well as user defined. Default is to perform a Wilcoxon analysis.
rank.test(x, y, alternative = "two.sided", scores = Rfit::wscores, conf.int = FALSE, conf.level = 0.95)rank.test(x, y, alternative = "two.sided", scores = Rfit::wscores, conf.int = FALSE, conf.level = 0.95)
x |
m x 1 vector |
y |
n x 1 vector |
alternative |
one of 'two.sided', 'less', or 'greater' |
scores |
an object of class scores |
conf.int |
logical indicating if a confidence interval should be estimated |
conf.level |
desired level of confidence for interval |
Test is based on T = sum_i a(R(y_i)) where R is the rank based on the combined sample and a(t) = varphi(t/(N+1)). Confidence interval, if requested, is based on call to Rfit.
statistic |
Standardized value of test statistics |
Sphi |
Test statistic |
p.value |
p-value |
conf.int |
confidence interval for shift in location |
estimate |
point estimate for shift in location |
John Kloke, Joseph McKean
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
rank.test(rt(20,1),rt(22,1)+0.2)rank.test(rt(20,1),rt(22,1)+0.2)
Generate a random sample from a contaminated normal distribution.
rcn(n, eps, sigmac) rcn_5_5(n)rcn(n, eps, sigmac) rcn_5_5(n)
n |
sample size |
eps |
proportion of proportion of contamination |
sigmac |
standard devation of contaiminated component |
With probability (1-eps) a deviates are drawn from a standard normal distribution. With probability eps deviates are drawn from a normal distribution with mean 0 and standard devation sigmac rcn_5_5 is a special case where eps=0.05 and sigma=5.
n x 1 numeric vector containing the random deviates.
John Kloke, Joseph McKean
Hogg, R. McKean, J, Craig, A (2013) Introduction to Mathematical Statistics, 7th Ed. Boston: Pearson.
qqnorm(rcn(100,.25,10)) set.seed(101); rcn(10,0.05,5) set.seed(101); rcn_5_5(10)qqnorm(rcn(100,.25,10)) set.seed(101); rcn(10,0.05,5) set.seed(101); rcn_5_5(10)
Generate random data from a contaminated normal distribution where the contaimation is a multiplicative factor. As, for example, in cases of data recorded in incorrect units or incorrect decimal point.
rcnx100(n,eps=0.001,x=100,mu=0,sigma=1,...) rcnx(...) rcnx_01_100(n)rcnx100(n,eps=0.001,x=100,mu=0,sigma=1,...) rcnx(...) rcnx_01_100(n)
n |
sample size to be drawn. |
eps |
amount (probability) of contaminated observations |
x |
multiplier for the contaminated observations |
mu |
mean of uncontaminated samples |
sigma |
standard deviation of uncontaminated samples |
... |
optional arguments. |
Samples are drawn from a random normal distribution with mean mu and standard deviations. A fraction of the observations (eps) are multiplied by the factor x. rcnx is an alias for rcnx100. rcnx_01_100 is a special case where the observations are drawn from a standard normal distribution (i.e., mu=0 and sigma=1 — the defaults in rcnx100) and eps and x are specified as 0.01 and 100, respectively.
Numeric vector of length n is returned.
John Kloke
https://en.wikipedia.org/wiki/Fat-finger_error
set.seed(101); x1 <- rcnx100(10) set.seed(101); x2 <- rcnx(10) set.seed(101); x3 <- rcnx_01_100(10) qqnorm(rcnx(10000,eps=0.005,x=10)) qqnorm(rcnx(1000,eps=0.05,x=1/100))set.seed(101); x1 <- rcnx100(10) set.seed(101); x2 <- rcnx(10) set.seed(101); x3 <- rcnx_01_100(10) qqnorm(rcnx(10000,eps=0.005,x=10)) qqnorm(rcnx(1000,eps=0.05,x=1/100))
Random generation for the Laplace (double exponential) data with location 0 and scale 1.
rlaplace(n)rlaplace(n)
n |
scalar. number of random draws. |
A Laplace or double expoential distribution has heavier tails than a normal distribution and so a sample will tend to have additional outliers.
A vector of length n is returned containing the random data.
John Kloke, Joseph McKean
Hogg, Robert V.; McKean, Joseph; and Craig, Allen T., "Introduction to Mathematical Statistics (6th Edition)" (2005).
x <- rlaplace(100) qqnorm(x)x <- rlaplace(100) qqnorm(x)
A simulated regression model with one response and one predictor. It is discussed in Exercise 6.5.6 of Kloke and McKean (2014)/Exercise 8.11.23 of Kloke and McKean (2024).
data(rs)data(rs)
Fifty observations on two variables.
ysimulated response
xsimulated predictor
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
rfit(y ~ x,data=rs)rfit(y ~ x,data=rs)
A data set discussed in Hollander and Wolfe (1999) and Exercise 5.8.9 of Kloke and McKean (2014)/Exercise 5.9.15 of Kloke and McKean (2024). It contains part of a study on the effects of cloud seeding of cyclones.
data(SCUD)data(SCUD)
Twenty-one observations on three variables.
trttreatment indicator (1) is Seeded and (2) is control
Mpredictor M, the geostrophic meridional circulation index
RImeasure of precipitation
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
plot(RI ~ M,data=SCUD)plot(RI ~ M,data=SCUD)
Counts of viewers for 9 seasons of Seinfeld
data("seinfeld")data("seinfeld")
A data frame with 180 observations on the following 4 variables.
episodeNumberOveralla numeric vector
seasona numeric vector
episodeNumberSeasona numeric vector
viewersa numeric vector
Wikipedia https://en.wikipedia.org/wiki/List_of_Seinfeld_episodes (date unknown).
data(seinfeld) #Comparison boxplots of views versus season boxplot(viewers~season,data=seinfeld,ylab='Number of Viewers (in millions)',xlab='Season') # Normal q-q plots for selected seasons. oldpar_mfrow <- par()$mfrow par(mfrow=c(2,2)) seasons2display <- c(4,5,6,9) for( s in seasons2display) { v <- seinfeld[seinfeld$season==s,'viewers'] qqnorm(v,main=paste("Season",s)) abline(a=median(v),b=mad(v)) } par(mfrow=oldpar_mfrow) # Normal q-q plots for selected seasons # using centered and scaled residuals. oldpar_mfrow <- par()$mfrow par(mfrow=c(2,2)) seasons2display <- c(4,5,6,9) for( s in seasons2display) { v0 <- seinfeld[seinfeld$season==s,'viewers'] v1 <- (v0 - median(v0))/mad(v0) qqnorm(v1,main=paste("Season",s)) abline(a=0,b=1) } par(mfrow=oldpar_mfrow)data(seinfeld) #Comparison boxplots of views versus season boxplot(viewers~season,data=seinfeld,ylab='Number of Viewers (in millions)',xlab='Season') # Normal q-q plots for selected seasons. oldpar_mfrow <- par()$mfrow par(mfrow=c(2,2)) seasons2display <- c(4,5,6,9) for( s in seasons2display) { v <- seinfeld[seinfeld$season==s,'viewers'] qqnorm(v,main=paste("Season",s)) abline(a=median(v),b=mad(v)) } par(mfrow=oldpar_mfrow) # Normal q-q plots for selected seasons # using centered and scaled residuals. oldpar_mfrow <- par()$mfrow par(mfrow=c(2,2)) seasons2display <- c(4,5,6,9) for( s in seasons2display) { v0 <- seinfeld[seinfeld$season==s,'viewers'] v1 <- (v0 - median(v0))/mad(v0) qqnorm(v1,main=paste("Season",s)) abline(a=0,b=1) } par(mfrow=oldpar_mfrow)
Doksum and Sievers (1976) describe an experiment involving the effect of ozone on weight gain of rats. The experimental group consisted of 22 rats which were placed in an ozone environment for seven days, while the control group contained 21 rats which were placed in an ozone-free environment for the same amount of time. The response was the weight gain in a rat over the time period.
data(sievers)data(sievers)
A data frame with 45 observations on the following 2 variables.
groupindicator variable for treatment
weight.gainresponse variable of weight gain
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Doksum, K. A. and Sievers, G. L. (1976), Plotting with confidence: Graphical comparisons of two populations, Biometrika, 63, 421-434.
data(sievers) boxplot(weight.gain~group,data=sievers)data(sievers) boxplot(weight.gain~group,data=sievers)
p-value for a one sample sign test based on the binomial distribution.
signtest_pvalue(x, alternative = "two.sided", theta0 = 0, ...)signtest_pvalue(x, alternative = "two.sided", theta0 = 0, ...)
x |
number vector. |
alternative |
type of alternative hypothesis |
theta0 |
null value of the parameter |
... |
optional arguments. currently ignored. |
Returns p-value using the binomial distribution.
a numeric scalar — the p-value — is returned
John Kloke, Joseph McKean
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall.
x <- round(rt(19,9) + 2,1) signtest_pvalue(x,alternative='greater') S <- sum(x > 0) M <- sum(x != 0) 1-pbinom(S-1,M,0.5) x <- round(rt(19,9) + 0,1) signtest_pvalue(x) S <- sum(x > 0) M <- sum(x != 0) 2*min(pbinom(S,M,0.5), 1-pbinom(S-1,M,0.5))x <- round(rt(19,9) + 2,1) signtest_pvalue(x,alternative='greater') S <- sum(x > 0) M <- sum(x != 0) 1-pbinom(S-1,M,0.5) x <- round(rt(19,9) + 0,1) signtest_pvalue(x) S <- sum(x > 0) M <- sum(x != 0) 2*min(pbinom(S,M,0.5), 1-pbinom(S-1,M,0.5))
A simulated classification example with two variables and two classes (labels).
data("sim_class2")data("sim_class2")
A data frame with 1000 observations on the following 4 variables.
trainan indicator for training and test sets
x1an explantory variable
x2an explantory variable
yresponse variable - a factor with levels 0 1
Random points in the x1,x2 plane were generated. Class labels based on location relative to two circles in the x1,x2 plane with some random variation in the labels simulated.
data(sim_class2) dim(sim_class2) train_set <- sim_class2[sim_class2$train==1,] dim(train_set) with(train_set,plot(x1,x2,main='Training Set',cex=0.625)) with(train_set,points(x1,x2,main='Training Set',pch=20,col=y,cex=0.625))data(sim_class2) dim(sim_class2) train_set <- sim_class2[sim_class2$train==1,] dim(train_set) with(train_set,plot(x1,x2,main='Training Set',cex=0.625)) with(train_set,points(x1,x2,main='Training Set',pch=20,col=y,cex=0.625))
An experiment in which the members of two groups of students each played the game Simon twice.
data("simon")data("simon")
A data frame with 31 observations on the following 3 variables.
game1score on first trial
game2score on second trial
classgroup variable
Demonstrates the concept of regression toward the mean. Simulated data to represent a realistic realization of the experiment. See Problem 4.9.20 of Kloke and McKean (2014)/Problem 4.7.17 of Kloke and McKean (2024).
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
data(simon) plot(game2~game1,data=simon) rfit(game2~game1,data=simon)data(simon) plot(game2~game1,data=simon) rfit(game2~game1,data=simon)
Simulated dataset
data("sincos")data("sincos")
A data frame with 197 observations on the following 2 variables.
xindependent variable
ydependent variable
The data were generated using
x <- seq(1,50,by=.25) ; y <- 5*sin(3*x) + 6*cos(x/4)+rnorm(length(x),0,10)
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall.
data(sincos) plot(y~x,sincos) ### code to create Figure 4.9 of Kloke & McKean 2014 ### my.sincos<-sincos my.sincos$y3<-my.sincos$y my.sincos$y3[137] <- 800 plot(y3~x,ylim=c(-50,50),data=my.sincos) fit4 <- loess(y3 ~ x,data=my.sincos) # lines(fit4$x,fit4$fitted,lty=2) with(fit4,lines(x,fitted,lty=2)) fit5 <- loess(y3 ~ x,family="symmetric",data=my.sincos) with(fit5,lines(x,fitted,lty=1)) legend('bottomleft',legend=c('Local Robust Fit','Local LS Fit'),lty=1:2) title("loess Fits of Sine-Cosine Data")data(sincos) plot(y~x,sincos) ### code to create Figure 4.9 of Kloke & McKean 2014 ### my.sincos<-sincos my.sincos$y3<-my.sincos$y my.sincos$y3[137] <- 800 plot(y3~x,ylim=c(-50,50),data=my.sincos) fit4 <- loess(y3 ~ x,data=my.sincos) # lines(fit4$x,fit4$fitted,lty=2) with(fit4,lines(x,fitted,lty=2)) fit5 <- loess(y3 ~ x,family="symmetric",data=my.sincos) with(fit5,lines(x,fitted,lty=1)) legend('bottomleft',legend=c('Local Robust Fit','Local LS Fit'),lty=1:2) title("loess Fits of Sine-Cosine Data")
A sample of 82 cars with variables speed and miles per gallon collected.
data("speed")data("speed")
A data frame with 82 observations on the following 2 variables.
mpgMiles per gallon
spa numeric vector
Higgins (2003) Introduction to modern nonparmetric statistics.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
data(speed) plot(sp~mpg,data=speed) rfit(sp~mpg+I(mpg^2),data=speed)data(speed) plot(sp~mpg,data=speed) rfit(sp~mpg+I(mpg^2),data=speed)
A data frame containg measurements of 48 turtles. The first three columns are the Length, Width, and Height measurements of the carapace of the turtle. The fourth column is a categorical variable sex with values of female and male. Data are drawn from Johnson and Wichern (2007).
data(turtle)data(turtle)
48 observations on four variables.
numeric vector.
numeric vector.
numeric vector.
character vector.
Johnson, R.A. and Wichern, D.W. (2007), Applied Multivariate Statistical Analysis, 6th ed., Upper Saddle River, NJ: Pearson.
with(turtle,boxplot(Length~sex)) with(turtle,boxplot(Length~sex,ylab='Length (units)'))with(turtle,boxplot(Length~sex)) with(turtle,boxplot(Length~sex,ylab='Length (units)'))
Performs the vanElteren extension of the Wilcoxon rank sum test for stratified experiments.
vanElteren.test(g, y, b)vanElteren.test(g, y, b)
g |
n x 1 vector: treatment/group indicator |
y |
n x 1 vector: responses |
b |
n x 1 vector: denotes strata |
statistic |
Value of the test statistic. |
p.value |
p-value based on a normal approximation. |
January weather data for Kalamazoo, MI for the years 1900 to 1995. It is discussed in Example 4.7.4, page 105-106, of Kloke and McKean (2014)/Example 4.6.4, p.177-178, of Kloke and McKean (2024).
data(weather)data(weather)
Ninety-six observations (1900-1995) for twelve weather variables.
avemaxavemax
aveminavemin
coldestmaxcoldestmax
hihesthihest
lowestlowest
maxdayprecmaxdayprec
maxdaysnowfallmaxdaysnowfall
meantmpmeantmp
totalprectotalprec
totalsnowtotalsnow
warmestwarmest
yearyear
http://weather-warehouse.com/WeatherHistory/
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
plot(avemax ~ year,data=weather)plot(avemax ~ year,data=weather)
Wilson (score) confidence interval for a population proportion.
wilson.ci(x, n, conf.level = 0.95)wilson.ci(x, n, conf.level = 0.95)
x |
number of events |
n |
number of samples |
conf.level |
confidence level |
Uses defintion in Agresti.
conf.int |
estimated confidence interval |
John Kloke, Joseph McKean
Agresti (2002), Categorical data analysis, New York: John Wiley & Sons, Inc.
n <- 100 x <- rbinom(1,n,0.33) wilson.ci(n,x)n <- 100 x <- rbinom(1,n,0.33) wilson.ci(n,x)