Title A Handbook of Statistical Analyses Using R
Author Brian S. Everitt and Torsten Hothorn
Maintainer Torsten Hothorn <Torsten.Hothorn@R-project.org>
Description Functions, data sets, analyses and examples from the book
‘A Handbook of Statistical Analyses Using R’ (Brian S. Everitt and TorstenHothorn, Chapman & Hall/CRC, 2006). The first chapterof the book, which is entitled ‘An Introduction to R’,is completely included in this pack-age, for all other chapters,a vignette containing all data analyses is available.
Suggests lattice, MASS, scatterplot3d (>= 0.3-23), ape (>= 1.6), coin
(>= 0.3-3), flexmix (>= 1.1-0), gee (>= 4.13-10), ipred (>=0.8-3), lme4 (>= 0.98-1), mclust (>= 3.0-0), party (>= 0.2-8),randomForest (>= 4.5-12), rmeta (>= 2.12), vcd (>= 0.9-3),survival, KernSmooth, rpart, mvtnorm, Matrix, boot, TH.data
agefat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
aspirin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BCG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
birthdeathrates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
bladdercancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BtheB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CYGOB1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
epilepsy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Forbes2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . foster
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
gardenflowers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GHQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . heptathlon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HSAURtable
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lanza . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . mastectomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . meteo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . orallesions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . phosphate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . pistonrings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . planets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . plasma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . polyps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . polyps3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . pottery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rearrests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . respiratory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . roomwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . schizophrenia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . schizophrenia2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
schooldays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . skulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . smoking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . suicides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . toothpaste . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . water
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
watervoles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . weightgain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . womensrole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Age and body fat percentage of 25 normal adults.
A data frame with 25 observations on the following 3 variables.
sex a factor with levels female and male.
The data come from a study investigating a new methods of measuring body composition (seeMazess et al, 1984), and give the body fat percentage (percent fat), age and sex for 25 normal adultsaged between 23 and 61 years. The questions of interest are how are age and percent fat related,and is there any evidence that the relationship is different for males and females.
R. B. Mazess, W. W. Peppler and M. Gibbons (1984), Total body composition by dual-photon(153Gd) absorptiometry. American Journal of Clinical Nutrition, 40, 834–839.
data("agefat", package = "HSAUR")plot(fat ~ age, data = agefat)
Efficacy of Aspirin in preventing death after a myocardial infarct.
A data frame with 7 observations on the following 4 variables.
tp total number subjects treated with placebo.
ta total number of subjects treated with Aspirin.
The data were collected for a meta-analysis of the effectiveness of Aspirin (versus placebo) inpreventing death after a myocardial infarction.
J. L. Fleiss (1993), The statistical basis of meta-analysis. Statistical Methods in Medical Research2, 121–145.
data("aspirin", package = "HSAUR")aspirin
A meta-analysis on the efficacy of BCG vaccination against tuberculosis (TB).
A data frame with 13 observations on the following 7 variables.
BCGTB the number of subjects suffering from TB after a BCG vaccination.
BCGVacc the number of subjects with BCG vaccination.
NoVaccTB the number of subjects suffering from TB without BCG vaccination.
NoVacc the total number of subjects without BCG vaccination.
Latitude geographic position of the place the study was undertaken.
Year the year the study was undertaken.
Bacille Calmette Guerin (BCG) is the most widely used vaccination in the world. Developed in the1930s and made of a live, weakened strain of Mycobacterium bovis, the BCG is the only vaccinationavailable against tuberculosis today. Colditz et al. (1994) report data from 13 clinical trials of BCGvaccine each investigating its efficacy in the treatment of tuberculosis. The number of subjectssuffering from TB with or without BCG vaccination are given here. In addition, the data containsthe values of two other variables for each study, namely, the geographic latitude of the place wherethe study was undertaken and the year of publication. These two variables will be used to investigateand perhaps explain any heterogeneity among the studies.
G. A. Colditz, T. F. Brewer, C. S. Berkey, M. E. Wilson, E. Burdick, H. V. Fineberg and F. Mosteller(1994), Efficacy of BCG vaccine in the prevention of tuberculosis. Meta-analysis of the publishedliterature. Journal of the American Medical Association, 271(2), 698–702.
data("BCG", package = "HSAUR")boxplot(BCG$BCGTB/BCG$BCGVacc, BCG$NoVaccTB/BCG$NoVacc,
names = c("BCG Vaccination", "No Vaccination"),ylab = "Percent BCG cases")
Birth and death rates for 69 countries.
A data frame with 69 observations on the following 2 variables.
J. A. Hartigan (1975), Clustering Algorithms. John Wiley & Sons, New York.
data("birthdeathrates", package = "HSAUR")plot(birthdeathrates)
Data arise from 31 male patients who have been treated for superficial bladder cancer, and give thenumber of recurrent tumours during a particular time after the removal of the primary tumour, alongwith the size of the original tumour.
A data frame with 31 observations on the following 3 variables.
tumorsize a factor with levels <=3cm and >3cm.
The aim is the estimate the effect of size of tumour on the number of recurrent tumours.
G. U. H. Seeber (1998), Poisson Regression. In: Encyclopedia of Biostatistics (P. Armitage and T. Colton, eds), John Wiley \& Sons, Chichester.
data("bladdercancer", package = "HSAUR")mosaicplot(xtabs(~ number + tumorsize, data = bladdercancer))
Data from a clinical trial of an interactive multimedia program called ‘Beat the Blues’.
A data frame with 100 observations of 100 patients on the following 8 variables.
drug did the patient take anti-depressant drugs (No or Yes).
length the length of the current episode of depression, a factor with levels <6m (less than six
months) and >6m (more than six months).
treatment treatment group, a factor with levels TAU (treatment as usual) and BtheB (Beat the Blues)
bdi.pre Beck Depression Inventory II before treatment.
bdi.2m Beck Depression Inventory II after two months.
bdi.4m Beck Depression Inventory II after four months.
bdi.6m Beck Depression Inventory II after six months.
bdi.8m Beck Depression Inventory II after eight months.
Longitudinal data from a clinical trial of an interactive, multimedia program known as "Beat theBlues" designed to deliver cognitive behavioural therapy to depressed patients via a computer ter-minal. Patients with depression recruited in primary care were randomised to either the Beating theBlues program, or to "Treatment as Usual (TAU)".
Note that the data are stored in the wide form, i.e., repeated measurments are represented by addi-tional columns in the data frame.
J. Proudfoot, D. Goldberg and A. Mann (2003). Computerised, interactive, multimedia CBT re-duced anxiety and depression in general practice: A RCT. Psychological Medicine, 33, 217–227.
data("BtheB", package = "HSAUR")layout(matrix(1:2, nrow = 1))ylim <- range(BtheB[,grep("bdi", names(BtheB))], na.rm = TRUE)boxplot(subset(BtheB, treatment == "TAU")[,grep("bdi", names(BtheB))],
main = "Treated as usual", ylab = "BDI",xlab = "Time (in months)", names = c(0, 2, 4, 6, 8), ylim = ylim)
boxplot(subset(BtheB, treatment == "BtheB")[,grep("bdi", names(BtheB))],
main = "Beat the Blues", ylab = "BDI", xlab = "Time (in months)",names = c(0, 2, 4, 6, 8), ylim = ylim)
Data from an experiment investigating the use of massive amounts of silver iodide (100 to 1000grams per cloud) in cloud seeding to increase rainfall.
A data frame with 24 observations on the following 7 variables.
seeding a factor indicating whether seeding action occured (no or yes).
time number of days after the first day of the experiment.
cloudcover the percentage cloud cover in the experimental area, measured using radar.
prewetness the total rainfall in the target area one hour before seeding (in cubic metres times 1e+8).
echomotion a factor showing whether the radar echo was moving or stationary.
rainfall the amount of rain in cubic metres times 1e+8.
Weather modification, or cloud seeding, is the treatment of individual clouds or storm systems withvarious inorganic and organic materials in the hope of achieving an increase in rainfall. Introductionof such material into a cloud that contains supercooled water, that is, liquid water colder than zeroCelsius, has the aim of inducing freezing, with the consequent ice particles growing at the expenseof liquid droplets and becoming heavy enough to fall as rain from clouds that otherwise wouldproduce none.
The data available in cloud were collected in the summer of 1975 from an experiment to investigatethe use of massive amounts of silver iodide 100 to 1000 grams per cloud) in cloud seeding toincrease rainfall. In the experiment, which was conducted in an area of Florida, 24 days werejudged suitable for seeding on the basis that a measured suitability criterion (SNE).
W. L. Woodley, J. Simpson, R. Biondini and J. Berkeley (1977), Rainfall results 1970-75: Floridaarea cumulus experiment. Science 195, 735–742.
R. D. Cook and S. Weisberg (1980), Characterizations of an empirical influence function for detect-ing influential cases in regression. Technometrics 22, 495–508.
data("clouds", package = "HSAUR")layout(matrix(1:2, nrow = 2))boxplot(rainfall ~ seeding, data = clouds, ylab = "Rainfall")boxplot(rainfall ~ echomotion, data = clouds, ylab = "Rainfall")
Energy output and surface termperature for Star Cluster CYG OB1.
A data frame with 47 observations on the following 2 variables.
logst log survface termperature of the star.
logli log light intensity of the star.
The Hertzsprung-Russell (H-R) diagram forms the basis of the theory of stellar evolution. Thediagram is essentially a plot of the energy output of stars plotted against their surface temperature. Data from the H-R diagram of Star Cluster CYG OB1, calibrated according to VanismaGreve1972are given here.
F. Vanisma and J. P. De Greve (1972), Close binary systems before and after mass transfer. Astro-physics and Space Science, 87, 377–401.
D. J. Hand, F. Daly, A. D. Lunn, K. J. McConway and E. Ostrowski (1994). A Handbook of SmallDatasets, Chapman and Hall/CRC, London.
data("CYGOB1", package = "HSAUR")plot(logst ~ logli, data = CYGOB1)
A randomised clinical trial investigating the effect of an anti-epileptic drug.
A data frame with 236 observations on the following 6 variables.
treatment the treatment group, a factor with levels placebo and Progabide.
base the number of seizures before the trial.
seizure.rate the number of seizures (response variable).
period treatment period, an ordered factor with levels 1 to 4.
subject the patient ID, a factor with levels 1 to 59.
In this clinical trial, 59 patients suffering from epilepsy were randomized to groups receiving eitherthe anti-epileptic drug Progabide or a placebo in addition to standard chemotherapy. The numbersof seizures suffered in each of four, two-week periods were recorded for each patient along with abaseline seizure count for the 8 weeks prior to being randomized to treatment and age. The mainquestion of interest is whether taking progabide reduced the number of epileptic seizures comparedwith placebo.
P. F. Thall and S. C. Vail (1990), Some covariance models for longitudinal count data with overdis-persion. Biometrics, 46, 657–671.
data("epilepsy", package = "HSAUR")library(lattice)dotplot(I(seizure.rate / base) ~ period | subject, data = epilepsy,
subset = treatment == "Progabide")
dotplot(I(seizure.rate / base) ~ period | subject, data = epilepsy,
subset = treatment == "Progabide")
The Forbes 2000 Ranking of the World’s Biggest Companies (Year
The Forbes 2000 list is a ranking of the world’s biggest companies, measured by sales, profits,assets and market value.
A data frame with 2000 observations on the following 8 variables.
country a factor giving the country the company is situated in.
category a factor describing the products the company produces.
sales the amount of sales of the company in billion USD.
profits the profit of the company in billion USD.
assets the assets of the company in billion USD.
marketvalue the market value of the company in billion USD.
data("Forbes2000", package = "HSAUR")summary(Forbes2000)### number of countrieslength(levels(Forbes2000$country))### number of industrieslength(levels(Forbes2000$category))
The data are from a foster feeding experiment with rat mothers and litters of four different geno-types. The measurement is the litter weight after a trial feeding period.
A data frame with 61 observations on the following 3 variables.
litgen genotype of the litter, a factor with levels A, B, I, and J.
motgen genotype of the mother, a factor with levels A, B, I, and J.
weight the weight of the litter after a feeding period.
Here the interest lies in uncovering the effect of genotype of mother and litter on litter weight.
D. J. Hand, F. Daly, A. D. Lunn, K. J. McConway and E. Ostrowski (1994). A Handbook of SmallDatasets, Chapman and Hall/CRC, London.
data("foster", package = "HSAUR")plot.design(foster)
The dissimilarity matrix of 18 species of garden flowers.
The dissimilarity was computed based on certain characteristics of the flowers.
L. Kaufman and P. J. Rousseeuw (1990), Finding groups in data: an introduction to cluster analysis,John Wiley \& Sons, New York.
data("gardenflowers", package = "HSAUR")gardenflowers
Data from an psychiatric screening questionnaire
A data frame with 22 observations on the following 4 variables.
GHQ the General Health Questionnaire score.
cases the number of diseased subjects.
non.cases the number of healthy subjects.
The data arise from a study of a psychiatric screening questionnaire called the GHQ (General HealthQuestionnaire, see Goldberg, 1972). Here the main question of interest is to see how caseness isrelated to gender and GHQ score.
D. Goldberg (1972). The Detection of Psychiatric Illness by Questionnaire, Oxford UniversityPress, Oxford, UK.
data("GHQ", package = "HSAUR")male <- subset(GHQ, sex == "male")female <- subset(GHQ, sex == "female")layout(matrix(1:2, ncol = 2))barplot(t(as.matrix(male[,c("cases", "non.cases")])), main = "Male", xlab = "GHC score")barplot(t(as.matrix(male[,c("cases", "non.cases")])), main = "Female", xlab = "GHC score")
Results of the olympic heptathlon competition, Seoul, 1988.
A data frame with 25 observations on the following 8 variables.
The first combined Olympic event for women was the pentathlon, first held in Germany in 1928. Initially this consisted of the shot putt, long jump, 100m, high jump and javelin events held overtwo days. The pentathlon was first introduced into the Olympic Games in 1964, when it consistedof the 80m hurdles, shot, high jump, long jump and 200m. In 1977 the 200m was replaced by the800m and from 1981 the IAAF brought in the seven-event heptathlon in place of the pentathlon,with day one containing the events-100m hurdles, shot, high jump, 200m and day two, the longjump, javelin and 800m. A scoring system is used to assign points to the results from each eventand the winner is the woman who accumulates the most points over the two days. The event madeits first Olympic appearance in 1984.
In the 1988 Olympics held in Seoul, the heptathlon was won by one of the stars of women’s athleticsin the USA, Jackie Joyner-Kersee. The results for all 25 competitors are given here.
D. J. Hand, F. Daly, A. D. Lunn, K. J. McConway and E. Ostrowski (1994). A Handbook of SmallDatasets, Chapman and Hall/CRC, London.
data("heptathlon", package = "HSAUR")plot(heptathlon)
Generate longtable LaTeX environments.
HSAURtable(object, .)## S3 method for class ’table’HSAURtable(object, xname = deparse(substitute(object)), pkg = NULL,
## S3 method for class ’data.frame’HSAURtable(object, xname = deparse(substitute(object)), pkg = NULL,
## S3 method for class ’tabtab’toLatex(object, caption = NULL, label = NULL,
## S3 method for class ’dftab’toLatex(object, pcol = 1, caption = NULL,
label = NULL, rownames = FALSE, topcaption = TRUE, index = TRUE,.)
the package object comes from, optionally.
the number of rows actually printed for a data.frame.
the (optional) caption of the table without label.
the (optional) label to be defined for this table.
logical, should the rownames be printed in the first row without column name?
logical, should the captions be placed on top (default) of the table?
logical, should an index entry be generated?
additional arguments, currently ignored.
Based on the data in object, an object from which a Latex table (in a longtable environment)may be constructed (via is generated.
An object of class tabtab or dftab for which methods are available.
toLatex produces objects of class Latex, a character vector, essentially.
data("rearrests", package = "HSAUR")toLatex(HSAURtable(rearrests),
caption = "Rearrests of juvenile felons.",label = "rearrests_tab")
Data from four randomised clinical trials on the prevention of gastointestinal damages by Miso-prostol reported by Lanza et al. (1987, 1988a,b, 1989).
A data frame with 198 observations on the following 3 variables.
study a factor with levels I, II, III, and IV describing the study number.
treatment a factor with levels Misoprostol Placebo
classification an ordered factor with levels 1 < 2 < 3 < 4 < 5 describing an ordered response
The response variable is defined by the number of haemorrhages or erosions.
F. L. Lanza (1987), A double-blind study of prophylactic effect of misoprostol on lesions of gastricand duodenal mucosa induced by oral administration of tolmetin in healthy subjects. British Journalof Clinical Practice, May suppl, 91–101.
F. L. Lanza, R. L. Aspinall, E. A. Swabb, R. E. Davis, M. F. Rack, A. Rubin (1988a), Double-blind,placebo-controlled endoscopic comparison of the mucosal protective effects of misoprostol versuscimetidine on tolmetin-induced mucosal injury to the stomach and duodenum. Gastroenterology,95(2), 289–294.
F. L. Lanza, K. Peace, L. Gustitus, M. F. Rack, B. Dickson (1988b), A blinded endoscopic compar-ative study of misoprostol versus sucralfate and placebo in the prevention of aspirin-induced gastricand duodenal ulceration. American Journal of Gastroenterology, 83(2), 143–146.
F. L. Lanza, D. Fakouhi, A. Rubin, R. E. Davis, M. F. Rack, C. Nissen, S. Geis (1989), A double-blind placebo-controlled comparison of the efficacy and safety of 50, 100, and 200 micrograms ofmisoprostol QID in the prevention of ibuprofen-induced gastric and duodenal mucosal lesions andsymptoms. American Journal of Gastroenterology, 84(6), 633–636.
data("Lanza", package = "HSAUR")layout(matrix(1:4, nrow = 2))pl <- tapply(1:nrow(Lanza), Lanza$study, function(indx)
mosaicplot(table(Lanza[indx,"treatment"],
Survival Times after Mastectomy of Breast Cancer Patients
Survival times in months after mastectomy of women with breast cancer. The cancers are classifiedas having metastized or not based on a histochemical marker.
A data frame with 42 observations on the following 3 variables.
event a logical indicating if the event was observed (TRUE) or if the survival time was censored
metastized a factor at levels yes and no.
B. S. Everitt and S. Rabe-Hesketh (2001), Analysing Medical Data using S-PLUS, Springer, NewYork, USA.
data("mastectomy", package = "HSAUR")table(mastectomy$metastized)
Several meteorological measurements for a period between 1920 and 1931.
A data frame with 11 observations on the following 6 variables.
rainNovDec rainfall in November and December (mm).
radiation radiation in July (millilitres of alcohol).
yield average harvest yield (quintals per hectare).
Carry out a principal components analysis of both the covariance matrix and the correlation ma-trix of the data and compare the results. Which set of components leads to the most meaningfulinterpretation?
B. S. Everitt and G. Dunn (2001), Applied Multivariate Data Analysis, 2nd edition, Arnold, London.
data("meteo", package = "HSAUR")meteo
The distribution of the oral lesion site found in house-to-house surveys in three geographic regionsof rural India.
Cyrus R. Mehta and Nitin R. Patel (2003), StatXact-6: Statistical Software for Exact NonparametricInference, Cytel Software Cooperation, Cambridge, USA.
data("orallesions", package = "HSAUR")mosaicplot(orallesions)
Plasma inorganic phosphate levels from 33 subjects.
A data frame with 33 observations on the following 9 variables.
group a factor with levels control and obese.
t0.5 phosphate level after 1/2 an hour.
t1 phosphate level after one an hour.
t1.5 phosphate level after 1 1/2 hours.
t3 phosphate level after three hours.
t4 phosphate level after four hours.
t5 phosphate level after five hours.
C. S. Davis (2002), Statistical Methods for the Analysis of Repeated Measurements, Springer, NewYork.
data("phosphate", package = "HSAUR")plot(t0 ~ group, data = phosphate)
Number of failures of piston rings in three legs of four steam-driven compressors.
The data are given in form of a The table gives the number of piston-ring failures in eachof three legs of four steam-driven compressors located in the same building. The compressors haveidentical design and are oriented in the same way. The question of interest is whether the twoclassification variables (compressor and leg) are independent.
S. J. Haberman (1973), The analysis of residuals in cross-classificed tables. Biometrics 29, 205–220.
data("pistonrings", package = "HSAUR")mosaicplot(pistonrings)
Data on planets outside the Solar System.
A data frame with 101 observations from 101 exoplanets on the following 3 variables.
eccen the radial eccentricity of the planet.
From the properties of the exoplanets found up to now it appears that the theory of planetary de-velopment constructed for the planets of the Solar System may need to be reformulated. The exo-planets are not at all like the nine local planets that we know so well. A first step in the process ofunderstanding the exoplanets might be to try to classify them with respect to their known properties.
M. Mayor and P. Frei (2003). New Worlds in the Cosmos: The Discovery of Exoplanets. CambridgeUniversity Press, Cambridge, UK.
data("planets", package = "HSAUR")require("scatterplot3d")scatterplot3d(log(planets$mass), log(planets$period), log(planets$eccen),
The erythrocyte sedimentation rate and measurements of two plasma proteins (fibrinogen and glob-ulin).
A data frame with 32 observations on the following 3 variables.
fibrinogen the fibrinogen level in the blood.
globulin the globulin level in the blood.
ESR the erythrocyte sedimentation rate, either less or greater 20 mm / hour.
The erythrocyte sedimentation rate (ESR) is the rate at which red blood cells (erythrocytes) settleout of suspension in blood plasma, when measured under standard conditions. If the ESR increaseswhen the level of certain proteins in the blood plasma rise in association with conditions such asrheumatic diseases, chronic infections and malignant diseases, its determination might be useful inscreening blood samples taken form people suspected to being suffering from one of the conditionsmentioned. The absolute value of the ESR is not of great importance rather it is whether it is lessthan 20mm/hr since lower values indicate a healthy individual.
The question of interest is whether there is any association between the probability of an ESRreading greater than 20mm/hr and the levels of the two plasma proteins. If there is not then thedetermination of ESR would not be useful for diagnostic purposes.
D. Collett and A. A. Jemain (1985), Residuals, outliers and influential observations in regressionanalysis. Sains Malaysiana, 4, 493–511.
data("plasma", package = "HSAUR")layout(matrix(1:2, ncol = 2))boxplot(fibrinogen ~ ESR, data = plasma, varwidth = TRUE)boxplot(globulin ~ ESR, data = plasma, varwidth = TRUE)
Data from a placebo-controlled trial of a non-steroidal anti-inflammatory drug in the treatment offamilial andenomatous polyposis (FAP).
A data frame with 20 observations on the following 3 variables.
number number of colonic polyps at 12 months.
treat treatment arms of the trail, a factor with levels placebo and drug.
Giardiello et al. (1993) and Piantadosi (1997) describe the results of a placebo-controlled trial ofa non-steroidal anti-inflammatory drug in the treatment of familial andenomatous polyposis (FAP). The trial was halted after a planned interim analysis had suggested compelling evidence in favourof the treatment. Here we are interested in assessing whether the number of colonic polyps at 12months is related to treatment and age of patient.
F. M. Giardiello, S. R. Hamilton, A. J. Krush, S. Piantadosi, L. M. Hylind, P. Celano, S. V. Booker,C. R. Robinson and G. J. A. Offerhaus (1993), Treatment of colonic and rectal adenomas withsulindac in familial adenomatous polyposis. New England Journal of Medicine, 328(18), 1313–1316.
S. Piantadosi (1997), Clinical Trials: A Methodologic Perspective. John Wiley \& Sons, New York.
data("polyps", package = "HSAUR")plot(number ~ age, data = polyps, pch = as.numeric(polyps$treat))legend(40, 40, legend = levels(polyps$treat), pch = 1:2, bty = "n")
Data from a placebo-controlled trial of a non-steroidal anti-inflammatory drug in the treatment offamilial andenomatous polyposis (FAP).
A data frame with 22 observations on the following 5 variables.
sex a factor with levels female and male.
treatment a factor with levels placebo and active.
baseline the baseline number of polyps.
number3m the number of polyps after three month.
The data arise from the same study as the data. Here, the number of polyps after threemonths are given.
F. M. Giardiello, S. R. Hamilton, A. J. Krush, S. Piantadosi, L. M. Hylind, P. Celano, S. V. Booker,C. R. Robinson and G. J. A. Offerhaus (1993), Treatment of colonic and rectal adenomas withsulindac in familial adenomatous polyposis. New England Journal of Medicine, 328(18), 1313–1316.
S. Piantadosi (1997), Clinical Trials: A Methodologic Perspective. John Wiley \& Sons, New York.
data("polyps3", package = "HSAUR")plot(number3m ~ age, data = polyps3, pch = as.numeric(polyps3$treatment))legend("topright", legend = levels(polyps3$treatment), pch = 1:2, bty = "n")
Chemical composition of Romano-British pottery.
A data frame with 45 observations on the following 9 chemicals.
The data gives the chemical composition of specimens of Romano-British pottery, determined byatomic absorption spectrophotometry, for nine oxides.
A. Tubb and N. J. Parker and G. Nickless (1980), The analysis of Romano-British pottery by atomicabsorption spectrophotometry. Archaeometry, 22, 153–171.
data("pottery", package = "HSAUR")plot(pottery)
Rearrests of juventile felons by type of court in which they were tried.
The data (taken from Agresti, 1996) arise from a sample of juveniles convicted of felony in Floridain 1987. Matched pairs were formed using criteria such as age and the number of previous offences. For each pair, one subject was handled in the juvenile court and the other was transferred to theadult court. Whether or not the juvenile was rearrested by the end of 1988 was then noted. Herethe question of interest is whether the true proportions rearrested were identical for the adult andjuvenile court assignments?
A. Agresti (1996). An Introduction to Categorical Data Analysis. Wiley, New York.
data("rearrests", package = "HSAUR")rearrests
The respiratory status of patients recruited for a randomised clinical multicenter trial.
A data frame with 555 observations on the following 7 variables.
centre the study center, a factor with levels 1 and 2.
treatment the treatment arm, a factor with levels placebo and treatment.
sex a factor with levels female and male.
status the respiratory status (response variable), a factor with levels poor and good.
month the month, each patient was examined at months 0, 1, 2, 3 and 4.
subject the patient ID, a factor with levels 1 to 111.
In each of two centres, eligible patients were randomly assigned to active treatment or placebo. During the treatment, the respiratory status (categorised poor or good) was determined at each offour, monthly visits. The trial recruited 111 participants (54 in the active group, 57 in the placebogroup) and there were no missing data for either the responses or the covariates. The question ofinterest is to assess whether the treatment is effective and to estimate its effect.
Note that the data are in long form, i.e, repeated measurments are stored as additional rows in thedata frame.
C. S. Davis (1991), Semi-parametric and non-parametric methods for the analysis of repeated mea-surements with applications to clinical trials. Statistics in Medicine, 10, 1959–1980.
data("respiratory", package = "HSAUR")mosaicplot(xtabs( ~ treatment + month + status, data = respiratory))
Lecture room width estimated by students in two different units.
A data frame with 113 observations on the following 2 variables.
unit a factor with levels feet and metres.
width the estimated width of the lecture room.
Shortly after metric units of length were officially introduced in Australia, each of a group of 44students was asked to guess, to the nearest metre, the width of the lecture hall in which they weresitting. Another group of 69 students in the same room was asked to guess the width in feet, to thenearest foot. The data were collected by Professor T. Lewis and are taken from Hand et al (1994). The main question is whether estimation in feet and in metres gives different results.
D. J. Hand, F. Daly, A. D. Lunn, K. J. McConway and E. Ostrowski (1994). A Handbook of SmallDatasets, Chapman and Hall/CRC, London.
data("roomwidth", package = "HSAUR")convert <- ifelse(roomwidth$unit == "feet", 1, 3.28)boxplot(I(width * convert) ~ unit, data = roomwidth)
Data on sex differences in the age of onset of schizophrenia.
A data frame with 251 observations on the following 2 variables.
gender a factor with levels female and male
A sex difference in the age of onset of schizophrenia was noted by Kraepelin (1919). Subsequentlyepidemiological studies of the disorder have consistently shown an earlier onset in men than inwomen. One model that has been suggested to explain this observed difference is know as thesubtype model which postulates two type of schizophrenia, one characterised by early onset, typicalsymptoms and poor premorbid competence, and the other by late onset, atypical symptoms, andgood premorbid competence. The early onset type is assumed to be largely a disorder of men andthe late onset largely a disorder of women.
E. Kraepelin (1919), Dementia Praecox and Paraphrenia. Livingstone, Edinburgh.
data("schizophrenia", package = "HSAUR")boxplot(age ~ gender, data = schizophrenia)
Though disorder and early onset of schizophrenia.
A data frame with 220 observations on the following 4 variables.
subject the patient ID, a factor with levels 1 to 44.
onset the time of onset of the disease, a factor with levels < 20 yrs and > 20 yrs.
disorder whether thought disorder was absent or present, the response variable.
The data were collected in a follow-up study of women patients with schizophrenia. The binaryresponse recorded at 0, 2, 6, 8 and 10 months after hospitalisation was thought disorder (absent orpresent). The single covariate is the factor indicating whether a patient had suffered early or lateonset of her condition (age of onset less than 20 years or age of onset 20 years or above). Thequestion of interest is whether the course of the illness differs between patients with early and lateonset?
Davis (2002), Statistical Methods for the Analysis of Repeated Measurements, Springer, New York.
data("schizophrenia2", package = "HSAUR")mosaicplot(xtabs( ~ onset + month + disorder, data = schizophrenia2))
Data from a sociological study, the number of days absent from school is the response variable.
A data frame with 154 observations on the following 5 variables.
race race of the child, a factor with levels aboriginal and non-aboriginal.
sex the sex of the child, a factor with levels female and male.
school the school type, a factor with levels F0 (primary), F1 (first), F2 (second) and F3 (third
learner how good is the child in learning things, a factor with levels average and slow.
absent number of days absent from school.
The data arise from a sociological study of Australian Aboriginal and white children reported byQuine (1975).
In this study, children of both sexes from four age groups (final grade in primary schools and first,second and third form in secondary school) and from two cultural groups were used. The childrenin age group were classified as slow or average learners. The response variable was the number ofdays absent from school during the school year. (Children who had suffered a serious illness duringthe years were excluded.)
S. Quine (1975), Achievement Orientation of Aboriginal and White Adolescents. Doctoral Disser-tation, Australian National University, Canberra.
data("schooldays", package = "HSAUR")plot.design(schooldays)
Measurements made on Egyptian skulls from five epochs.
A data frame with 150 observations on the following 5 variables.
epoch the epoch the skull as assigned to, a factor with levels c4000BC c3300BC, c1850BC, c200BC,
and cAD150, where the years are only given approximately, of course.
bh basibregmatic heights of the skull.
bl basialiveolar length of the skull.
The question is whether the measurements change over time. Non-constant measurements of theskulls over time would indicate interbreeding with immigrant populations.
D. J. Hand, F. Daly, A. D. Lunn, K. J. McConway and E. Ostrowski (1994). A Handbook of SmallDatasets, Chapman and Hall/CRC, London.
data("skulls", package = "HSAUR")means <- tapply(1:nrow(skulls), skulls$epoch, function(i)
apply(skulls[i,colnames(skulls)[-1]], 2, mean))
means <- matrix(unlist(means), nrow = length(means), byrow = TRUE)colnames(means) <- colnames(skulls)[-1]rownames(means) <- levels(skulls$epoch)
Data from a meta-analysis on nicotine gum and smoking cessation
A data frame with 26 observations (studies) on the following 4 variables.
qt the number of treated subjetcs who stopped smoking.
tt the totla number of treated subjects.
qc the number of subjetcs who stopped smoking without being treated.
tc the total number of subject not being treated.
Cigarette smoking is the leading cause of preventable death in the United States and kills moreAmericans than AIDS, alcohol, illegal drug use, car accidents, fires, murders and suicides com-bined. It has been estimated that 430,000 Americans die from smoking every year. Fighting to-bacco use is, consequently, one of the major public health goals of our time and there are now manyprograms available designed to help smokers quit. One of the major aids used in these programs isnicotine chewing gum, which acts as a substitute oral activity and provides a source of nicotine thatreduces the withdrawal symptoms experienced when smoking is stopped. But separate randomizedclinical trials of nicotine gum have been largely inconclusive, leading Silagy (2003) to considercombining the results studies found from an extensive literature search. The results of these trialsin terms of numbers of people in the treatment arm and the control arm who stopped smoking for atleast 6 months after treatment are given here.
C. Silagy (2003), Nicotine replacement therapy for smoking cessation (Cochrane Review). TheCochrane Library, 4, John Wiley \& Sons, Chichester.
data("smoking", package = "HSAUR")boxplot(smoking$qt/smoking$tt,
smoking$qc/smoking$tc,names = c("Treated", "Control"), ylab = "Percent Quitters")
Students were administered two parallel forms of a test after a random assignment to three differenttreatments.
A data frame with 35 observations on the following 3 variables.
treatment a factor with levels AA, C, and NC.
The data arise from a large study of risk taking (Timm, 2002). Students were randomly assigned tothree different treatments labelled AA, C and NC. Students were administered two parallel formsof a test called low and high. The aim is to carry out a test of the equality of the bivariate means ofeach treatment population.
N. H. Timm (2002), Applied Multivariate Analysis. Springer, New York.
data("students", package = "HSAUR")layout(matrix(1:2, ncol = 2))boxplot(low ~ treatment, data = students, ylab = "low")boxplot(high ~ treatment, data = students, ylab = "high")
Data from a study carried out to investigate the causes of jeering or baiting behaviour by a crowdwhen a person is threatening to commit suicide by jumping from a high building.
L. Mann (1981), The baiting crowd in episodes of threatened suicide. Journal of Personality andSocial Psychology, 41, 703–709.
data("suicides", package = "HSAUR")mosaicplot(suicides)
Meta-analysis of studies comparing two different toothpastes.
A data frame with 9 observations on the following 7 variables.
nA number of subjects using toothpaste A.
meanA mean DMFS index of subjects using toothpaste A.
sdA standard deviation of DMFS index of subjects using toothpaste A.
nB number of subjects using toothpaste B.
meanB mean DMFS index of subjects using toothpaste B.
sdB standard deviation of DMFS index of subjects using toothpaste B.
The data are the results of nine randomised trials comparing two different toothpastes for the pre-vention of caries development. The outcomes in each trial was the change, from baseline, in thedecayed, missing (due to caries) and filled surface dental index (DMFS).
B. S. Everitt and A. Pickles (2000), Statistical Aspects of the Design and Analysis of Clinical Trials,Imperial College Press, London.
data("toothpaste", package = "HSAUR")toothpaste
Voting results for 15 congressmen from New Jersey.
Romesburg (1984) gives a set of data that shows the number of times 15 congressmen from NewJersey voted differently in the House of Representatives on 19 environmental bills. Abstentions arenot recorded.
H. C. Romesburg (1984), Cluster Analysis for Researchers. Lifetime Learning Publications, Bel-mont, Canada.
data("voting", package = "HSAUR")require("MASS")voting_mds <- isoMDS(voting)plot(voting_mds$points[,1], voting_mds$points[,2],
type = "n", xlab = "Coordinate 1", ylab = "Coordinate 2",xlim = range(voting_mds$points[,1])*1.2)
text(voting_mds$points[,1], voting_mds$points[,2],
voting_sh <- Shepard(voting[lower.tri(voting)], voting_mds$points)
The mortality and drinking water hardness for 61 cities in England and Wales.
A data frame with 61 observations on the following 4 variables.
location a factor with levels North and South indicating whether the town is as north as Derby.
mortality averaged annual mortality per 100.000 male inhabitants.
hardness calcium concentration (in parts per million).
The data were collected in an investigation of environmental causes of disease. They show theannual mortality per 100,000 for males, averaged over the years 1958-1964, and the calcium con-centration (in parts per million) in the drinking water for 61 large towns in England and Wales. The higher the calcium concentration, the harder the water. Towns at least as far north as Derbyare identified in the table. Here there are several questions that might be of interest including, aremortality and water hardness related, and do either or both variables differ between northern andsouthern towns?
D. J. Hand, F. Daly, A. D. Lunn, K. J. McConway and E. Ostrowski (1994). A Handbook of SmallDatasets, Chapman and Hall/CRC, London.
data("water", package = "HSAUR")plot(mortality ~ hardness, data = water,
Percentage incidence of the 13 characteristics of water voles in 14 areas.
A dissimilarity matrix for the following 14 variables, i.e, areas: Surrey, Shropshire, Yorkshire,Perthshire, Aberdeen, Elean Gamhna, Alps, Yugoslavia, Germany, Norway, Pyrenees I, Pyrenees II,North Spain, and South Spain.
Corbet et al. (1970) report a study of water voles (genus Arvicola) in which the aim was to compareBritish populations of these animals with those in Europe, to investigate whether more than onespecies might be present in Britain. The original data consisted of observations of the presence orabsence of 13 characteristics in about 300 water vole skulls arising from six British populationsand eight populations from the rest of Europe. The data are the percentage incidence of the 13characteristics in each of the 14 samples of water vole skulls.
G. B. Corbet, J. Cummins, S. R. Hedges, W. J. Krzanowski (1970), The taxonomic structure ofBritish water voles, genus Arvicola. Journal of Zoology, 61, 301–316.
data("watervoles", package = "HSAUR")watervoles
Measurements of root mean square bending moment by two different mooring methods.
A data frame with 18 observations on the following 2 variables.
method1 Root mean square bending moment in Newton metres, mooring method 1
method2 Root mean square bending moment in Newton metres, mooring method 2
In a design study for a device to generate electricity from wave power at sea, experiments werecarried out on scale models in a wave tank to establish how the choice of mooring method for thesystem affected the bending stress produced in part of the device. The wave tank could simulate awide range of sea states and the model system was subjected to the same sample of sea states witheach of two mooring methods, one of which was considerably cheaper than the other. The questionof interest is whether bending stress differs for the two mooring methods.
D. J. Hand, F. Daly, A. D. Lunn, K. J. McConway and E. Ostrowski (1994). A Handbook of SmallDatasets, Chapman and Hall/CRC, London.
data("waves", package = "HSAUR")plot(method1 ~ method2, data = waves)
The data arise from an experiment to study the gain in weight of rats fed on four different diets,distinguished by amount of protein (low and high) and by source of protein (beef and cereal).
A data frame with 40 observations on the following 3 variables.
source source of protein given, a factor with levels Beef and Cereal.
type amount of protein given, a factor with levels High and Low.
Ten rats are randomized to each of the four treatments. The question of interest is how diet affectsweight gain.
D. J. Hand, F. Daly, A. D. Lunn, K. J. McConway and E. Ostrowski (1994). A Handbook of SmallDatasets, Chapman and Hall/CRC, London.
data("weightgain", package = "HSAUR")interaction.plot(weightgain$type, weightgain$source,
Data from a survey from 1974 / 1975 asking both female and male responders about their opinionon the statement: Women should take care of running their homes and leave running the country upto men.
A data frame with 42 observations on the following 4 variables.
sex a factor with levels Male and Female.
agree number of subjects in agreement with the statement.
disagree number of subjects in disagreement with the statement.
The data are from Haberman (1973) and also given in Collett (2003). The questions here are whetherthe response of men and women differ.
S. J. Haberman (1973), The analysis of residuals in cross-classificed tables. Biometrics, 29, 205–220.
D. Collett (2003), Modelling Binary Data. Chapman and Hall / CRC, London. 2nd edition.
data("womensrole", package = "HSAUR")summary(subset(womensrole, sex == "Female"))summary(subset(womensrole, sex == "Male"))
schizophrenia, schizophrenia2, schooldays, skulls, smoking, students, suicides,
table, toLatex, toLatex.dftab (HSAURtable), toLatex.tabtab (HSAURtable), toothpaste,
water, watervoles, waves, weightgain, womensrole,
The knowledge of applying medicinal herbs in disease therapy has a long-standing traditionBuilding on that tradition, while implementing the latest scientific research findings, Belupo hasdeveloped the Favora product line, intended as the support therapy in treatment of acute andchronic diseases, and providing dietetic supplements of high nutritional value. AKTIVIN® – H1 capsule contains
Categoría: Crónica Autor (es): Juan Veledíaz Tipo de Medio: Proceso La mayoría eran jóvenes con no más de un año en el Ejército. Dos errores de su comandante –desviar el camino y dar mal las coordenadas, con un equipo de radio sin batería de repuesto– fueron los factores que provocaron la trágica muerte de un grupo de militares en Laguna Salada, Baja California, en el verano de 1996