Title: | Efficient Effect Size Computation |
---|---|
Description: | A collection of functions to compute the standardized effect sizes for experiments (Cohen d, Hedges g, Cliff delta, Vargha-Delaney A). The computation algorithms have been optimized to allow efficient computation even with very large data sets. |
Authors: | Marco Torchiano [aut, cre] |
Maintainer: | Marco Torchiano <[email protected]> |
License: | GPL-2 |
Version: | 0.8.0 |
Built: | 2025-02-24 05:03:12 UTC |
Source: | https://github.com/mtorchiano/effsize |
This packages contains functions to compute effect sizes both based on means difference (Cohen's d and Hedges g), dominance matrices (Cliff's Delta) and stochastic superiority (Vargha-Delaney A).
The computation (especially for Cliff's Delta) is carried on with higly efficient algorithms.
The main functions are:
VD.A
.
Change history
Fixed a bug in cohen.d
when PAIRED=TRUE
, now the PAIRED
parameter has no effect, it is left just for compatibility. In a future code clean-up it may be removed
Implemented a new algorithm with improved memory and time complexity. In particular new time complexity is T = O(n1*log(n2)) vs. the previous T = O(n1*n2), and new memory complexity M = O( n1 + n2 ) vs. the previous M = O( n1 * n2). In practice now the computation becomes feasible in a "reasonable" time.
Code clean-up and optimization using vectorized binary partioning.
Added Vargha and Delaney A and fixed minor bugs with Cohen.d.
Modified the Vargha and Delaney A computation to minimize accuracy errors.
Fixed bug in cliff.delta
.
Fixed bug in cohen.d.formula
.
Fixed minor issue detected by check.
Changed the effsize field magnitude to a factor value.
Implemented paired computation and CI computation with non-central t-distributions for cohen.d.
Added ability to specify factor vector and data vector for 'cliff.delta' function (thanks to Joses W. Ho).
na.rm
in cohen.d
removes all incomplete pairs when paired.
fixed bug in cohen.d
when na.rm=TRUE
, minor changes in the documentation (thanks to P.Thomas)
Fixed a bug related to pairedcohen.d
with NAs.
Minor documentation changes
Refactored tests using testthat
package.
Fixed a bug in cliff.delta
returning inconsistent
results when the dominance matrix is returned. Fixed issue concerning CI.
Fixed bug in cohen.d
when using noncentral parameter for negative effect sizes.
Fixed minor bugs in cliff.delta
and cohen.d
Fixed bugs in cohen.d
, order of factors is now observed and CI are computed correctly
Fixed bugs in cohen.d
, possible endless loop, cleaned code
Fixed bugs in cliff.delta
when values are factors
Fixed bugs in cohen.d
for paired data
Fixed bugs in cohen.d
for CI of paired data
Fixed bugs in cohen.d
for non-pooled SD, plus a few pull requests on documentation
Fixed bug in cohen.d
wrong correct type check
Fixed tests to be compatible with upcoming R 4.0, that sets stringsAsFactors to FALSE by default
Added non-central CI estimation for single sample cohen.d
, fixed a bug related to order of data and added a subject
parameter for paired cohen.d
Marco Torchiano http://softeng.polito.it/torchiano/
Computes the Cliff's Delta effect size for ordinal variables with the related confidence interval using efficient algorithms.
cliff.delta(d, ... ) ## S3 method for class 'formula' cliff.delta(formula, data=list() ,conf.level=.95, use.unbiased=TRUE, use.normal=FALSE, return.dm=FALSE, ...) ## Default S3 method: cliff.delta(d, f, conf.level=.95, use.unbiased=TRUE, use.normal=FALSE, return.dm=FALSE, ...)
cliff.delta(d, ... ) ## S3 method for class 'formula' cliff.delta(formula, data=list() ,conf.level=.95, use.unbiased=TRUE, use.normal=FALSE, return.dm=FALSE, ...) ## Default S3 method: cliff.delta(d, f, conf.level=.95, use.unbiased=TRUE, use.normal=FALSE, return.dm=FALSE, ...)
d |
a numeric vector giving either the data values (if |
f |
either a factor with two levels or a numeric vector of values (see Detials) |
conf.level |
confidence level of the confidence interval |
use.unbiased |
a logical indicating whether to compute the delta's variance using the "unbiased" estimate formula or the "consistent" estimate |
use.normal |
logical indicating whether to use the normal or Student-t distribution for the confidence interval estimation |
return.dm |
logical indicating whether to return the dominance matrix. Warning: the explicit computation of the dominance uses a sub-optimal algorithm both in terms of memory and time |
formula |
a formula of the form |
data |
an optional matrix or data frame containing the variables in the formula |
... |
further arguments to be passed to or from methods. |
Uses the original formula reported in (Cliff 1996).
If the dominance matrix is required i.e. return.dm=TRUE
) the full matrix is computed thus using the naive algorithm.
Otherwise, if treatment
and control
are factor
s then the optimized linear complexity algorithm is used, otherwise the RLE algorithm (with complexity n log n) is used.
A list of class effsize
containing the following components:
estimate |
the Cliff's delta estimate |
conf.int |
the confidence interval of the delta |
var |
the estimated variance of the delta |
conf.level |
the confidence level used to compute the confidence interval |
dm |
the dominance matrix used for computation, only if |
magnitude |
a qualitative assessment of the magnitude of effect size |
method |
the method used for computing the effect size, always |
variance.estimation |
the method used to compute the delta variance estimation, either |
CI.distribution |
the distribution used to compute the confidence interval, either |
The magnitude is assessed using the thresholds provided in (Romano 2006), i.e. |d|<0.147 "negligible"
, |d|<0.33 "small"
, |d|<0.474 "medium"
, otherwise "large"
Marco Torchiano http://softeng.polito.it/torchiano/
Norman Cliff (1996). Ordinal methods for behavioral data analysis. Routledge.
J. Romano, J. D. Kromrey, J. Coraggio, J. Skowronek, Appropriate statistics for ordinal level data: Should we really be using t-test and cohen's d for evaluating group differences on the NSSE and other surveys?, in: Annual meeting of the Florida Association of Institutional Research, 2006.
K.Y. Hogarty and J.D.Kromrey (1999). Using SAS to Calculate Tests of Cliff's Delta. Proceedings of the Twenty-Foursth Annual SAS User Group International Conference, Miami Beach, Florida, p 238. Available at: http://www2.sas.com/proceedings/sugi24/Posters/p238-24.pdf
## Example data from Hogarty and Kromrey (1999) treatment <- c(10,10,20,20,20,30,30,30,40,50) control <- c(10,20,30,40,40,50) res = cliff.delta(treatment,control,return.dm=TRUE) print(res) print(res$dm)
## Example data from Hogarty and Kromrey (1999) treatment <- c(10,10,20,20,20,30,30,30,40,50) control <- c(10,20,30,40,40,50) res = cliff.delta(treatment,control,return.dm=TRUE) print(res) print(res$dm)
Computes the Cohen's d and Hedges'g effect size statistics.
cohen.d(d, ...) ## S3 method for class 'formula' cohen.d(formula,data=list(),...) ## Default S3 method: cohen.d(d,f,pooled=TRUE,paired=FALSE, na.rm=FALSE, mu=0, hedges.correction=FALSE, conf.level=0.95,noncentral=FALSE, within=TRUE, subject=NA, ...)
cohen.d(d, ...) ## S3 method for class 'formula' cohen.d(formula,data=list(),...) ## Default S3 method: cohen.d(d,f,pooled=TRUE,paired=FALSE, na.rm=FALSE, mu=0, hedges.correction=FALSE, conf.level=0.95,noncentral=FALSE, within=TRUE, subject=NA, ...)
d |
a numeric vector giving either the data values (if |
f |
either a factor with two levels or a numeric vector of values, if |
formula |
a formula of the form If using a paired computation ( A single sample effect size can be specified with the form |
data |
an optional matrix or data frame containing the variables in the formula |
pooled |
a logical indicating whether compute pooled standard deviation or the whole sample standard deviation. If |
hedges.correction |
logical indicating whether apply the Hedges correction |
conf.level |
confidence level of the confidence interval |
noncentral |
logical indicating whether to use non-central t distributions for computing the confidence interval. |
paired |
a logical indicating whether to consider the values as paired, a warning is issued if
|
within |
indicates whether to compute the effect size using the within subject variation, taking into consideration the correlation between pre and post samples. |
subject |
an array indicating the id of the subject for a paired computation, when the formula interface is used it can be indicated in the formula by adding |
mu |
numeric indicating the reference mean for single sample effect size. |
na.rm |
logical indicating whether |
... |
further arguments to be passed to or from methods. |
When f
in the default version is a factor or a character, it must have two values and it identifies the two groups to be compared. Otherwise (e.g. f
is numeric), it is considered as a sample to be compare to d
.
In the formula version, f
is expected to be a factor, if that is not the case it is coherced to a factor and a warning is issued.
The function computes the value of Cohen's d statistics (Cohen 1988).
If required (hedges.correction==TRUE
) the Hedges g statistics is computed instead (Hedges and Holkin, 1985).
When paired
is set, the effect size is computed using the approach suggested in (Gibbons et al. 1993). In particular a correction to take into consideration the correlation of the two samples is applied (see Borenstein et al., 2009)
It is possible to perform a single sample effect size estimation either using a formula ~x
or passing f=NA
.
The computation of the CI requires the use of non-central Student-t distributions that are used when noncentral==TRUE
; otherwise a central distribution is used.
Also a quantification of the effect size magnitude is performed using the thresholds defined in Cohen (1992).
The magnitude is assessed using the thresholds provided in (Cohen 1992), i.e. |d|<0.2 "negligible"
, |d|<0.5 "small"
, |d|<0.8 "medium"
, otherwise "large"
The variance of the d
is computed using the conversion formula reported at page 238 of Cooper et al. (2009):
A list of class effsize
containing the following components:
estimate |
the statistic estimate |
conf.int |
the confidence interval of the statistic |
sd |
the within-groups standard deviation |
conf.level |
the confidence level used to compute the confidence interval |
magnitude |
a qualitative assessment of the magnitude of effect size |
method |
the method used for computing the effect size, either |
Marco Torchiano http://softeng.polito.it/torchiano/
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York:Academic Press.
Hedges, L. V. & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159.
Cooper, Hedges, and Valentin (2009). The Handbook of Research Synthesis and Meta-Analysis
David C. Howell (2011). Confidence Intervals on Effect Size. Available at: https://www.uvm.edu/~statdhtx/methods8/Supplements/MISC/Confidence%20Intervals%20on%20Effect%20Size.pdf
Cumming, G.; Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 633-649.
Gibbons, R. D., Hedeker, D. R., & Davis, J. M. (1993). Estimation of effect size from a series of experiments involving paired comparisons. Journal of Educational Statistics, 18, 271-279.
M. Borenstein, L. V. Hedges, J. P. T. Higgins and H. R. Rothstein (2009) Introduction to Meta-Analysis. John Wiley & Son.
cliff.delta
, VD.A
, print.effsize
treatment = rnorm(100,mean=10) control = rnorm(100,mean=12) d = (c(treatment,control)) f = rep(c("Treatment","Control"),each=100) ## compute Cohen's d ## treatment and control cohen.d(treatment,control) ## data and factor cohen.d(d,f) ## formula interface cohen.d(d ~ f) ## compute Hedges' g cohen.d(d,f,hedges.correction=TRUE)
treatment = rnorm(100,mean=10) control = rnorm(100,mean=12) d = (c(treatment,control)) f = rep(c("Treatment","Control"),each=100) ## compute Cohen's d ## treatment and control cohen.d(treatment,control) ## data and factor cohen.d(d,f) ## formula interface cohen.d(d ~ f) ## compute Hedges' g cohen.d(d,f,hedges.correction=TRUE)
Prints the results of an effect size computation
## S3 method for class 'effsize' print(x, ...)
## S3 method for class 'effsize' print(x, ...)
x |
the effect size result |
... |
further parameters are currently ignored |
Shows the estimate value and, when available, the confidence interval.
This is still work in progress..
Marco Torchiano http://softeng.polito.it/torchiano/
See the main function cliff.delta
.
Computes the Vargha and Delaney A effect size measure.
VD.A(d, ...) ## S3 method for class 'formula' VD.A(formula,data=list(), ...) ## Default S3 method: VD.A(d,f, ...)
VD.A(d, ...) ## S3 method for class 'formula' VD.A(formula,data=list(), ...) ## Default S3 method: VD.A(d,f, ...)
d |
a numeric vector giving either the data values (if |
f |
either a factor with two levels or a numeric vector of values |
formula |
a formula of the form |
data |
an optional matrix or data frame containing the variables in the formula |
... |
further arguments to be passed to or from methods. |
The function computes the Vargha and Delaney A effect size measure (Vargha and Delaney, 2000).
A list of class effsize
containing the following components:
estimate |
the A statistics estimate |
magnitude |
a qualitative assessment of the magnitude of effect size |
method |
the method used, i.e. |
Marco Torchiano http://softeng.polito.it/torchiano/
A. Vargha and H. D. Delaney. "A critique and improvement of the CL common language effect size statistics of McGraw and Wong." Journal of Educational and Behavioral Statistics, 25(2):101-132, 2000
cliff.delta
, cohen.d
, print.effsize
treatment = rnorm(100,mean=10) control = rnorm(100,mean=12) d = (c(treatment,control)) f = rep(c("Treatment","Control"),each=100) ## compute Vargha and Delaney A ## treatment and control VD.A(treatment,control) ## data and factor VD.A(d,f) ## formula interface VD.A(d ~ f)
treatment = rnorm(100,mean=10) control = rnorm(100,mean=12) d = (c(treatment,control)) f = rep(c("Treatment","Control"),each=100) ## compute Vargha and Delaney A ## treatment and control VD.A(treatment,control) ## data and factor VD.A(d,f) ## formula interface VD.A(d ~ f)