Title: | Useful Wrappers Around Commonly Used Functions |
---|---|
Description: | The main functionalities of 'wrappedtools' are: adding backticks to variable names; rounding to desired precision with special case for p-values; selecting columns based on pattern and storing their position, name, and backticked name; computing and formatting of descriptive statistics (e.g. mean±SD), comparing groups and creating publication-ready tables with descriptive statistics and p-values; creating specialized plots for correlation matrices. Functions were mainly written for my own daily work or teaching, but may be of use to others as well. |
Authors: | Andreas Busjahn [cre, aut] |
Maintainer: | Andreas Busjahn <[email protected]> |
License: | GPL-3 |
Version: | 0.9.6 |
Built: | 2025-03-09 06:13:25 UTC |
Source: | https://github.com/abusjahn/wrappedtools |
bt
adds leading and trailing backticks to make illegal variable names
usable. Optionally removes them.
bt(x, remove = FALSE)
bt(x, remove = FALSE)
x |
Names to add backtick to. |
remove |
Option to remove existing backticks, default=FALSE. |
Character vector with backticks added.
bt('name 1')
bt('name 1')
cat_desc_stats
computes absolute and relative frequencies for
categorical data with a number of formatting options.
cat_desc_stats( source = NULL, separator = " ", return_level = TRUE, ndigit = 0, groupvar = NULL, singleline = FALSE, percent = TRUE, prettynum = FALSE, .german = FALSE, quelle = NULL )
cat_desc_stats( source = NULL, separator = " ", return_level = TRUE, ndigit = 0, groupvar = NULL, singleline = FALSE, percent = TRUE, prettynum = FALSE, .german = FALSE, quelle = NULL )
source |
Data for computation. Previously "quelle". |
separator |
delimiter between results per level, preset as ' '. |
return_level |
Should levels be reported? |
ndigit |
Digits for rounding of relative frequencies. |
groupvar |
Optional grouping factor. |
singleline |
Put all group levels in a single line? |
percent |
Logical, add percent-symbol after relative frequencies? |
prettynum |
logical, apply prettyNum to results? |
.german |
logical, should "." and "," be used as bigmark and decimal? Sets prettynum to TRUE. |
quelle |
deprecated, retained for compatibility, use 'source' instead. |
Structure depends on parameter return_level: if FALSE than a tibble with descriptives, otherwise a list with two tibbles with levels of factor and descriptives. If parameter singleline is FALSE (default), results for each factor level is reported in a separate line, otherwise they are pasted. Number of columns for result tibbles is one or number of levels of the additional grouping variable.
cat_desc_stats(mtcars$gear) cat_desc_stats(mtcars$gear, return_level = FALSE) cat_desc_stats(mtcars$gear, groupvar = mtcars$am) cat_desc_stats(mtcars$gear, groupvar = mtcars$am, singleline = TRUE)
cat_desc_stats(mtcars$gear) cat_desc_stats(mtcars$gear, return_level = FALSE) cat_desc_stats(mtcars$gear, groupvar = mtcars$am) cat_desc_stats(mtcars$gear, groupvar = mtcars$am, singleline = TRUE)
cat_desc_table
computes absolute and relative frequencies for
categorical data with a number of formatting options.
cat_desc_table( data, desc_vars, round_desc = 2, singleline = FALSE, spacer = " ", indentor = "" )
cat_desc_table( data, desc_vars, round_desc = 2, singleline = FALSE, spacer = " ", indentor = "" )
data |
name of data set (tibble/data.frame) to analyze. |
desc_vars |
vector of column names for dependent variables. |
round_desc |
number of significant digits for rounding of descriptive stats. |
singleline |
Put all group levels in a single line? |
spacer |
Text element to indent levels and fill empty cells, defaults to " ". |
indentor |
Optional text to indent factor levels |
A tibble with variable names and descriptive statistics.
cat_desc_table( data = mtcars, desc_vars = c("gear", "cyl", "carb")) cat_desc_table( data = mtcars, desc_vars = c("gear", "cyl", "carb"), singleline = TRUE)
cat_desc_table( data = mtcars, desc_vars = c("gear", "cyl", "carb")) cat_desc_table( data = mtcars, desc_vars = c("gear", "cyl", "carb"), singleline = TRUE)
cn
lists column names, by default for variable rawdata.
cn(data = rawdata)
cn(data = rawdata)
data |
Data structure to read column names from. |
Character vector with column names.
cn(mtcars)
cn(mtcars)
ColSeeker
looks up colnames (by default for tibble rawdata)
based on type and parts of names, using regular expressions.
Be warned that special characters as e.g. [
(
need to be escaped or replaced by .
Exclusion rules may be specified as well.
ColSeeker( data = rawdata, namepattern = ".", varclass = NULL, exclude = NULL, excludeclass = NULL, casesensitive = TRUE, returnclass = FALSE )
ColSeeker( data = rawdata, namepattern = ".", varclass = NULL, exclude = NULL, excludeclass = NULL, casesensitive = TRUE, returnclass = FALSE )
data |
tibble or data.frame, where columns are to be found; by default rawdata |
namepattern |
Vector of pattern to look for. |
varclass |
Vector, only columns of defined class(es) are returned |
exclude |
Vector of pattern to exclude from found names. |
excludeclass |
Vector, exclude columns of specified class(es) |
casesensitive |
Logical if case is respected in matching (default FALSE: a<>A) |
returnclass |
Logical if classes should be included in output |
A list with index, names, backticked names, and count; optionally the classes as well
ColSeeker(data = mtcars, namepattern = c("^c", "g")) ColSeeker(data = mtcars, namepattern = c("^c", "g"), exclude = "r") assign("rawdata", mtcars) ColSeeker(namepattern = c("^c", "g"), varclass="numeric") num_int_data <- data.frame(num1=rnorm(10), num2=runif(10), int1=1:10, int2=11:20) ColSeeker(num_int_data, varclass="numeric") # integers are not found ColSeeker(num_int_data, varclass=c("numeric","integer"))
ColSeeker(data = mtcars, namepattern = c("^c", "g")) ColSeeker(data = mtcars, namepattern = c("^c", "g"), exclude = "r") assign("rawdata", mtcars) ColSeeker(namepattern = c("^c", "g"), varclass="numeric") num_int_data <- data.frame(num1=rnorm(10), num2=runif(10), int1=1:10, int2=11:20) ColSeeker(num_int_data, varclass="numeric") # integers are not found ColSeeker(num_int_data, varclass=c("numeric","integer"))
Some names were changed in August 2022, to reflect the update of the function to handle ordinal data using non-parametric equivalents.
compare_n_numvars( .data = rawdata, dep_vars, indep_var, gaussian, round_desc = 2, range = FALSE, rangesep = " ", pretext = FALSE, mark = FALSE, round_p = 3, add_n = FALSE )
compare_n_numvars( .data = rawdata, dep_vars, indep_var, gaussian, round_desc = 2, range = FALSE, rangesep = " ", pretext = FALSE, mark = FALSE, round_p = 3, add_n = FALSE )
.data |
name of dataset (tibble/data.frame) to analyze, defaults to rawdata. |
dep_vars |
vector of column names. |
indep_var |
name of grouping variable. |
gaussian |
Logical specifying normal or ordinal indep_var (and chooses comparison tests accordingly) |
round_desc |
number of significant digits for rounding of descriptive stats. |
range |
include min/max? |
rangesep |
text between statistics and range or other elements. |
pretext , mark
|
for function formatP. |
round_p |
level for rounding p-value. |
add_n |
add n to descriptive statistics? |
A list with elements "results": tibble with descriptive statistics, p-value from ANOVA/Kruskal-Wallis test, p-values for pairwise comparisons, significance indicators, and descriptives pasted with significance. "raw": nested list with output from all underlying analyses.
# Usually,only the result table is relevant: compare_n_numvars( .data = mtcars, dep_vars = c("wt", "mpg", "hp"), indep_var = "cyl", gaussian = TRUE )$results # For a report, result columns may be filtered as needed: compare_n_numvars( .data = mtcars, dep_vars = c("wt", "mpg", "hp"), indep_var = "cyl", gaussian = FALSE )$results |> dplyr::select(Variable, `cyl 4 fn`:`cyl 8 fn`, multivar_p)
# Usually,only the result table is relevant: compare_n_numvars( .data = mtcars, dep_vars = c("wt", "mpg", "hp"), indep_var = "cyl", gaussian = TRUE )$results # For a report, result columns may be filtered as needed: compare_n_numvars( .data = mtcars, dep_vars = c("wt", "mpg", "hp"), indep_var = "cyl", gaussian = FALSE )$results |> dplyr::select(Variable, `cyl 4 fn`:`cyl 8 fn`, multivar_p)
Comparison for columns of factors for more than 2 groups with post-hoc
compare_n_qualvars( data, dep_vars, indep_var, round_p = 3, round_desc = 2, pretext = FALSE, mark = FALSE, singleline = FALSE, spacer = " ", linebreak = "\n", prettynum = FALSE )
compare_n_qualvars( data, dep_vars, indep_var, round_p = 3, round_desc = 2, pretext = FALSE, mark = FALSE, singleline = FALSE, spacer = " ", linebreak = "\n", prettynum = FALSE )
data |
name of data set (tibble/data.frame) to analyze. |
dep_vars |
vector of column names. |
indep_var |
name of grouping variable. |
round_p |
level for rounding p-value. |
round_desc |
number of significant digits for rounding of descriptive stats |
pretext |
for function formatP |
mark |
for function formatP |
singleline |
Put all group levels in a single line? |
spacer |
Text element to indent levels, defaults to " ". |
linebreak |
place holder for newline. |
prettynum |
Apply prettyNum to results? |
A tibble with variable names, descriptive statistics, and p-value of fisher.test and pairwise_fisher_test, number of rows is number of dep_vars.
# Separate lines for each factor level: compare_n_qualvars( data = mtcars, dep_vars = c("am", "cyl", "carb"), indep_var = "gear", spacer = " " ) # All levels in one row but with linebreaks: compare_n_qualvars( data = mtcars, dep_vars = c("am", "cyl", "carb"), indep_var = "gear", singleline = TRUE ) # All levels in one row, separateted by ";": compare_n_qualvars( data = mtcars, dep_vars = c("am", "cyl", "carb"), indep_var = "gear", singleline = TRUE, linebreak = "; " )
# Separate lines for each factor level: compare_n_qualvars( data = mtcars, dep_vars = c("am", "cyl", "carb"), indep_var = "gear", spacer = " " ) # All levels in one row but with linebreaks: compare_n_qualvars( data = mtcars, dep_vars = c("am", "cyl", "carb"), indep_var = "gear", singleline = TRUE ) # All levels in one row, separateted by ";": compare_n_qualvars( data = mtcars, dep_vars = c("am", "cyl", "carb"), indep_var = "gear", singleline = TRUE, linebreak = "; " )
compare2numvars
computes either t_var_test or wilcox.test,
depending on parameter gaussian. Descriptive statistics, depending on distribution,
are reported as well.
compare2numvars( data, dep_vars, indep_var, gaussian, round_p = 3, round_desc = 2, range = FALSE, rangesep = " ", pretext = FALSE, mark = FALSE, n = FALSE, add_n = FALSE )
compare2numvars( data, dep_vars, indep_var, gaussian, round_p = 3, round_desc = 2, range = FALSE, rangesep = " ", pretext = FALSE, mark = FALSE, n = FALSE, add_n = FALSE )
data |
name of dataset (tibble/data.frame) to analyze. |
dep_vars |
vector of column names for independent variables. |
indep_var |
name of grouping variable, has to translate to 2 groups. If more levels are encountered, an error is produced. |
gaussian |
logical specifying normal or ordinal values. |
round_p |
level for rounding p-value. |
round_desc |
number of significant digits for rounding of descriptive stats. |
range |
include min/max? |
rangesep |
text between statistics and range or other elements. |
pretext |
for function formatP. |
mark |
for function formatP. |
n |
create columns for n per group? |
add_n |
add n to descriptive statistics? |
A tibble with variable names, descriptive statistics, and p-value, number of rows is number of dep_vars.
# Assuming Normal distribution: compare2numvars( data = mtcars, dep_vars = c("wt", "mpg", "qsec"), indep_var = "am", gaussian = TRUE ) # Ordinal scale: compare2numvars( data = mtcars, dep_vars = c("wt", "mpg", "qsec"), indep_var = "am", gaussian = FALSE ) # If dependent variable has more than 2 levels, consider fct_lump: mtcars |> dplyr::mutate(gear=factor(gear) |> forcats::fct_lump_n(n=1)) |> compare2numvars(dep_vars="wt",indep_var="gear",gaussian=TRUE)
# Assuming Normal distribution: compare2numvars( data = mtcars, dep_vars = c("wt", "mpg", "qsec"), indep_var = "am", gaussian = TRUE ) # Ordinal scale: compare2numvars( data = mtcars, dep_vars = c("wt", "mpg", "qsec"), indep_var = "am", gaussian = FALSE ) # If dependent variable has more than 2 levels, consider fct_lump: mtcars |> dplyr::mutate(gear=factor(gear) |> forcats::fct_lump_n(n=1)) |> compare2numvars(dep_vars="wt",indep_var="gear",gaussian=TRUE)
compare2qualvars
computes fisher.test with simulated p-value and
descriptive statistics for a group of categorical dependent variables.
compare2qualvars( data, dep_vars, indep_var, round_p = 3, round_desc = 2, pretext = FALSE, mark = FALSE, singleline = FALSE, spacer = " ", linebreak = "\n", p_subgroups = FALSE )
compare2qualvars( data, dep_vars, indep_var, round_p = 3, round_desc = 2, pretext = FALSE, mark = FALSE, singleline = FALSE, spacer = " ", linebreak = "\n", p_subgroups = FALSE )
data |
name of data set (tibble/data.frame) to analyze. |
dep_vars |
vector of column names for dependent variables. |
indep_var |
name of grouping variable, has to translate to 2 groups. |
round_p |
level for rounding p-value. |
round_desc |
number of significant digits for rounding of descriptive stats. |
pretext |
for function formatP. |
mark |
for function formatP. |
singleline |
Put all group levels in a single line? |
spacer |
Text element to indent levels and fill empty cells, defaults to " ". |
linebreak |
place holder for newline. |
p_subgroups |
test subgroups by recoding other levels into other, default is not to do this. |
A tibble with variable names, descriptive statistics, and p-value, number of rows is number of dep_vars.
compare2qualvars( data = mtcars, dep_vars = c("gear", "cyl", "carb"), indep_var = "am", spacer = " " ) compare2qualvars( data = mtcars, dep_vars = c("gear", "cyl", "carb"), indep_var = "am", spacer = " ", singleline = TRUE ) compare2qualvars( data = mtcars, dep_vars = c("gear", "cyl", "carb"), indep_var = "am", spacer = " ", p_subgroups = TRUE )
compare2qualvars( data = mtcars, dep_vars = c("gear", "cyl", "carb"), indep_var = "am", spacer = " " ) compare2qualvars( data = mtcars, dep_vars = c("gear", "cyl", "carb"), indep_var = "am", spacer = " ", singleline = TRUE ) compare2qualvars( data = mtcars, dep_vars = c("gear", "cyl", "carb"), indep_var = "am", spacer = " ", p_subgroups = TRUE )
cortestR
computes correlations and their significance level
based on cor.test. Coefficients and p-values may be combined or
reported separately.
cortestR( cordata, method = "pearson", digits = 3, digits_p = 3, sign_symbol = TRUE, split = FALSE, space = "" )
cortestR( cordata, method = "pearson", digits = 3, digits_p = 3, sign_symbol = TRUE, split = FALSE, space = "" )
cordata |
data frame or matrix with rawdata. |
method |
as in cor.test. |
digits |
rounding level for estimate. |
digits_p |
rounding level for p value. |
sign_symbol |
If true, use significance indicator instead of p-value. |
split |
logical, report correlation and p combined (default) or split in list. |
space |
character to fill empty upper triangle. |
Depending on parameters split and sign_symbol, either a single data frame with coefficient and p-values or significance symbols or a list with two data frames.
# with defaults cortestR(mtcars[, c("wt", "mpg", "qsec")], split = FALSE, sign_symbol = TRUE) # separate coefficients and p-values cortestR(mtcars[, c("wt", "mpg", "qsec")], split = TRUE, sign_symbol = FALSE)
# with defaults cortestR(mtcars[, c("wt", "mpg", "qsec")], split = FALSE, sign_symbol = TRUE) # separate coefficients and p-values cortestR(mtcars[, c("wt", "mpg", "qsec")], split = TRUE, sign_symbol = FALSE)
detect_outliers
computes IQR and finds outliers. It gives the same results as geom_boxplot
and thus differs slightly from boxplot.stats
.
detect_outliers(x, coef = 1.5)
detect_outliers(x, coef = 1.5)
x |
numeric vector. |
coef |
coefficient for boxplot.stats, defaults to 1.5. |
A list with elements positions and outliers as numeric vectors.
detect_outliers(rnorm(100))
detect_outliers(rnorm(100))
eGFR
computes eGFR according to different rules (see references).
eGFR(data, age_var = "age", sex_var = "sex", crea_var = NULL, cys_var = NULL)
eGFR(data, age_var = "age", sex_var = "sex", crea_var = NULL, cys_var = NULL)
data |
name of data set (tibble/data.frame) to analyze. |
age_var |
name of column with patient age in years, default=age. |
sex_var |
name of column with sex, assumed as female and male. |
crea_var |
name of column with creatinine in mg/dl. If not available, leave as NULL. |
cys_var |
name of column with cystatin C in mg/l. If not available, leave as NULL. |
A list with 3 elements:
eGFR_crea
eGFR_cystatin
eGFR_creatinine_cystatin
https://www.kidney.org/content/ckd-epi-creatinine-cystatin-equation-2021
https://www.kidney.org/content/ckd-epi-creatinine-equation-2021
https://www.kidney.org/content/ckd-epi-cystatin-c-equation-2012
A dataset containing physiological data, biomarkers, and categorical data.
faketrial
faketrial
A tibble with 300 rows and 24 variables:
Sex of animal, factor with levels 'female', 'male'
Factor with levels 'young','middle','old'
Factor with levels 'sham', 'OP'
Heart rate
Systolic and diastolic blood pressure
Pseudo-medications, factors with levels 'y','n'
Biomarkers with log-normal distribution
factor yes/no, systolic plood pressure >= 120?
Function ColSeeker extends this by adding class-checks.
FindVars
looks up colnames (by default for data-frame rawdata)
based on parts of names, using regular expressions. Be warned that
special characters as e.g. [
(
need to be escaped or replaced by .
Exclusion rules may be specified as well.
New function ColSeeker()
extends this by adding class-checks.
FindVars( varnames, allnames = colnames(rawdata), exact = FALSE, exclude = NA, casesensitive = TRUE, fixed = FALSE, return_symbols = FALSE )
FindVars( varnames, allnames = colnames(rawdata), exact = FALSE, exclude = NA, casesensitive = TRUE, fixed = FALSE, return_symbols = FALSE )
varnames |
Vector of pattern to look for. |
allnames |
Vector of values to detect pattern in; by default: colnames(rawdata). |
exact |
Partial matching or exact only (adding ^ and $)? |
exclude |
Vector of pattern to exclude from found names. |
casesensitive |
Logical if case is respected in matching (default FALSE: a<>A) |
fixed |
Logical, match as is, argument is passed to |
return_symbols |
Should names be reported as symbols additionally? (Default FALSE) |
A list with index, names, backticked names, and symbols
FindVars(varnames = c("^c", "g"), allnames = colnames(mtcars)) FindVars(varnames = c("^c", "g"), allnames = colnames(mtcars), exclude = "r")
FindVars(varnames = c("^c", "g"), allnames = colnames(mtcars)) FindVars(varnames = c("^c", "g"), allnames = colnames(mtcars), exclude = "r")
flex2rmd
takes a flextable and returns a markdown table if not in an interactive session
flex2rmd(ft)
flex2rmd(ft)
ft |
a flextable |
either a markdown table or the flextable
formatP
simplifies p-values by rounding to the maximum of p or a
predefined level. Optionally < or = can be added, as well as
symbols according to significance level.
formatP( pIn, ndigits = 3, textout = TRUE, pretext = FALSE, mark = FALSE, german_num = FALSE, add.surprisal = FALSE, sprecision = 1 )
formatP( pIn, ndigits = 3, textout = TRUE, pretext = FALSE, mark = FALSE, german_num = FALSE, add.surprisal = FALSE, sprecision = 1 )
pIn |
A numeric vector or matrix with p-values. |
ndigits |
Number of digits (default=3). |
textout |
Cast output to character (default=TRUE)? |
pretext |
Should = or < be added before p (default=FALSE)? |
mark |
Should significance level be added after p (default=FALSE)? |
german_num |
change dot (default) to comma? |
add.surprisal |
Add surprisal aka Shannon information to p-value (default=FALSE)? |
sprecision |
Rounding level for surprisal (default=1). |
vector or matrix (depending on type of pIn) with type character (default) or numeric, depending on parameter textout
formatP(0.012345) formatP(0.012345, add.surprisal = TRUE) formatP(0.012345, ndigits = 4) formatP(0.000122345, ndigits = 3, pretext = TRUE)
formatP(0.012345) formatP(0.012345, add.surprisal = TRUE) formatP(0.012345, ndigits = 4) formatP(0.000122345, ndigits = 3, pretext = TRUE)
ggcormat
makes the same correlation matrix as cortestR
and graphically represents it in a plot
ggcormat( cor_mat, p_mat = NULL, method = "Correlation", title = "", maxpoint = 2.1, textsize = 5, axistextsize = 2, titlesize = 3, breaklabels = NULL, lower_only = TRUE, .low = "blue3", .high = "red2", .legendtitle = NULL )
ggcormat( cor_mat, p_mat = NULL, method = "Correlation", title = "", maxpoint = 2.1, textsize = 5, axistextsize = 2, titlesize = 3, breaklabels = NULL, lower_only = TRUE, .low = "blue3", .high = "red2", .legendtitle = NULL )
cor_mat |
correlation matrix as produced by cor. |
p_mat |
Optional matrix of p-values; if provided, this is used to define size of dots rather than absolute correlation. |
method |
text specifying type of correlation. |
title |
plot title. |
maxpoint |
maximum for scale_size_manual, may need adjustment depending on plotsize. |
textsize |
for theme text. |
axistextsize |
relative text size for axes. |
titlesize |
as you already guessed, relative text size for title. |
breaklabels |
currently not used, intended for str_wrap. |
lower_only |
should only lower triangle be plotted? |
.low |
Color for heatmap. |
.high |
Color for heatmap. |
.legendtitle |
Optional name for color legend. |
A ggplot object, allowing further styling.
coeff_pvalues <- cortestR(mtcars[, c("wt", "mpg", "qsec", "hp")], split = TRUE, sign_symbol = FALSE ) # focus on coefficients: ggcormat(cor_mat = coeff_pvalues$corout, maxpoint = 5) # size taken from p-value: ggcormat( cor_mat = coeff_pvalues$corout, p_mat = coeff_pvalues$pout, maxpoint = 5)
coeff_pvalues <- cortestR(mtcars[, c("wt", "mpg", "qsec", "hp")], split = TRUE, sign_symbol = FALSE ) # focus on coefficients: ggcormat(cor_mat = coeff_pvalues$corout, maxpoint = 5) # size taken from p-value: ggcormat( cor_mat = coeff_pvalues$corout, p_mat = coeff_pvalues$pout, maxpoint = 5)
glm_CI
computes and formats CIs for glm.
glmCI(model, min = .01, max = 100, cisep = '\U000022ef', ndigit=2)
glmCI(model, min = .01, max = 100, cisep = '\U000022ef', ndigit=2)
model |
Output from glm. |
min , max
|
Lower and upper limits for CIs, useful for extremely wide CIs. |
cisep |
Separator between CI values. |
ndigit |
rounding level. |
A list with coefficient, CIs, and pasted coef([CIs]).
glm_out <- glm(am ~ mpg, family = binomial, data = mtcars) glmCI(glm_out)
glm_out <- glm(am ~ mpg, family = binomial, data = mtcars) glmCI(glm_out)
ksnormal
is a convenience function around ks.test, testing against
Normal distribution.
If less than 2 values are provided, NA is returned.
ksnormal(x, lillie = TRUE)
ksnormal(x, lillie = TRUE)
x |
Vector of data to test. |
lillie |
Logical, should the Lilliefors test be used? Defaults to TRUE |
p.value from ks.test.
# original ks.test: ks.test( x = mtcars$wt, pnorm, mean = mean(mtcars$wt, na.rm = TRUE), sd = sd(mtcars$wt, na.rm = TRUE) ) # wrapped version: ksnormal(x = mtcars$wt, lillie=FALSE)
# original ks.test: ks.test( x = mtcars$wt, pnorm, mean = mean(mtcars$wt, na.rm = TRUE), sd = sd(mtcars$wt, na.rm = TRUE) ) # wrapped version: ksnormal(x = mtcars$wt, lillie=FALSE)
label_outliers
adds a text_repel layer to an existing ggplot object. It is intended to be used with boxplots or beeswarm plots. Faceting will result in separate computations for outliers.
It requires the ggrepel
package.
label_outliers( plotbase, labelvar = NULL, coef = 1.5, nudge_x = 0, nudge_y = 0, color = "darkred", size = 3, hjust = 0, face = "bold" )
label_outliers( plotbase, labelvar = NULL, coef = 1.5, nudge_x = 0, nudge_y = 0, color = "darkred", size = 3, hjust = 0, face = "bold" )
plotbase |
ggplot object to add labels to. |
labelvar |
variable to use as label. If NULL, rownames or rownumbers are used. |
coef |
coefficient for boxplot.stats, defaults to 1.5. |
nudge_x |
nudge in x direction, defaults to 0. |
nudge_y |
nudge in y direction, defaults to 0. |
color |
color of labels, defaults to darkred. |
size |
size of labels, defaults to 3. |
hjust |
horizontal justification of labels, defaults to 0. |
face |
font face of labels, defaults to bold. |
A ggplot object, allowing further styling.
logrange_1
returns a vector for log-labels at .1, 1, 100, 1000 ...
logrange_1 logrange_5 logrange_123456789 logrange_12357 logrange_15
logrange_1 logrange_5 logrange_123456789 logrange_12357 logrange_15
An object of class numeric
of length 41.
An object of class numeric
of length 738.
An object of class numeric
of length 369.
An object of class numeric
of length 205.
An object of class numeric
of length 82.
numeric vector
numeric vector
logrange_5
: vector for log-labels at
1.0, 1.5, 2.0, 2.5 ... 10, 15, 20, 25 ...
logrange_123456789
: vector for log-labels at
1, 2, 3 ... 9, 10, 20, 30 ... 90, 100 ...
logrange_12357
: vector for log-labels at
1 ,2, 3, 5, 7, 10, 20 ,30, 50, 70 ...
logrange_15
: vector for log-labels at
1, 5, 10, 50 ...
ggplot2::ggplot(mtcars) + ggplot2::aes(wt, mpg) + ggplot2::geom_point() + ggplot2::scale_y_log10(breaks = logrange_5) ggplot2::ggplot(mtcars) + ggplot2::aes(wt, mpg) + ggplot2::geom_point() + ggplot2::scale_y_log10(breaks = logrange_123456789)
ggplot2::ggplot(mtcars) + ggplot2::aes(wt, mpg) + ggplot2::geom_point() + ggplot2::scale_y_log10(breaks = logrange_5) ggplot2::ggplot(mtcars) + ggplot2::aes(wt, mpg) + ggplot2::geom_point() + ggplot2::scale_y_log10(breaks = logrange_123456789)
markSign
returns the symbol associated with a significance level.
markSign(SignIn, plabel = c("n.s.", "+", "*", "**", "***"))
markSign(SignIn, plabel = c("n.s.", "+", "*", "**", "***"))
SignIn |
A single p-value. |
plabel |
A translation table, predefined with the usual symbols. |
factor with label as defined in plabel.
markSign(0.012)
markSign(0.012)
Compute mean and sd and put together with the ± symbol.
meansd( x, roundDig = 2, drop0 = FALSE, groupvar = NULL, range = FALSE, rangesep = " ", add_n = FALSE, .german = FALSE )
meansd( x, roundDig = 2, drop0 = FALSE, groupvar = NULL, range = FALSE, rangesep = " ", add_n = FALSE, .german = FALSE )
x |
Data for computation. |
roundDig |
Number of relevant digits for roundR. |
drop0 |
Should trailing zeros be dropped? |
groupvar |
Optional grouping variable for subgroups. |
range |
Should min and max be included in output? |
rangesep |
How should min/max be separated from mean+-sd? |
add_n |
Should n be included in output? |
.german |
logical, should "." and "," be used as bigmark and decimal? |
character vector with mean ± SD, rounded to desired precision
# basic usage of meansd meansd(x = mtcars$wt) # with additional options meansd(x = mtcars$wt, groupvar = mtcars$am, add_n = TRUE)
# basic usage of meansd meansd(x = mtcars$wt) # with additional options meansd(x = mtcars$wt, groupvar = mtcars$am, add_n = TRUE)
meanse
computes SEM based on Standard Deviation/square root(n)
meanse(x, mult = 1, roundDig = 2, drop0 = FALSE)
meanse(x, mult = 1, roundDig = 2, drop0 = FALSE)
x |
Data for computation. |
mult |
multiplier for SEM, default 1, can be set to e.g. 2 or 1.96 to create confidence intervals |
roundDig |
Number of relevant digits for roundR. |
drop0 |
Should trailing zeros be dropped? |
character vector with mean ± SEM, rounded to desired precision
# basic usage of meanse meanse(x = mtcars$wt)
# basic usage of meanse meanse(x = mtcars$wt)
median_cl_boot
computes lower and upper confidence limits for the
estimated median, based on bootstrapping.
median_cl_boot(x, conf = 0.95, type = "basic", nrepl = 10^3)
median_cl_boot(x, conf = 0.95, type = "basic", nrepl = 10^3)
x |
Data for computation. |
conf |
confidence interval with default 95%. |
type |
type for function boot.ci. |
nrepl |
number of bootstrap replications, defaults to 1000. |
A tibble with one row and three columns: Median, CIlow, CIhigh.
# basic usage of median_cl_boot median_cl_boot(x = mtcars$wt)
# basic usage of median_cl_boot median_cl_boot(x = mtcars$wt)
median_cl_boot_gg
computes lower and upper confidence limits for the
estimated median, based on bootstrapping, using default settings.
median_cl_boot_gg(x)
median_cl_boot_gg(x)
x |
Data for computation. |
A tibble with one row and three columns: y, ymin, ymax.
# basic usage of median_cl_boot median_cl_boot_gg(x = mtcars$wt)
# basic usage of median_cl_boot median_cl_boot_gg(x = mtcars$wt)
Compute median and quartiles and put together.
median_quart( x, nround = NULL, probs = c(0.25, 0.5, 0.75), qtype = 8, roundDig = 2, drop0 = FALSE, groupvar = NULL, range = FALSE, rangesep = " ", rangearrow = " -> ", prettynum = FALSE, .german = FALSE, add_n = FALSE )
median_quart( x, nround = NULL, probs = c(0.25, 0.5, 0.75), qtype = 8, roundDig = 2, drop0 = FALSE, groupvar = NULL, range = FALSE, rangesep = " ", rangearrow = " -> ", prettynum = FALSE, .german = FALSE, add_n = FALSE )
x |
Data for computation. |
nround |
Number of digits for fixed round. |
probs |
Quantiles to compute. |
qtype |
Type of quantiles. |
roundDig |
Number of relevant digits for roundR. |
drop0 |
Should trailing zeros be dropped? |
groupvar |
Optional grouping variable for subgroups. |
range |
Should min and max be included in output? |
rangesep |
How should min/max be separated from mean+-sd? |
rangearrow |
What is put between min -> max? |
prettynum |
logical, apply prettyNum to results? |
.german |
logical, should "." and "," be used as bigmark and decimal? |
add_n |
Should n be included in output? |
character vector with median [1stQuartile/3rdQuartile]
, rounded to desired precision
# basic usage of median_quart median_quart(x = mtcars$wt) # with additional options median_quart(x = mtcars$wt, groupvar = mtcars$am, add_n = TRUE) data(faketrial) median_quart(x=faketrial$`Biomarker 1 [units]`,groupvar = faketrial$Treatment)
# basic usage of median_quart median_quart(x = mtcars$wt) # with additional options median_quart(x = mtcars$wt, groupvar = mtcars$am, add_n = TRUE) data(faketrial) median_quart(x=faketrial$`Biomarker 1 [units]`,groupvar = faketrial$Treatment)
medianse
is based on mad
/square root(n)
medianse(x)
medianse(x)
x |
Data for computation. |
numeric vector with SE Median.
# basic usage of medianse medianse(x = mtcars$wt)
# basic usage of medianse medianse(x = mtcars$wt)
pairwise_fisher_test
calculates pairwise comparisons between
group levels with corrections for multiple testing.
pairwise_fisher_test( dep_var, indep_var, adjmethod = "fdr", plevel = 0.05, symbols = letters[-1], ref = FALSE )
pairwise_fisher_test( dep_var, indep_var, adjmethod = "fdr", plevel = 0.05, symbols = letters[-1], ref = FALSE )
dep_var |
dependent variable, containing the data. |
indep_var |
independent variable, should be factor or coercible. |
adjmethod |
method for adjusting p values (see p.adjust). |
plevel |
threshold for significance. |
symbols |
predefined as b,c, d...; provides footnotes to mark group differences, e.g. b means different from group 2 |
ref |
is the 1st subgroup the reference (like in Dunnett test)? |
A list with elements "methods" (character), "p.value" (matrix), "plevel" (numeric), and "sign_colwise" (vector of length number of levels - 1)
# All pairwise comparisons pairwise_fisher_test(dep_var = mtcars$cyl, indep_var = mtcars$gear) # Only comparison against reference gear=3 pairwise_fisher_test(dep_var = mtcars$cyl, indep_var = mtcars$gear, ref = TRUE)
# All pairwise comparisons pairwise_fisher_test(dep_var = mtcars$cyl, indep_var = mtcars$gear) # Only comparison against reference gear=3 pairwise_fisher_test(dep_var = mtcars$cyl, indep_var = mtcars$gear, ref = TRUE)
pairwise_ordcat_test
calculates pairwise comparisons for ordinal
categories between all group levels with corrections for multiple testing.
pairwise_ordcat_test( dep_var, indep_var, adjmethod = "fdr", plevel = 0.05, symbols = letters[-1], ref = FALSE, cmh = TRUE )
pairwise_ordcat_test( dep_var, indep_var, adjmethod = "fdr", plevel = 0.05, symbols = letters[-1], ref = FALSE, cmh = TRUE )
dep_var |
dependent variable, containing the data |
indep_var |
independent variable, should be factor |
adjmethod |
method for adjusting p values (see p.adjust) |
plevel |
threshold for significance |
symbols |
predefined as b,c, d...; provides footnotes to mark group differences, e.g. b means different from group 2 |
ref |
is the 1st subgroup the reference (like in Dunnett test) |
cmh |
Should Cochran-Mantel-Haenszel test (coin::cmh_test) be used for testing? If false, the linear-by-linear association test (coin::lbl_test) is applied. |
A list with elements "methods" (character), "p.value" (matrix), "plevel" (numeric), and "sign_colwise" (vector of length number of levels - 1)
# All pairwise comparisons mtcars2 <- dplyr::mutate(mtcars, cyl = factor(cyl, ordered = TRUE)) pairwise_ordcat_test(dep_var = mtcars2$cyl, indep_var = mtcars2$gear) # Only comparison against reference gear=3 pairwise_ordcat_test(dep_var = mtcars2$cyl, indep_var = mtcars2$gear, ref = TRUE)
# All pairwise comparisons mtcars2 <- dplyr::mutate(mtcars, cyl = factor(cyl, ordered = TRUE)) pairwise_ordcat_test(dep_var = mtcars2$cyl, indep_var = mtcars2$gear) # Only comparison against reference gear=3 pairwise_ordcat_test(dep_var = mtcars2$cyl, indep_var = mtcars2$gear, ref = TRUE)
pairwise_t_test
calculate pairwise comparisons between group levels
with corrections for multiple testing based on pairwise.t.test
pairwise_t_test( dep_var, indep_var, adjmethod = "fdr", plevel = 0.05, symbols = letters[-1] )
pairwise_t_test( dep_var, indep_var, adjmethod = "fdr", plevel = 0.05, symbols = letters[-1] )
dep_var |
dependent variable, containing the data |
indep_var |
independent variable, should be factor |
adjmethod |
method for adjusting p values (see p.adjust) |
plevel |
threshold for significance |
symbols |
predefined as b,c, d...; provides footnotes to mark group differences, e.g. b means different from group 2 |
A list with method output of pairwise.t.test, matrix of p-values, and character vector with significance indicators.
pairwise_t_test(dep_var = mtcars$wt, indep_var = mtcars$cyl)
pairwise_t_test(dep_var = mtcars$wt, indep_var = mtcars$cyl)
pairwise_wilcox_test
calculates pairwise comparisons on ordinal data
between all group levels with corrections for multiple testing based on
wilcox_test from package 'coin'.
pairwise_wilcox_test( dep_var, indep_var, strat_var = NA, adjmethod = "fdr", distr = "exact", plevel = 0.05, symbols = letters[-1], sep = "" )
pairwise_wilcox_test( dep_var, indep_var, strat_var = NA, adjmethod = "fdr", distr = "exact", plevel = 0.05, symbols = letters[-1], sep = "" )
dep_var |
dependent variable, containing the data. |
indep_var |
independent variable, should be factor. |
strat_var |
optional factor for stratification. |
adjmethod |
method for adjusting p values (see p.adjust) |
distr |
Computation of p-values, see wilcox_test. |
plevel |
threshold for significance. |
symbols |
predefined as b,c, d...; provides footnotes to mark group differences, e.g. b means different from group 2. |
sep |
text between statistics and range or other elements. |
A list with matrix of adjusted p-values and character vector with significance indicators.
pairwise_wilcox_test(dep_var = mtcars$wt, indep_var = mtcars$cyl)
pairwise_wilcox_test(dep_var = mtcars$wt, indep_var = mtcars$cyl)
pdf_kable
formats tibbles/df's for markdown
pdf_kable( .input, width1 = 6, twidth = 14, tposition = "left", innercaption = NULL, caption = "", foot = NULL, escape = TRUE )
pdf_kable( .input, width1 = 6, twidth = 14, tposition = "left", innercaption = NULL, caption = "", foot = NULL, escape = TRUE )
.input |
table to print |
width1 |
Width of 1st column, default 6. |
twidth |
Default 14 |
tposition |
Default left |
innercaption |
subheader |
caption |
header |
foot |
footnote |
escape |
see kable |
A character vector of the table source code.
plot_LB
plots a Lineweaver-Burk diagram and computes the linear model
plot_LB( data, substrate, velocity, group = NULL, title = "Lineweaver-Burk-Plot", xlab = "1/substrate", ylab = "1/velocity" )
plot_LB( data, substrate, velocity, group = NULL, title = "Lineweaver-Burk-Plot", xlab = "1/substrate", ylab = "1/velocity" )
data |
data structure with columns for model data |
substrate |
colname for substrate concentration |
velocity |
colname for reaction velocity |
group |
colname for optional grouping factor |
title |
title of the plot |
xlab |
label of the abscissa |
ylab |
label of the ordinate |
MMdata <- data.frame(subst = c(2.00, 1.00, 0.50, 0.25), velo = c(0.2253, 0.1795, 0.1380, 0.1000)) plot_LB(data=MMdata, substrate = 'subst',velocity = 'velo') MMdata <- data.frame(subst = rep(c(2.00, 1.00, 0.50, 0.25),2), velo = c(0.2253, 0.1795, 0.1380, 0.1000, 0.4731333, 0.4089333, 0.3473000, 0.2546667), condition = rep(c('C1','C2'),each=4)) plot_LB(data=MMdata,substrate = 'subst', velocity = 'velo',group='condition')
MMdata <- data.frame(subst = c(2.00, 1.00, 0.50, 0.25), velo = c(0.2253, 0.1795, 0.1380, 0.1000)) plot_LB(data=MMdata, substrate = 'subst',velocity = 'velo') MMdata <- data.frame(subst = rep(c(2.00, 1.00, 0.50, 0.25),2), velo = c(0.2253, 0.1795, 0.1380, 0.1000, 0.4731333, 0.4089333, 0.3473000, 0.2546667), condition = rep(c('C1','C2'),each=4)) plot_LB(data=MMdata,substrate = 'subst', velocity = 'velo',group='condition')
plot_MM
creates a Michaelis-Menten type Enzyme kinetics plot and returns model as well
plot_MM( data, substrate, velocity, group = NULL, title = "Michaelis-Menten", xlab = "substrate", ylab = "velocity" )
plot_MM( data, substrate, velocity, group = NULL, title = "Michaelis-Menten", xlab = "substrate", ylab = "velocity" )
data |
data structure with columns for model data |
substrate |
colname for substrate concentration |
velocity |
colname for reaction velocity |
group |
colname for optional grouping factor |
title |
title of the plot |
xlab |
label for x-axis |
ylab |
label for y-axis |
a list with elements "MMfit" and "MMplot"
MMdata <- data.frame(subst = c(2.00, 1.00, 0.50, 0.25), velo = c(0.2253, 0.1795, 0.1380, 0.1000)) plot_MM(data=MMdata, substrate = 'subst',velocity = 'velo') MMdata <- data.frame(subst = rep(c(2.00, 1.00, 0.50, 0.25),2), velo = c(0.2253, 0.1795, 0.1380, 0.1000, 0.4731333, 0.4089333, 0.3473000, 0.2546667), condition = rep(c('C1','C2'),each=4)) plot_MM(data=MMdata,substrate = 'subst', velocity = 'velo',group='condition')
MMdata <- data.frame(subst = c(2.00, 1.00, 0.50, 0.25), velo = c(0.2253, 0.1795, 0.1380, 0.1000)) plot_MM(data=MMdata, substrate = 'subst',velocity = 'velo') MMdata <- data.frame(subst = rep(c(2.00, 1.00, 0.50, 0.25),2), velo = c(0.2253, 0.1795, 0.1380, 0.1000, 0.4731333, 0.4089333, 0.3473000, 0.2546667), condition = rep(c('C1','C2'),each=4)) plot_MM(data=MMdata,substrate = 'subst', velocity = 'velo',group='condition')
package flextable is a more powerful alternative
print_kable
formats and prints tibbles/df's in markdown with splitting
into sub-tables with repeated caption and header.
print_kable(t, nrows = 30, caption = "", ncols = 100, ...)
print_kable(t, nrows = 30, caption = "", ncols = 100, ...)
t |
table to print. |
nrows |
number of rows (30) before splitting. |
caption |
header. |
ncols |
number of columns (100) before splitting. |
... |
Further arguments passed to kable. |
No return value, called for side effects.
## Not run: print_kable(mtcars, caption = "test") ## End(Not run)
## Not run: print_kable(mtcars, caption = "test") ## End(Not run)
roundR
takes a vector or matrix of numbers and returns rounded values
with selected precision and various formatting options.
roundR( roundin, level = 2, smooth = FALSE, textout = TRUE, drop0 = FALSE, .german = FALSE, .bigmark = FALSE )
roundR( roundin, level = 2, smooth = FALSE, textout = TRUE, drop0 = FALSE, .german = FALSE, .bigmark = FALSE )
roundin |
A vector or matrix of numbers. |
level |
A number specifying number of relevant digits to keep. |
smooth |
A logical specifying if you want rounding before the dot (e.g. 12345 to 12300). |
textout |
A logical if output is converted to text. |
drop0 |
A logical if trailing zeros should be dropped. |
.german |
A logical if german numbers should be reported. |
.bigmark |
A logical if big.mark is to be shown, mark itself depends on parameter .german. |
vector of type character (default) or numeric, depending on parameter textout.
roundR(1.23456, level = 3) roundR(1.23456, level = 3, .german = TRUE) roundR(1234.56, level = 2, smooth = TRUE)
roundR(1.23456, level = 3) roundR(1.23456, level = 3, .german = TRUE) roundR(1234.56, level = 2, smooth = TRUE)
se_median
is based on mad
/square root(n)
(Deprecated, please see medianse, which is the same but named more consistently)
se_median(x)
se_median(x)
x |
Data for computation. |
numeric vector with SE Median.
# basic usage of se_median ## Not run: se_median(x = mtcars$wt) ## End(Not run)
# basic usage of se_median ## Not run: se_median(x = mtcars$wt) ## End(Not run)
SEM
computes standard error of mean.
SEM(x)
SEM(x)
x |
Data for computation. |
numeric vector with SEM.
SEM(x = mtcars$wt)
SEM(x = mtcars$wt)
surprisal
takes p-values and returns s, a value representing the
number of consecutive heads on a fair coin, that would be as surprising
as the p-value
surprisal(p, precision = 1)
surprisal(p, precision = 1)
p |
a vector of p-values |
precision |
rounding level with default 1 |
a character vector of s-values
t_var_test
tests for equal variance based on var.test
and calls t.test, setting the option var.equal accordingly.
t_var_test(data, formula, cutoff = 0.05)
t_var_test(data, formula, cutoff = 0.05)
data |
Tibble or data_frame. |
formula |
Formula object with dependent and independent variable. |
cutoff |
is significance threshold for equal variances. |
A list from t.test
t_var_test(mtcars, wt ~ am) # may be used in pipes: mtcars |> t_var_test(wt ~ am)
t_var_test(mtcars, wt ~ am) # may be used in pipes: mtcars |> t_var_test(wt ~ am)
tab.search
searches for pattern within a data-frame or tibble,
returning column(s) and row(s)
tab.search(searchdata = rawdata, pattern, find.all = T, names.only = FALSE)
tab.search(searchdata = rawdata, pattern, find.all = T, names.only = FALSE)
searchdata |
table to search in, predefined as rawdata |
pattern |
regex, for exact matches add ^findme$ |
find.all |
return all row indices or only 1st per column,default=TRUE |
names.only |
return only vector of colnames rather than list with names and rows, default=FALSE |
A list with numeric vectors for each column giving row numbers of matched elements
var_coeff computes relative variability as standard deviation/mean *100
var_coeff(x)
var_coeff(x)
x |
Data for computation. |
numeric vector with coefficient of variance.
var_coeff(x = mtcars$wt)
var_coeff(x = mtcars$wt)
WINratio
computes the ratio of wins and losses for any number
of comparison rules.
WINratio(data, groupvar, testvars, rules, idvar = NULL, p_digits = 3)
WINratio(data, groupvar, testvars, rules, idvar = NULL, p_digits = 3)
data |
name of data set (tibble/data.frame) to analyze. |
groupvar |
name of grouping variable, has to translate to 2 groups. |
testvars |
names of variables for sequential rules. |
rules |
list of rules (minimal cut-offs) for sequential comparison, negative if reduction is success, positive if increase is beneficial, must not be 0. |
idvar |
name of identifier variable. If NULL, rownumber is used. |
p_digits |
level for rounding p-value. |
A list with elements:
WINratio=vector with WINratio and CIs,
WINodds=odds ratio of wins and losses, taking ties into account,
p.value=p.value from prop.test,
WINratioCI=character with merged WINratio, CI, and p
testdata= tibble with testdata from cross-join.