Fitting Weighted Quantile Sum regression models
Description
Fits Weighted Quantile Sum (WQS) regression (Carrico et al. (2014) doi:10.1007/s13253-014-0180-3), a random subset implementation of WQS (Curtin et al. (2019) doi:10.1080/03610918.2019.1577971), a repeated holdout validation WQS (Tanner et al. (2019) doi:10.1016/j.mex.2019.11.008) and a WQS with 2 indices (Renzetti et al. (2023) doi:10.3389/fpubh.2023.1289579) for continuous, binomial, multinomial, Poisson, quasi-Poisson and negative binomial outcomes.
Usage
gwqs(formula, data, na.action, weights, mix_name, stratified, rh = 1, b = 100, b1_pos = TRUE, bint_cont_pos = NULL, bint_cat_pos = NULL, b_constr = FALSE, zero_infl = FALSE, q = 4, validation = 0.6, validation_rows = NULL, family = gaussian, signal = c("t2", "t3", "one", "abst", "expt"), rs = FALSE, n_vars = NULL, zilink = c("logit", "probit", "cloglog", "cauchit", "log"), seed = NULL, wp = NULL, wn = NULL, plan_strategy = "sequential", lambda = 0, optim.method = c("BFGS", "Nelder-Mead", "CG", "SANN"), control = list(trace = FALSE, maxit = 2000, reltol = 1e-9), b1_constr = NULL, ...) gwqs_multinom(formula, data, na.action, weights, mix_name, stratified, rh = 1, b = 100, b1_pos = c(TRUE, TRUE), b_constr = FALSE, q = 4, validation = 0.6, validation_rows = NULL, signal = c("t2", "t3", "one", "abst", "expt"), rs = FALSE, n_vars = NULL, zilink = c("logit", "probit", "cloglog", "cauchit", "log"), seed = NULL, wp = NULL, wn = NULL, plan_strategy = "sequential", lambda = 0, optim.method = c("BFGS", "Nelder-Mead", "CG", "SANN"), control = list(trace = FALSE, maxit = 2000, reltol = 1e-9), b1_constr = NULL, ...) gwqsrh(formula, data, na.action, weights, mix_name, stratified, rh = 1, b = 100, b1_pos = TRUE, bint_cont_pos = NULL, bint_cat_pos = NULL, b_constr = FALSE, zero_infl = FALSE, q = 4, validation = 0.6, validation_rows = NULL, family = gaussian, signal = c("t2", "t3", "one", "abst", "expt"), rs = FALSE, n_vars = NULL, zilink = c("logit", "probit", "cloglog", "cauchit", "log"), seed = NULL, wp = NULL, wn = NULL, plan_strategy = "sequential", lambda = 0, optim.method = c("BFGS", "Nelder-Mead", "CG", "SANN"), control = list(trace = FALSE, maxit = 2000, reltol = 1e-9), ...)
gwqs(formula, data, na.action, weights, mix_name, stratified, rh = 1, b = 100, b1_pos = TRUE, bint_cont_pos = NULL, bint_cat_pos = NULL, b_constr = FALSE, zero_infl = FALSE, q = 4, validation = 0.6, validation_rows = NULL, family = gaussian, signal = c("t2", "t3", "one", "abst", "expt"), rs = FALSE, n_vars = NULL, zilink = c("logit", "probit", "cloglog", "cauchit", "log"), seed = NULL, wp = NULL, wn = NULL, plan_strategy = "sequential", lambda = 0, optim.method = c("BFGS", "Nelder-Mead", "CG", "SANN"), control = list(trace = FALSE, maxit = 2000, reltol = 1e-9), b1_constr = NULL, ...) gwqs_multinom(formula, data, na.action, weights, mix_name, stratified, rh = 1, b = 100, b1_pos = c(TRUE, TRUE), b_constr = FALSE, q = 4, validation = 0.6, validation_rows = NULL, signal = c("t2", "t3", "one", "abst", "expt"), rs = FALSE, n_vars = NULL, zilink = c("logit", "probit", "cloglog", "cauchit", "log"), seed = NULL, wp = NULL, wn = NULL, plan_strategy = "sequential", lambda = 0, optim.method = c("BFGS", "Nelder-Mead", "CG", "SANN"), control = list(trace = FALSE, maxit = 2000, reltol = 1e-9), b1_constr = NULL, ...) gwqsrh(formula, data, na.action, weights, mix_name, stratified, rh = 1, b = 100, b1_pos = TRUE, bint_cont_pos = NULL, bint_cat_pos = NULL, b_constr = FALSE, zero_infl = FALSE, q = 4, validation = 0.6, validation_rows = NULL, family = gaussian, signal = c("t2", "t3", "one", "abst", "expt"), rs = FALSE, n_vars = NULL, zilink = c("logit", "probit", "cloglog", "cauchit", "log"), seed = NULL, wp = NULL, wn = NULL, plan_strategy = "sequential", lambda = 0, optim.method = c("BFGS", "Nelder-Mead", "CG", "SANN"), control = list(trace = FALSE, maxit = 2000, reltol = 1e-9), ...)
Arguments
formula |
An object of class |
data |
The |
na.action |
|
weights |
An optional term containing the name of the variable in the dataset representing the weights
to be used in the fitting process. Should be |
mix_name |
A character vector listing the variables contributing to a mixture effect. |
stratified |
The character name of the variable for which you want to stratify for.
It has to be a |
rh |
Number of repeated holdout validations. |
b |
Number of bootstrap samples used in parameter estimation. No bootstrap will be performed if b = 1. |
b1_pos |
A logical value that determines whether weights are derived from models where the beta
values were positive ( |
bint_cont_pos |
A logical value that determines whether weights are derived from models where the
beta parameter of the interaction term between the WQS index and a continuous variable were
positive ( |
bint_cat_pos |
A logical value or a vector of logical values that determines whether weights are
derived from models where the slopes of the WQS index for each level (other than the reference one)
of the interacting categorical variable were positive ( |
b_constr |
A logial value that determines whether to apply positive (if |
zero_infl |
A logical value ( |
q |
An |
validation |
Percentage of the dataset to be used to validate the model. If
|
validation_rows |
A list of a single (if rh=1) or multiple vectors containing the rows to be considered in the validation step. When "validation_rows=NULL" (default) the function randomly choose the observations to be considered in the validation step. |
family |
A character value that allows to decide for the glm: |
signal |
Character identifying the signal function to be used when the average weights
are estimated. It can take values from |
rs |
A logic value. If |
n_vars |
The number of mixture components to be included at each random subset step.
If |
zilink |
Character specification of link function in the binary zero-inflation model
(you can choose among |
seed |
An |
wp , wn
|
An optional set of starting weights for the positive ( |
plan_strategy |
A character value that allows to choose the evaluation strategies for the
|
lambda |
The value of the penalization term used to shrink towards 0 the weights that are not truly associated with the outcome (see the "Details" and the vignette for further information). |
optim.method |
A character identifying the method to be used by the |
control |
The control list of optimization parameters. See |
b1_constr |
The argument is deprecated, use 'b_constr' instead. |
... |
Additional arguments to be passed to the function |
Details
gWQS
uses the glm
function in the stats package to fit the linear, logistic,
the Poisson and the quasi-Poisson regression, while the glm.nb
function from the MASS
package is used to fit the negative binomial regression respectively. The nlm
function from
the stats package was used to optimize the log-likelihood of the multinomial regression.
The optim
optimization function is used to estimate the weights at each
bootstrap step.
The seed
argument specifies a fixed seed through the set.seed
function.
The rs
term allows to choose the type of methodology between the bootstrap implementation
(WQSBS) or the random subset implementation (WQSRS) of the WQS. The first method performs b
bootstrapped samples to estimate the weights while the second creates b
randomly-selected
subset of the total predictor set. For further details please see the vignette
("How to use gWQS package") and the references below.
Value
gwqs
return the results of the WQS regression as well as many other objects and datasets.
fit |
The object that summarizes the output of the WQS model, reflecting a
linear, logistic, multinomial, Poisson, quasi-Poisson or negative binomial regression
depending on how the |
final_weights |
|
conv |
Indicates whether the solver has converged (0) or not (1 or 2). |
bres |
Matrix of estimated weights, mixture effect parameter estimates and the associated standard errors, statistics and p-values estimated for each bootstrap iteration. |
wqs |
Vector containing the wqs index for each subject. |
pwqs |
Vector containing the positive wqs index for each subject. |
nwqs |
Vector containing the negative wqs index for each subject. |
qi |
List of the cutoffs used to divide in quantiles the variables in the mixture |
bindex |
List of vectors containing the |
y_wqs_df |
|
family |
The family specified. |
call |
The matched call. |
formula |
The formula supplied. |
mix_name |
The vector of variable names used to identify the elements in the mixture. |
q |
The method used to rank varibales included in the mixture. |
n_levels |
The number of levels of the of the dependent variable when a multinomial regression is ran. |
zero_infl |
If a zero inflated model was ran ( |
zilink |
The chosen link function when a zero inflated model was ran. |
dwqs |
A logical value whether two indices were included ( |
levelnames |
The name of each level when a multinomial regression is ran. |
data |
The data used in the WQS analysis. |
objfn_values |
The vector of the b values of the objective function corresponding to the optima values |
optim_messages |
The vector of character strings giving any additional information returned by the optimizer, or NULL. |
gwqslist |
List of the output from the |
coefmat |
Matrix containing the parameter estimates from each repeated holdout WQS model. |
wmat |
Matrix containing the weight estimates from each repeated holdout WQS model. |
rh |
The number of repeated holdout performed. |
Author(s)
Stefano Renzetti, Paul Curtin, Allan C Just, Ghalib Bello, Chris Gennings
References
Carrico C, Gennings C, Wheeler D, Factor-Litvak P. Characterization of a weighted quantile sum
regression for highly correlated data in a risk analysis setting. J Biol Agricul Environ Stat.
2014:1-21. ISSN: 1085-7117. doi:10.1007/s13253-014-0180-3.
Curtin P, Kellogg J, Cech N, Gennings C (2021). A random subset implementation of weighted quantile
sum (WQSRS) regression for analysis of high-dimensional mixtures, Communications in Statistics -
Simulation and Computation, 50:4, 1119-1134. doi:10.1080/03610918.2019.1577971.
Tanner EM, Bornehag CG, Gennings C. Repeated holdout validation for weighted quantile sum regression.
MethodsX. 2019 Nov 22;6:2855-2860. doi:10.1016/j.mex.2019.11.008. PMID: 31871919; PMCID: PMC6911906.
Renzetti S, Gennings C and Calza S (2023) A weighted quantile sum regression with penalized weights
and two indices. Front Public Health 11:1151821. doi:10.3389/fpubh.2023.1151821.
See Also
glm, glm.nb, multinom, zeroinfl.
Examples
# we save the names of the mixture variables in the variable "toxic_chems" toxic_chems = names(wqs_data)[1:34] # To run a linear model and save the results in the variable "results". This linear model # (family = gaussian) will rank/standardize variables in quartiles (q = 4), perform a # 40/60 split of the data for training/validation (validation = 0.6), and estimate weights # over 2 bootstrap samples (b = 2; in practical applications at least 100 bootstraps # should be used). Weights will be derived from mixture effect parameters that are positive # (b1_pos = TRUE). A unique seed was specified (seed = 2016) so this model will be # reproducible, and plots describing the variable weights and linear relationship will be # generated as output (plots = TRUE). In the end tables describing the weights values and # the model parameters with the respectively statistics are generated in the plots window # (tables = TRUE): results = gwqs(yLBX ~ wqs, mix_name = toxic_chems, data = wqs_data, q = 4, validation = 0.6, b = 2, b1_pos = TRUE, b_constr = FALSE, family = gaussian, seed = 2016) # to test the significance of the covariates summary(results)
# we save the names of the mixture variables in the variable "toxic_chems" toxic_chems = names(wqs_data)[1:34] # To run a linear model and save the results in the variable "results". This linear model # (family = gaussian) will rank/standardize variables in quartiles (q = 4), perform a # 40/60 split of the data for training/validation (validation = 0.6), and estimate weights # over 2 bootstrap samples (b = 2; in practical applications at least 100 bootstraps # should be used). Weights will be derived from mixture effect parameters that are positive # (b1_pos = TRUE). A unique seed was specified (seed = 2016) so this model will be # reproducible, and plots describing the variable weights and linear relationship will be # generated as output (plots = TRUE). In the end tables describing the weights values and # the model parameters with the respectively statistics are generated in the plots window # (tables = TRUE): results = gwqs(yLBX ~ wqs, mix_name = toxic_chems, data = wqs_data, q = 4, validation = 0.6, b = 2, b1_pos = TRUE, b_constr = FALSE, family = gaussian, seed = 2016) # to test the significance of the covariates summary(results)