Introduction to AdaPTGMM Package

The AdaPTGMM method is a flexible multiple testing algorithm that uses covariates to estimate the local false discovery rate with finite sample FDR control.

The AdaPTGMM package performs automatic model selection over the number of mixture components and a list of candidate models (or featurization).

The arguments of the model are in res$args. The model selection chose a spline with 2 degrees of freedom and 4 Gaussian components.

Instead, we may choose to model the $z_i$’s. Since we are now inputting $z_i$’s and using a two sided test, we need to specify that the testing procedure is two sided with testing="two_sided".

Masking

The masking function in AdaPTGMM is governed by three parameters, $\alpha_m$, $\lambda$ and $\zeta$, see our paper for more details. These three variables must satisfy \[0<\alpha_m\le \lambda \le \alpha\cdot\zeta + \lambda \le 1.\] The value of $\zeta$ controls the minimum possible number of rejections. If the analyst expects a small number of rejections, i.e. $<20$, we recommend large $\zeta$, for example 5-20.

If the user does not specify any values for $\alpha_m$, $\lambda$, and $\zeta$, AdaPTGMM will choose default parameters, adapting to the total number of hypotheses. See our paper for more details.

We may input custom masking parameters. For the sake of example, we generate one sided p-values.

In this dataset, we have some null p-values close to 1. Masking 0 and 1 to the same value will lead to inflated FDP estimates and lower power.

res1 <- adapt_gmm(x=x,z=z, alphas=alphas, testing="one_sided",
        beta_formulas = formulas, nclasses= nclasses,
        alpha_m = 0.2, lambda = 0.4, zeta = 3)
#> Model selection starting. Shrink the set of candidate models or reduce niter_fit/niter_ms/nfits if it is too time-consuming.
#> 
  |                                                        
  |                                                  |   0%
  |                                                        
  |======                                            |  12%
  |                                                        
  |============                                      |  25%
  |                                                        
  |===================                               |  38%
  |                                                        
  |=========================                         |  50%
  |                                                        
  |===============================                   |  62%
  |                                                        
  |======================================            |  75%
  |                                                        
  |============================================      |  88%
  |                                                        
  |==================================================| 100%
#> alpha = 0.2: FDPhat 0.1971, Number of Rej. 93
#> alpha = 0.1: FDPhat 0.0976, Number of Rej. 41
#> alpha = 0.05: FDPhat 0.0256, Number of Rej. 13
#> alpha = 0.01: FDPhat 0.0256, Number of Rej. 0
#> Complete.

res2 <- adapt_gmm(x=x,z=z, alphas=alphas, testing="one_sided",
        beta_formulas = formulas, nclasses= nclasses,
        alpha_m = 0.2, lambda = 0.3, zeta = 3)
#> Model selection starting. Shrink the set of candidate models or reduce niter_fit/niter_ms/nfits if it is too time-consuming.
#> 
  |                                                        
  |                                                  |   0%
  |                                                        
  |======                                            |  12%
  |                                                        
  |============                                      |  25%
  |                                                        
  |===================                               |  38%
  |                                                        
  |=========================                         |  50%
  |                                                        
  |===============================                   |  62%
  |                                                        
  |======================================            |  75%
  |                                                        
  |============================================      |  88%
  |                                                        
  |==================================================| 100%
#> alpha = 0.2: FDPhat 0.1996, Number of Rej. 182
#> alpha = 0.1: FDPhat 0.099, Number of Rej. 128
#> alpha = 0.05: FDPhat 0.0495, Number of Rej. 101
#> alpha = 0.01: FDPhat 0.0072, Number of Rej. 46
#> Complete.

We see that using a masking function with $\nu= \alpha\cdot\zeta + \lambda=1$ results in much lower power. For this reason, we recommend $\alpha_m=\lambda$ and $\nu=0.9$.

Modeling Mixture Proportions

AdaPTGMM models the mixture proportions with a blackbox classification model. We provide implementation of multinomial logistic regression, GAM, glmnet, and a neural network with a single hidden layer. The user may specify the model type with respectively model_type=“nnet”, “mgcv”, “glmnet”, and “neural”.

We may use a multinomial logistic regression model from nnet::multinom.

res_multinom <- adapt_gmm(x=x,z=z, alphas=alphas, testing="one_sided",
        beta_formulas = formulas, model_type = "nnet", nclasses= nclasses)
#> Model selection starting. Shrink the set of candidate models or reduce niter_fit/niter_ms/nfits if it is too time-consuming.
#> 
  |                                                        
  |                                                  |   0%
  |                                                        
  |======                                            |  12%
  |                                                        
  |============                                      |  25%
  |                                                        
  |===================                               |  38%
  |                                                        
  |=========================                         |  50%
  |                                                        
  |===============================                   |  62%
  |                                                        
  |======================================            |  75%
  |                                                        
  |============================================      |  88%
  |                                                        
  |==================================================| 100%
#> alpha = 0.2: FDPhat 0.1989, Number of Rej. 186
#> alpha = 0.1: FDPhat 0.0977, Number of Rej. 133
#> alpha = 0.05: FDPhat 0.0476, Number of Rej. 84
#> alpha = 0.01: FDPhat 0.0102, Number of Rej. 0
#> Complete.

Alternatively, the user may construct use own beta model. AdaPTGMM accepts as input custom_beta_model, which must be a function that accepts a formula, dataset, and (optional) initialization parameters. This is a default template for writing a custom beta model. There are four key things that the function must do,

Fit the model using the formula, dataset, and weights per observation
Return fitted probabilities for each row in the dataset
Return the degrees of freedom of the model for model selection
(Optionally) Return model weights for initialization

custom_beta <- function(formula, data, initialization){

  ########################################################################
  # EDIT THIS SECTION to fit any desired model
  # Formula, data, and weights for each observation in data$weights
  # Compute:
  # 1. The fitted probabilities for the data
  # 2. New model parameters for initialization (or leave as null if undesired)
  # 3. Degrees of freedom for model (for model selection)

  model <- fit_model(formula,data,params=initialization,weights=data$weights)

  fitted_probabilities <- fitted(model)
  new_weights <- model$weights
  df <- model$df

  ########################################################################
  output <- list()
  output$fitted_prob <- fitted_probabilities
  output$model_weights <- new_weights
  output$df <- df

  return(output)

}

Here is an example of using a multinomial logistic regression model. To use a custom beta model, model_type="custom" and custom_beta_model= the custom beta function.

custom_multinom <- function(formula, data, initialization){

  ########################################################################
  # EDIT THIS SECTION to fit any desired model
  # Formula, data, and weights for each observation in data$weights
  # Compute:
  # 1. The fitted probabilities for the data
  # 2. New model parameters for initialization (or leave as null if undesired)
  # 3. Degrees of freedom for model (for model selection)

  if(is.null( initialization)){
    model <- nnet::multinom(formula=formula, data=data, weights = weights, trace = F)
  }else{
    model <- nnet::multinom(formula, data, weights = weights, trace = F,Wts= initialization)
  }

  fitted_probabilities <- fitted(model)
  new_weights <- model$wts
  df <- sum(model$edf)

  ########################################################################
  output <- list()
  output$fitted_prob <- fitted_probabilities
  output$model_weights <- new_weights
  output$df <- df

  return(output)

}

res_custom <- adapt_gmm(x=x,z=z, alphas=alphas, testing="one_sided",
        beta_formulas = formulas, nclasses= nclasses,
         model_type = "custom", custom_beta_model = custom_multinom)
#> Model selection starting. Shrink the set of candidate models or reduce niter_fit/niter_ms/nfits if it is too time-consuming.
#> 
  |                                                        
  |                                                  |   0%
  |                                                        
  |======                                            |  12%
  |                                                        
  |============                                      |  25%
  |                                                        
  |===================                               |  38%
  |                                                        
  |=========================                         |  50%
  |                                                        
  |===============================                   |  62%
  |                                                        
  |======================================            |  75%
  |                                                        
  |============================================      |  88%
  |                                                        
  |==================================================| 100%
#> alpha = 0.2: FDPhat 0.1989, Number of Rej. 186
#> alpha = 0.1: FDPhat 0.0977, Number of Rej. 133
#> alpha = 0.05: FDPhat 0.0476, Number of Rej. 84
#> alpha = 0.01: FDPhat 0.0102, Number of Rej. 0
#> Complete.

We see that we obtain the same results as the default implementation of multinomial logistic regression.

Fast Version of AdaPTGMM

If speed is a core concern, we recommend decreasing the number of iterations used in fitting the model and number of EM iterations, as well as using a simple $\beta$ model.

Overall Recommendations:

Decrease number of candidate classes and the size of classes, here we set nclasses=3.
Decrease number of candidate featurizations, here we set beta_formulas to be a spline with 3 degrees of freedom.
Decrease number of EM outer loops, nfits. By default, nfits=20, we set it to be 3 here.
Decrease number of EM inner iterations, niter_fit. By default, niter_fit=5, we set it to be 2 here.
Decrease number of EM model selection steps, niter_ms. By default, niter_fit=10, we set it to be 5 here.
Optionally remove the intercept only model if the covariates are thought to be helpful.

Here is a one example:

fast_res <- adapt_gmm(x=x,z=z, alphas=alphas, testing="one_sided",
        beta_formulas = "splines::ns(x, df=3)", model_type = "nnet", nclasses= 4,
        nfits = 5, niter_fit=2, niter_ms = 5,intercept_model = FALSE)
#> Model selection starting. Shrink the set of candidate models or reduce niter_fit/niter_ms/nfits if it is too time-consuming.
#> 
  |                                                        
  |                                                  |   0%
  |                                                        
  |==================================================| 100%
#> alpha = 0.2: FDPhat 0.2, Number of Rej. 165
#> alpha = 0.1: FDPhat 0.1, Number of Rej. 110
#> alpha = 0.05: FDPhat 0.0461, Number of Rej. 76
#> alpha = 0.01: FDPhat 0.01, Number of Rej. 50
#> Complete.

Introduction to `AdaPTGMM` Package

Patrick Chao

2021-07-27

Masking

Modeling Mixture Proportions

Fast Version of AdaPTGMM