Filtering and Merging latent factors

We recommend running Maui with a large number of latent factors (e.g. 100), even when we expect the latent space to be of lower dimension. This way we are more likely to capture latent factors which are interesting, and the uninteresting ones can be dropped later before down-stream analysis. Maui comes with some functionality to that end.

Dropping unexplanatory latent factors

An unsupervised way to drop latent factors with low explanatory power, is to fit linear models predicting the input x from the latent factorz z. The Maui Utilities have a function which does this. For each latent factor, a linear model is fit, predicting all input features from each latent factor. Then, the R-square is computed. Factors with an R-square score below some threshold are dropped.

The functionality is also available directly on a trained Maui model (The Maui Class), which exposes a function which drops unexplanatory factors in-place:

Merging similar latent factors

Some times running Maui with a large number of latent factor can produce embeddings which are similar to one another. For instance, a heatmap of latent factor values may look like this:

_images/colinearity.png

Heatmap of latent factors shows many latent factors are very similar.

The latent factors may be clustered and merged to produce a more succinct, even lower-dimension representation of the data, without losing much information

_images/colinearity-merged.png

Heatmap of latent factors after they have been merged by similarity values.

Maui Utilities provides functionality to merge latent factors based on arbitrary distance metrics:

And functionality for the base case where factors are merged by correlation is provided in the Maui model calss:

Supervised filtering of latent factors

In the case of patient data, latent factors may be assessed for usefulness based on how predictive they are of patient survival. Maui includes functionality to do this in the utilities class:

For a more comprehensive example, check out our vignette.