Skip to content

interoperability with {randomForestExplainer} by changing my_forest$call to be more {ranger}-like? #10

@mikoontz

Description

@mikoontz

Hello again! I hope it's okay to keep documenting my experience working with your excellent package!

This is not a high priority item, but one thing I've come across in teasing out some inference from my {ranger} model built using {spatialRF} is how the resulting object can work with the {randomForestExplainer} package which is meant to operate on {ranger} objects (https://github.com/ModelOriented/randomForestExplainer).

I ran into an error when trying to learn more about non-multiplicative interactions in my random forest trying to follow this tutorial (https://cran.rstudio.com/web/packages/randomForestExplainer/vignettes/randomForestExplainer.html#variable-interactions).

The troublesome function from {randomForestExplainer} is plot_predict_interaction() and I get this error:

Error in if (as.character(forest$call[[2]])[3] == ".") { : missing value where TRUE/FALSE needed

I'll swap in the mtcars dataset for iris in the help file example for plot_predict_interaction(), which now looks like:

forest_ranger <- ranger::ranger(cyl ~ ., data = mtcars)
randomForestExplainer::plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp")

The $call component of the random forest generated by {ranger} for this model looks like:

> forest_ranger$call
ranger::ranger(cyl ~ ., data = mtcars)

Using {spatialRF} to build the ranger model looks like this (sort of a silly example, since the data aren't spatial, but I'll use the non-spatial approach following your very helpful tutorial):

forest_ranger <- spatialRF::rf(dependent.variable.name = "cyl", 
                               predictor.variable.names = c("mpg", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"), 
                               data = mtcars)

Which then gives the error above when trying to call randomForestExplainer::plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp").

The $call component of the random forest generated by {spatialRF} looks like:

> forest_ranger$call
ranger::ranger(data = data, dependent.variable.name = dependent.variable.name, 
    num.trees = num.trees, mtry = mtry, importance = importance, 
    write.forest = write.forest, probability = probability, min.node.size = min.node.size, 
    max.depth = max.depth, replace = replace, sample.fraction = sample.fraction, 
    case.weights = case.weights, class.weights = class.weights, 
    splitrule = splitrule, num.random.splits = num.random.splits, 
    alpha = alpha, minprop = minprop, split.select.weights = split.select.weights, 
    always.split.variables = always.split.variables, respect.unordered.factors = respect.unordered.factors, 
    scale.permutation.importance = scale.permutation.importance, 
    local.importance = local.importance, regularization.factor = regularization.factor, 
    regularization.usedepth = regularization.usedepth, keep.inbag = keep.inbag, 
    inbag = inbag, holdout = holdout, quantreg = quantreg, oob.error = oob.error, 
    num.threads = num.threads, save.memory = save.memory, verbose = verbose, 
    seed = seed, classification = classification)

Pretty different!

The line in {randomForestExplainer} that causes this is here: https://github.com/ModelOriented/randomForestExplainer/blob/630c4fe9f7ddcc0a9a586dc4c4fc1822e9d30776/R/min_depth_interactions.R#L363

I was able to work around this by overwriting the $call component in the random forest generated by {spatialRF} like so:

forest_ranger$call <- str2lang(paste0("ranger::ranger(cyl ~ ", paste(c("mpg", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"), collapse = " + "), ")"))

For posterity, this essentially creates the formula for the random forest model by putting together the dependent and independent variable pieces. To make it work with {randomForestExplainer}, you also have to include the package name and function call (ranger::ranger()) wrapped around the formula. The str2lang() function was suggested by R as the right way to create an object of class call after I tried wrapping the character string in just as.call() which didn't work.

Which then lets me run randomForestExplainer::plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp") to produce:

image

So anyway, I'm interested in your opinion about this. Is the approach by {randomForestExplainer} to get the names of the independent variables too fragile (and I can make an issue on their package)? Or is this issue better served by changing how {spatialRF} stores the call as a component of the {ranger} object? Maybe both?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions