-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Hello again! I hope it's okay to keep documenting my experience working with your excellent package!
This is not a high priority item, but one thing I've come across in teasing out some inference from my {ranger} model built using {spatialRF} is how the resulting object can work with the {randomForestExplainer} package which is meant to operate on {ranger} objects (https://github.com/ModelOriented/randomForestExplainer).
I ran into an error when trying to learn more about non-multiplicative interactions in my random forest trying to follow this tutorial (https://cran.rstudio.com/web/packages/randomForestExplainer/vignettes/randomForestExplainer.html#variable-interactions).
The troublesome function from {randomForestExplainer} is plot_predict_interaction()
and I get this error:
Error in if (as.character(forest$call[[2]])[3] == ".") { : missing value where TRUE/FALSE needed
I'll swap in the mtcars
dataset for iris
in the help file example for plot_predict_interaction()
, which now looks like:
forest_ranger <- ranger::ranger(cyl ~ ., data = mtcars)
randomForestExplainer::plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp")
The $call
component of the random forest generated by {ranger} for this model looks like:
> forest_ranger$call
ranger::ranger(cyl ~ ., data = mtcars)
Using {spatialRF} to build the ranger model looks like this (sort of a silly example, since the data aren't spatial, but I'll use the non-spatial approach following your very helpful tutorial):
forest_ranger <- spatialRF::rf(dependent.variable.name = "cyl",
predictor.variable.names = c("mpg", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"),
data = mtcars)
Which then gives the error above when trying to call randomForestExplainer::plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp")
.
The $call
component of the random forest generated by {spatialRF} looks like:
> forest_ranger$call
ranger::ranger(data = data, dependent.variable.name = dependent.variable.name,
num.trees = num.trees, mtry = mtry, importance = importance,
write.forest = write.forest, probability = probability, min.node.size = min.node.size,
max.depth = max.depth, replace = replace, sample.fraction = sample.fraction,
case.weights = case.weights, class.weights = class.weights,
splitrule = splitrule, num.random.splits = num.random.splits,
alpha = alpha, minprop = minprop, split.select.weights = split.select.weights,
always.split.variables = always.split.variables, respect.unordered.factors = respect.unordered.factors,
scale.permutation.importance = scale.permutation.importance,
local.importance = local.importance, regularization.factor = regularization.factor,
regularization.usedepth = regularization.usedepth, keep.inbag = keep.inbag,
inbag = inbag, holdout = holdout, quantreg = quantreg, oob.error = oob.error,
num.threads = num.threads, save.memory = save.memory, verbose = verbose,
seed = seed, classification = classification)
Pretty different!
The line in {randomForestExplainer} that causes this is here: https://github.com/ModelOriented/randomForestExplainer/blob/630c4fe9f7ddcc0a9a586dc4c4fc1822e9d30776/R/min_depth_interactions.R#L363
I was able to work around this by overwriting the $call
component in the random forest generated by {spatialRF} like so:
forest_ranger$call <- str2lang(paste0("ranger::ranger(cyl ~ ", paste(c("mpg", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"), collapse = " + "), ")"))
For posterity, this essentially creates the formula for the random forest model by putting together the dependent and independent variable pieces. To make it work with {randomForestExplainer}, you also have to include the package name and function call (ranger::ranger()
) wrapped around the formula. The str2lang()
function was suggested by R as the right way to create an object of class call
after I tried wrapping the character string in just as.call()
which didn't work.
Which then lets me run randomForestExplainer::plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp")
to produce:
So anyway, I'm interested in your opinion about this. Is the approach by {randomForestExplainer} to get the names of the independent variables too fragile (and I can make an issue on their package)? Or is this issue better served by changing how {spatialRF} stores the call
as a component of the {ranger} object? Maybe both?