scikit-survival 0.21.0 launched | Sebastian Pölsterl

[ad_1]

Immediately marks the discharge of scikit-survival 0.21.0.
This launch options some thrilling new options and vital efficiency enhancements:

  • Pointwise confidence intervals for the Kaplan-Meier estimator.
  • Early stopping in GradientBoostingSurvivalAnalysis.
  • Improved efficiency of becoming SurvivalTree and RandomSurvivalForest.
  • Decreased reminiscence footprint of concordance_index_censored.

Pointwise Confidence Intervals for the Kaplan-Meier Estimator

kaplan_meier_estimator()
can now estimate pointwise confidence intervals by specifying the conf_type parameter.

import matplotlib.pyplot as plt
from sksurv.datasets import load_veterans_lung_cancer
from sksurv.nonparametric import kaplan_meier_estimator

_, y = load_veterans_lung_cancer()

time, survival_prob, conf_int = kaplan_meier_estimator(
    y["Status"], y["Survival_in_days"], conf_type="log-log"
)
plt.step(time, survival_prob, the place="publish")
plt.fill_between(time, conf_int[0], conf_int[1], alpha=0.25, step="publish")
plt.ylim(0, 1)
plt.ylabel("est. likelihood of survival $hat{S}(t)$")
plt.xlabel("time $t$")
Kaplan-Meier curve with pointwise confidence intervals.

Kaplan-Meier curve with pointwise confidence intervals.

Early Stopping in GradientBoostingSurvivalAnalysis

Early stopping permits us to find out when the mannequin is sufficiently advanced.
That is often achieved by repeatedly evaluating the mannequin on held-out knowledge.
For GradientBoostingSurvivalAnalysis,
the simplest approach to obtain that is by setting n_iter_no_change and
optionally validation_fraction (defaults to 0.1).

from sksurv.datasets import load_whas500
from sksurv.ensemble import GradientBoostingSurvivalAnalysis

X, y = load_whas500()

mannequin = GradientBoostingSurvivalAnalysis(
    n_estimators=1000, max_depth=2, subsample=0.8, n_iter_no_change=10, random_state=0,
)

mannequin.match(X, y)
print(mannequin.n_estimators_)

On this instance, mannequin.n_estimators_ signifies that becoming stopped after 73 iterations,
as a substitute of the utmost 1000 iterations.

Alternatively, one can present a customized callback perform to the
match
technique. If the callback returns True, coaching is stopped.

mannequin = GradientBoostingSurvivalAnalysis(
    n_estimators=1000, max_depth=2, subsample=0.8, random_state=0,
)

def early_stopping_monitor(iteration, mannequin, args):
    """Cease coaching if there was no enchancment within the final 10 iterations"""
    begin = max(0, iteration - 10)
    finish = iteration + 1
    oob_improvement = mannequin.oob_improvement_[start:end]
    return all(oob_improvement < 0)

mannequin.match(X, y, monitor=early_stopping_monitor)
print(mannequin.n_estimators_)

Within the instance above, early stopping is decided by checking
the final 10 entries of the oob_improvement_ attribute.
It incorporates the advance in loss on the out-of-bag samples
relative to the earlier iteration.
This requires setting subsample to a worth smaller 1, right here 0.8.
Utilizing this strategy, coaching stopped after 114 iterations.

Improved Efficiency of SurvivalTree and RandomSurvivalForest

One other thrilling characteristic of scikit-survival 0.21.0 is because of a re-write of
the coaching routine of SurvivalTree.
This ends in roughly 3x quicker coaching instances.

Runtime comparison of fitting SurvivalTree.

Runtime comparability of becoming SurvivalTree.

The plot above compares the time required to suit a single SurvivalTree on knowledge with
25 options and ranging variety of samples.
The efficiency distinction turns into notable for knowledge with 1000 samples and above.

Word that this enchancment additionally speeds-up becoming
RandomSurvivalForest
and ExtraSurvivalTrees.

Improved concordance index

One other efficiency enchancment is because of Christine Poerschke
who considerably lowered the reminiscence footprint of
concordance_index_censored().
With scikit-survival 0.21.0, reminiscence utilization scales linear, as a substitute of quadratic, within the variety of samples, making efficiency analysis on massive datasets far more manageable.

For a full checklist of adjustments in scikit-survival 0.21.0, please see the
launch notes.

Set up

Pre-built conda packages can be found for Linux, macOS (Intel), and Home windows, both

by way of pip:

pip set up scikit-survival

or by way of conda

 conda set up -c sebp scikit-survival

[ad_2]

Leave a Reply

Your email address will not be published. Required fields are marked *