Kaplan–Meier survival curves

I t is useful to know how long someone will live or something will last. The probability that a specific subject will survive beyond any specified time is equivalent to the fraction of a group of similar subjects studied who do survive beyond that time and is known as the estimate of the survival function. Many methods for estimating survival functions have been proposed, but the most popular is that described by Edward Kaplan and Paul Meier in 1958. The major advantage of the Kaplan–Meier method is that it makes complete use of even the incomplete information available about the studied group. Incomplete information is common in survival analysis because of right-censoring, when a subject is lost to follow-up before the event of interest occurs as time moves along the survival curve from left to right. For example, in a study of bacterial contamination in apheresis platelets, a unit may be transfused without causing symptoms before bacterial growth is observed. Traditional survival analysis handled censored data either by excluding a censored subject entirely or by analyzing subjects within fixed intervals of time, leading to awkward assumptions and biases. Kaplan and Meier’s insight was to reevaluate the data every time the event of interest occurred, rather than at arbitrary intervals. This makes maximum use of all the information at any time point, even from individuals who will go on to be censored later. The method is not limited to just death as an endpoint but is widely used to calculate other time-to-event information such as relapse-free survival or event-free survival in cancer studies, time on the shelf in studies of inventory, time to failure for mechanical systems, and time to event in behavioral and ecologic studies. Even time can be replaced by a surrogate, such as miles traveled in the life of a tire.

I t is useful to know how long someone will live or something will last. The probability that a specific subject will survive beyond any specified time is equivalent to the fraction of a group of similar subjects studied who do survive beyond that time and is known as the estimate of the survival function. Many methods for estimating survival functions have been proposed, but the most popular is that described by Edward Kaplan and Paul Meier in 1958. 1 The major advantage of the Kaplan-Meier method is that it makes complete use of even the incomplete information available about the studied group. Incomplete information is common in survival analysis because of right-censoring, when a subject is lost to follow-up before the event of interest occurs as time moves along the survival curve from left to right. For example, in a study of bacterial contamination in apheresis platelets, a unit may be transfused without causing symptoms before bacterial growth is observed. Traditional survival analysis handled censored data either by excluding a censored subject entirely or by analyzing subjects within fixed intervals of time, leading to awkward assumptions and biases. Kaplan and Meier's insight was to reevaluate the data every time the event of interest occurred, rather than at arbitrary intervals. This makes maximum use of all the information at any time point, even from individuals who will go on to be censored later. The method is not limited to just death as an endpoint but is widely used to calculate other time-to-event information such as relapse-free survival or event-free survival in cancer studies, time on the shelf in studies of inventory, time to failure for mechanical systems, and time to event in behavioral and ecologic studies. Even time can be replaced by a surrogate, such as miles traveled in the life of a tire.

THE KAPLAN-MEIER/PRODUCT-LIMIT METHOD
Kaplan-Meier estimation follows all subjects from their individual starting times, which are brought together at t 0 (time zero). The probability of being alive at t 0 is, therefore, one. All individuals are followed forward from t 0 until the time the first event of interest occurs to any individual, for example, a death, at time t 1 . At this point the estimated probability of survival beyond t 1 is the number of individuals who survived beyond t 1 divided by the number who are known to have survived up to t 1 . This fraction is multiplied by the probability of survival at the last time point, in this case the probability at t 0 , and the resulting product is the new estimate of the survival function. This process is repeated at t 2 and every subsequent event point, so the estimate of the survival function becomes smaller with time. 2 If any individuals are lost to follow-up, censored, in the interval from one event time to another, they are left out of the calculation in that and every subsequent multiplication step. The estimate of the survival function can be plotted as the familiar-looking step curve (see example in Fig. 1). If an individual is lost to follow-up, a censor mark (small vertical line) may be added to indicate the time when an individual observation stopped. The finer the time resolution in the data and the larger the data set, the smoother the curve becomes. The variance at each point in the curve can be calculated, usually according to a formula proposed by Greenwood, 3 and confidence limits can be plotted on either size.
Differences in survival curves between two groups can be examined statistically. Such analysis may compare differences in how long each group survived, or what fraction of each group survived to a certain time. Examples of the former are frequent in cancer trials: "Treatment with drug A increased median progression-free survival by x months." Examples of the latter are frequent in vaccine trials: "Disease-free survival was increased by x% in the first year after administration." Since an estimate of the survival function is unique to its group, parametric statistical tests (which make assumptions about the shape of the underlying distribution) are inappropriate where flexible and data-driven alternatives are available. The most commonly used statistical test for comparing Kaplan-Meier curves is the log-rank (or logrank), a nonparametric test that can make use of right-censored data. The use of high-quality statistical software and a statistician fluent in its use are strongly recommended.
Multivariable survival models can be constructed, and the most commonly used is the method proposed by Sir David Cox. An important feature of Cox models is the proportional hazards assumption. Briefly, this assumes that all individuals share some underlying risk of the outcome, and that this risk is independent of time and any other covariates under study. Several methods exist to test the proportional hazards assumption and therefore the validity of a Cox model.
Even with good statistical technique, bias and ambiguity in the data is a major problem with survival analysis. Survival in disease might start from the time of diagnosis, but what do you do with people who die and are diagnosed after death? Survival with treatment might start at the beginning of therapy: What do you do with patients who die after the decision to start therapy but before the start of therapy, or who die from biologically unrelated causes (so-called "competing risks")? A more detailed discussion of issues in survival analysis may be found in leading epidemiology texts. 2

A HISTORIC EXAMPLE FROM TRANSFUSION
Schinaia and colleagues reviewed the survival of patients with transfusion-transmitted human immunodeficiency virus/ acquired immune deficiency syndrome (HIV/AIDS) in Italy in an article in TRANSFUSION in 1993. 4 Specifically, they were interested in the survival of these patients after they developed transfusion-transmitted AIDS and whether it was different from patients who acquired HIV from other routes. They did not find significant differences in outcome by route of transmission; however, they did find differences by year of diagnosis. Figure 1 shows the results of their analysis of the effect of year of diagnosis of AIDS, dividing the population into those who developed AIDS before 1987, the year that the first drug therapy became available, and those who developed AIDS after that time. The survival curves begin to diverge at about 3 months, and by 30 months three times as many patients were alive.

A MORE RECENT EXAMPLE
Stanworth and his colleagues performed the Transfusion of Prophylactic Platelets Study comparing prophylactic platelets with no prophylactic platelets in adults with hematologic malignancy and published their results in the New England Journal of Medicine in 2013. 4 They enrolled 600 patients with hematologic malignancies and without evidence of World Health Organization (WHO) Grade II or greater bleeding and randomized them to receive either an adult dose of prophylactic platelets when their first-morning  Volume 60, April 2020 TRANSFUSION 671 platelet count was 10 × 10 9 /L or less or no prophylactic platelets. Patients were then followed through a course of chemotherapy or stem cell transplant for the development of new WHO Grade II or greater bleeding. Figure 2 shows the results of that clinical trial. The Kaplan-Meier curves are essentially superimposable over the first 5 days, diverge over Days 6 to 10, and then remain with an approximately 7% overall reduced rate of new WHO Grade II bleeding that persisted until the end of 30 days. The primary outcome of the trial was that the hazard ratio for bleeding was 1.3 times greater for patients who did not receive prophylactic platelets, which was statistically significant as expressed by confidence limits greater than 1 and a probability less than 0.05.

SUMMARY
Kaplan-Meier curves are an efficient and effective visual way to present information on time-to-event situations. From a user point of view, the most import aspect is to clearly understand the population being described.