Survival analysis of
Skin Cutaneous Melanoma[SKCM]
A case study of The Cancer Genome Atlas,TCGA
作者
Kili
发布于
2024-08-14
摘要
癌症基因组图谱计划(The Cancer Genome,Atlas,TCGA)是美国的国家肿瘤研究所(National Cancer Institute,NCI)与国家人类基因研究所(National Human Genome Research Institute,NHGRI)共同开展的一个广泛而综合的合作计划。其目标是获取、刻画并分析人类癌症中大规模、多种变异的分子特征,且为癌症研究迅速地提供数据。选取发病于皮肤的皮肤黑色素瘤样本[SKCM],即Skin Cutaneous Melanoma作为生存分析对象.分析不同样本特征下的死亡率差别.
Melanoma is a cancer in a type of skin cells called melanocytes. Melanocyes are the cells that produce melanin, which colors the skin. When exposed to sun, these cells make more melanin, causing the skin to darken or tan. Melanoma can occur anywhere on the body and risk factors include fair complexion, family history of melanoma, and being exposed to natural or artificial sunlight over long periods of time. Melanoma is most often discovered because it has metastasized, or spread, to another organ, such as the lymph nodes. In many cases, the primary skin melanoma site is never found. Because of this challenge, TCGA is studying primarily metastatic cases (in contrast to other cancers selected for study, where metastatic cases are excluded). For 2011, it was estimated that there were 70,230 new cases of melanoma and 8,790 deaths from the disease
```{r}termplot(result, se = T, terms =1, ylabs ="Log hazard")```
代码
```{r}cox_fit <- result# Predictions will be for all levels of sex and ph.ecog, but only at the median# age.new_dat <-expand.grid(age_at_diagnosis =median(skcm$age_at_diagnosis),tumor_status =levels(skcm$tumor_status),clark_level_at_diagnosis =levels(skcm$clark_level_at_diagnosis), submitted_tumor_site.1 =levels(skcm$submitted_tumor_site.1)) %>%# strata is our key to join back to the fitted values.mutate(strata =as.factor(row_number()))# Create fitted survival curves at the covariate presets.fit_curves <-survfit(cox_fit, newdata = new_dat, data = skcm)# `surv_summary()` is like `summary()` except that it includes risk table info,# confidence interval attributes, and pivots the strata longer.surv_summary <-surv_summary(fit_curves) %>%# The cases are labeled "strata", but `survsummary()` doesn't label what the# strata are! Get it from new_dat.inner_join(new_dat, by ="strata")# Now use ggplot() just like normal.median_line <- surv_summary %>%filter(surv >= .5) %>%group_by(tumor_status, clark_level_at_diagnosis, submitted_tumor_site.1) %>%summarize(.groups ="drop", max_t =max(time))surv_summary %>%ggplot(aes(x = time, y = surv)) +geom_line(aes(color = clark_level_at_diagnosis)) +geom_ribbon(aes(ymin = lower, ymax = upper, color = clark_level_at_diagnosis, fill = clark_level_at_diagnosis),alpha =0.4 ) +geom_segment(data = median_line,aes(x =0, xend = max_t, y = .5, yend = .5),linetype =2, color ="gray40" ) +geom_segment(data = median_line,aes(x = max_t, xend = max_t, y =0, yend = .5, color = clark_level_at_diagnosis),linetype =2 ) +facet_wrap(facets =vars(tumor_status, submitted_tumor_site.1)) +scale_y_continuous(labels =percent_format(1)) +theme_light() +theme(legend.position ="top") +labs(X ="days", y ="Survival Probability", color =NULL, fill =NULL,title ="Cox fitted model",subtitle ="Age held at median." )```
gender 分1,2 submitted_tumor_site.1分1,2,3,4
4 模型验证
4.1 Assessing Goodness of Fit Using Residuals
4.1.1 Martingale and Deviance Residuals
代码
```{r}smoothSEcurve <-function(yy, xx) {# use after a call to "plot"# fit a lowess curve and 95% confidence interval curve# make list of x values xx.list <-min(xx) + ((0:100) /100) * (max(xx) -min(xx))# Then fit loess function through the points (xx, yy)# at the listed values yy.xx <-predict(loess(yy ~ xx),se = T,newdata =data.frame(xx = xx.list) )lines(yy.xx$fit ~ xx.list, lwd =2)lines(yy.xx$fit -qt(0.975, yy.xx$df) * yy.xx$se.fit ~ xx.list, lty =2)lines(yy.xx$fit +qt(0.975, yy.xx$df) * yy.xx$se.fit ~ xx.list, lty =2)}skcm1 <-select(skcm, age_at_diagnosis, tumor_status, clark_level_at_diagnosis, submitted_tumor_site.1)rr.final <-residuals(cox_fit, type ="martingale")par(mfrow =c(1, 2))plot(rr.final ~ skcm1$age_at_diagnosis)smoothSEcurve(rr.final, skcm1$age_at_diagnosis)title("Martingale residuals\nversus age")plot(rr.final ~ skcm1$tumor_status)title("Martingale residuals\nversus tumor_status")```
```{r}skcm2 <- skcm |>mutate(months =floor(days /30))d <-datadist(skcm2)options(datadist ="d")f <-cph(formula =Surv(months, status) ~ age_at_diagnosis + tumor_status + clark_level_at_diagnosis + submitted_tumor_site.1, data = skcm2, surv = T)med <-Quantile(f)surv <-Survival(f) # This would also work if f was from cphplot(nomogram(f, fun =function(x) med(lp = x), funlabel ="Median Survival Time"))nom <-nomogram(f,fun =list(function(x) surv(6, x),function(x) surv(12, x),function(x) surv(24, x),function(x) surv(36, x),function(x) surv(72, x) ),funlabel =c("6-Month Survival Probability","12-month Survival Probability","24-month Survival Probability","36-month Survival Probability","72-month Survival Probability" ))plot(nom, xfrac = .7)```
5.3 calibration plot
代码
```{r}#| fig-cap: Calibration curve for 2 year survival. X-axis shows the nomogram predicted probability, while the Y-axis is actual 2-year survival as estimated by the Kaplan-Meier method.d <-datadist(skcm)options(datadist ="d")f <-cph(formula =Surv(days, status) ~ age_at_diagnosis + tumor_status + clark_level_at_diagnosis + submitted_tumor_site.1, data = skcm, x =TRUE, y =TRUE, surv =TRUE, time.inc =365)pa <-requireNamespace("polspline")if (pa) { cal <-calibrate(f, u =365, B =20) # cmethod='hare'plot(cal)}```
Using Cox survival estimates at 365 Days
Calibration curve for 2 year survival. X-axis shows the nomogram predicted probability, while the Y-axis is actual 2-year survival as estimated by the Kaplan-Meier method.