I follow r-bloggers regularly. This post about survival analysis popped up recently. It is a very concise and readable intro to the topic – dealing mostly with theory, and then just touching on the commands in R used to implement survival models. There are several references as well if you want to chase that particular rabbit down that particular hole.
Breast Cancer Survival in the SEER Data
SEER, a division of the National Cancer Institute, records and tracks cancer cases in 18 mostly urban areas across the country, spanning nearly 60% of the population. They have been doing this since 1973 and data is available for the asking.
Since my prior publication in the Journal of Insurance Medicine on the impact of micrometastases in breast cancer survival, I have been waiting for the SEER data to age enough to determine if immunohistochemically detected tumor cells in the lymph nodes (so-called “isolated tumor cells” or ITCs) actually impact prognosis.
Recently I looked into the data and was pleased to find that there were 1379 cases with ITCs among all stage I or II (AJCC 6th edition) with no nodal ‘macro’ metastases (N0), and no distant metastases (M0). There were another 22,731 who had been tested and were negative for ITCs. Additionally, from 2004 forward there were 36,530 who had not had the testing done.
I used Cox models to evaluate the possible risks of these ITCs. In each of then I used restricted cubic splines for age, and included sex and T stage as co-variates. The findings were pretty surprising. When the women who had not had testing were included, both positive and negative ITC tests were protective (HR 0.67 and 0.72 , respectively).
Since this could have been due to ‘informative missing’ – meaning the test was not done because of good prognosis or some other beneficial factor not related to the other co-variates, I tried another fit with only women who had the test done. This really did not change anything – the group with a positive test had a HR of 0.94 compared to the group with negative testing – an insignificant difference (p=0.68).
One obvious factor missing from this analysis is treatment. It is quite likely that the women with ITCs were treated more aggressively than their counterparts who had no testing. Nonetheless, the results here imply that, within the current milieu of testing and treatment, women with ITCs do just as well as women without them, and better than those who were never tested.
You can view my R-code here and my SEER*Stat query here (on my Google drive site – you may not be able to navigate here if behind a firewall). If I can expand this out a bit more it may be the basis for a future JIM submission.
It Begins
Today begins my stint as a consultant. It took some time, a lot of anguish, and some sacrifice, but I am really looking forward to the challenge. I am fortunate to have met many great folks in the life insurance industry, and now I will (hopefully) have the opportunity to work with even more of them.
My intent for this site is to post fairly frequently on topics related insurtech, mortality studies, survival analysis, R and R-studio, other whatever else may come around.
One of my first priorities for the site will be to develop a list of resources for those who want to learn more about mortality analysis. I will post it here once it is ready.