Mortality Analysis – The Next Level

 

I just returned from Seattle, where I was attending the AAIM Triennial Conference. For the 3rd time, I taught the Basic Mortality Methodology Course. It went very well and was well received by the group. Further, there were a couple of excellent sessions on analytics and big data applied to life insurance.

During this conference I was asked by several attendees and colleagues how they might further their study and understanding of mortality analysis. Some even broaden that to ask about data analysis in general, because it is becoming a more prominent skill in the world of life insurance.

So, in this post I will cite some of my favorite resources. Some are aimed at delivering a high-level understanding of analytics, while others are more technical, teaching you how to actually perform the analyses.

A lot of the books I mention are available as a free pdf or as a ‘pay what you want’ download. This is a testament to the open nature of much of the community that uses R and other types of free software. If you can, please consider paying something so that these fantastic resources continue to do their excellent work.

First, a bit of self-promotion. This is a link to 3 videos which I made for a seminar called “Mortality Analysis with Modern Tools”. It contains a refresher on basic, ‘classical’ mortality analysis, a tutorial on the use of SEER*Stat to gather cancer survival data, and a brief introduction to R, the free statistical programming language, and R-studio, a software tool which makes using R much easier.

The password is AAIM2016.
Books:
Life Expectancy in Court :  This book is a very clear, concise and easy to understand analysis of the use of actuarial analysis in the determination of life expectancy in a legal setting. It relies only on pencil-and-paper methods which are easily translated to spreadsheet tools.
Intro to Statistical Learning (this site has the pdf as well as links to slides and videos):  This is a methodological textbook, and a toned-down version of the very academic text “Elements of Statistical Learning”. It contains tons of great examples with R-code and detailed vignettes. There is a set of free videos by the authors and a robust community of folks who have attempted and completed the exercises at the back of each chapter.
Reckoning with Risk . This one is more for a lay audience – great for underwriters but also very clarifying for anyone else. The author really brings home the difference between common measure of test performance like specificity and sensitivity – and the more important measure of real world use like positive predictive value. Definitely a must read if you break into hives when someone shows you a 2×2 table.
R for Data Science. A bit more technical here – this one is not about statistics, but rather the manipulation, cleaning and display of data which is so integral to any analytic endeavor. I highly recommend the ‘tidy’ approach as is outlined in this text.
Courses:
Coursera Data Science: This is a series of 10 online courses which are available for a nominal fee. They provide an excellent introduction to the use of R as well as a fairly detailed look at the statistics underlying typical analytic methods like linear and logistic regression. I would say that 7 or 8 out of the 10 are fairly easy – though some have time-consuming homework. The others are pretty challenging. I leave it up to you to decide which ones. The authors include several biostatiticians from the Johns Hopkins School of Public Health.
Chromebook Data Science . I have not taken these, but I plan on checking out at least a couple of them. It is given by Jeff Leake – one of the aforementioned JHU biostatiticians, and focuses on the use of a simple web-enabled laptop to perform serious data analysis by making the most of web-based resources like AWS.
Regression Modelling Strategies: This is a book, but also a short (one-week) course offered at Vanderbilt University by the eminent statistician Frank Harrell. This is one if you are feeling like you might actually know something, and are ready to find out otherwise. I took this one a few years ago – it was excellent, but also very challenging.  Dr. Harrell also has a great blog and his book is an excellent resource.
There are so many more of these it would be folly for me to attempt to review them all.
Websites:
R-bloggers: A repository of web log posts from around the internet which deal with R and various other statistical issues. Their top-ten list is very useful, but you can also discover other nuggets of gold on any given visit.
KD nuggets: Speaking of nuggets, this site offers a wide array of articles relevant to data analysis. It’s scope is broader than R-bloggers so you may need to dig a little deeper to find what you want. But FYI they have much more comprehensive reviews of courses and books than I am offering here.
Personal Activity:
I can’t emphasize this enough. The best way to gain and retain mortality analysis skills is to actually practice those skills on real world data. So go ahead – analyze your department’s workflow, analyze the mortality risks you find in a relevant article, assign yourself the task of updating your company’s underwriting manual on a topic that interests you, or write a mortality abstract for publication. But find something that takes the task out of the theoretical realm and into the real world. You will find that your knowledge of the topic does the same thing.

Book Review 2: Predictive Analytics

Another book review. Well, I had to give some talks recently – one on predictive analytics and another on genetics – and so I read The Gene by Dr. Siddhartha Mukherjee which I reivew here, and Predictive Analytics by Eric Siegel.

I really admire how both of these authors can cover what are really very technical topics in plain language. Where Dr. Mukherjee’s style emphasizes scientific rigor, inspiration and wonder, Mr. Seigel tends more toward the astounding or the entertaining. He covers multiple relevant topics and uses copious examples to illustrate predictive analtyics an the insights it can provide. He goes into especially great detail with IBM Watson’s performance on the game show Jeopardy!. There is quite a lot to unpack in that example from language processing to appropriate seleciton of candidate answer to the arrival at a final probabity of being correct.

Additional technical topics included linear and logistic regression, decision trees and ensembles.

Overall this is an excellent book and would be very valuable for those who need to use the results of predictive analyses in their business. It will not, however, enable you to perform these analyses yourself – it is just not that kind of text.

Book Review : The Gene

Because of some recent plane travel and a conference at a golf resort (I don’t play golf) I had some down time in which to catch up with my reading. I had been working on The Gene: An Intimate History, by Dr. Siddhartha Mukherjee. It is a 600 + page epic which means it does not travel well in its native, paper form. Actually, that is the worst thing I can say about it.

The book takes a deep, wide look at the history of genetics from the very beginning of man’s study of heredity (yes, there are pea plants) to the damaging social effects of mid-century eugenics. What I found eye-opening here was the widespread enthusiasm in America for eugenics by forced sterilization at the time. This was illustrated by the famous Buck vs. Bell case where none other than the luminary judge Oliver Wendell Holmes ruled that the forced sterilization of the “weak minded” was permissible and did not violate the 14th Amendment.

This was one of a few chapters in the book that dealth with the wider social impact of heredity and genetics. Mostly, though the book is concerned with the history of scientific discoveries. Considerable time is spend with Mendel, Darwin, Crick, Watson, Franklin, and Berg as well as modern-day innovators like Venter, Collins and Doudna (developer of the CRISPR-Cas9 technology).

The overall tone of the book is one of cautious optimism, but it does not fail to point out significant concerns about the under-regulated use of CRISPR on human fetal tissue or germ cells. The author also points out that genomic science is still in its infancy, and that the complex interplay of multiple genes in disease is as yet poorly understood.

Overall, I found this to be a riveting look at nearly all aspects of genetics. It is, perhaps a difficult read for those without a background in science or medicine – but only for a few particularly dense pages.