The latest issue of JAMA arrived in my snail-mail box this week and it contained an editorial about machine learning in healthcare. This was, no doubt, in follow-up, to a recent issue of JAMA which contained 2 articles about deep learning algorithms which performed well in the detection of diabetic retinopathy and metastatic cancer.
This editorial does an excellent job of balancing the exciting potential of such algorithms with realistic expectations. Futher it succinctly encapsulates the various tiers of algorithmic complexity. Perhaps most interestingly the editorial contains a figure which maps out many familiar and not-so-familiar algorithms into a chart of human involvement vs. data volume. One of these cited algorithms is a study of coronary risk scoring algorithms developed by examining the electronic records of primary care patients in England. Authors used several techniques (random forest, logistic regression, gradient boosting, and neural networks) and compared performance to a classic risk score from the American Heart Association. They found that all of the machine learning algorithms out-performed the risk score. The intriguing part to me was that logistic regression performed nearly as well as neural networks, and quite a bit better than random forest. Two reasons why that is interesting: 1) I am a nerd, 2) neural nets and random forest are notoriously ‘black box’, while logistic regression is very clear in how the risk prediction is affected by the input variables (so-called ‘glass box’).
The bottom line is this: it is possible to find models which perform extremely well and that still maintain clarity.
You can find the full article here.