I googled "correlation causation 2020", and one of the top links
here is an article "Artificial intelligence can spot when correlation means causation" on the University College London website, which is a non-specialist piece about a February 2020 paper (the paper itself, "Integrating Overlapping Dasets Using Bivariate Causal Discovery" is
here). I'm only guessing this is the article mentioned in the New Scientist April 2020 issue, though the author of the New Scientist article (
here, mostly behind a paywall) has the same name as one of the authors of the paper, which would be a remarkable correlation if uncaused.
It seems to be something to do with causes being earlier, and so less complex, than effects, but I don't understand it. Quoting much of the non-specialist piece:
A new artificial intelligence (AI) has allowed researchers at UCL and Babylon Health, for the first time, to demonstrate a useful and reliable way of sifting through masses of correlating data to spot when correlation means causation.
By fusing old, overlapping and incomplete datasets this new method, inspired by quantum cryptography, paves the way for researchers to glean the results of medical trials that would otherwise be too expensive, difficult or unethical to run. The research is being published at the prestigious and peer-reviewed Association for Advancement of Artificial Intelligence (AAAI) conference in New York.
..................
Dr Ciarán Lee, Senior Research Scientist at Babylon and Honorary Senior Research Associate at UCL Physics & Astronomy, explained: "Scientists have it hammered into them that correlation does not mean causation; ice-cream sales don't cause sunburn despite rates of both shooting up during the summer. To find the exact cause of sunburn we whittle down or control as many variables as possible. Then when our datasets show that a change in sun exposure matches a change in sunburn, we can be confident the sun exposure was the causative variable. The problem is the real world is rarely neat and tidy and it can be really hard to control all the variables and work out which is causative."
Scientists started looking for other ways to help spot causative variables. A theory born from physics suggests that everything becomes more disordered and complicated with time, so a cause should be less disordered and complex than its effect. Dr Lee said "If you take your dataset and give each of the variables a complexity rating you can work backwards and spot which one is the cause. But that just helps for that one dataset - we wanted to see if there was a way of combining datasets, ones with gaps or where researchers were asking different questions to what they're interested in now. That could be a game-changer."
Dr Lee was inspired by quantum cryptography. The strange laws of quantum physics mean that two users can send a message and then use a mathematical formula to prove whether someone else is eavesdropping on their conversation. Dr Lee realised that datasets could work in a similar way, but thinking of a potential causative variable from another dataset as the eavesdropper. "If one dataset shows us that obesity causes heart disease, and another shows vitamin D causes obesity we can use a mathematical formula to prove whether vitamin D causes obesity or not. This is what our AI is doing."