Why machine learning is a red flag when it comes to chronic illness research

Groups like Bruce Patterson / IncellDX and Amatica Health are promising machine learning approaches to analyzing the data.

There is no issue if they have hundreds of thousands of patients providing data. However, there is an issue if there are only tens of patients providing data. This is because data efficiency is a known problem in the machine learning world.

Noam Brown, an OpenAI researcher, explains data efficiency.

I think there’s General agreements that one of the major issues with AI today is that it’s very data inefficient

It requires a huge number of samples of training examples to be able to train. Look at an AI that plays go- it needs millions of games of go to learn how to play the game well… whereas a human can pick it up in like-

I don’t know how many games does a human Go player, Go Grand Master, play in their lifetime?
Probably you know in the thousands or tens of thousands I guess.

So that’s that’s one issue overcome efficiency overcoming this challenge of data efficiency and this is particularly important if we want to deploy AI systems in real world settings.

Actual machine learning in practice

There are many healthcare problems where you can get massive amounts of data. For example, doctors will do X-Rays and MRI scans to look for cancer or other health problems. Machine learning and deep learning techniques have been tried for analyzing these medical images.

Both IBM and Google have stopped their efforts in this space, despite a lot of machine learning expertise. IBM was a pioneer in chess-playing computers (Deep Blue if you’re old enough to remember it) and in crushing humans in Jeopardy. Google’s Deepmind unit was crushing humans in Go, Starcraft, etc.

Maybe the future will be different. (I personally think that computers will eventually get there.) However, the sailing has not been smooth so far.

Further reading:

Techniques other than machine learning

I wouldn’t consider IncellDX’s approach to be ‘deep learning’ or ‘machine learning’, even though their paper cites other academic papers where deep learning is used.

Let’s just say this: there are old statistical analysis techniques that you can simply rebrand as machine learning.

Basically, they’re doing statistical analysis and many ML (machine learning) researchers would not consider what they’re doing to be ML. It’s misleading and that’s why patients should be careful.




*Note: IncellDX has nothing to do with incels (incel = involuntarily celibate; an individual who believes that society is conspiring against them to prevent them from having sex).