Extreme Imbalanced Data — The Worst Data Scientist Nightmare

And the Accuracy Trap

Carla Martins
CodeX
Published in
3 min readJun 17, 2022

--

Photo by Luke Chesser on Unsplash

We can say that we have imbalanced data when one of the target variable classes has a much lower frequency than the other(s). One common example is data on cancer detection. If we have 10,000 lab results to detect cancer, and we only have a relative frequency of 1% of positive results for cancer, our data is extremely…

--

--

Carla Martins
CodeX
Writer for

Compulsive learner. Passionate about technology. Speaks C, R, Python, SQL, Haskell, Java and LaTeX. Interested in creating solutions.