Friday, May 17, 2013

TL;DR: Bangladesh is not racist.

This is a response to Max Fisher’s blogpost in Worldviews section of Washington Post website. Fisher used data from World Values Survey to show most and least racist countries of the world through map visualization. However, the data Fisher used shows Bangladesh as the second most racist country in the world. Being from Bangladesh, it hit me a bit. So I went to Fisher’s source and had a look at the data. Here’s a short version if you don’t want to read along – Bangladesh isn't racist.

Fisher picked the question that involved responders answering if they do not want someone from a different race as neighbors. (in short, Yes = racist, No = not racist. Very simple. /s) Now, World Values Survey collected data in total five waves or sets which happened at different times. Each time, data was collected from different countries. There is data from total 100 countries with each wave having between 23 to 66 countries. Only 5 countries were surveyed in every wave while 36 countries were surveyed only once. That means for comparison purposes, we have 64 countries that have more than one data point i.e. were surveyed at least at two different time periods.

Now, if we take the only latest data from all 100 countries, we end up with these top countries that rate really high on the racist scale:

So far, so good. However, this is just one data point - only the latest data point for each country.  Among these top 10 countries, 5 countries had more than 1 data point. Looking at those countries, we find that Bangladesh had a significant jump from 1996 to 2002 (below). In 6 years, it went from 17% to 72%. I would assume that is highly unlikely. In those 6 years, there were no major social, political or even economical change in the country that would make 3 people racist out of every 4.


Digging down further, we try to see if any other country outside of the top 10 had such severe increase in the percentage change. If we plot the latest year’s value on x-axis and earliest year’s value on y-axis for all 64 countries with multiple data point, we get a graph like the following. Only some of the countries were labeled here with country name and with the difference between earliest and latest year.

The graph speaks for itself. That point on the far right should not be there. I would definitely like to assume that the data for Bangladesh has some mistake in it.

While working on this, I found a very detailed explanation on how the Bangladesh data was messed up. Go through this post by Ashirul Amin to see how the answer keys were reversed in 2002’s survey. Ashirul was very detailed in his investigation and found out the root cause (with scans of the questionnaire!). Kudos to him!

I enjoyed viewing Fisher's visualization and reading his blogpost. This was a good initiative and a somewhat insightful one. I just wish the data was cleaner or at least someone had tried to explain the data and the extreme values before publishing it. I guess this is just another example of how un-sanitized data can mess up your analysis and headliner findings.

PS: Here's the data if you want to play around with it.