Far too many infographics are more about graphics than info. Visually interesting data is often prioritized over visually accurate data. As an extension to my earlier thoughts about the growth of infographics, I wanted to offer a little more scrutiny on how data points are being shown.
“Facts are stubborn things, but statistics are more pliable.”
~ Mark Twain
Let’s take journalism as an analogy. Journalists hold themselves and their peers to a renowned standard of integrity and ethics. With this in mind, imagine you’re reading a recap of a soccer match. In it, a player is quoted talking about the goalkeeper, saying…
“I thought he did a tremendous job for us out there. We were down 3-0 early and he shook it off. He was perfect for us the rest of the match and we were able to battle back for a huge win.”
But later you heard an audio recording of his actual words, which were…
“I thought he did a great job for us out there. We were down 2-0 early and he shook it off. He was solid for us the rest of the match and we were able to battle back for a big win.”
Though it sounds more dramatic, that first version is just wrong. He didn’t say “tremendous.” He said ”great,” which isn’t as strong. And “perfect” is clearly an adjective of higher degree than “solid.” The basic message is the same in both versions, but one is true and one isn’t. This just isn’t done in their profession. But data visualizers, unintentional or otherwise, do this all the time. Here’s how.
Nonlinear data scaling
One of the most common blunders is distorting a data point’s relative magnitude by using a visual device that does not scale the way the designer thinks it does. Take a look at the example below.
All this needs to do is illustrate the percentages in true fashion. But it doesn’t do that. This graphic makes it appear as though calls to “receive product or service” are seven times as frequent as “file complaint” calls (it’s slice is about seven times larger). But that’s not the case at all. The “receive product or service” calls are only about three times as frequent as reported by the data.
Not that bad, you say? Then have a look at this sham.
Wow, having any kind of accent at all seems to be the kiss of death when it comes to credibility. This old trick resets the baseline value as something other than zero. Here it’s 6.3. Combined with the nonlinear scaling from above, this shenanigan really takes things out of whack.
Using proportional devices and showing true distance from zero shows us a much more accurate visual interpretation of reality.
Now it seems that callers who check their order status or file a complaint aren’t so insignificant anymore. And those accents? Not quite the detriment they once seemed.
The distortion factor is my calculation based on Edward Tufte’s “lie factor” as he describes in his book, The Visual Display of Quantitative Information, 2nd edition. A factor of 3.0 means the visual representation of the data appears three times larger (or smaller) than the real data itself.
Even Jess3 is not immune to data distortion. Here’s their display of the the ILO convention average score for policy and practice (by continent). It serves as an example of what happens when data is pinned to a circle’s diameter instead of its area.
This graphic is comparing law to reality. The inner circle is the policy score, the outer circle is the practice score. The greater the size disparity between them, the more the law doesn’t reflect reality. Looking at Africa gives us a quick accuracy gauge. If the inner policy circle represents one unit, how many of those do you think it takes to fill up the practice circle? I’ll save you the arithmetic — it’s about 14. Way more than 3.7. The way it’s presented here makes the policy and practice look impossibly distant.
The Americas don’t fare all that much better (2.3 and 5.8). Here’s how the real data compares with the distortion in both bar and circular graph form.
The large (5.8) circle consumes about 6,300 pixels per data unit. Applying that same scale to the smaller (2.3) circle results in a much different, and truer, result.
Reflection not invention
Data should be reflected, not invented. And by reflected, I don’t mean from a fun house mirror. In this last example from GOOD.is, the size of the pie charts correctly reflect the value they represent with no distortion.
The circles grow in area at approximately 3,500 pixels per $1 billion. The image still scores high on visual appeal. But it stays virtuous. I can clearly see that, compared to other forms of energy, the federal government is heavily subsidizing fossil fuels to keep them cheap for us to consume.
If you’re creating infographics, show the info as it is. Be pedantic. Care about accuracy. If something has a value of 5.1, show it as 5.1. Labeling it is an ancillary detail compared to the power of the data point’s visual manifestation. We process shapes before text. The data points should never be thought of as approximations — 43.7 doesn’t mean “a lot bigger than” 4.1. It’s 10.66 times the size of 4.1. Nothing more, nothing less.
Certainly not all areas of visual communication need to practice such a purist approach. But when a designer is adding a presentation layer to a set of data, are they not obligated to show the data with precision? Should the standard really be any less than that of journalism?







Feedback
5 insightful responses to How to Distort Data
Tweets that mention How to Distort Data | Aaron Weyenberg -- Topsy.com
Thanks for writing this, Aaron. Great post. Just last week this link was sent around as an example of great infographics: http://www.behance.net/gallery/Information-graphics-in-context/924345
But to me, it puts the data so far in the background that there’s very little “info” left.
From a theory perspective, I think the best way to think about this is Tufte’s concept of data-ink ratio:
A large share of ink on a graphic should present data-information, the ink changing as the data change. Data-ink is the non-erasable core of a graphic, the non-redundant ink arranged in response to variation in the numbers represented.
More on that here: http://www.informationweek.com/news/software/bi/showArticle.jhtml?articleID=49400920
All the examples you showed suffer from a very low data-ink ratio. I hope we can somehow get back to Tufte’s ultimate wish: “Above all else show the data.”
Thanks for the comment, Rian.
Though he coined the term in the pre-web era, Tufte’s data-ink ratio is a good measure of unnecessary “chartjunk.” I do believe there are ways to let the data be the star of the show, but still contain them in visually engaging and energetic designs.
The growing problem I see is when data is given a layer of design & art to make a possibly boring story very sensationalized. It’s akin to news media puffing up a bland story to get you to watch their broadcast. But in many cases, the story doesn’t need sensationalizing; a compelling story already exists in the data. It’s the designer’s duty to reflect that story without altering its magnitude.
But unfortunately it’s been revealed that beautiful visualizations (and not necessary accurate, honest, or even meaningful ones) perform very well in terms of clicks and distribution. That a leading factor driving this malpractice in my opinion.
Graphics, info-graphics and flat-out dubious statistics «
Hey Aaron,
Good stuff here, thanks for posting it. I’m currently working on an infographic for my master’s thesis – it maps P2P car sharing platforms around the world. You can see it here (http://p2pcarsharing.us.com/infographic-the-world-of-p2p-car-sharing/). I’m in the process of converting the circles’ to represent their values based on area instead of diameter (as they do now). Do you think it’s ever appropriate to set zero to a different value so there’s more visual difference? In particular for the visualization in the top right – ‘passenger vehicle ownership by country. The values are relatively similar and therefore when the circles are representing the values by area it’s hard to see any difference.
What are your thoughts on using circles with no fill (just outline)? Does that somehow imply that the viewer should compare the diameter as opposed to the area contained within. Just a thought.
thanks again, mark