Data Visualization: The Wild West of Complexity
By Avolyn Fisher
For many people, looking at numbers and data can be overwhelming. As a data scientist, visualization is a key way of communicating your data to an audience that may not be as comfortable with numbers or raw data. Visualization can also be used in the analysis process as a way to see what your data is trying to tell you. Depending on how comfortable you are with data; visualization; statistics; information can either be fun or frustrating.
Why is data visualization so important?
We recently attended a presentation by Elijah Meeks from Netflix, that was given at the University of San Francisco as part of their Masters in Analytics speaker series. His talk covered a topic that is of growing importance but has yet to be written about extensively from an authoritative perspective. As Elijah pointed out, we don't have a clear defined rule book on complex data visualization. At times, it can seem as though the world of data visualization is the wild west, so we wanted to share some key takeaways, and best practices to help you improve your complex data visualization skills.
"Visualizations act as a campfire around which we gather to tell stories." - Al Shalloway
What are some of the key considerations a data scientist should think about when deciding how to visualize their data?
Ease of Understanding
Above all, your data visualization should be easily understood by your audience. Complexity for the sake of complexity is not good and is generally frowned upon within the data science community. What will your audience take away from your message? When most of your audience doesn't agree, doesn't know, or is too embarrassed to even attempt to understand your visualization, it will make them extremely uncomfortable and your message will be lost.
Precision vs. Context
In general, the more complex your data visualization becomes, the more likely you are to lose precision. But in some cases, this is ok if the added complexity is enhancing your overall message. In many cases, additional complexity allows us to add context otherwise not provided through simplistic data visualization. So ask yourself, is the complexity worth the sacrifice in order to show additional context to your audience. For example, a Sankey Diagram can show the flow of energy resources, such as the case in this visualization from the UK Department of Energy and Climate Change. This diagram tells a much bigger story than a simple bar chart depicting energy sources. We can see not only where our energy is coming from but where it is later being used.
"The greatest value of a picture is when it forces us to notice what we never expected to see." - John Tukey
Convey Your Expertise
In addition to supplying context, complexity allows a visualization to provide the additional expertise that the data scientist or area expert often knows, but isn't displayed in a simple visualization. In the Sankey Diagram example, the energy expert would know the full story behind how energy is being used, but unless communicated, that expertise would not translate over through a simple bar chart. One way to tell if your visualization is conveying your expertise is whether or not your visualization requires a long description or explanation to go along with it. This might mean your visualization isn't as complex as it should be to tell the full story.
Draw Them In
Ultimately, you want to draw the audience into your graphic. Odds are if the person looking at your data visualization isn't drawn into it, they won't use it. This relates to the theory behind the Attention Economy, a part of Economic Theory that treats people's attention as a scarce commodity and impacts the approach we take in communicating information. One way to draw your audience in is to delight your audience. Animating our visualizations, making them interactive, or adding gifs are just a few ways to delight through data visualization.
Often your initial designs will be conservative. Focus on the initial ask with the data. Once you've satisfied the ask, then you can be more creative and add some unasked, but valuable information or explore additional questions you come across in your analysis.
Additional Dos and Don'ts
- Annotate whenever you can - adding labels when possible or data dictionaries as necessary will often aid in the understanding of your data visualization.
- Push boundaries but have a frame of reference - start with what is needed from a basic data visualization level and then offer additional complexity as an optional view until you gauge your audience. Offering multiple visuals for comparison will help you be able to tell if a visualization is good or bad.
- Expect failure - figure out what caused your message to fail and teach others data literacy through visualization
- Overwhelm your audience - focus on main points, sometimes less color is more.
- Use angles or sweeping lines if your data has a numerical focus.
- Ignore proximity - your audience will subconsciously draw conclusions based on the proximity of data points or an order within a chart.
We hope you found this content useful and if you want to hear more we encourage you to check out Elijah's full hour-long talk here.