2017: The Year of Data Literacy

By Avolyn Fisher

"Data is for everyone" - IBM Watson Analytics Blog

Researchers at the University of California -San Diego estimate that each day our brains receive the equivalent of 34 GB of data. This amount of data would overload even a powerful laptop within a week. This data includes media across various channels not just from our phones and computers, but through television, radio, text message, video games, etc. This amount of data is equivalent to 100,000 words of information each day (both written and audio). Over a five day period, that equates to reading the Lord of the Rings trilogy which contains 481,103 words.

But what exactly is data? A buzzword that's been populated over the last few years? Your phone plan? Simply put, data is information. It can be both qualitative (descriptive) or quantitative (numerical).

It's safe to say that data is everywhere. No matter who you are, where you live, what you do for a living, you interact with and consume data on a daily basis.

For that reason, many people believe that the future isn't going to belong to those who know how to code, it will belong to those that are data literate. Many people will tell you that they're not comfortable working with numbers. I've heard this numerous times within the last year. But in the words of IBM Watson Analytics, data is for everyone.

Data Literacy is defined as, "the ability to read, create, and communicate data as information," or put more simply, healthy skepticism. It's the same skepticism that helps you detect fake news articles, misleading news headlines, political statements, or whether a retail price is actually a good deal or not. Data literacy is also defined as, "the ability to see the big picture: competency in finding, manipulating, managing, and interpreting data, including not just numbers but also text and images."

How can you boost your data literacy in 2017?

 

Sharpen your ability to handle BASIC math and statistical concepts

You don't need to go back to school and get a PhD in mathematics or an actuarial degree. Simple division and multiplication will go a long way. In addition to basic multiplication, division, a basic understanding of simple statistical concepts. Douglas Hofstadter coined the term innumeracy, which is parallel to the term illiteracy. However, under Hofstadter's definition someone who is innumerate can still do math but has an inability to reason with numbers. Innumeracy is most common when dealing with percentages, averages, and changes. Being data literate is therefore the opposite of being innumerate.

Check for sources and understand context

Any piece of information or data that you consume should be scrutinized. Consider yourself a member of data TSA, sifting through information the way TSA sifts through your luggage at the airport. Who wrote and delivered the book, article or podcast you're reading or listening to? What sources do they reference and what potential bias' could those sources have? What methods were used in gathering their data, did they have an appropriate sample to even base their conclusions? One common issue in sampling is present within psychological studies which are often conducted on undergraduate students currently studying psychology. According to an article published in the New York Times by Anand Giridharadas, "A randomly selected American undergraduate is 4,000 times likelier to be the subject of a psychological study than a random non-westerner."

Even in the workplace, pulling reliable data is still a challenge because we haven't figured out how to get the most out of the data we have. Additionally, when we review other peoples data results, we often can't apply them to our own situations due to context and difference in variables; this is why understanding the context of data is so important.

Ask questions and look for answers

Curiosity is your friend. While it might have killed the cat, it's most likely going to be what saves you from falling victim to false advertisement, empty campaign promises, or misleading news. Many times in the world of data science, a new project can't even begin until you've identified the question you're wanting to answer. But, questions aren't just important in the beginning; after you've gotten your results it's often appropriate to ask, "So what now?" or "What else is the data trying to tell me?"

Beware of potential bias

We can't help it; we're all bias in some way. It would be impossible to be perfectly objective and unbiased for the pure fact that much of our bias comes through subconsciously.

One of the key skills necessary to be an effective data analyst is the ability to be an effective storyteller. It's often included in data related job descriptions and listed as a necessary skill within the field. However, this often invites confirmation bias to sneak in. The definition of confirmation bias reads, "a tendency to search for or interpret information in a way that confirms one's preconceptions, leading to statistical errors" via ScienceDaily. Storytelling often requires a certain amount of logical gymnastics to piece your story puzzle together. In doing so, we may build in connections that are not there including links that we've drawn in our heads, which can be the beginning of the slippery slope known as confirmation bias if we're not careful.

Other forms of cognitive bias include:

  • Anchoring: The tendency to rely too heavily on one piece of information when making decisions, usually the first piece of information we acquire on a subject.
  • Framing Effect: Drawing different conclusions from the same information depending on how the information is presented.
  • Gambler's Fallacy: The tendency to think that future probabilities are altered by past events, when in reality they are unchanged.

There are too many forms of bias to name, there are over 100 different cognitive biases that have been identified within the category of Decision-making, belief, and behavioral biases alone.

Be willing to experiment

Many of us remember learning the principles of scientific experimentation when we were in school. This is both good and bad. It's good because you should be somewhat familiar with what this looks like; how to construct a hypothesis; how to design a basic experiment; select a sample of data; and how to interpret your results. What's bad is the examples you most likely ran across in school were nice and neat. In the real world, data is messy, and the results won't be so nice and neat (they weren't pre-designed for easy teacher grading).

 

So, you think you're ready to start thinking like a data scientist now? Harvard Business Review published a fun exercise on how to think like a data scientist, you can find it here.

Interested in further exploring the topic of data literacy? Nearly everything you could ever want to know can be found via Google search. If you'd like something a little more structured or verified you can explore MOOC sites such as Coursera, EdX, and Udacity which run online courses on many data related topics. Another great resource for some of the basic concepts discussed in this post is Khan Academy.

Total data consumed by reading this post: 355,250 bytes

 

Avolyn Fisher