A Short Primer on Data Visualization

Introduction

Data visualizations shape the way that we see and interact with the world around us.  Tableau defines data visualization as “the graphical representation of information and data.”  These graphical representations can be shown as charts, plots, or infographics, with the goal of communicating complex data relationships in a way that is easy to understand. Data visualizations can reveal data relationships that statistics and models may miss, such as unusual distributions of data, patterns and clusterings, gaps, and outliers (Unwin). Typically, many people will find that looking at a data visualization is much more effective than looking at raw data, as it is much easier to understand trends and patterns in data. Data visualizations allow others to gain information and better knowledge of a certain trend or relationship in data. 

Over time, data visualizations have changed dramatically due to the growing complexity of data itself along with advancements in technology. According to Insight Software, a big turning point occurred in the 20th century, where computers gave statisticians the ability to collect and store data in larger volumes than seen before, as well as the ability to create data visualizations. They also mention that in the last three decades have seen the field of data visualization explode into hundreds of focus areas, and concepts like big data, artificial intelligence, and machine learning have expanded the possibilities for data visualization (Insight Software). As modern data visualization tools continue to get developed, it is important to understand the basics of data visualization. In this blog post, we will be discussing the most common data visualization types, and a few best practices related to data visualizations. 

Bar charts

One of the most common types of data visualizations is known as a bar chart or bar graph. A bar chart can be defined as a visual tool that uses bars in order to compare data that belong in separate categories (Branson). Bar charts are commonly used to categorize data together by size, which is represented by the height of the bar. In order to create a bar graph, you first create a frequency distribution table, where you keep track of a category with its corresponding value. A frequency distribution table is just a table that shows how often a value or category appears in a data set. Next, you draw the two axes– the horizontal axis represents the categories, and the vertical axis represents the values. Next you draw the bars for each category. Below is an example of a bar graph from Branson representing the different neighborhood related issues that survey respondents are concerned about.

Line graphs

Another common type of data visualization is known as a line graph or line chart. A line graph connects a series of data points using a line, and is often used to help identify trends in data. Typically the horizontal axis represents a sequential progression of values, and the vertical axis tells you the values for a selected metric across that progression (Tableau). Line graphs are commonly used to show data over time. A line graph can either have one line, or multiple lines allowing you to compare multiple categories within the same field. Below is an example of a line graph with multiple lines from Yale Library: 

Pie charts

Pie charts are used to organize qualitative data that usually represent part-whole relationships. Imagine a pie divided into slices where each slice represents a different category. This graphical representation displays a set of data within a circular or “pie” shaped graph. Within this pie, data is divided into sections also known as slices “such that the area of each slice is proportional to the fraction of the total it represents.” (Wilke). Due to their straightforward representation that allows comparison amongst slices, pie charts are often used to help viewers visually understand a dataset. Businesses and schools commonly use pie charts to display multi categorical data and allow viewers to understand the data without much analysis. Below is an example of a pie chart from “Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures” displaying the number of Rhode Island citizens with correct slice proportions for each county.

Scatter Plots

A scatter plot displays the relationship between two different variables. Scatter plots usually contain two axes that each represent a different variable. Data points are added to the graph to show correlation between the two axes. These plots are commonly used in science, business, and education to display trends and correlations between variables. The visual display of multiple data points allows viewers to discover gaps, clusters, and outliers within a dataset, which can make spotting trends and creating predictions easier. Scatter plots are most commonly used for datasets with a lot of points to see how they interact with each other.

Matching the visualization type with the data

There are many different types of data visualizations, and it is important to make sure that your visualization effectively communicates relationships found in the data.. One of the best practices when creating a data visualization is to match the visualization type with the data. This essentially means using visualization that accurately and effectively shows the dataset. This helps eliminate any confusion and ensures that the target audience arrives at the intended conclusion. Specific visualizations are designed for specific data sets. IBM gives an example of this, by explaining that scatter plots display the relationship between variables well, while line graphs display data over time well. 

Keep it simple and clear

When it comes to data visualization, simplicity and clarity are two characteristics that will allow viewers to understand even the most complex datasets. Using clean, legible fonts with corresponding color choices enhances comprehension amongst viewers. Removing distractions and clutter from the visualization also provides clarity, allowing the essential data to be highlighted. According to Harvard Digital Accessibility, “The important thing is to not overwhelm people with information. If you try to show too much, people might lose the most important pieces.” (Harvard). By keeping data visualizations simple and clear, data becomes more accessible and impactful. 

Using Proper Scales and Labeling

The use of proper scales and labeling when visualizing data is crucial for accurate data representation and interpretation. It is important to select scales that best represent the data by ensuring that they are appropriate for the data range and distribution. The use of consistent scales across similar data visualizations makes visual comparisons easier. Axes should be properly labeled with correct unit measurements and descriptive titles. Data legends may be necessary for certain graphs to give context to symbols, colors, and lines. It is also important to avoid overcrowding of labels to ensure data is readable and concise.  

Conclusion

In conclusion, data visualizations are important for interpreting and understanding complex data driven relationships. By turning raw numbers and information into eye-catching graphs, charts, and plots, viewers are able to add meaning and context to data that might have otherwise been difficult to interpret. This transformation of raw data into visualizations also allows viewers to discover trends, patterns, and anomalies that could potentially be overlooked within a dataset. The development and evolution of data visualization has offered greater accessibility to large and complex datasets by increasing understandability within an audience. With greater technological advances, data visualization capabilities also increase, with a positive effect on accessibility. Ultimately, by using best practices for data visualization and mastering common types of visualization, we can effectively communicate key factors within a dataset as well as foster greater data practices and literacy.

References

For a list of sources cited throughout this post, please go here.

This blog post was authored by Reagan Bourne (Senior Data Fellow) and Keirsi Birch (Data Fellow) at FSU Libraries.

Leave a Reply

Powered by WordPress.com.

Up ↑

Discover more from FSULIB

Subscribe now to keep reading and get access to the full archive.

Continue reading