Maybe you’re on Twitter one day and search ‘#Statistics’ to look up some information for your Introductory Statistics course. Before you know it, you scroll through and see several tweets that are also marked with ‘#BigData’, and you’re left with more questions than you had when you started your search. Maybe you try to search for “big data” on Google, see the definition from Oxford, and are then left with even more questions:
How large is “extremely large?”
What kind of patterns, trends, and interactions are we talking about?
What isn’t big data?
Big data as a term has become synonymous with the growth of digital data and the glut of information available to researchers and the public. Furthermore, there is a growing interest by both the public and private sector in utilizing large datasets to provide insight into market trends and to improve decision making. However, the exact definition of big data is sometimes unclear and can vary widely depending on who you ask. Businesses, nonprofit organizations, government agencies, and academic researchers each view big data in a different context and with different goals for its use. (University of Wisconsin Data Science, n.d.)
Above: a Google Trends graph that shows the number of searches for the term “Big Data” from 2007 to 2017
In this blog post, we aim to provide clarity and insight into the origins and definitions of big data. We will also discuss the potential benefits and challenges surrounding big data. In doing so, we will provide some examples linking big data to applications or data that you may interact with on a daily basis.
Love Data Week is a week-long event that is celebrated on an international level to bring more awareness to the importance of research data management, library-based research data services, and more. On top of raising awareness on topics related to these different aspects, Love Data Week also aims to build a community for individuals to get engaged as they participate in the series of events that are held throughout the week.
This year, Love Data Week will be on Feb. 8 to Feb. 12. With the virtual event theme being, “Delivering a Better Future,” participants will be given the opportunity to share how they are using data to invest in having a better future as a result.
While this year’s Love Data Week is soon approaching, check out the Meet Your Data Librarians Podcast from last year’s event to learn more about some of the contributors of the celebration!
So, what is a Census Research Data Center? The Center for Economic Studies defines Census Research Data Centers (RDCs) as U.S. Census Bureau facilities, staffed by a Census Bureau employee, which meet all physical and computer security requirements for access to restricted–use data. At RDCs, qualified researchers with approved projects receive restricted access to selected non–public Census Bureau data files.
Where do college graduates work? Visualization based on 2012 Census data.
To understand the true value of doing research with non-public data from the RDC, it’s important to note the difference between micro data and macro data, which is often referred to as aggregate data. When most of us use datasets for research or analysis, we’re looking at summary figures. For example, if you extract Census data for analysis, you’re typically looking at some sort of summary or aggregation for a specific geographic unit. These geographic units range from state, county, city as well as much smaller units such as census tracts and block groups. Regardless of unit of analysis, the data itself is a summarization of individual survey responses for participants in that specific area.
Data science is all the rage lately. Harvard Business Review even named it the sexiest job of the 21st century. Even though the term is rapidly gaining mind share, many are still confused about what data science actually is. When you cut through the hype, the core of data science is actually pretty simple: it’s the study of data. What kind of data is being studied, how it is being studied, and what the individual data scientist is looking for all depend on the specific case. Data science is just another field of study using digital methods, putting it firmly under the umbrella of Digital Scholarship.