Maybe you’re on Twitter one day and search ‘#Statistics’ to look up some information for your Introductory Statistics course. Before you know it, you scroll through and see several tweets that are also marked with ‘#BigData’, and you’re left with more questions than you had when you started your search. Maybe you try to search for “big data” on Google, see the definition from Oxford, and are then left with even more questions:
How large is “extremely large?”
What kind of patterns, trends, and interactions are we talking about?
What isn’t big data?
Big data as a term has become synonymous with the growth of digital data and the glut of information available to researchers and the public. Furthermore, there is a growing interest by both the public and private sector in utilizing large datasets to provide insight into market trends and to improve decision making. However, the exact definition of big data is sometimes unclear and can vary widely depending on who you ask. Businesses, nonprofit organizations, government agencies, and academic researchers each view big data in a different context and with different goals for its use. (University of Wisconsin Data Science, n.d.)
Above: a Google Trends graph that shows the number of searches for the term “Big Data” from 2007 to 2017
In this blog post, we aim to provide clarity and insight into the origins and definitions of big data. We will also discuss the potential benefits and challenges surrounding big data. In doing so, we will provide some examples linking big data to applications or data that you may interact with on a daily basis.
Welcome to the third post in the Get Data Lit! blog series. This post will focus on my experience working as a STEM Research Data Services Associate with FSU Libraries during the 2020-2021 school year. In this role, I assisted with outreach and education to FSU students, groups, and organizations at Florida State University around STEM research data services.
My name is Paxton Welton and I will be graduating with a bachelor’s degree in Finance this semester. One question that you might have right from the start-why is a finance major working in a STEM-focused role?
When applying for jobs prior to this academic year, I knew I wanted a role that would challenge me and allow me to develop new skills. I believed that being the Research Data Services Assistant would provide me the appropriate level of challenge and opportunity that I was looking for. By and large, I believe that my experience provided me with just that. There was a major learning curve that I faced when I first started this role. While I had a grasp of the basics of data literacy and research data services, I quickly realized I did not know nearly enough to be able to properly speak to student groups about these topics. During the first few weeks of the fall semester, I spent a significant portion of my time getting a stronger understanding of data and everything FSU STEM Libraries had to offer to its students in regards to research data. By reading countless articles about data literacy and engaging in weekly discussions with my supervisor Dr. Nick Ruhs, the STEM Data & Research Librarian, I became confident in my working knowledge on these topics.
As the STEM Research Data Services Assistant, one of my main responsibilities was conducting targeted outreach to different student organizations across campus. When I first started this process I reached out specifically to STEM-focused groups. This process involved me initiating conversations via email with registered student organizations (RSOs) to introduce them to the research data services FSU Libraries offers them. In several cases, we were invited to meet and/or present synchronously to these groups. This gave us a chance to share more in-depth information about our services and just how valuable they are to students. It also gave students a chance to ask us any questions they may have. Getting the chance to directly interact with students and help them find the right resources to feel more prepared for their future was by far my favorite part of this role.
I also had the opportunity to contribute to data-related events hosted by FSU STEM Libraries. Two examples include Love Data Week in February and the Virtual FSU Libraries Data Services Quest in March. My involvement in these events allowed me to see the entire process of creating programming for students. I was able to sit in on brainstorming meetings, give my input on the marketing materials, and create content for the events.
One of my main focuses throughout this year has been to develop and create this blog series you are reading right now–Get Data Lit! The focus of this blog series was data literacy and its applicability to student’s educational experiences. As such, I had the chance to put into practice the new data literacy skills I learned in this role. I also had the opportunity to connect data literacy to real-world practice and explain the importance of critically evaluating data. Doing so made me realize just how important learning data skills are for my future career and education.
One thing that proved to be a common theme throughout all the work I was doing is that data is powerful and knowing how to work with it is even more powerful. From a career in law to a career in fashion, you are going to be working with data in some form. Learning how to critically evaluate data is going to give you the skills you need to stand out in the future.
By taking on a job in a discipline that I knew very little about, I was able to challenge myself and make the most out of this past year. From getting to work on student programming events to developing a blog series, I was constantly challenged and learning something new.
Love Data Week is a week-long event that is celebrated on an international level to bring more awareness to the importance of research data management, library-based research data services, and more. On top of raising awareness on topics related to these different aspects, Love Data Week also aims to build a community for individuals to get engaged as they participate in the series of events that are held throughout the week.
This year, Love Data Week will be on Feb. 8 to Feb. 12. With the virtual event theme being, “Delivering a Better Future,” participants will be given the opportunity to share how they are using data to invest in having a better future as a result.
While this year’s Love Data Week is soon approaching, check out the Meet Your Data Librarians Podcast from last year’s event to learn more about some of the contributors of the celebration!
“Data is the sword of the 21st century. Those who wield it, the samurai.”-Jonathan Rosenberg
Data is all around us and we often interact with it in ways we don’t even realize. From using an app to mobile order our coffee to reviewing a chart provided in an article, data surrounds us and has become so intertwined with our lives. However, with the increasing amount of data available at our fingertips, it can be difficult to understand its meaning, accuracy, and relevance to our lives. This is the reason we decided to start this new blog series, Get Data Lit! We realize that data can be difficult to decipher and want to give you the tools to better navigate data you are faced with everyday.
Margaret Bell, undergraduate student and data analyst for FSU Libraries, provided insight into her experience working in data assessment.
As a senior undergraduate student at Florida State University, I’ve become very aware of the different opportunities to be pursued on both on and off campus. This awareness, however, took me years to develop – and had I not had a job on campus, I’m sure it would have taken a lot longer. With so many people to compete with for on-campus jobs, I remember being afraid that I would graduate with zero professional experience to put on my résumé – something that seemed a little too risky especially when considering that I had no idea of what I wanted to do post graduation. Although I’m still unsure of my path at this time, I was fortunate enough to secure a position in Strozier’s assessment department by the end of my sophomore year. Members of the assessment department are responsible for collecting and analyzing data related to FSU libraries (among many other things), so as a double-major in Psychology and Editing, Writing & Media, I certainly hadn’t foreseen “Data Analyst” being my first job title.
After a period of training and adjusting to my schedule, I quickly came to see the benefits of working in Strozier. This job has been an opportunity to learn more about the resources that FSU Libraries offers students, faculty, and staff. Not just offering a physical space for learning and studying, the libraries have also compiled an invaluable online source full of useful information. Working in assessment and having to update the assessment Facts & Figures page has allowed me plenty of time to become very familiar with the Libraries’ website – something I recommend that all students do.
As this was my first time having a regular part-time job, I came in with a few worries; mostly that I would have a difficult time juggling work with classes and other extracurriculars. However, I was pleased to discover an emphasis on school coming first. This allowed me to comfortably work around my other responsibilities while also being able to supplement my FSU experience with exposure to working in a professional environment. For that reason plus the availability of many different job positions, I would absolutely advise job-seeking students to consider working for FSU Libraries.
Enrichment related to my academic and professionally-related experience aside, working in the library has added so much to my time at FSU just in terms of the wonderful people I’ve met. The assessment team – including my amazing boss Kirsten Kinsley, mentor Elizabeth Yuu (a recent graduate with a Master’s in Biostatistics who also happens to be my idol), and awesome undergraduate peers Rachael Straley and Jake Tompkins – have made the latter half of my college experience better than I ever could’ve asked for. So if there’s one thing I’d recommend to future students, it’s to not take the library for granted.
My name is Rachel Smart and I’m a graduate assistant for Digital Research and Scholarship. I was adopted by DRS in mid-March when the Goldstein Library was reamed of its collection. It was devastating for the 2% of the campus who knew of its existence. Bitterness aside, I’m very grateful for the opportunity I’ve been given by the DRS staff who warmly welcomed me to their basement layer; here I’m being swiftly enthralled by the Open Access battle cry. The collaborative atmosphere and constant stream of projects never fails to hold my interest. Which leads me to Data Carpentry…
In May of this year, I met with Micah Vandegrift (boss and King of Extroverts) regarding my progress and the future direction of my work with DRS. He presented me with the task of running a data workshop here in our newly renovated space. Having never organized something this scale before, I was caught off guard. However, I understood the importance and need for data literacy and management trainings here on campus, and I was excited by the prospect of contributing to the establishment of a Data Carpentry presence here at FSU. Micah was kind enough to supply me with a pair of floaties before dropping me into the deep end. He initiated first contact with Deb Paul from iDigBio, a certified Data Carpentry instructor, here on campus and I joined the conversation from there.
It took a few weeks of phone calls and emails before we had a committed instructor line-up, and we were able to apply for a self-organized Data Carpentry workshop in April. Instructors Matthew Collins, Sergio Marconi, and Henry Senyondo from the University of Florida taught the introduction to R, R visualizations, and SQL portions of the workshop. I was informed that you aren’t a true academic librarian until you’ve had to wrestle with a Travel Authorization form, and I completed them for three different people, so I feel thoroughly showered in bureaucratic splendor. However, the most obstructive item on my multipart to-do list of 34+ tasks was finding the money to pay for food. DRS has an event budget with which we paid the self-hosting fee and our instructors’ traveling expenses, but we were not allowed to use it for food. This delayed the scheduling process, and if it weren’t for the generous assistance from iDigBio, we would have had some very hungry and far fewer attendees. If I were blessed with three magical freebies for the next potential Data Carpentry event, I would use the first to transform our current event budget into food-friendly money, and I would save the other two in case anything went wrong (ex, a vendor never received an order). This may seem overly cautious, but just ask anyone who had to organize anything. We are perfectly capable of completing these tasks on our own or with a team, but some freebies for the tasks which fall beyond our control would come in handy.
The event ran smoothly and we had full attendance from the 20 registered attendees. As busy as I was in the background during the event, attendees came up to me and let me know how well the workshop was going. There were also comments indicating we could do things a little differently during the lessons. I think most of the issues that sprung up during the event were troubleshooting software errors and discrepancies in the instructions for some of the lessons, for example, the SQLite instructions were written using the desktop version of the program and not the browser plugin everyone was using. The screen we used to display the lessons and programming demos was the largest we could find, but it was still difficult for some people to see. However, adjustments were made and attendees were able to continue participating.
The most rewarding element of the experience for me were the resulting discussions among participants during planned collaboration in lessons and unplanned collaboration during breaks and long lunch periods. The majority of our participants have various backgrounds in the Biological Sciences, but as individuals they had different approaches to solving problems. These approaches frequently resulted in discussions between participants about how their various backgrounds and research impacted their relationship with the tools and concepts they were learning at Data Carpentry. On both days of the event, participants came together in our conference room for lunch and rehashed what they had learned so far. They launched into engaging discussions with one another and with DRS staff about the nature of our work and how we can work together on future initiatives. This opportunity to freely exchange ideas sparked creative ideas relating to the Data Carpentry workshops themselves. On the second day, an increased number of participants brought their own project data to work with in workshop exercises.
The future of Data Carpentry here at FSU looks bright, whether I will be there for the next workshop is unknown. Thank you, Deb Paul, Micah Vandegrift, Emily Darrow, Kelly Grove, and Carolyn Moritz for helping me put this workshop together, and thank you to everyone who participated or contributed in any way.
I spend a considerable portion of my time convincing researchers of the benefits associated with publishing their data online in open repositories. Bringing up things like reproducibility of research and the idea of others using their original data sets to advance scholarship in their field or another are my usual selling points. Academics produce vast amounts of data that has value well beyond the scope of their original project. That being said, government agencies produce endless amounts of data and information as they conduct their day to day business. There are obvious products that have mounds of useful information in them, like the U.S. Census or the American Community Survey. Governments rely on information in all sorts of formats to perform countless tasks on a day to day basis. For example, many local governments rely on spatial data of their infrastructure (roads, sewers, power lines) to set maintenance schedules or to select an ideal space for new residential development.
So, what is a Census Research Data Center? The Center for Economic Studies defines Census Research Data Centers (RDCs) as U.S. Census Bureau facilities, staffed by a Census Bureau employee, which meet all physical and computer security requirements for access to restricted–use data. At RDCs, qualified researchers with approved projects receive restricted access to selected non–public Census Bureau data files.
To understand the true value of doing research with non-public data from the RDC, it’s important to note the difference between micro data and macro data, which is often referred to as aggregate data. When most of us use datasets for research or analysis, we’re looking at summary figures. For example, if you extract Census data for analysis, you’re typically looking at some sort of summary or aggregation for a specific geographic unit. These geographic units range from state, county, city as well as much smaller units such as census tracts and block groups. Regardless of unit of analysis, the data itself is a summarization of individual survey responses for participants in that specific area.
Data science is all the rage lately. Harvard Business Review even named it the sexiest job of the 21st century. Even though the term is rapidly gaining mind share, many are still confused about what data science actually is. When you cut through the hype, the core of data science is actually pretty simple: it’s the study of data. What kind of data is being studied, how it is being studied, and what the individual data scientist is looking for all depend on the specific case. Data science is just another field of study using digital methods, putting it firmly under the umbrella of Digital Scholarship.