data science

STEM Data Fellow Spotlight: Diego Bustamante

This image has an empty alt attribute; its file name is Love-Data-Week_Banner-1024x341.png

For Love Data Week 2022, we are highlighting our FSU STEM Libraries Data Fellows! These posts, written by the fellows themselves, tell their stories of how they became interested in data-related work and their experience as a data fellow to this point. Today’s post is contributed by Diego Bustamante.

Prior to my role as a Data Fellow, my idea of what data is was defined by my previous work with quantitative data collected from laboratory experiments. For example, when I worked as a Research Assistant I recorded quantitative data for chemistry experiments, like mass, temperature, volume, etc. I then conducted statistical analysis on the data in order to draw conclusions from each experiment. I personally enjoy collecting and analyzing data, especially because it can lead to many scientific and technological advancements!

While searching for jobs in FSU’s NoleNetwork in summer 2021, one job title that immediately caught my attention was “FSU STEM Libraries Data Fellow.” The job description was unique amongst other jobs offered on campus. As a data fellow, I was offered the opportunity to develop several professional skills in data reference, co-hosting programming language workshops, writing and publishing blog posts, and many more. I felt like it was a great opportunity and a good fit with my previous experience and skills, and so I decided to apply. Thankfully, I was selected as one of the inaugural data fellows, leading to a journey of professional and personal development that has thus far surpassed my initial expectations. 

One of my first tasks in the program was meeting with different librarians at FSU Libraries. In these meetings I was able to learn about different methods and applications for data analysis in a variety of disciplines. For example, I learned that the Digital Humanities Librarian uses a text-mining software to find specific words from books published in the 1800s. She used the data drawn from the software to analyze certain traits of the story by counting the amount of times a character participates in an interaction of this type. This experience helped me realize that qualitative data sets can be used to draw similar conclusions about a study as quantitative data. 

Another concept that I have become familiar with while working as a Data Fellow is open data. We discussed this concept during a workshop where we talked about the potential benefits of making research data openly accessible to the wider research community. Initially, I was hesitant regarding the concept of open data, because I saw academic research as a “race” to find a solution to a given problem. However, further discussion of how researchers are compensated for sharing their data made me realize that it is possible to benefit from open data on a personal and global level. 

Currently, I am still learning about the many different types of data, its definitions, applications, and its importance. I am also working on developing an open source Canvas module on MATLAB where I explain the basics of the math based programming language in a student friendly manner. I look forward to sharing more about this work in the future!

How do the Pros do Data Analysis?

This image has an empty alt attribute; its file name is G6vrY6DLc67Wpw5hxpdsL7U5JkTLoM8SgJeEpu0ArZGDRUYjKk8970n_W27ywByLlR4Fa65qPZldsxk8dJaZKLrnt9jvfvM-5JshA_xKHtudqCpsrHZxjariXhqDIhH5SW4XZx4s

By: Diego Bustamante and William-Elijah Clark

INTRODUCTION

As technology continues to evolve, the infrastructure needed to run this technology gets more and more sophisticated. Processes and tasks carried out by personal computers, smartphones, and appliances are increasingly automated and run with minimal input from the user. This is made possible through code that is developed with one or more computer programming languages.  However, with the increase in the quantity of software and programming applications, the demand for programmers and the number of languages they are required to learn has increased.  Furthermore, many employers now require skills in data analysis and computer programming as prerequisites for job applications.  In this blog post, we will discuss the most in demand languages in the market and give a brief explanation of each.  (Grand Canyon University 2020; Jiidee 2020; Meinke 2020; University of California – Berkeley, n.d.) 

(more…)

What is ‘Big Data’ Anyway?

This image has an empty alt attribute; its file name is G6vrY6DLc67Wpw5hxpdsL7U5JkTLoM8SgJeEpu0ArZGDRUYjKk8970n_W27ywByLlR4Fa65qPZldsxk8dJaZKLrnt9jvfvM-5JshA_xKHtudqCpsrHZxjariXhqDIhH5SW4XZx4s

By: Diego Bustamante and William-Elijah Clark

Maybe you’re on Twitter one day and search ‘#Statistics’ to look up some information for your Introductory Statistics course. Before you know it, you scroll through and see several tweets that are also marked with ‘#BigData’, and you’re left with more questions than you had when you started your search. Maybe you try to search for “big data” on Google, see the definition from Oxford, and are then left with even more questions: 

  • How large is “extremely large?”
  • What kind of patterns, trends, and interactions are we talking about?
  • What isn’t big data?

Big data as a term has become synonymous with the growth of digital data and the glut of information available to researchers and the public. Furthermore, there is a growing interest by both the public and private sector in utilizing large datasets to provide insight into market trends and to improve decision making. However, the exact definition of big data is sometimes unclear and can vary widely depending on who you ask. Businesses, nonprofit organizations, government agencies, and academic researchers each view big data in a different context and with different goals for its use. (University of Wisconsin Data Science, n.d.)

a Google Trends graph that shows the number of searches for the term “Big Data” from 2007 to 2017

Above: a Google Trends graph that shows the number of searches for the term “Big Data” from 2007 to 2017

In this blog post, we aim to provide clarity and insight into the origins and definitions of big data.  We will also discuss the potential benefits and challenges surrounding big data. In doing so, we will provide some examples linking big data to applications or data that you may interact with on a daily basis.

(more…)

What is a Census Research Data Center and Why Should You Care?

This semester, FSU became the newest consortial member of Atlanta’s Census Research Data Center. Funded primarily by the College of Social Sciences and the Office of Research, the Florida State community can now use Census micro-data without paying lab fees, which can range upwards of $15,000 per project.  There are currently 18 Census Research Data Centers in the United States, and outside of North Carolina’s Research Triangle the only one located in the southeastern United States is The Federal Reserve Bank of Atlanta.

So, what is a Census Research Data Center? The Center for Economic Studies defines Census Research Data Centers (RDCs) as U.S. Census Bureau facilities, staffed by a Census Bureau employee, which meet all physical and computer security requirements for access to restricted–use data. At RDCs, qualified researchers with approved projects receive restricted access to selected non–public Census Bureau data files.

Where do college graduates work? Visualization based on 2012 Census data.

Where do college graduates work? Visualization based on 2012 Census data.

To understand the true value of doing research with non-public data from the RDC, it’s important to note the difference between micro data and macro data, which is often referred to as aggregate data. When most of us use datasets for research or analysis, we’re looking at summary figures. For example, if you extract Census data for analysis, you’re typically looking at some sort of summary or aggregation for a specific geographic unit. These geographic units range from state, county, city as well as much smaller units such as census tracts and block groups. Regardless of unit of analysis, the data itself is a summarization of individual survey responses for participants in that specific area.

(more…)

Building Data Sets with FSU’s Digital Library

Data science is all the rage lately. Harvard Business Review even named it the sexiest job of the 21st century. Even though the term is rapidly gaining mind share, many are still confused about what data science actually is. When you cut through the hype, the core of data science is actually pretty simple: it’s the study of data. What kind of data is being studied, how it is being studied, and what the individual data scientist is looking for all depend on the specific case. Data science is just another field of study using digital methods, putting it firmly under the umbrella of Digital Scholarship.

(more…)