When I was in elementary school, I remember Googling various football statistics, running down to my parents, and telling them, for example, “Ben Roethlisberger had 4,328 passing yards in 2009!” I played football for eight years from elementary school to high school, and I was good with working with numbers. I found that sports analytics was a great combination of the two. In high school, I entered a sports analytics competition, where my project was to determine what would happen if onside kicks in football would be replaced with a 4th down and 15, and I absolutely loved it. Now, I’m fascinated with data science as a whole– being able to make a computer do something that we could never imagine doing as humans is an amazing feeling for me.
Since the sports analytics competition, I’ve been doing anything and everything I could related to data science. Some of the research I’m currently working on includes sports team values, kickstarter data, and sportswashing (for example, Qatar holding the World Cup amidst some controversial political issues). I also had a job this year working for a company called Scouting Heroes, where I logged basic statistics for the FSU football team. (More information on what the data I collected was for can be found at https://simplebet.io/nfl.html.) I’ve also worked on creating data visualizations based on football data. For example, this past summer I created over 20 graphs that can be found at https://twitter.com/a_graph_a_day .
In one of my classes, one of my (now) coworkers, William-Elijah Clark, posted the opening for the STEM Libraries Data Fellowship in the class’s GroupMe, and I was eager to apply. Something I’m super excited for with this Data Fellowship is that I really want to translate my skills into some real-world experience. Instead of simply creating graphs or finding statistics on my own, I want to have a tangible impact with regard to data. I hope to be able to help students out with their needs or be able to have my data analysis translate into a decision being made that affects people. In a way, it would signify that my hard work on data analysis is paying off.
One of the projects that I’m super interested in working on as a Data Fellow is the use of Jupyter Books to assist users in learning more about how to code and analyze data as a whole. By offering interactive code blocks and giving users the opportunity to run code on their own, they may be more willing to learn about the data analysis techniques used. Furthermore, I hope that by implementing sports analytics examples, specifically football, people who are interested in sports may be more willing to learn how to use data analysis techniques with respect to sports.
As a whole, I’m very excited to learn more about data analysis techniques here at the FSU libraries and as well as apply my skills to tangibly help others at Florida State as a whole.
This blog post was written by Sahil Chugani, STEM Data Fellow at FSU Libraries.
Prior to my experience at Florida State University, I took a few research classes in high school. In these classes, I had assignments where I would have to collect and analyze data as part of a research project. These experiences sparked my interest in data science, and from that point forward I always knew that I was interested in data-related research. Furthermore, I have always been interested in a few different subjects, including computer science, biology, and mathematics. I never realized that I would be able to combine my interests before starting this data fellowship.
When I first found this fellowship during the summer of 2022, I felt that I was at an academic crossroads. I was unsure of what I wanted to study and my career goals. However, I was extremely interested in this opportunity, because it was unlike anything I had ever really known about. I thought that this position would be a great learning opportunity for me, and would hopefully allow me to utilize my data skills and pursue some of my interests. So far, this fellowship has gone above and beyond what I was hoping for.
As I am still in the beginning of my academic career, I have not had the opportunity to obtain much experience using my data skills before this fellowship. For this reason, I am so grateful to be participating in this fellowship. I have already learned so many different things in my few months here. One of my first assignments was to meet with many of the different librarians at FSU Libraries. I really enjoyed this task, because I liked hearing about all of the different paths that were taken until finding this career. It introduced me to a lot of different projects and areas of expertise in the library that I had never known about, such as the Health Data Sciences Initiative and open science.
Another concept that I have recently learned a lot about is the importance of critically evaluating data. Working on a blog post about this topic has been a great learning experience for me. It has introduced me to so many ideas that I had never known about. Specifically, I have learned about machine learning algorithms for data science. As a student currently pursuing a computer science degree with a minor in data analytics, this topic was extremely interesting to me, and is something that I am excited to explore further.
As I take more classes related to my major, I am excited to apply the skills I learn towards this fellowship. In the future I hope to teach workshops about Unix, C#, SQL, and many more. I am looking forward to continuing my work with the FSU Libraries.
This blog post was written by Reagan Bourne, STEM Data Fellow at FSU Libraries.
Love Data Week is coming back to FSU in 2023! Love Data Week, or LDW, is an international event where individuals and groups are encouraged to host and participate in activities related to any and all data. It occurs every week that Valentine’s Day lands, and focuses on helping people learn about the best data management practices and methods for interpreting data. LDW was started in 2015 and is headed by the Inter-university Consortium for Political and Social Research at the University of Michigan. For those looking to learn more about data or are interested in statistics, this is an excellent opportunity to ask questions and get started!
Events
Because looking at raw data can sometimes be boring, we’re looking to spice things up this year by including two new activities! We’ll be right inside the entrance of Dirac from 12:00 – 2:00 PM on Thursday and Strozier from 12:00-2:00 PM on Friday! First, we’re going to be doing an Adopt-a-Dataset activity, where participants will be able to “adopt” one of the openly available datasets we have displayed. Your task will then be to determine what conclusions can be drawn from the data, and you’ll receive a Dum-Dum for your work! After that, we’ll have a jar of Smarties at the table, with a list of numbers from a normal distribution on hand. From there, you’ll have to guess the number of smarties in the jar, and the person with the closest guess will win them all! In addition to the tabling events, our Research Data Management Librarian, Dr. Nick Ruhs, will be giving a workshop on Data Analysis with Microsoft Excel on Valentine’s Day (February 14) from 3:00-4:30. If you are or will be using Excel for your projects or research and are looking to enhance your skills, this will be a great workshop to attend!
Blog Posts
In addition to the wonderful events that are occurring during Love Data Week, we will be publishing two blog posts introducing the two new Data Fellows at FSU, Reagan Bourne and Sahil Chugani. In those posts, you’ll learn all about what inspired them to become a data fellow and how they became passionate about data analysis and management techniques.
Contact/Resources
For more information about any data questions/concerns you may have, you can either check out https://www.icpsr.umich.edu/web/pages/ or contact Dr. Nick Ruhs, our resident Research Data Management Librarian, at nruhs@fsu.edu. Furthermore, if you ever need any assistance with any data question you may have, you can check out the walk-up hours for our STEM Data Fellows!
This blog post was written bySahil Chugani (STEM Data Fellow) from FSU Libraries.
Data literacy is the combination of a few unique skill sets: statistical literacy, information literacy, and technical proficiency. It also involves being able to visualize, critically evaluate, determine the accuracy and reliability of, and understand data sets. There are many reasons why it is important to be data literate, especially in recent years with the advent of the internet and social media. Data literacy is also crucial to many different industries and research areas. It is important to interpret the data that you are collecting to make sure that the results are accurate and to be able to understand that data so that you can create useful visualizations for others.
There are a variety of concepts to keep in mind when critically evaluating data. For example, you need to consider the methods that were used to collect the data and whether those methods are ethical. Furthermore, when evaluating how the data is presented, you need to consider whether that representation or visualization is the most accurate way to portray the data. Another particular topic of concern is bias. There are different points at which biases can be introduced, such as when data is collected, when it is analyzed, and when it is shared with the public. Also, if you are critically evaluating your own data, it is important to check that there are no biases within your own work. In this post we will be discussing the critical evaluation of data through the lens of data collection, data presentation and visualization, and data ethics.
Data Collection
In the context of data collection, several different collection methods can be used for research. Some of these methodologies, such as focus groups, surveys, and participant interviews, are familiar to the public at large. However, there are other specific data collection processes that many people outside of certain academic disciplines may not be aware of, such as web scraping/text mining, phlebotomy procedures for blood tests, observational behavior recording for time series data, and many more.
Consequently, not only is recording the data itself of importance for experimental duplication purposes, but it can also be important for interdisciplinary work. Some fields of research may have different research data collection methods that researchers in other fields may not be aware of, even across seemingly similar disciplines. For example, accounting and finance may seem similar but can have drastically different ways of interpreting monetary data. The way accountants and financial analysts calculate when a company is at a net zero (i.e., a break-even) between revenues and costs is different. Even within the same field of research, transparency with how data is collected is important for peer review – whether it be for ethics accountability or determining methodological flaws within research. An incomplete set of data can make it difficult or impossible to know whether or not the data was collected in a way to prevent bias, and further make it impossible to know if the data is accurate and/or precise.
Failing to document data and data collection methods can also create problems reproducing or using the data for further research, particularly if things such as question types, experiment conditions, and units of measure are not properly documented. For example, while the hypothetical idea of cold fusion (nuclear fusion performed at room temperature) would be a low-cost energy solution, the experimental methods and data were not recorded. As a result, the concept of cold fusion is now widely looked at with skepticism because none of the data was recorded! A less extreme case where incomplete data may cause research problems is that the way that a survey is constructed can bias responses. Therefore, documenting how a survey was written can be helpful in evaluating why a research study came to a specific conclusion, as well as testing whether or not changing questions or even question order would change results.
Furthermore, data cleaning – which is the process in which things such as incorrectly formatted data, corrupted data, etc are reformatted or fixed so that it can be used in analysis – can also contribute to statistical bias(es) via things such as eliminating outliers, accidentally losing a variable, how you decide to categorize your data, and more. Therefore, documenting how you clean your data is also a critical component of research – explaining what outliers you decided to keep or remove and why can help you and other researchers down the road. It is also important to consider the order questions are asked in and the way questions are worded when conducting surveys. While it might seem counterintuitive at first, the way that questions are ordered and worded can impact the percentages of people that respond in a certain way, whether or not potential participants qualify for research projects, and even the numeric values of the data itself.
Data Presentation and Visualization
Most have probably heard the phrase “label your axes” at some point, even before college. It is often mentioned in K-12 education, with the pretense being that someone will not know what your graph(s) are depicting without them. While this is indeed correct, labeled axes constitute only one of many different components of data presentation and visualization.
Figure 1: Axes that are labeled!
A good place to start on the types of ways that data visualizations can be best implemented would be The Data Visualisation Catalogue. While the site was originally established with graphic designers in mind, Severino Ribeccca himself stated “I felt it would also be beneficial to…anyone in a field that requires the use of data visualisation.”(Ribecca n.d.) As such, almost anyone who uses data typically has to consider how to visually communicate data in a way to an outside audience, or even the general public outside of the realm of academia. A nifty feature of The Data Visualisation Catalogue is that there is a way to filter recommended data visualization types by what concept you are trying to demonstrate.
One consideration when looking at a data visualization is whether the data is represented in a way that is appropriate for that specific data type. While it might not seem like the data presentation would differ between data types, certain visualizations will serve to more accurately and sufficiently depict different types of data. For instance, data related to time and Geographic Information Systems mapping produce distinct data types. While they can be combined and represented in the same graphic (i.e., how has the land of a certain area changed over time?), they both have their own distinct issues to consider to make sure that you are not creating misleading graphics. Namely, one cannot make a map with time data alone, and a map would be hard to make with a line graph that is meant to show trends in time.
Furthermore, the scales and units that are utilized in a data representation are also important considerations! Using our previous example, we can note that the visual scales of a map are different from the visual scales of time series data. For instance, you can get drastically different data visualizations if you transform data from a linear scale to a logarithmic scale (i.e., a scale that plots data based on what exponent would be needed to get your number back). This can be useful for situations where the data you are working with is so large that it is hard to see everything in an efficient way. For example, a logarithmic scale of time where millions of years are condensed into smaller numbers that are easier to conceptualize leads to graphs where you can see things like different geographical eras.
On a more human scale, while logarithmic data could be used to misrepresent data, a far more common tactic for misrepresenting data involves a truncated or broken axis on a graph (Figures 2a and 2b); a truncated graph deliberately not starting at zero on the y-axis, and a broken axis subtly skipping a large amount of units. This is a common tactic that is present in some graphics that news outlets might use, whether it is intentional or not. Some other characteristics of misrepresented data might be plotting two graphs that are not on the same scale or zooming your scale in to make a trend look far larger than it truly is.
Figures 2a and 2b: Graphical Examples of a graph with a broken axis and a graph with a truncated axis, respectively
While there are many examples of distinctly misleading graphs, there are also many graphs that accurately portray the data, but use an incompatible or inaccessible color palette. Related to this, many color palettes used in data visualizations can be inaccessible to those with vision impairments such as green-red and blue-yellow color blindness. Utilizing distinct color-blind friendly palettes can help to make visualizations more accessible. Furthermore, using alt-text descriptions of what the graph is showing enhance the ability of screen readers and other tools utilized by those with low-vision and blindness to interpret the visualization. Thus, being hard to see or just looking aesthetically displeasing does not make a graph misleading, and is an important distinction to make (although the two are not mutually exclusive!)
Figure 3: A “Painbow” Graph
Data ethics
When examining a dataset, it is also important to consider whether there are any biases present that may affect interpretation of the data. Two common categories of biases are cognitive biases and statistical/algorithmic biases. Cognitive biases involve individuals interpreting the results of a study to best fit a specific narrative. This may involve a data producer deleting data that does not fit the conclusion that they are trying to prove. At the same time, a data producer may also add data that is not accurate in an attempt to strengthen their claims. Furthermore, studies may be designed to collect data that only represents a small subset of a population, while claiming to be representative of the entire population.
Similar to cognitive biases, statistical/algorithmic biases describe the concept of bias as your sample poorly describing your population. In that context, it is significantly mitigated (if not outright eliminated) if your data collection methods are not generally or statistically biased. This is particularly noticeable when examining artificial intelligence (AI) algorithms. These algorithms are often trained with unequal datasets, which then leads to skewed results when performing data analysis with said algorithms. Therefore, when examining data that is outputted by an algorithm, one should consider whether the algorithm has been trained with accurate and equal data sets. An industry where statistical and algorithmic biases are extremely important to consider is the healthcare industry. For example, many hospitals use artificial intelligence to sort through patient data, which helps doctors determine who needs immediate emergency attention. While there are many benefits to such algorithms, there have been issues in the past because of them. In certain instances, if a patient has pre-existing medical conditions that affect their health, the algorithm will not be able to take that into account. In addition, many algorithms that are commonly used in healthcare systems are racially and gender biased. As mentioned in “Algorithmic Bias in Health Care Exacerbates Social Inequities — How to Prevent It” written by Katherine Igoe, “algorithms in health care technology don’t simply reflect back social inequities but may ultimately exacerbate them.” Igoe also mentions that certain prediction algorithms used for detecting heart diseases in the medical industry were biased in their design. For example, the “Framingham Heart Study cardiovascular risk score” worked very well for caucasion patients, but not for African American patients. This is due to the fact that around 80% of the collected data used for this algorithm was from caucasian patients. Utilizing such an unequal dataset to train the algorithm can lead to unequal care and treatment in medical practices (Igoe). This example is just one of the many examples of bias due to algorithm design.
Companies such as Amazon have also faced huge problems relating to algorithm bias. A few years ago, Amazon tried to utilize an algorithm that used artificial intelligence to hire new employees. However, it turned out that this algorithm was biased against women. This is because the algorithm was trained on resumes that were submitted during a time period where the number of male applicants was significantly higher than the number of female applicants. This ultimately caused the algorithm to be trained to favor men over women.
Conclusion
Critical evaluation of data is an extremely important skill set for any student or professional to have. Knowing the importance of checking the reliability, accuracy, and the bias in any data set is necessary when reading or working with data. Some questions to keep in mind are: is the collection method clear and documented? Is the data visualization appropriate for the dataset and for what the author is trying to represent? Is the data biased in the collection or visualization stages? It is important to evaluate data to ensure that we are using quality and accurate data to make sound decisions and conclusions.
Alessandro, Brian d’, Cathy O’Neil, and Tom LaGatta. “Conscientious Classification: A Data Scientist’s Guide to Discrimination-Aware Classification.” ..Org. Cornell University ArXiv, July 21, 2019. https://arxiv.org/abs/1907.09013.
Edwards, Brent D. 2019. “Edwards, D. Brent. “Best Practices from Best Methods? Big Data and the Limitations of Impact Evaluation in the Global Governance of Education.” Edwards, D. Brent. “Best Practices from Best Methods? Big Data and the Limitations of Impact Evaluation in the Global Governance of Education. 39: 69–85. https://doi.org/10.1108/S1479-367920190000038005.
Lavalle, Ana, Alejandro Mate, and Juan Trujillo. “An Approach to Automatically Detect and Visualize Bias in Data Analytics.” CEUR, Proceedings of the 22nd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with EDBT/ICDT 2020 Joint Conference (EDBT/ICDT 2020), 2572 (March 30, 2020). https://rua.ua.es/dspace/bitstream/10045/104029/1/2020_Lavalle_etal_DOLAP.pdf.
Loo, Mark van der, and Edwin de Jonge. Statistical Data Cleaning with Applications in R. Hoboken, NJ: John Wiley & Sons, Inc., 2018.
Mark, Melvin, Kristen Eysell, and Bernadette Campbell. “The Ethics of Data Collection and Analysis.” New Directions for Evaluation, 1999 1999, no. 82 (November 5, 2004): 47–56. https://doi.org/10.1002/ev.1136.
Oldendick, Robert W. 2008. “Question Order Effects.” In Encyclopedia of Survey Research Methods, edited by Paul J. Lavrakas, 664–65. Thousand Oaks, CA: Sage Publications, Inc. https://dx.doi.org/10.4135/9781412963947.n428.
Paradise, Elise, Bridget O’Brien, Laura Nimmon, Glen Bandiera, and Maria Athina (Tina) Martimianakis. 2016. “Design: Selection of Data Collection Methods.” Journal of Graduate Medical Education 8 (2): 263–64. https://doi.org/10.4300/JGME-D-16-00098.1.
Pennsylvania State University. n.d. “4.3 – Statistical Biases.” ..Edu. STAT 509 Design and Analysis of Clinical Trials. Accessed November 17, 2022. https://online.stat.psu.edu/stat509/lesson/4/4.3.
Ribecca, Severino. n.d. “About the Data Visualisation Catalogue.” ..Com. The Data Visualisation Catalogue. Accessed November 22, 2022. https://datavizcatalogue.com/about.html.
Ribecca, Severino. n.d. “What Do You Want to Show?” ..Com. The Data Visualisation Catalogue. Accessed November 22, 2022. https://datavizcatalogue.com/search.html.
Ribecca, Severino n.d. “Data Viz Catalogue.” ..Com. The Data Visualisation Catalogue. Accessed April 14, 2022. https://datavizcatalogue.com/.
Silva, Selena, and Martin Kenney. “How Computing Platforms and Algorithms Can Potentially Either Reinforce or Identify and Address Ethnic Biases.” ..Org. ACM Digital Library, October 24, 2019. https://dl.acm.org/doi/fullHtml/10.1145/3318157.
Villar, Ana. “Response Bias.” In Encyclopedia of Survey Research Methods, edited by Lavrakas, Paul J., 752-53. Thousand Oaks, CA: Sage Publications, Inc., 2008. https://dx.doi.org/10.4135/9781412963947.n486.
For Love Data Week 2022, we are highlighting our FSU STEM Libraries Data Fellows! These posts, written by the fellows themselves, tell their stories of how they became interested in data-related work and their experience as a data fellow to this point. Today’s post is contributed by Diego Bustamante.
Prior to my role as a Data Fellow, my idea of what data is was defined by my previous work with quantitative data collected from laboratory experiments. For example, when I worked as a Research Assistant I recorded quantitative data for chemistry experiments, like mass, temperature, volume, etc. I then conducted statistical analysis on the data in order to draw conclusions from each experiment. I personally enjoy collecting and analyzing data, especially because it can lead to many scientific and technological advancements!
While searching for jobs in FSU’s NoleNetwork in summer 2021, one job title that immediately caught my attention was “FSU STEM Libraries Data Fellow.” The job description was unique amongst other jobs offered on campus. As a data fellow, I was offered the opportunity to develop several professional skills in data reference, co-hosting programming language workshops, writing and publishing blog posts, and many more. I felt like it was a great opportunity and a good fit with my previous experience and skills, and so I decided to apply. Thankfully, I was selected as one of the inaugural data fellows, leading to a journey of professional and personal development that has thus far surpassed my initial expectations.
One of my first tasks in the program was meeting with different librarians at FSU Libraries. In these meetings I was able to learn about different methods and applications for data analysis in a variety of disciplines. For example, I learned that the Digital Humanities Librarian uses a text-mining software to find specific words from books published in the 1800s. She used the data drawn from the software to analyze certain traits of the story by counting the amount of times a character participates in an interaction of this type. This experience helped me realize that qualitative data sets can be used to draw similar conclusions about a study as quantitative data.
Another concept that I have become familiar with while working as a Data Fellow is open data. We discussed this concept during a workshop where we talked about the potential benefits of making research data openly accessible to the wider research community. Initially, I was hesitant regarding the concept of open data, because I saw academic research as a “race” to find a solution to a given problem. However, further discussion of how researchers are compensated for sharing their data made me realize that it is possible to benefit from open data on a personal and global level.
Currently, I am still learning about the many different types of data, its definitions, applications, and its importance. I am also working on developing an open source Canvas module on MATLAB where I explain the basics of the math based programming language in a student friendly manner. I look forward to sharing more about this work in the future!
For Love Data Week 2022, we are highlighting our FSU STEM Libraries Data Fellows! These posts, written by the fellows themselves, tell their stories of how they became interested in data-related work and their experience as a data fellow to this point. Today’s post is contributed by William-Elijah Clark.
It’s hard to say exactly when I first got interested in data. After all, my mother was a statistician, so I’ve always been surrounded by data since I was in elementary school — from Arkansas Department of Health public health and mortality statistics to Disney World focus groups and market research. Personally, I started liking statistics when I took UCF’s equivalent to QMB 3200 and Econometrics. This experience extended into being a research assistant at UCF, and even into conducting and monitoring surveys at Universal Orlando Resort! Through my Econometrics course and from additional professional development opportunities at Universal, I was also able gain experience with R (although I didn’t learn it to the extent that I would call myself a professional data analyst or a data scientist.)
Due to the COVID-19 pandemic and subsequent lockdowns in Orlando back in 2020, I decided to go back to school here at Florida State University for Statistics, especially considering that FSU has a SAS coding certificate! Overall, I came to Florida State University with over two years of professional survey experience between academia and hospitality industry work.
I spent time in 2020 taking calculus courses and statistics electives here at FSU to hone my data analysis skills further. I then saw an opportunity to apply for a FSU Libraries data fellowship beginning in Fall 2021. I decided to apply, as this position would give me the opportunity to utilize some of the skills I obtained from my previous positions and coursework at UCF and FSU, and hopefully develop some new skills to further myself in my goals of becoming a data analyst (and hopefully even an econometrician).
So far in my fellowship here at FSU Libraries, I have had the opportunity to gain some experience with MATLAB and SQL through the Data @ Your Desk workshops at Dirac, as well as some experience writing surveys in Qualtrics (as opposed to just conducting and monitoring surveys). I’ve also had the opportunity to learn more about citation management, library research, and data management. I’ve even been able to explain concepts for MS Excel to a patron via the online “Ask a Data Librarian” feature on the FSU Libraries website. This all said, I’m looking forward to applying some of my previous R coding and statistical analysis skills to some survey data for FSU Libraries this semester.
It’s once again time for Love Data Week! LDW is a yearly, international outreach event taking place the week of Valentine’s Day (February 14-18 this year). The week is focused on promoting good data stewardship and best practices around working with and interpreting data. LDW was started in 2015 and is currently celebrated by academic libraries and data organizations around the world. While every institution celebrates in their own way, common activities include data workshops, social media outreach, and more!
Each year, a theme is chosen around which organizations can theme their Love Data Week activities. For 2022, the theme is “Data is for everyone.” This year, we are shining a light on the “people-side” of data, and on how different folks use and interact with data. Data often means something different to everyone, and how someone interacts with data varies based on their chosen discipline, research project, life experiences, and their own beliefs and values. There are also often inherent biases that exist in data collection, analysis, and interpretation, which can affect one’s own impression of a dataset. Despite these differences, the ability to critically evaluate data and interact with it is a universal skill that is crucial for everyone.
As technology continues to evolve, the infrastructure needed to run this technology gets more and more sophisticated. Processes and tasks carried out by personal computers, smartphones, and appliances are increasingly automated and run with minimal input from the user. This is made possible through code that is developed with one or more computer programming languages. However, with the increase in the quantity of software and programming applications, the demand for programmers and the number of languages they are required to learn has increased. Furthermore, many employers now require skills in data analysis and computer programming as prerequisites for job applications. In this blog post, we will discuss the most in demand languages in the market and give a brief explanation of each. (Grand Canyon University 2020; Jiidee 2020; Meinke 2020; University of California – Berkeley, n.d.)
Maybe you’re on Twitter one day and search ‘#Statistics’ to look up some information for your Introductory Statistics course. Before you know it, you scroll through and see several tweets that are also marked with ‘#BigData’, and you’re left with more questions than you had when you started your search. Maybe you try to search for “big data” on Google, see the definition from Oxford, and are then left with even more questions:
How large is “extremely large?”
What kind of patterns, trends, and interactions are we talking about?
What isn’t big data?
Big data as a term has become synonymous with the growth of digital data and the glut of information available to researchers and the public. Furthermore, there is a growing interest by both the public and private sector in utilizing large datasets to provide insight into market trends and to improve decision making. However, the exact definition of big data is sometimes unclear and can vary widely depending on who you ask. Businesses, nonprofit organizations, government agencies, and academic researchers each view big data in a different context and with different goals for its use. (University of Wisconsin Data Science, n.d.)
Above: a Google Trends graph that shows the number of searches for the term “Big Data” from 2007 to 2017
In this blog post, we aim to provide clarity and insight into the origins and definitions of big data. We will also discuss the potential benefits and challenges surrounding big data. In doing so, we will provide some examples linking big data to applications or data that you may interact with on a daily basis.
Welcome to the third post in the Get Data Lit! blog series. This post will focus on my experience working as a STEM Research Data Services Associate with FSU Libraries during the 2020-2021 school year. In this role, I assisted with outreach and education to FSU students, groups, and organizations at Florida State University around STEM research data services.
My name is Paxton Welton and I will be graduating with a bachelor’s degree in Finance this semester. One question that you might have right from the start-why is a finance major working in a STEM-focused role?
When applying for jobs prior to this academic year, I knew I wanted a role that would challenge me and allow me to develop new skills. I believed that being the Research Data Services Assistant would provide me the appropriate level of challenge and opportunity that I was looking for. By and large, I believe that my experience provided me with just that. There was a major learning curve that I faced when I first started this role. While I had a grasp of the basics of data literacy and research data services, I quickly realized I did not know nearly enough to be able to properly speak to student groups about these topics. During the first few weeks of the fall semester, I spent a significant portion of my time getting a stronger understanding of data and everything FSU STEM Libraries had to offer to its students in regards to research data. By reading countless articles about data literacy and engaging in weekly discussions with my supervisor Dr. Nick Ruhs, the STEM Data & Research Librarian, I became confident in my working knowledge on these topics.
As the STEM Research Data Services Assistant, one of my main responsibilities was conducting targeted outreach to different student organizations across campus. When I first started this process I reached out specifically to STEM-focused groups. This process involved me initiating conversations via email with registered student organizations (RSOs) to introduce them to the research data services FSU Libraries offers them. In several cases, we were invited to meet and/or present synchronously to these groups. This gave us a chance to share more in-depth information about our services and just how valuable they are to students. It also gave students a chance to ask us any questions they may have. Getting the chance to directly interact with students and help them find the right resources to feel more prepared for their future was by far my favorite part of this role.
I also had the opportunity to contribute to data-related events hosted by FSU STEM Libraries. Two examples include Love Data Week in February and the Virtual FSU Libraries Data Services Quest in March. My involvement in these events allowed me to see the entire process of creating programming for students. I was able to sit in on brainstorming meetings, give my input on the marketing materials, and create content for the events.
One of my main focuses throughout this year has been to develop and create this blog series you are reading right now–Get Data Lit! The focus of this blog series was data literacy and its applicability to student’s educational experiences. As such, I had the chance to put into practice the new data literacy skills I learned in this role. I also had the opportunity to connect data literacy to real-world practice and explain the importance of critically evaluating data. Doing so made me realize just how important learning data skills are for my future career and education.
One thing that proved to be a common theme throughout all the work I was doing is that data is powerful and knowing how to work with it is even more powerful. From a career in law to a career in fashion, you are going to be working with data in some form. Learning how to critically evaluate data is going to give you the skills you need to stand out in the future.
By taking on a job in a discipline that I knew very little about, I was able to challenge myself and make the most out of this past year. From getting to work on student programming events to developing a blog series, I was constantly challenged and learning something new.