2023 Research Data Access & Preservation (RDAP) Virtual Summit Reflections

The Research Data Access and Preservation (RDAP) Summit is an annual conference focusing on management, access, and preservation of research data that brings together professionals and students from various fields such as library science, data management, and research data specialists.  As a graduate assistant, I was lucky enough to have FSU Libraries sponsor my registration for the virtual conference, allowing me to attend the RDAP Summit for the first time in my professional career. The 2023 summit offered attendees a wide range of sessions, workshops, and networking opportunities.

As a virtual conference the RDAP 2023 summit was hosted on the comprehensive and digital platform, Whova,  which enabled attendees to network, access conference materials, and attend presentations seamlessly. The platform was used to provide attendees with the most updated information about the conference, including schedules, speakers’ profiles, and session descriptions.

One of the features that really stood out to me and which made networking seamless was the Whova community board. The community board allowed attendees to connect with other professionals and students in their field based on demographic information provided by LinkedIn, which harmoniously connected to the Whova platform. Attendees could post questions, comments, and ideas, as well as view and respond to others’ posts under discussion threads with specific topics ranging from personal to professional. The community board was a great way for attendees to exchange ideas, establish new professional relationships, and keep up-to-date on the latest developments in the research data field.This also included a thread with several listings for employment opportunities as an information professional. This thread was perhaps the busiest as countless positions were listed by other attendees at the conference that you could interact and engage with on a more personal level than would typically be possible in normal circumstances. Whova’s platform also provided attendees with the ability to create their own virtual business card, making it easy to exchange contact information with other attendees. Attendees could easily share their business card with other attendees, and they could save other attendees’ information to their contacts list.

One other feature that made attending the conference presentations seamless was the ability to create a personalized schedule. I was able to select the sessions and workshops I wanted to attend before the conference even began to ensure that I did not miss any important sessions. Since all of  the presentations were all hosted on Whova, rather than an external service such as Zoom, the schedule provided an immediate access point to presentations. Because of the direct interconnectedness of the platform to the conference panels, access to conference materials such as presentations, posters, and other materials were readily available and easy to locate. 

The aspects of community building were made abundantly clear by the different opportunities to network or even share your own scholarly work. This also included the conference presentations, which highlighted the latest trends, challenges, and opportunities in research data access and preservation. The continuing need for open communication and collaboration between academic libraries nationwide through similar values that shape the world of open science and data today was abundantly evident.

One presentation that demonstrated these collectivized efforts was the first session that I attended, which focused on teaching and outreach. Ruby MacDougall, who serves as an analyst for Ithaka S+R, discussed how the infrastructure to support digital research is unevenly distributed, as the connecting links between steps in the research workflow are often weak or missing. Ruby described how data librarians from a range of institutions are working to create stronger ties to humanities researchers and identify strategies for helping humanists navigate the digital infrastructure.

For some context, Ithaka S+R is a nonprofit research organization that helps academic institutions and cultural organizations sustainably navigate the digital age. They offer a wide range of research, advisory, and consulting services to help institutions make informed decisions that enhance their missions, workflows, and user experiences. The organization conducts research on key issues facing universities and colleges, such as the impact of technology on teaching and learning, student success, and faculty development. They also work with institutions to develop strategic plans and make data-informed decisions that align with their goals and values.

This presentation was also significant and meaningful because Nicholas Ruhs, the Research Data Management Librarian for FSU Libraries (who also serves as my supervisor), is currently participating and representing Florida State University amongst the other academic institutions active in the study. At this juncture, the study is in the preliminary phases of creating an inventory of university data services by reviewing web content of various departments and offices across campus to see what services exist and where in order to create a map of all of the data services on campus. On the surface, it may appear that all of the necessary mechanisms for supporting digital research with proper data management at a university level are in place, but the connecting links between steps in the research workflow are often weak or missing. Mapping out these services will allow FSU Libraries and libraries at other institutions to better coordinate their efforts at addressing the research and scholarly needs of their students and faculty. 

Speaking of accessibility, RDAP made  a significant effort to diversify their presentations, but also keep them organized and efficient. The posters portion of the RDAP Summit was an opportunity for researchers and practitioners in the research data field to showcase their work in a static and asynchronous  format. The poster format gave presenters an effective method to communicate complex ideas and research findings in a clear and concise way, and they offered a chance for attendees to engage with presenters and ask questions about their work, or to view the posters at their own availability and discretion. Because the poster presentations had their own section, conference attendees could visit them at any time and even start a conversation or ask questions to the presenters. Even now after the conference has ended, I can still access these posters as they exist in a digital collection. 

One of my biggest takeaways from the poster presentations was again the emphasis on collaboration and community-building in the research data field. Many posters showcased partnerships between academic institutions, libraries, and other organizations to develop and implement data management plans and policies. Others highlighted the importance of building networks and communities of practice to support data sharing and reuse. The diversity of research and practice in the field of research data was also on display with the posters covering a wide range of topics, from data management and preservation to data sharing and reuse, as well as the ethical and social implications of research data. For example, one poster presented a framework for ethical data sharing in the social sciences, while another addressed the challenges of incorporating Indigenous perspectives and knowledge into data management and preservation practices.

Furthermore, one of the most discussed topics at the conference was the new NIH and OSTP guidelines on data management and sharing. The guidelines  present both opportunities and challenges for researchers, institutions, and stakeholders in the research community. The policy changes aim to improve the transparency, reproducibility, and efficiency of research by requiring grant applicants to include data-management plans and make their research data publicly available. One of the main challenges of compliance is the need for researchers to have the necessary skills and resources to manage and share their research data effectively. This can involve issues such as data formatting, storage, documentation, and curation, as well as ethical and legal considerations related to data sharing and privacy. To address these ongoing obstacles, universities and other research institutions are responding by developing Research Data Management support services and infrastructure to help researchers manage their data throughout the research lifecycle. These can include data-management planning tools, data repositories, data curation services, and training and support for researchers on data management and sharing best practices. Researchers must ensure that data sharing is done in a way that protects the privacy and confidentiality of research participants and respects intellectual property rights. 

While NIH and OSTP have issued guidelines and policies to address these issues, not everything has been made clear as the policies are still quite recent or new. NIH and OSTP are responding to inquiries and questions arising from these policy changes and expectations by providing guidance and support to researchers and institutions. NIH has launched a website on data management and sharing, which provides resources and guidance on data management planning, data repositories, and data sharing policies. OSTP has also issued a public access policy memo that outlines the key principles and expectations for data management and sharing across federal agencies. However, as one of the presenters pointed out, specific questions arise and exceptions that are listed in the new policy mandates may not always be clear, or even come into direct conflict with other policies already implemented. Additionally, not all of the information put forth is available within the policy itself. Abigail Goben, associate professor and data management librarian for the University of Illinois Chicago, discussed the rabbit hole she went down searching for the information her researchers needed relating to patent protection and open data sharing. She ultimately utilized guidance issued from the prepared remarks of Director Taunton Paine in September 2022 over a NIH Training Webinar and followed up with an email directly to the Sharing Policies & Data Management and Sharing Policy Implementation Team in order to get the proper information. However their response provided potentially conflicting guidance as well as information not listed or available on the sharing.nih.gov website.

Overall, conferences such as these open the door to connect and hear about the experiences of others in the profession. In so doing we continue the spread of information and ideas, some of which are not always readily or easily accessible to those who need it. Attending the Research Data Access and Preservation (RDAP) Summit 2023 was an amazing opportunity for professionals and students interested in the management, access, and preservation of research data. Discussions that address the research and scholarly needs of students and faculty highlighted the need for open communication and collaboration between academic libraries nationwide. The presentations were diverse, efficient, and organized, and the posters provided an opportunity for attendees to engage with presenters and ask questions about their work. The RDAP Summit 2023 was a great success, and I highly recommend it to anyone interested in research data management in the coming years.

Evaluating Data Through a Critical and Ethical Lens

Introduction

Data literacy is the combination of a few unique skill sets: statistical literacy, information literacy, and technical proficiency. It also involves being able to visualize, critically evaluate, determine the accuracy and reliability of, and understand data sets. There are many reasons why it is important to be data literate, especially in recent years with the advent of the internet and social media. Data literacy is also crucial to many different industries and research areas. It is important to interpret the data that you are collecting to make sure that the results are accurate and to be able to understand that data so that you can create useful visualizations for others. 

There are a variety of concepts to keep in mind when critically evaluating data. For example, you need to consider the methods that were used to collect the data and whether those methods are ethical. Furthermore, when evaluating how the data is presented, you need to consider whether that representation or visualization is the most accurate way to portray the data. Another particular topic of concern is bias. There are different points at which biases can be introduced, such as when data is collected, when it is analyzed, and when it is shared with the public. Also, if you are critically evaluating your own data, it is important to check that there are no biases within your own work. In this post we will be discussing the critical evaluation of data through the lens of data collection, data presentation and visualization, and data ethics. 

Data Collection

In the context of data collection, several different collection methods can be used for research. Some of these methodologies, such as focus groups, surveys, and participant interviews, are familiar to the public at large. However, there are other specific data collection processes that many people outside of certain academic disciplines may not be aware of, such as web scraping/text mining, phlebotomy procedures for blood tests, observational behavior recording for time series data, and many more.

Consequently, not only is recording the data itself of importance for experimental duplication purposes, but it can also be important for interdisciplinary work. Some fields of research may have different research data collection methods that researchers in other fields may not be aware of, even across seemingly similar disciplines. For example, accounting and finance may seem similar but can have drastically different ways of interpreting monetary data. The way accountants and financial analysts calculate when a company is at a net zero (i.e., a break-even) between revenues and costs is different. Even within the same field of research, transparency with how data is collected is important for peer review – whether it be for ethics accountability or determining methodological flaws within research. An incomplete set of data can make it difficult or impossible to know whether or not the data was collected in a way to prevent bias, and further make it impossible to know if the data is accurate and/or precise.

 Failing to document data and data collection methods can also create problems reproducing or using the data for further research, particularly if things such as question types, experiment conditions, and units of measure are not properly documented. For example, while the hypothetical idea of cold fusion (nuclear fusion performed at room temperature) would be a low-cost energy solution, the experimental methods and data were not recorded. As a result, the concept of cold fusion is now widely looked at with skepticism because none of the data was recorded! A less extreme case where incomplete data may cause research problems is that the way that a survey is constructed can bias responses. Therefore, documenting how a survey was written can be helpful in evaluating why a research study came to a specific conclusion, as well as testing whether or not changing questions or even question order would change results.

Furthermore, data cleaning – which is the process in which things such as incorrectly formatted data, corrupted data, etc are reformatted or fixed so that it can be used in analysis – can also contribute to statistical bias(es) via things such as eliminating outliers, accidentally losing a variable, how you decide to categorize your data, and more. Therefore, documenting how you clean your data is also a critical component of research – explaining what outliers you decided to keep or remove and why can help you and other researchers down the road. It is also important to consider the order questions are asked in and the way questions are worded when conducting surveys. While it might seem counterintuitive at first, the way that questions are ordered and worded can impact the percentages of people that respond in a certain way, whether or not potential participants qualify for research projects, and even the numeric values of the data itself.

Data Presentation and Visualization

 Most have probably heard the phrase “label your axes” at some point, even before college. It is often mentioned in K-12 education, with the pretense being that someone will not know what your graph(s) are depicting without them. While this is indeed correct, labeled axes constitute only one of many different components of data presentation and visualization.

Figure 1: Axes that are labeled!

A good place to start on the types of ways that data visualizations can be best implemented would be The Data Visualisation Catalogue. While the site was originally established with graphic designers in mind, Severino Ribeccca himself stated I felt it would also be beneficial to…anyone in a field that requires the use of data visualisation.”(Ribecca n.d.) As such, almost anyone who uses data typically has to consider how to visually communicate data in a way to an outside audience, or even the general public outside of the realm of academia. A nifty feature of The Data Visualisation Catalogue is that there is a way to filter recommended data visualization types by what concept you are trying to demonstrate.

One consideration when looking at a data visualization is whether the data is represented in a way that is appropriate for that specific data type. While it might not seem like the data presentation would differ between data types, certain visualizations will serve to more accurately and sufficiently depict different types of data. For instance, data related to time and Geographic Information Systems mapping produce distinct data types. While they can be combined and represented in the same graphic (i.e., how has the land of a certain area changed over time?), they both have their own distinct issues to consider to make sure that you are not creating misleading graphics. Namely, one cannot make a map with time data alone, and a map would be hard to make with a line graph that is meant to show trends in time.

Furthermore, the scales and units that are utilized in a data representation are also important considerations! Using our previous example, we can note that the visual scales of a map are different from the visual scales of time series data. For instance, you can get drastically different data visualizations if you transform data from a linear scale to a logarithmic scale (i.e., a scale that plots data based on what exponent would be needed to get your number back). This can be useful for situations where the data you are working with is so large that it is hard to see everything in an efficient way. For example, a logarithmic scale of time where millions of years are condensed into smaller numbers that are easier to conceptualize leads to graphs where you can see things like different geographical eras.

On a more human scale, while logarithmic data could be used to misrepresent data, a far more common tactic for misrepresenting data involves a truncated or broken axis on a graph (Figures 2a and 2b); a truncated graph deliberately not starting at zero on the y-axis, and a broken axis subtly skipping a large amount of units.  This is a common tactic that is present in some graphics that news outlets might use, whether it is intentional or not. Some other characteristics of misrepresented data might be plotting two graphs that are not on the same scale or zooming your scale in to make a trend look far larger than it truly is.


Figures 2a and 2b: Graphical Examples of a graph with a broken axis and a graph with a truncated axis, respectively

While there are many examples of distinctly misleading graphs, there are also many graphs that accurately portray the data, but use an incompatible or inaccessible color palette. Related to this, many color palettes used in data visualizations can be inaccessible to those with vision impairments such as green-red and blue-yellow color blindness. Utilizing distinct color-blind friendly palettes can help to make visualizations more accessible. Furthermore, using alt-text descriptions of what the graph is showing enhance the ability of screen readers and other tools utilized by those with low-vision and blindness to interpret the visualization. Thus, being hard to see or just looking aesthetically displeasing does not make a graph misleading, and is an important distinction to make (although the two are not mutually exclusive!)


Figure 3: A “Painbow” Graph

Data ethics

When examining a dataset, it is also important to consider whether there are any biases present that may affect interpretation of the data. Two common categories of biases are cognitive biases and statistical/algorithmic biases.  Cognitive biases involve individuals interpreting the results of a study to best fit a specific narrative. This may involve a data producer deleting data that does not fit the conclusion that they are trying to prove. At the same time, a data producer may also add data that is not accurate in an attempt to strengthen their claims. Furthermore, studies may be designed to collect data that only represents a small subset of a population, while claiming to be representative of the entire population. 

Similar to cognitive biases, statistical/algorithmic biases describe the concept of bias as your sample poorly describing your population. In that context, it is significantly mitigated (if not outright eliminated) if your data collection methods are not generally or statistically biased. This is particularly noticeable when examining artificial intelligence (AI) algorithms.  These algorithms are often trained with unequal datasets, which then leads to skewed results when performing data analysis with said algorithms. Therefore, when examining data that is outputted by an algorithm, one should consider whether the algorithm has been trained with accurate and equal data sets. An industry where statistical and algorithmic biases are extremely important to consider is the healthcare industry. For example, many hospitals use artificial intelligence to sort through patient data, which helps doctors determine who needs immediate emergency attention. While there are many benefits to such algorithms, there have been issues in the past because of them. In certain instances, if a patient has pre-existing medical conditions that affect their health, the algorithm will not be able to take that into account. In addition, many algorithms that are commonly used in healthcare systems are racially and gender biased. As mentioned in “Algorithmic Bias in Health Care Exacerbates Social Inequities — How to Prevent It” written by Katherine Igoe, “algorithms in health care technology don’t simply reflect back social inequities but may ultimately exacerbate them.” Igoe also mentions that certain prediction algorithms used for detecting heart diseases in the medical industry were biased in their design. For example, the “Framingham Heart Study cardiovascular risk score” worked very well for caucasion patients, but not for African American patients. This is due to the fact that around 80% of the collected data used for this algorithm was from caucasian patients. Utilizing such an  unequal dataset to train the algorithm can lead to unequal care and treatment in medical practices (Igoe).  This example is just one of the many examples of bias due to  algorithm design. 

Companies such as Amazon have also faced huge problems relating to algorithm bias. A few years ago, Amazon tried to utilize an algorithm that used artificial intelligence to hire new employees. However, it turned out that this algorithm was biased against women. This is because the algorithm was trained on resumes that were submitted during a time period where the number of male applicants was significantly higher than the number of female applicants. This ultimately caused the algorithm to be trained to favor men over women.

Conclusion

Critical evaluation of data is an extremely important skill set for any student or professional to have. Knowing the importance of checking the reliability, accuracy, and the bias in any data set is necessary when reading or working with data. Some questions to keep in mind are: is the collection method clear and documented? Is the data visualization appropriate for the dataset and for what the author is trying to represent? Is the data biased in the collection or visualization stages? It is important to evaluate data to ensure that we are using quality and accurate data to make sound decisions and conclusions. 

Works Cited

This blog post was written by William-Elijah Clark (Senior STEM Data Fellow) and Reagan Bourne (STEM Data Fellow) from FSU Libraries.

Summer Tutoring Opens Today

Join us this summer for help with numerous core chemistry, math, and physics classes.

Our free service does not require appointments! Simply drop in anytime you need assistance and our tutors will be happy to help. All tutoring during summer 2021 will happen online through Zoom, and you can find more information about the service via our Online Tutoring page.

Our summer hours are Monday, Tuesday, and Wednesday from 8pm to 11pm.

For questions or to request additional information, please email lib-tutoring@fsu.edu.

My Experience as a STEM Research Data Services Assistant

By: Paxton Welton

Welcome to the third post in the Get Data Lit! blog series. This post will focus on my experience working as a STEM Research Data Services Associate with FSU Libraries during the 2020-2021 school year. In this role, I assisted with outreach and education to FSU students, groups, and organizations at Florida State University around STEM research data services. 

My name is Paxton Welton and I will be graduating with a bachelor’s degree in Finance this semester. One question that you might have right from the start-why is a finance major working in a STEM-focused role? 

When applying for jobs prior to this academic year, I knew I wanted a role that would challenge me and allow me to develop new skills. I believed that being the Research Data Services Assistant would provide me the appropriate level of challenge and opportunity that I was looking for. By and large, I believe that my experience provided me with just that. There was a major learning curve that I faced when I first started this role. While I had a grasp of the basics of data literacy and research data services, I quickly realized I did not know nearly enough to be able to properly speak to student groups about these topics. During the first few weeks of the fall semester, I spent a significant portion of my time getting a stronger understanding of data and everything FSU STEM Libraries had to offer to its students in regards to research data. By reading countless articles about data literacy and engaging in weekly discussions with my supervisor Dr. Nick Ruhs, the STEM Data & Research Librarian, I became confident in my working knowledge on these topics. 

As the STEM Research Data Services Assistant, one of my main responsibilities was conducting targeted outreach to different student organizations across campus. When I first started this process I reached out specifically to STEM-focused groups. This process involved me initiating conversations via email with registered student organizations (RSOs) to introduce them  to the research data services FSU Libraries offers them.  In several cases, we were invited to meet and/or present synchronously to these groups. This gave us a chance to share more in-depth information about our services and just how valuable they are to students. It also gave students a chance to ask us any questions they may have. Getting the chance to directly interact with students and help them find the right resources to feel more prepared for their future was by far my favorite part of this role.

I also had the opportunity to contribute to data-related events hosted by FSU STEM Libraries. Two examples include Love Data Week in February and the Virtual FSU Libraries Data Services Quest in March. My involvement in these events allowed me to see the entire process of creating programming for students. I was able to sit in on brainstorming meetings, give my input on the marketing materials, and create content for the events.

One of my main focuses throughout this year has been to develop and create this blog series you are reading right now–Get Data Lit! The focus of this blog series was data literacy and its applicability to student’s educational experiences. As such, I had the chance to put into practice the new data literacy skills I learned in this role. I also had the opportunity to connect data literacy to real-world practice and explain the importance of critically evaluating data. Doing so made me realize just how important learning data skills are for my future career and education.

One thing that proved to be a common theme throughout all the work I was doing is that data is powerful and knowing how to work with it is even more powerful. From a career in law to a career in fashion, you are going to be working with data in some form. Learning how to critically evaluate data is going to give you the skills you need to stand out in the future. 

By taking on a job in a discipline that I knew very little about, I was able to challenge myself and make the most out of this past year. From getting to work on student programming events to developing a blog series, I was constantly challenged and learning something new. 

7 LinkedIn Learning Skills to Master This Summer

Hi everyone, this is Courtney again, the STEM Libraries GA, along with Emily McClellan, the STEM Libraries Outreach Associate, to talk about ways we can continue our learning and professional development throughout what promises to be a unique semester. It’s often said that we should try to control how we react to the things we can’t control. While that’s a lot easier said than done, we wanted to share some opportunities that you may find helpful while continuing to learn and grow throughout the summer.  While the world is constantly shifting and changing around us, finding stability can be hard. If you’re looking for a professional goal you can achieve this summer, try a LinkedIn Learning training to keep you grounded and focused as we continue to work from home. 

Continue reading 7 LinkedIn Learning Skills to Master This Summer

Introducing our Newest Librarians

Continuing the series, here are two more of our new librarians.

Renaine Julian – Data Research Librarian

Hi folks. My name is Renaine and I’m the Data Research Librarian at FSU. I’m a three time FSU alum and I couldn’t be happier to be back on campus! Before starting my current position, I worked for the Libraries for about five years as a student worker and, later, as a staff member before heading over to the state-wide library consortium, The Florida Virtual Campus.

The Data Research Librarian is a new position and I’m responsible for creating a new suite of services for students and faculty related to quantitative data as well as the management of research data. That being said, I can help you find data as well as figure out what to do with it once you have your hands on something useful. If you’re creating large datasets for your research, you’ll need a plan for managing that information and, in many cases, making it available to others. I’m working with other folks in the Libraries and around campus to develop data management consulting services to assist you in planning to keep your research intact, findable and usable.

I’m also the subject specialist for Economics, Geography, and Urban and Regional Planning. My research interests include: data management, data visualization, open data, emerging technologies and digital libraries. I work in the Scholars Commons which is located in Strozier’s basement. Please come by and say hello.

Contact Renaine – rjulian at fsu.edu

[Editors note – photo coming soon! That’s how new Stacey is!]

Hello! My name is Stacey Mantooth and I am a new addition to the library staff at Dirac Science Library. Before joining Florida State University, I earned my MSLS at the University of North Carolina at Chapel Hill and worked at the EPA Library at Research Triangle Park in North Carolina. While I’ve lived in several states around the Southeast and Midwest, this is my first time living in Florida, and I’m excited to see what Tallahassee has to offer.

As the liaison to the Chemistry and Biochemistry and Earth, Ocean, and Atmospheric Science departments, I help students and faculty with research activities like finding journal articles, writing literature reviews, patent searching, or managing data. I also help make decisions about what materials the Libraries buy or keep for these subjects. In addition to my regular library and liaison work, I’m interested in doing research on the information needs of STEM faculty and students on campus. Studying which information researchers need, knowing how they go about getting it, and understanding how they view the research process could lead to improved University services and greater STEM success.

Contact Stacey – smantooth at fsu.edu