2023 Research Data Access & Preservation (RDAP) Virtual Summit Reflections

The Research Data Access and Preservation (RDAP) Summit is an annual conference focusing on management, access, and preservation of research data that brings together professionals and students from various fields such as library science, data management, and research data specialists.  As a graduate assistant, I was lucky enough to have FSU Libraries sponsor my registration for the virtual conference, allowing me to attend the RDAP Summit for the first time in my professional career. The 2023 summit offered attendees a wide range of sessions, workshops, and networking opportunities.

As a virtual conference the RDAP 2023 summit was hosted on the comprehensive and digital platform, Whova,  which enabled attendees to network, access conference materials, and attend presentations seamlessly. The platform was used to provide attendees with the most updated information about the conference, including schedules, speakers’ profiles, and session descriptions.

One of the features that really stood out to me and which made networking seamless was the Whova community board. The community board allowed attendees to connect with other professionals and students in their field based on demographic information provided by LinkedIn, which harmoniously connected to the Whova platform. Attendees could post questions, comments, and ideas, as well as view and respond to others’ posts under discussion threads with specific topics ranging from personal to professional. The community board was a great way for attendees to exchange ideas, establish new professional relationships, and keep up-to-date on the latest developments in the research data field.This also included a thread with several listings for employment opportunities as an information professional. This thread was perhaps the busiest as countless positions were listed by other attendees at the conference that you could interact and engage with on a more personal level than would typically be possible in normal circumstances. Whova’s platform also provided attendees with the ability to create their own virtual business card, making it easy to exchange contact information with other attendees. Attendees could easily share their business card with other attendees, and they could save other attendees’ information to their contacts list.

One other feature that made attending the conference presentations seamless was the ability to create a personalized schedule. I was able to select the sessions and workshops I wanted to attend before the conference even began to ensure that I did not miss any important sessions. Since all of  the presentations were all hosted on Whova, rather than an external service such as Zoom, the schedule provided an immediate access point to presentations. Because of the direct interconnectedness of the platform to the conference panels, access to conference materials such as presentations, posters, and other materials were readily available and easy to locate. 

The aspects of community building were made abundantly clear by the different opportunities to network or even share your own scholarly work. This also included the conference presentations, which highlighted the latest trends, challenges, and opportunities in research data access and preservation. The continuing need for open communication and collaboration between academic libraries nationwide through similar values that shape the world of open science and data today was abundantly evident.

One presentation that demonstrated these collectivized efforts was the first session that I attended, which focused on teaching and outreach. Ruby MacDougall, who serves as an analyst for Ithaka S+R, discussed how the infrastructure to support digital research is unevenly distributed, as the connecting links between steps in the research workflow are often weak or missing. Ruby described how data librarians from a range of institutions are working to create stronger ties to humanities researchers and identify strategies for helping humanists navigate the digital infrastructure.

For some context, Ithaka S+R is a nonprofit research organization that helps academic institutions and cultural organizations sustainably navigate the digital age. They offer a wide range of research, advisory, and consulting services to help institutions make informed decisions that enhance their missions, workflows, and user experiences. The organization conducts research on key issues facing universities and colleges, such as the impact of technology on teaching and learning, student success, and faculty development. They also work with institutions to develop strategic plans and make data-informed decisions that align with their goals and values.

This presentation was also significant and meaningful because Nicholas Ruhs, the Research Data Management Librarian for FSU Libraries (who also serves as my supervisor), is currently participating and representing Florida State University amongst the other academic institutions active in the study. At this juncture, the study is in the preliminary phases of creating an inventory of university data services by reviewing web content of various departments and offices across campus to see what services exist and where in order to create a map of all of the data services on campus. On the surface, it may appear that all of the necessary mechanisms for supporting digital research with proper data management at a university level are in place, but the connecting links between steps in the research workflow are often weak or missing. Mapping out these services will allow FSU Libraries and libraries at other institutions to better coordinate their efforts at addressing the research and scholarly needs of their students and faculty. 

Speaking of accessibility, RDAP made  a significant effort to diversify their presentations, but also keep them organized and efficient. The posters portion of the RDAP Summit was an opportunity for researchers and practitioners in the research data field to showcase their work in a static and asynchronous  format. The poster format gave presenters an effective method to communicate complex ideas and research findings in a clear and concise way, and they offered a chance for attendees to engage with presenters and ask questions about their work, or to view the posters at their own availability and discretion. Because the poster presentations had their own section, conference attendees could visit them at any time and even start a conversation or ask questions to the presenters. Even now after the conference has ended, I can still access these posters as they exist in a digital collection. 

One of my biggest takeaways from the poster presentations was again the emphasis on collaboration and community-building in the research data field. Many posters showcased partnerships between academic institutions, libraries, and other organizations to develop and implement data management plans and policies. Others highlighted the importance of building networks and communities of practice to support data sharing and reuse. The diversity of research and practice in the field of research data was also on display with the posters covering a wide range of topics, from data management and preservation to data sharing and reuse, as well as the ethical and social implications of research data. For example, one poster presented a framework for ethical data sharing in the social sciences, while another addressed the challenges of incorporating Indigenous perspectives and knowledge into data management and preservation practices.

Furthermore, one of the most discussed topics at the conference was the new NIH and OSTP guidelines on data management and sharing. The guidelines  present both opportunities and challenges for researchers, institutions, and stakeholders in the research community. The policy changes aim to improve the transparency, reproducibility, and efficiency of research by requiring grant applicants to include data-management plans and make their research data publicly available. One of the main challenges of compliance is the need for researchers to have the necessary skills and resources to manage and share their research data effectively. This can involve issues such as data formatting, storage, documentation, and curation, as well as ethical and legal considerations related to data sharing and privacy. To address these ongoing obstacles, universities and other research institutions are responding by developing Research Data Management support services and infrastructure to help researchers manage their data throughout the research lifecycle. These can include data-management planning tools, data repositories, data curation services, and training and support for researchers on data management and sharing best practices. Researchers must ensure that data sharing is done in a way that protects the privacy and confidentiality of research participants and respects intellectual property rights. 

While NIH and OSTP have issued guidelines and policies to address these issues, not everything has been made clear as the policies are still quite recent or new. NIH and OSTP are responding to inquiries and questions arising from these policy changes and expectations by providing guidance and support to researchers and institutions. NIH has launched a website on data management and sharing, which provides resources and guidance on data management planning, data repositories, and data sharing policies. OSTP has also issued a public access policy memo that outlines the key principles and expectations for data management and sharing across federal agencies. However, as one of the presenters pointed out, specific questions arise and exceptions that are listed in the new policy mandates may not always be clear, or even come into direct conflict with other policies already implemented. Additionally, not all of the information put forth is available within the policy itself. Abigail Goben, associate professor and data management librarian for the University of Illinois Chicago, discussed the rabbit hole she went down searching for the information her researchers needed relating to patent protection and open data sharing. She ultimately utilized guidance issued from the prepared remarks of Director Taunton Paine in September 2022 over a NIH Training Webinar and followed up with an email directly to the Sharing Policies & Data Management and Sharing Policy Implementation Team in order to get the proper information. However their response provided potentially conflicting guidance as well as information not listed or available on the sharing.nih.gov website.

Overall, conferences such as these open the door to connect and hear about the experiences of others in the profession. In so doing we continue the spread of information and ideas, some of which are not always readily or easily accessible to those who need it. Attending the Research Data Access and Preservation (RDAP) Summit 2023 was an amazing opportunity for professionals and students interested in the management, access, and preservation of research data. Discussions that address the research and scholarly needs of students and faculty highlighted the need for open communication and collaboration between academic libraries nationwide. The presentations were diverse, efficient, and organized, and the posters provided an opportunity for attendees to engage with presenters and ask questions about their work. The RDAP Summit 2023 was a great success, and I highly recommend it to anyone interested in research data management in the coming years.

A Primer on Machine Learning and Artificial Intelligence

Introduction 

Artificial intelligence is a very broad topic that includes machine learning and deep learning. These terms are often used interchangeably with the assumption that they are all the same topic. However, while the terms are related, there are specific characteristics that differentiate between them. Deep learning is actually a subfield of machine learning, which is a subfield of artificial intelligence. Artificial intelligence involves developing computers that are capable of mimicking human cognitive functions and following through with specific tasks. Machine learning uses algorithms to recognize patterns and trends from previous data, and then uses this information to make real-world applications. The whole goal of artificial intelligence is to allow computers to work independently, without the need for humans to instruct and interact with them. There is a large variety of applications for artificial intelligence and machine learning, ranging from essentially every industry.  Artificial intelligence is widely used in the manufacturing, banking, and healthcare industries. In this blog post, we will go deeper into the definitions of artificial intelligence and machine learning, and their practical applications.

What is Artificial Intelligence?

 There are many different ways to define artificial intelligence, and over the course of several years, the definition has changed drastically. Alan Turing, who is often referred to as the father of modern computer science created a test known as the Turing Test in an attempt to answer the question “can machines think?” In this test, a human has to differentiate between a computer’s response to a question and another human’s response to the same question (IBM).  Furthermore, in “Artificial Intelligence: A Modern Approach”, Stuart Russel and Peter Norvig discuss a human approach vs. a rational approach to artificial intelligence. They discuss four different goals to pursue when designing artificial intelligence: systems that think like humans, systems that act like humans, systems that think rationally, and systems that act rationally. Each method or goal has its own advantages and disadvantages, and all of these methods are used today. An overall definition for artificial intelligence, that fits into these different goals, is that artificial intelligence allows machines to learn from previous experiences and information, and perform human-like tasks (SAS).

Along with the general definition described above, artificial intelligence can also be differentiated into  weak and strong artificial intelligence. Weak artificial intelligence, also known as narrow artificial intelligence is artificial intelligence that is programmed and trained for one task. Narrow artificial intelligence can not mimic a human as a whole, but rather certain aspects, and has very specific applications. For example, narrow artificial intelligence is used in Amazon Alexa, Google Home, personalized advertisements on social media, recommended songs on Spotify, and so many more. 

Strong artificial intelligence, also known as artificial general intelligence, focuses on creating a machine that can perform any cognitive task that a human can. In other words, a machine that can mimic a human. There are three main tasks that are critical to making an artificial general intelligence machine. The first is the ability to generalize knowledge (being able to use knowledge from a different area) and apply it to an issue or task. The second task involves  the ability to make a prediction based on prior knowledge and experiences, while the third and final task  is the ability to adapt to changes (Forbes). Notably, there are a lot of ethical arguments that come along with artificial general intelligence, and it can be argued that it is impossible to make a “strong” artificial intelligence. 

Overall, artificial intelligence can be used to add intelligence to preexisting technologies. It can perform tasks reliably, with much less error than a human, and faster than a human. Artificial intelligence can also adapt through progressive learning. In the future, artificial intelligence may have even more of an impact on our everyday lives, and we can learn so much from it. 

Real-Life Use Cases for Artificial Intelligence

Daily Tech Use

Depending on how much tech you interface with, you may be thinking: “Artificial Intelligence isn’t used for anything I do or use. Why would I need to know where AI is used?” To answer the question quickly…artificial intelligence is currently embedded in a lot of daily tasks that most people (possibly even you!) use.

Whether you’re trying to find something via Google, trying to decide on what you’d like to watch on Netflix, or trying to discover niche music genres on Spotify, all of these sites use algorithms via AI in order to deduce what you’re probably interested in looking at. (University of York n.d.) For example…if you’re a STEM major who happens to search for the phrase “R Programming” enough, Google will eventually pick up that you are most likely not looking for the history of how the letter R came to exist. Likewise, if you’re a linguistics major looking for how the modern letter R came to exist, you will most likely not get search results related to the R programming language. Of course, this isn’t the only situation where two people will get radically different search results. In fact, Google’s algorithmic presentation of information based on what you typically look for has a name — “filter bubbles”. The term was coined over a decade ago by political activist Eli Pariser. He demonstrated this phenomenon in a 2011 TED Talk with two different people searching for “Egypt” around the same time. While the conversation was predominantly about how filter bubbles impact politics and activism, it should be noted that filter bubbles would not exist without artificial intelligence behind them.  This said, being aware of how AI algorithms can influence what you see is an important aspect of civic engagement. This concept may become even more pertinent as newer chatbots present further issues, such as giving false information when asked certain questions. Thus, the implementation of AI is important for everyone.

For a less ominous use of modern AI, there are also applications with handwriting recognition software. Even with written English, a touch-screen interface combined with AI, image processing, and computer vision to convert handwriting into text-compatible notes. This can be extremely useful for transferring text data from one computer to another. While you could take a photo of your notes for someone else to look at, this might have limited use for finding words within the text after the fact – you would not be able to search for a keyword if it was only saved as an image. Further, a computer that can convert handwriting to typed text also allows someone to use a search engine without typing. This use of AI even extends beyond the English language. Handwriting recognition research has been used for several different languages, including non-Western languages such as simplified Chinese, Arabic, Thai, and more. As a consequence, handwriting recognition AI can bypass the need to type (a skill that is separate from writing and is even less common). Further, converting from hand-written text to computer text formats is also applicable to these languages, which can be used for translation AIs – while things such as Google Translate may not be the most reliable, they can serve in a pinch in situations such as a hospital ER.

AI in Economics and Finance

Economics and Finance also embrace technology to carry out their work. For example, technology is particularly relevant to detecting credit card and insurance fraud.  There are well-established ways to use mathematics and statistics to determine if someone’s financial accounts have been compromised. However, the conundrum that comes with modern finance and economics is that transactions happen at far, far faster speeds than humans can currently keep up with.  An AI algorithm can calculate the probability that a financial transaction was fraudulent far faster than a human could. Therefore, as long as the humans behind the algorithm have given their AI formulas to work with, faster processing speed is of great assistance in preventing modern-day fraud.

Likewise, AI is already the cornerstone of the modern foreign exchange market (also known as FOREX). While the concept of foreign exchange has existed since Antiquity, there are some additional considerations in contemporary times. Specifically, modern currencies are traded in significantly larger amounts and at faster speeds than anything before. In fact, modern FOREX is so large and so fast that a human being cannot efficiently or consistently make profits without AI tools! This is predominantly due to the majority of FOREX transactions being carried out by AI bots instead of humans.  A study commissioned by JPMorgan in 2020 determined that about 60% of all FOREX transactions were made by AI rather than humans! This is not to say that human involvement in FOREX is non-existent. Instead, the human role of a FOREX trader is no longer in the realm of physically placing trades, but in examining formulas and creating better and better code that a FOREX AI bot will operate with. Essentially, AI frees up time for human financiers to make analytical decisions as opposed to physically waiting or physically making trades…if so inclined. It should be noted that these applications of AI are still new, and often come with the risk of sudden price shifts wiping out short-term profits. 

AI in Healthcare

Artificial Intelligence also has applications in healthcare. It might be odd to think about how AI would impact something as physical as your own body, but there are already several cases where it can be used. 

For example, AI can be used to detect lethal drug interactions and make vaccines from scratch. For the former, researchers at Pennsylvania State University used AI to study what prescription drug combinations could cause liver damage. In the case of the latter, in 2019 researchers at Flinders University in Australia developed the first flu vaccine that was completely designed by artificial intelligence. Previously developed vaccines have been partially designed by AI, giving precedence to the first 100% AI-made vaccine. Furthermore, AI is used in physical machines developed for medicinal purposes – namely, via Robot-assisted surgery. While most robotic surgical systems are not 100% AI-driven, the very first instance of a surgical robot doing surgery by itself was back in 2006 (United Press International 2006)! This isn’t a commonplace practice at the moment, but robot-assisted surgery with human intervention is. Hence, it is worth considering whether or not medical science should completely automate surgery altogether, or use AI-surgical robots as collaborative machines. 

What is Machine Learning?

Machine learning is a subset of AI specializing in taking data and improving the accuracy of predictions using that data. For example, if the temperature increased by one degree Fahrenheit every day, a machine learning algorithm could use that data to predict that the temperature would keep increasing by one degree per day. This is arguably the simplest form of machine learning, called linear regression (as there is a linear relationship between the number of days and the temperature). However, machine learning can encompass a number of different ideas and models, even including items such as weather forecasts. 

Machine learning is used in many ways throughout our everyday lives, such as for Spotify/YouTube recommendations, stock market predictions, and advertisements. With more data being readily available every day, the potential applications of ML will only continue to increase. Creative destruction, in economics, is the concept that with new and better technology, some jobs may be lost in the short run. However, in the long run, productivity will increase, new jobs will be created, and living standards will increase. With AI potentially taking over some jobs such as customer service jobs, and some of those jobs being replaced by jobs requiring the coding of AI tools, creative destruction is taking place and will only continue to do so. Therefore, with ML taking over a large portion of the Internet today, it is fundamental to obtain an in-depth understanding of what it does. 

Machine learning can generally work in two ways: supervised and unsupervised learning. With supervised learning, a computer is trained with labeled data and can then use that data to make new predictions. For example, if we wanted to train a computer to recognize a picture of an apple, we would first need to input a large number of pictures containing apples and pictures that do not have apples. Then, we would appropriately label them. The computer would then take this data, make a model out of it, and predict whether or not something is an apple from a new picture. Unsupervised learning is generally used to cluster or group segments of data. For example, Spotify could use this type of ML algorithm to group listeners into certain categories. One potential grouping of the listeners could be hip-hop and rap, enabling Spotify to suggest hip-hop artists to rap listeners and vice versa. 

Figure 1: Supervised vs. Unsupervised Learning (Yan et. al. 2018)

One way a computer can make a model is through reinforcement learning, which tells a computer to predict the future given the past. Going back to the apple example, the computer could start out by making random guesses on which pictures have apples and which do not. Then, the model would check the guesses against the data – if the guesses were off, the model would change to adapt. Each pass through the dataset (each time the model goes through the dataset and guesses which pictures have apples) is called an epoch. Eventually, after tens or hundreds of epochs, the model will get better and better. Ideally, a good model would be able to guess which pictures contain apples with close to 100% accuracy. 

Use Cases for ML: Sports Analytics

One example of machine learning in the real world is using the rushing yards over expectation (RYOE) metric in the NFL (National Football League). To calculate RYOE, developers can calculate the expected rushing yards given a few factors, such as the speed of defenders and the number of blockers in the area. Then, given the actual rushing yards that occurred, RYOE can be calculated as (actual yards) – (expected yards). Using new data and machine learning modules based on this metric, teams can better determine whether rushing yards are the products of running backs themselves or of offensive linemen and schemes. This also allows for quantitative comparisons related to the value of passing plays versus running plays, and subsequently where teams should invest personnel resources into. Thus, with the introduction of new data and machine learning models applied to that data, we are able to make a cohesive argument to finally answer the question: do running backs really matter?

Another use of machine learning is in sports betting. By analyzing previous historical data with player ratings, injury history, and various other metrics, betting companies and bettors can use this to train a machine-learning model. By plugging in the current values of those metrics, the model is able to predict, for example, who will win a game and by how many points. By doing this, betting companies can set betting lines for games, and if the models of bettors do not align with this, the bettors may believe that their model is better and use that to bet on the game.

Furthermore, machine learning can be used to analyze game-time decisions in sports such as baseball and basketball. By looking at player performance in the past and seeing how they perform compared to other players in specific situations, such as in the rain or sun, teams can utilize machine learning to predict how players will perform in the future. Given this data, they can put their players in the best possible position to succeed.

Conclusion

In essence, it can be noted that Artificial Intelligence and Machine Learning are deeply interrelated concepts. This is especially true when Machine Learning is a subset of the broader AI field itself. Further, both broader AI and more specific Machine Learning techniques have applications ranging from entertainment such as sports and music, to daily living tasks just as hand-writing recognition and home assistant devices, to critical infrastructure such as finance and medicine. This leads one to ask where artificial intelligence is not implemented yet. While it can be hard to say when tech experts in academia and the private sector cannot come to a consensus, there is one thing that is absolutely certain. AI and Machine Learning carries least some importance to everyone’s lives in one way or another, whether directly or indirectly. 

Further, This also leads to further discussions, such as “is the importance of these technologies overstated or understated?”, as the exact magnitude to which artificial intelligence and machine learning will impact society is still unknown. With the introduction of machine learning chatbots such as ChatGPT, it can be challenging to ascertain how useful it will be in the long run. While it can answer questions from “Where was Abraham Lincoln killed?” to “Code a website for me”, it fails to answer some simple logical questions from time to time. Although the tool has been trained on an astounding three billion words, it’s far from perfect at this time. However, as time goes on, ChatGPT and similar tools will be trained on even more data, computers will become even faster, and the applications and accuracy will only increase – leaving us to wonder if future applications will be indistinguishable from humans. Similar to our previous example of robotic surgeons, time will only tell if AI and ML-powered chatbots will require extensive assistance from humans or if they will be capable of being autonomous in the future. While we cannot answer this question at this time, nor do we encourage a specific stance on artificial intelligence and machine learning… we can say that it is a topic to keep an eye on.

Works Cited

For a list of references, please use this link: http://bit.ly/3GBKGof

This blog post was written by William-Elijah Clark (Senior STEM Data Fellow), Sahil Chugani (STEM Data Fellow) and Reagan Bourne (STEM Data Fellow) from FSU Libraries.

STEM Data Fellow Spotlight: Sahil Chugani

When I was in elementary school, I remember Googling various football statistics, running down to my parents, and telling them, for example,  “Ben Roethlisberger had 4,328 passing yards in 2009!” I played football for eight years from elementary school to high school, and I was good with working with numbers. I found that sports analytics was a great combination of the two. In high school, I entered a sports analytics competition, where my project was to determine what would happen if onside kicks in football would be replaced with a 4th down and 15, and I absolutely loved it. Now, I’m fascinated with data science as a whole– being able to make a computer do something that we could never imagine doing as humans is an amazing feeling for me.

Since the sports analytics competition, I’ve been doing anything and everything I could related to data science. Some of the research I’m currently working on includes sports team values, kickstarter data, and sportswashing (for example, Qatar holding the World Cup amidst some controversial political issues). I also had a job this year working for a company called Scouting Heroes, where I logged basic statistics for the FSU football team. (More information on what the data I collected was for can be found at https://simplebet.io/nfl.html.) I’ve also worked on creating data visualizations based on football data. For example, this past summer I created over 20 graphs that can be found at https://twitter.com/a_graph_a_day .

In one of my classes, one of my (now) coworkers, William-Elijah Clark, posted the opening for the STEM Libraries Data Fellowship in the class’s GroupMe, and I was eager to apply. Something I’m super excited for with this Data Fellowship is that I really want to translate my skills into some real-world experience. Instead of simply creating graphs or finding statistics on my own, I want to have a tangible impact with regard to data. I hope to be able to help students out with their needs or be able to have my data analysis translate into a decision being made that affects people. In a way, it would signify that my hard work on data analysis is paying off. 

One of the projects that I’m super interested in working on as a Data Fellow is the use of Jupyter Books to assist users in learning more about how to code and analyze data as a whole. By offering interactive code blocks and giving users the opportunity to run code on their own, they may be more willing to learn about the data analysis techniques used. Furthermore, I hope that by implementing sports analytics examples, specifically football, people who are interested in sports may be more willing to learn how to use data analysis techniques with respect to sports.

As a whole, I’m very excited to learn more about data analysis techniques here at the FSU libraries and as well as apply my skills to tangibly help others at Florida State as a whole. 

This blog post was written by Sahil Chugani, STEM Data Fellow at FSU Libraries.

STEM Data Fellow Spotlight: Reagan Bourne

Prior to my experience at Florida State University, I took a few research classes in high school. In these classes, I had assignments where I would have to collect and analyze data as part of a research project. These experiences sparked my interest in data science, and from that point forward I always knew that I was interested in data-related research. Furthermore, I have always been interested in a few different subjects, including computer science, biology, and mathematics. I never realized that I would be able to combine my interests before starting this data fellowship.

When I first found this fellowship during the summer of 2022, I felt that I was at an academic crossroads. I was unsure of what I wanted to study and my career goals. However, I was extremely interested in this opportunity, because it was unlike anything I had ever really known about. I thought that this position would be a great learning opportunity for me, and would  hopefully allow me to utilize my data skills and pursue some of my interests. So far, this fellowship has gone above and beyond what I was hoping for. 

As I am still in the beginning of my academic career, I have not had the opportunity to obtain much experience using my data skills before this fellowship. For this reason, I am so grateful to be participating in this fellowship. I have already learned so many different things in my few months here. One of my first assignments was to meet with many of the different librarians at FSU Libraries. I really enjoyed this task, because I liked hearing about all of the different paths that were taken until finding this career. It introduced me to a lot of different projects and areas of expertise in the library that I had never known about, such as the Health Data Sciences Initiative and open science. 

Another concept that I have recently learned a lot about is the importance of critically evaluating data. Working on a blog post about this topic has been a great learning experience for me. It has introduced me to so many ideas that I had never known about.  Specifically, I have learned about machine learning algorithms for data science. As a student currently pursuing a computer science degree with a minor in data analytics, this topic was extremely interesting to me, and is something that I am excited to explore further. 

As I take more classes related to my major, I am excited to apply the skills I learn towards this fellowship. In the future I hope to teach workshops about Unix, C#, SQL, and many more.  I am looking forward to continuing my work with the FSU Libraries.

This blog post was written by Reagan Bourne, STEM Data Fellow at FSU Libraries.

FSU Libraries celebrates Love Data Week 2023!

Introduction

Love Data Week is coming back to FSU in 2023! Love Data Week, or LDW, is an international event where individuals and groups are encouraged to host and participate in activities related to any and all data. It occurs every week that Valentine’s Day lands, and focuses on helping people learn about the best data management practices and methods for interpreting data. LDW was started in 2015 and is headed by the Inter-university Consortium for Political and Social Research at the University of Michigan. For those looking to learn more about data or are interested in statistics, this is an excellent opportunity to ask questions and get started!

Events

Because looking at raw data can sometimes be boring, we’re looking to spice things up this year by including two new activities! We’ll be right inside the entrance of Dirac from 12:00 – 2:00 PM on Thursday and Strozier from 12:00-2:00 PM on Friday! First, we’re going to be doing an Adopt-a-Dataset activity, where participants will be able to “adopt” one of the openly available datasets we have displayed. Your task will then be to determine what conclusions can be drawn from the data, and you’ll receive a Dum-Dum for your work! After that, we’ll have a jar of Smarties at the table, with a list of numbers from a normal distribution on hand. From there, you’ll have to guess the number of smarties in the jar, and the person with the closest guess will win them all! In addition to the tabling events, our Research Data Management Librarian, Dr. Nick Ruhs, will be giving a workshop on Data Analysis with Microsoft Excel on Valentine’s Day (February 14) from 3:00-4:30. If you are or will be using Excel for your projects or research and are looking to enhance your skills, this will be a great workshop to attend!

Blog Posts

In addition to the wonderful events that are occurring during Love Data Week, we will be publishing two blog posts introducing the two new Data Fellows at FSU, Reagan Bourne and Sahil Chugani. In those posts, you’ll learn all about what inspired them to become a data fellow and how they became passionate about data analysis and management techniques.

Contact/Resources

For more information about any data questions/concerns you may have, you can either check out https://www.icpsr.umich.edu/web/pages/ or contact Dr. Nick Ruhs, our resident Research Data Management Librarian, at nruhs@fsu.edu. Furthermore, if you ever need any assistance with any data question you may have, you can check out the walk-up hours for our STEM Data Fellows!

This blog post was written by Sahil Chugani (STEM Data Fellow) from FSU Libraries.

My Experience Attending the Midwest Data Librarian Symposium

The Midwest Data Librarian Symposium (MDLS) is an annual conference aimed at providing Midwestern librarians, as well as others across the United States, the chance to network and discuss several industry issues and topics related to research data management. This year the event was co-hosted by the University of Cincinnati, The Ohio State University, and Miami University, as well as virtually through online Zoom conference calls and presentations. With free registration to all participants, MDLS focuses on the goal of providing low-cost networking and educational opportunities for established professionals and developing librarians of the future. Relatively new to the environment of Research Data Management, I was eager to represent FSU and the entire state of Florida at the Symposium, being the only participant in attendance at the conference from the state. While I could not travel to participate in the in-person programming, the free registration allowed me to actively engage with the virtual conference presentations and events, like many others over zoom meetings. 

Whether it was a zoom scavenger hunt or a presentation surrounding a less talked about subject, like “Making Infographics More Accessible”, I found that with each opportunity to engage I was able to learn something new and many things that I could bring back and put into practice in my own work. The presentations also left me with a lot to contemplate and consider, opening my eyes to information and concepts I had yet to broach or discover through my own work, like Digital Curation and Data Management for filmmakers and documentaries. For example, in the growing industry of filmmaking there are many times limited resources, especially for independent filmmakers, to effectively meet the costs to preserve their data. With barriers, like high memory file capacities, time constraints, and the threat of file corruption or loss of data, documentaries have a much more indirect path to successfully serve as critical sources of historical and cultural documentation. 

The vulnerability of data collected in documentaries further illustrates the broader importance to take serious measures to securely store raw data, especially with its potential relevance to guide other research. Additionally, metadata’s pertinence in other research frameworks encapsulates the expansive benefits of open science and universal accessibility. Pressures of academic viability, publishing, and performance can direct researchers’ hesitancy to relinquish ownership and control of data. This exemplifies the utility and demand to create stronger avenues to motivate the open sharing of data even when it is imperfect or incomplete. Procedurally, sharing upon request protocols have been imperfect, to say the least, as the decision to distribute that data is left at the mercy of the Primary Investigator of the original research that was conducted, who may have internal or external factors that motivate, dissuade, or even obstruct their ability to share the data in a timely or consistent manner.

While there were a variety of different topics covered during the conference, several presentations were based around the new National Institutes of Health (NIH) Data Management and Sharing (DMS) policy that will come into effect at the beginning of 2023. More specifically, there were discussions about the effects of this new policy on data management and sharing, as well as how to prepare and instruct those in need of support to navigate through these changes at a university level. For one of the main presentations on this topic the authors conducted semi-structured interviews at their university to survey the research data service needs of their constituents, as well as to gauge and collect their perspectives in relational proximity to the new governmental regulations being put into place. These interviews produced a myriad of noteworthy and interesting observations to take away. Perhaps the most surprising theme to emerge was that many of the researchers and professors were unaware of or unworried about the policy changes, believing that they’d be able to adapt their research practices and proposals when the new year began. Others wondered about how strictly the new policies would be enforced, especially with loose criteria for what might qualify submissions as exceptions and with aspects of proposals not tied to scoring to motivate researchers to put more effort into adopting practices that promote open science. Additional implications of being able to recognize and remove protected health information further supports the importance of collaboration when it comes to properly following research assurance, protocols, and proper maintenance as well as storage of data. 

These interviews revealed that many students and faculty across the country were uninformed and/or ill equipped to seamlessly handle this transitional phase that will take place in the coming months to comply with the new NIH DMS policy. Perhaps an even larger overarching takeaway that can be applied is that the general level of informational literacy is relatively low in association to student needs and the expectations that they must meet in order to perform adequately in their field. Adjustments are necessary to overcome the deficiencies in standard coursework that often operates on a foundational assumption that students will come into their academic institutions already having research skills and a working knowledge of information systems, catalogs, and databases. In most cases an established base of informational literacy is required to locate or know that library resources for these causes even exist. Libraries as well as universities more broadly must make an effort to publicly promote their services and resources more widely, while also making them more accessible to effectively address this dilemma. Without additional infrastructure to develop these skills, students have a much larger barrier to overcome the limitations embedded in the university academic framework. Taking levels of privilege into account with access to both technology and experience must also play a part in the organization of their practicum. 

As always each institution has its own individual needs as well as priorities and is equipped with different resources to be able to develop the necessary systems and resources to provide its student body with enough support to navigate through all academic challenges. Conferences typically follow a shared academic code of free exchange that open science bases itself on principle. Just look at the public accessibility of most universities’ research guides that they produce and publish and one can truly get a sense of the collaborative instruction that academic libraries strive to achieve. The symposium offers an opportunity that amplifies this ideal, allowing different institutions to come together to cooperate and exchange different ideas through dialogue with similar like-minded individuals trying to reach mutual goals. 

Preparing for the Midwest Data Librarian Symposium, my impression was that I’d simply be attending lectures where I’d experience most of the learning. However, in addition to some of the networking events and opportunities, the interconnectedness and interactive components of the entire conference made attending the symposium a much more well-balanced exchange of ideas and information. Moreover, MDLS hosted a slack channel to further promote ongoing discussions and networking, as well as archiving notes that all participants were given access to and permission to contribute as well for each presentation and event. In addition, many of the presentations that were longer than the five-minute rapid-fire “Lightning Talk” featured aspects of involvement from the audience, whether it was through discussion questions, breakout room consultations, or jam board collaborations to exchange ideas on different subjects. The integration of technology was applied seamlessly and improved the overall quality of engagement within the presentations and symposium as a whole. Attending this symposium gave me the chance to consider and discuss countless ideas to bring into practice with my own work. I am grateful for opportunities like these and experiences that enrich professionals at all stages in their careers with an academic environment of common interests and goals. 

Author Bio: Liam Wirsansky is a second-year MSI student at Florida State University and the STEM Libraries Graduate Assistant at FSU’s Dirac Library. He currently serves as the President and Artistic Director of White Mouse Theatre Productions at FSU and acts as the Director of Research and Development for the Rosenstrasse Foundation. Liam loves the academic outlet that research has provided him as well as the opportunity to educate and assist students in the development of their information literacy skills.

If you have any questions regarding the Midwest Data Librarian Symposium (MDLS), please contact the organizers at mwdatalibsym@gmail.com.

Some Helpful Resources That Were Shared at the Symposium:

When Social Movements Collide: Open Access for Climate Justice

You’ve heard of climate change, but how familiar are you with the term climate justice? It’s the topic of the week since it’s the theme of International Open Access Week 2022, an occasion for challenging each other to raise awareness and take action on climate justice through the open and interdisciplinary sharing of data and resources. 

With hurricanes, heat waves, and forest fires appearing more regularly in our news cycle, you’ve undoubtedly noticed that the discussion of our changing climate is becoming a bigger part of our lives. As we better understand the enormous threat that climate change poses to our planet, we more desperately than ever need to also have a grasp on climate justice–the aspiration to have all people, regardless of personal or community characteristics, treated fairly when it comes to protection, risks, policies, and decision-making around the impacts of climate change. In other words, when it comes to our environment and the changes happening globally, we must strive to consider everyone, understand how they’ll be impacted differently, and make decisions fairly.

While the term may be new to some, in reality calls for climate justice have been ongoing for decades. In fact, climate justice was born out of the environmental justice movement and is related to other calls to treat people more equitably such as movements for racial or social justice. Why is this so important? We know from past catastrophes that people’s level of vulnerability can vary widely based on their personal circumstances or their community’s demographics. This is one aspect of climate change where data and Open Access become very important; we need the open sharing of knowledge in order to address this important social and environmental issue and ensure justice for all. But, who has access?  

Free, immediate, online access to the results of scholarly research on climate change and how various demographics and geographies are impacted would be a powerful tool to aid and equip the communities most at risk. Removing barriers to accessing climate research would also enable faster communication and better engagement of both the general public and policymakers on related societal issues. Instead of data being individually owned and only available to those who can afford to access it, the general public would have the right to use scientific research results as needed. The best examples of this have been projects attempting to map overburdened, at-risk communities by incorporating a wide range of data, going beyond looking at risk from a one dimensional geographical perspective. 

For example, check out the Climate and Economic Justice Screening Tool. Start by putting in the zip code of your hometown, and use it to have a look at the environmental and economic conditions of various communities. Then try exploring the area around FSU to familiarize yourself with the communities nearby and see how their issues compare to those in your hometown. 

Such tools are a great visual way to represent the combination of so much data. Use them as inspiration for starting conversations about climate change and/or justice. Climate Justice demands cross disciplinary collaboration, so campus forums like the Open Scholars Project could also serve as incubators for the climate action needed in our region and beyond. Through open information exchange and collaboration, we can create resources for understanding the needs of communities as well as non-human environments by evaluating their vulnerability to the impacts of climate change. Join together with your neighbors, campus groups, or local organizations to consider how best to take action to improve the resilience of communities where you live, study, work, or play. Whether that means volunteering, marching, donating, or joining, we need everyone’s contribution to make our communities more just and resilient in the face of climate change.


For more information about how the FSU Libraries supports open access, please visit our Research and Publishing web page here.


Author Bio: Mila S. Turner is the Social Science Data & Research Librarian at FSU Libraries and a broadly trained environmental sociologist. Her research spans diverse areas including how social inequalities intersect with environmental justice, racial equity, and natural disasters. Her thought leadership has been featured in The Hill, World War Zero, Quad Magazine, and more. 

Who Has Access? The New OSTP Memo’s Rippling Effects on Publicly Funded Research

The White House Office of Science and Technology Policy (OSTP) made groundbreaking progress at the end of August when they released a memorandum that updated their policy guidance to specify that data and results coming from taxpayer-supported research must be made immediately available and accessible to the public at no cost. OSTP also issued directions for agencies to update their public access policies and data sharing plans as soon as possible to make publications and the research they host publicly accessible, without an embargo or cost and in machine-readable formats to enable their full use and reuse.  

So what does this truly mean for students and researchers?

For many students, OSTP and any of the memorandums that have been released prior to the latest one (which many are calling the Nelson Memo as it was issued by Dr. Alondra Nelson, currently the acting director of the OSTP,) is mostly a foreign subject. What is OSTP and why does it matter? As a Graduate Student myself, I was surprised to learn about the strides taken by the government agency leading up to the release of this memorandum, and the historical struggle to achieve an open science framework that works for the masses and which aims to combat discrimination and structural inequalities inherent in the funding and publishing disadvantages experienced by underserved backgrounds and minorities, as well as  early-career researchers. 

Like many students at universities, it is easy to take the access we have to library resources, journals, and repositories for granted, especially when they meet our immediate needs. But looking at the world around us and the integration of advancing technology into everyday life and society, it is clear we live in a data driven world, making the availability and access of information a premium. Metadata, or data that describes other data, has become one of the most important concepts in the field of information, as it allows researchers to organize the data from their research or from other projects in a way that is meaningful and often cross-disciplinary in its application. This means that data can have unintended benefits and relevance to other researchers to inform their own work, assuming that they are able to access that data. With the Nelson Memo, access to publicly funded research has been defined and recognized as a right to the public. 

Until now there have been clear barriers set in place to promote the interests of academic journals and publishing, and while some of these will still exist even after all of the federal grant-making agencies release their plans for new policy implementation, this advancement toward open access establishes a clear standard moving forward. It sets the United States apart in this respect as global leaders of change in the field of open science. Prior to the Nelson memorandum’s release, Plan S, served as the global standard for open access policy guidance. It mandated that access to publications that have been produced through research grants must be immediately open and fully accessible without being monetized in any form, setting the stage for the standard that OSTP wanted to mirror and build upon.

“cOAlition S”, a consortium of national research agencies and funders from twelve European countries developed around the implementation of Plan S, has come out in support of the newest memorandum and OSTP. More broadly calling the guidance “fully aligned with the open access policies of many forward looking universities and research agencies who have implemented Plan S”, also acknowledging its correlation with the recent UNESCO Recommendation on Open Science, which was adopted by the General Conference of UNESCO at its 41st session last November. Plan S realizes that we have the necessary elements and collective ability to produce digital content as well as public goods that can be shared to help shape the vision of a large connected community that makes up one body, rather than smaller disjointed organs that mirror each other because they cannot see what the other does. All of that is to say, essentially these paywalls of entry to access research act as hurdles that deny the very nature of science as a tool to better understand and help humanity as a whole.

Globally, we saw the power of open science at work in combating the COVID-19 pandemic and bringing the scientific community together, as commercial journals and governments were forced to alter their typical subscription based structure in favor of providing temporary open access to COVID-19 and monkeypox related research data. This allowed for the development of a vaccine and ensured that the common masses had the most credible data driven information to inform their health-based choices and medical practice. Countries across the globe spend billions of dollars on research and experimental development. The United States is no different, with estimates conducted by National Science Foundation (NSF) totalling nearly $667 Billion dollars for the year 2019 alone, continuing to grow in size each of the following years. The expectation would be that the government funding the research would have ownership of the data collected and analyzed, however in the current copyright structure agreement, publicly funded research is often turned over to commercial journals. 

One of the largest concerns catalyzed by the newest memo is understanding how the policy changes will affect the viability of the current subscription model when considering the important role journals play in supporting research, such as peer reviews. Publishers were more circumspect about the changes, designating some amount of skepticism towards the question of how the shift to full open access would be funded. To alleviate this issue researchers can now use research grants and funds to support the publication components of the new policies put forth by OSTP. On the other side of the argument, students stand to benefit from open access journals in terms of the widened levels of exposure that their research will receive with entry points to view such articles increasing exponentially. In addition, libraries across the country suffer from the subscription based model with journals and are not in a position to subscribe to every single research journal that exists. FSU Libraries subscribes to several journals and databases to provide access for its students, but an increase in publicly funded and published research can only append the framework of available research, data, and information that student communities here and at other universities will have access to. Looking forward towards the future, this relationship with academic journals and publishing must continue to evolve and change.  

Ideally, community owned and managed public knowledge infrastructure seems to be the long term solution, but how do we get there? Creative Commons, a non-profit organization and international network devoted to open access and broadening the scope of educational as well as creative works to be made available for others to build upon and share with legal protections, believes we must work on the progression of “open licensing to ensure open re-use rights”. I believe that if we want to move beyond access and towards improved sharing of the information and data we collect, produce, and use, we must begin following these steps and supporting organizations, like Creative Commons or the ​Subcommittee on Open Science, as well as continue to expand who contributes to new knowledge. Most importantly we must stay informed with the latest policy updates and changes, guiding researchers to success from different backgrounds and at all levels of experience.

Committed to the development of open science, Florida State University Libraries is devoted to the free exchange and access of information on a global scale for the good of people everywhere. This change in policy not only reinforces our mission, but also prioritizes the need for comprehensive support and resources to support the students and research that our institution hosts. We are thrilled to continue to work alongside our researchers, offering a wide array of different services and workshops to navigate through these policy changes, as they openly share and provide increased access to their work. We will continue to develop upon this foundation and explore more ways we can champion open science at Florida State University and beyond. 

For more information about how the FSU Libraries supports open access, please visit our Research and Publishing web page here.

For more specific details or information on the Nelson Memo, please see the White House OSTP announcement, here.

Author Bio: Liam Wirsansky is a second-year MSI student at Florida State University and the STEM Libraries Graduate Assistant at FSU’s Dirac Library. He currently serves as the President and Artistic Director of White Mouse Theatre Productions at FSU and acts as the Director of Research and Development for the Rosenstrasse Foundation. Liam loves the academic outlet that research has provided him as well as the opportunity to educate and assist students in the development of their information literacy skills.

References

Ambrose, M. (2022, September 1). US moves to make federally funded research free upon publication. Physics Today. Retrieved from https://physicstoday.scitation.org/do/10.1063/PT.6.2.20220901a/full/

Anderson, R. (2022, August 28). A new OSTP memo: Some initial observations and questions. The Scholarly Kitchen. Retrieved from https://scholarlykitchen.sspnet.org/2022/08/29/a-new-ostp-memo-some-initial-observations-and-questions/

Elder, A., & O’Donnell, M. (2022, September 7). New White House OSTP memo requires federally funded research be immediately open. Iowa State University Libraries. Retrieved from https://www.lib.iastate.edu/news/new-white-house-ostp-memo-requires-federally-funded-research-be-immediately-open-%C2%A0

Green, C. (2022, August 30). A big win for Open access: United States mandates all publicly funded research be freely available with no embargo. Creative Commons. Retrieved from https://creativecommons.org/2022/08/26/a-big-win-for-open-access/

Plan S. (2022, August 26). cOAlition S welcomes the updated Open Access policy guidance from the White House Office of Science Technology and Policy. Retrieved from https://www.coalition-s.org/coalition-s-welcomes-the-updated-open-access-policy-guidance-from-the-white-house-office-of-science-technology-and-policy/

SPARC. (2022, August 25). Fact sheet: White House OSTP memo on ensuring free, immediate, and equitable access to federally funded research. Retrieved from https://sparcopen.org/our-work/2022-updated-ostp-policy-guidance/fact-sheet-white-house-ostp-memo-on-ensuring-free-immediate-and-equitable-access-to-federally-funded-research/

Stebbins, M. (2013, February 22). Expanding public access to the results of federally funded research. National Archives and Records Administration. Retrieved from https://obamawhitehouse.archives.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research

Thurston, A. (2022, September 7). Will new white house open access rules impact researchers? The Brink – Pioneering Research for Boston University. Retrieved from https://www.bu.edu/articles/2022/impact-of-new-white-house-open-access-rules-on-researchers/UNESCO. (2021, November 24). UNESCO recommendation on Open science. Retrieved from https://en.unesco.org/science-sustainable-future/open-science/recommendation

STEM Data Fellow Spotlight: Diego Bustamante

For Love Data Week 2022, we are highlighting our FSU STEM Libraries Data Fellows! These posts, written by the fellows themselves, tell their stories of how they became interested in data-related work and their experience as a data fellow to this point. Today’s post is contributed by Diego Bustamante.

Prior to my role as a Data Fellow, my idea of what data is was defined by my previous work with quantitative data collected from laboratory experiments. For example, when I worked as a Research Assistant I recorded quantitative data for chemistry experiments, like mass, temperature, volume, etc. I then conducted statistical analysis on the data in order to draw conclusions from each experiment. I personally enjoy collecting and analyzing data, especially because it can lead to many scientific and technological advancements!

While searching for jobs in FSU’s NoleNetwork in summer 2021, one job title that immediately caught my attention was “FSU STEM Libraries Data Fellow.” The job description was unique amongst other jobs offered on campus. As a data fellow, I was offered the opportunity to develop several professional skills in data reference, co-hosting programming language workshops, writing and publishing blog posts, and many more. I felt like it was a great opportunity and a good fit with my previous experience and skills, and so I decided to apply. Thankfully, I was selected as one of the inaugural data fellows, leading to a journey of professional and personal development that has thus far surpassed my initial expectations. 

One of my first tasks in the program was meeting with different librarians at FSU Libraries. In these meetings I was able to learn about different methods and applications for data analysis in a variety of disciplines. For example, I learned that the Digital Humanities Librarian uses a text-mining software to find specific words from books published in the 1800s. She used the data drawn from the software to analyze certain traits of the story by counting the amount of times a character participates in an interaction of this type. This experience helped me realize that qualitative data sets can be used to draw similar conclusions about a study as quantitative data. 

Another concept that I have become familiar with while working as a Data Fellow is open data. We discussed this concept during a workshop where we talked about the potential benefits of making research data openly accessible to the wider research community. Initially, I was hesitant regarding the concept of open data, because I saw academic research as a “race” to find a solution to a given problem. However, further discussion of how researchers are compensated for sharing their data made me realize that it is possible to benefit from open data on a personal and global level. 

Currently, I am still learning about the many different types of data, its definitions, applications, and its importance. I am also working on developing an open source Canvas module on MATLAB where I explain the basics of the math based programming language in a student friendly manner. I look forward to sharing more about this work in the future!

Love Data Week: Data is for Everyone

By: Dr. Nick Ruhs

INTRODUCTION

It’s once again time for Love Data Week!  LDW is a yearly, international outreach event taking place the week of Valentine’s Day (February 14-18 this year). The week is focused on promoting good data stewardship and best practices around working with and interpreting data. LDW was started in 2015 and is currently celebrated by academic libraries and data organizations around the world. While every institution celebrates in their own way, common activities include data workshops, social media outreach, and more! 

Each year, a theme is chosen around which organizations can theme their Love Data Week activities. For 2022, the theme is “Data is for everyone.” This year, we are shining a light on the “people-side” of data, and on how different folks use and interact with data. Data often means something different to everyone, and how someone interacts with data varies based on their chosen discipline, research project, life experiences, and their own beliefs and values. There are also often inherent biases that exist in data collection, analysis, and interpretation, which can affect one’s own impression of a dataset. Despite these differences, the ability to critically evaluate data and interact with it is a universal skill that is crucial for everyone. 

Continue reading Love Data Week: Data is for Everyone