2018 FSU Great Give

FSU’s Great Give is a 36-hour online giving campaign in support of academic programs, scholarships and student activities at Florida State University. Florida State supporters can make gifts from 9 a.m. on Thursday, March 22 until the campaign ends at 9 p.m. on Friday, March 23.

This year the FSU Libraries will be focused on two very important funds.

1.  Support the Heritage Fund: Your gift to the Heritage Museum will be used to take care of the Museum, open it more often with longer hours, enhance and update its exhibits, upgrade the space including improving lighting, and creating ways to safely display valuable objects. Learn More: http://fla.st/2prGvTS

2.  Textbooks for C.A.R.E. Students: The Text Book Fund will purchase text books to be borrowed by students throughout the semester at no cost to them.  Students who are already receiving financial aid or are on scholarship may still be eligible for this fund. Learn More: http://fla.st/2FNXrPf

To learn more about our funds and how you can help, please visit the links above. Remember, giving starts at 9 a.m. on Thursday, March 22!

2017 FLORIDA BOOK AWARDS WINNERS ANNOUNCED

With its twelfth annual competition now complete, the Florida Book Awards has announced winners for books published in 2017. More than 200 eligible publications were submitted across the eleven categories of competition.

Coordinated by the Florida State University Libraries, the Florida Book Awards is the nation’s most comprehensive state book awards program. It was established in 2006 to celebrate the best Florida literature. Authors must be full-time Florida residents, except in the Florida nonfiction and visual arts categories, where the subject matter must focus on Florida.

Setting the standard for future cash prizes, the “Gwen P. Reichert Gold Medal for Children’s Literature”, now in its third year, is awarded to Brandon resident, Rob

Sanders for Rodzilla (Simon and Schuster) This $1000 cash award is in memory of Gwen P. Reichert and serves as a lasting tribute to her accomplishments as a rare book collector, nurturer of authors, and educator of children. Also awarded were the Richard E. Rice Gold Medal Award for Visual Arts to Jared Beck and Pamela Miner for River and Road (University of Florida Press) and the Phillip and Dana Zimmerman Gold Medal for Florida Nonfiction to Arlo Haskell for The Jews of Key West (Sand Paper Press). These two category winners each receive a $500 cash award.

The winning authors from across the state will be honored at the Abitz Family Dinner, the annual awards banquet, which will take place in Tallahassee on April 12th at the Mission San Luis. The public is invited to attend. More information will be available on the Florida Book Awards website.

Florida Book Awards 2017 Winners by Category

GWEN P. REICHERT GOLD MEDAL AWARD FOR YOUNGER CHILDREN’S LITERATURE: Rob Sanders

RICHARD E. RICE GOLD MEDAL AWARD FOR VISUAL ARTS: Jared Beck and Pamela Miner

Phillip and Dana Zimmerman Gold Medal Prize for Florida Nonfiction: Arlo Haskell

 

YOUNGER CHILDREN’S LITERATURE

Gold: Rob Sanders (Brandon),Rodzilla (Simon and Schuster)

Silver: Carrie Clickar (Gainesville), Dumpling Dreams (Simon and Schuster)

Bronze: Marianne Berkes, (Orange City), Baby on Board: How Animals Carry Their Young

 

OLDER CHILDREN’S LITERATURE

Gold: Ed Masessa (Florida), Wandmaker’s Apprentice (Scholastic)

Silver: R.M. Romero (Miami Beach), The Dollmaker of Krakow (Penguin Random House)

Bronze: Rodman Philbrick (Florida Keys) Who Killed Darius Drake? (Scholastic)

 

COOKING

Gold: Norman Van Aken (Miami) Norman Van Aken’s Florida Kitchen (University of Florida Press)

 

FLORIDA NONFICTION:

Gold: Arlo Haskell, (Key West) The Jews of Key West (Sand Paper Press)

Silver:  Frank Cassell (Sarasota), Suncoast Empire (Pineapple Press)

Bronze: Julio Capó Jr., Welcome to Fairyland (UNC Press)

 

GENERAL FICTION

GOLD: Laura Lee Smith (St. Augustine), The Ice House (Grove Press)

SILVER: Elizabeth Sims (Bradenton), Crimes in a Second Language (Spruce Park Press)

BRONZE: Randy Wayne White (Sanibel), Mangrove Lightning (G.P Putnam Sons)

 

GENERAL NONFICTION:

GOLD: Edwidge Danticat (Miami),The Art of Death (Graywolf Press)

SILVER: D. Bruce Means (Tallahassee), Diamonds in the Rough (Tall Timbers Press)

BRONZE: Kristine Harper (Tallahassee), Make it Rain: State Control of the Atmosphere in Twentieth-Century America (University of Chicago Press)

 

POETRY

GOLD: Kaveh Akbar (Tallahassee), Calling a Wolf (Alice James Books)

SILVER: Terry Ann Thaxton (Winter Springs), Mud Song (Truman State University Press)

BRONZE: Michael Hettich (Miami Shores), The Frozen Harbor (Red Dragonfly Press)

 

POPULAR FICTION:

GOLD: Patrick Gussin (Longboat Key), Come Home (Oceanview Publishing)

SILVER: Robert Macomber (Pine Island), An Honorable War (Pineapple Press)

BRONZE: Ward Larsen (Sarasota), Assassin’s Code (Forge Books)

 

SPANISH LANGUAGE

GOLD: Pedro Medina León (Coral Gables), Varsovia ( Sudaquia Editores)

SILVER: Carlos García Pandiello (Miami), Jaspora (Aduana Vieja Editorial)

 

VISUAL ARTS:

GOLD: Jared Beck (Naples) and Pamela Miner (Fort Myers), River and Road (University of Florida Press)

 

YOUNG ADULT:

GOLD: Jenny Torres Sanchez (Orlando), Because of the Sun (Delacorte Press)

Submissions for the 2017 awards were read by juries of three members, each nominated from across the state by co-sponsoring organizations. Jurors are authorized to select up to three medalists (including one gold winner, one silver runner-up and one bronze medalist) in each of the eleven categories; jurors are also authorized to make no selections in a given year.

The Florida State University Libraries coordinate the Florida Book Awards with assistance from co-sponsors including the Florida Center for the Book; the State Library and Archives of Florida; the Florida Historical Society; the Florida Humanities Council; the Florida Literary Arts Coalition; the Florida Library Association; the Florida Association for Media in Education; the Center for Literature and Theatre @ Miami Dade College; the Florida Chapter of the Mystery Writers of America; Friends of FSU Libraries; the Florida Writers Association; the Florida Literacy Coalition; and “Just Read, Florida!”

Learn more about the Florida Book Awards at floridabookawards.lib.fsu.edu.

The Learning Curve: A Digital Pedagogy Internship in Review

Digital Pedagogy is difficult to define. Among other things, it is an idea, a philosophy, a way of thinking, and how instructors think about instructional tools. There is no manual on “how to” Digital Pedagogy. There are no instructions to follow, and it is only an emerging idea and field, which makes it all the more experimental. This semester, FSU Libraries decided to take on a new Digital Pedagogy initiative. This is where I came in, and like instructors and schools that have adopted Digital Pedagogy initiatives, I quickly learned how difficult these new projects can be to implement.

I started my internship with FSU’s Office of Digital Research Scholarship (DRS) in August. It was also my first semester starting life as an Information Science graduate student. Truthfully, there are few times in life where I have entered into a new position with absolutely zero expectations (because I had genuinely no idea what I was getting myself into), but beginning my internship with DRS’ new Digital Pedagogy initiative was one of those times.

If it is not yet clear, yes, Digital Pedagogy is really broad. While I was familiar with both terms separately, “Digital Pedagogy” was new. So, upon receiving the call that I would be working on this new initiative, I immediately began my Google search. I sifted through articles about using technology to enhance education and the philosophies that espoused beliefs about Digital Pedagogy meaning more than simply using technology in classrooms, but using it to expand critical thinking and provide opportunities for growth and development. It was a broad topic, but I was certain that my role in the internship would be more focused, so I entered FSU’s DRS Commons with confidence and just a few nerves.

On Day 1, I met Micah, one of the creators of this new initiative. Sitting in the DRS commons, he told me that my role would be to create a project dealing with Digital Pedagogy. Like Digital Pedagogy, there were no constraints, no rules, or requirements. Needless to say, this ambiguity was mildly alarming to my Type A personality.

Putting aside my sudden impending anxiety, I turned to Lindsey, the Distance Librarian, who was also pioneering the initiative. Lindsey sent me articles and research on Digital Pedagogy, and eventually, this led me down the Rabbit Hole of Research and I discovered Dr. L. Dee Fink’s 2003 paper “A Self-Directed Guide to Designing Courses for Significant Learning.” This guide is designed to help college-level instructors design their courses so that students leave with more than a grade in the course, but knowledge and passion that extends beyond the semester.

It is no secret that many instructors do not consider pedagogy or educational theory when designing their courses, and this guide was created to help instructors think about these areas. Yet, I wondered, how many instructors have read the guide? How many instructors even knew it existed? Thus, the Canvas Module “Designing Courses for Significant Learning” was born.

The note-taking process became extensive

The idea behind the module was that it would be a resource to instructors and the emerging Center for Advancement of Teaching on campus. For weeks, I drafted the module, picked apart the guide to decide which areas to keep and discard, learned how to create items on the Canvas LMS, and put hours of work into making supplementary materials for the module that did not exist in the original guide. After several weeks, the project felt complete. I included pre-tests to help guide learners to navigate the module and created navigation tools so that the material did not have to be read or completed in a linear fashion, as learning itself is rarely linear. I included graphs, videos, and personal writing assignments for users to work alongside the module. In the end, I was certain this would be a great resource and, perhaps, might help bridge the gap of how to bring Educational Theory to Higher Education instructors, using Digital Pedagogy.

(As reference, here are some visuals of things I created. The first I created using Canva as a short reference for instructors to take “on the fly” or to be used as a short reminder to instructors after they work through the module. The second and third are screenshots of something that I assumed would be very easy but ended up being the most frustrating part of creating the module- figuring out how to edit the navigation- without knowing HTML. Thanks to basic HTML YouTube videos and too much time spent playing with the system (far more time than I care to admit), I managed to somewhat break through the LMS-barrier and non-linear course structure actually became a possibility.)

 

If you are anticipating an impending but or however, here it is. I created the module with guidance from Micah and Lindsey for the Center for Advancement of Teaching and instructors on campus, but left out the obvious piece of the puzzle: meeting with the director of the Center for Advancement of Teaching. So, when this meeting finally happened in November, it should have been no surprise when she told me, regretfully, that the project was not one that aligned with the current work of the Center. In other words, the weeks of work, hair-pulling, and stress of beginning a new graduate program and internship while continuing my full-time job, suddenly felt ultimately and utterly useless. I told her it was fine. It wasn’t, really.

After some deep reflection and head-to-desk moments of frustration, however, I have come to the conclusion that the overall experience was a positive one. I learned much as a first semester Information Science graduate student taking on a library internship. Primarily, I discovered how self-directed libraries can be. Though I had guidance from Micah and Lindsey, the project was ultimately mine to decide, create, and implement. I also learned how, frankly, it does not matter how much momentum and excitement begin a project, these factors do not mean the project will be successful or go exactly as planned.

Like Digital Pedagogy, creating a project from scratch, for an initiative that has not been established in the library, is difficult. There is no how-to guide for problem solving through issues or getting everyone on board with an idea. It takes time, energy, and flexibility. When one idea falls through, rather than dwell on the failures of the past, there is no option but to pick up the pieces and keep moving forward. If I had more time with the library, I would venture to do just that, but as the semester is coming to a close, I am ultimately grateful for the opportunities I had to experience the many facets of what it means to be in an Academic Library.

Open Access Week 2017

There is a serious, systemic problem in scholarly publishing that disadvantages academic authors, their institutions, the global research community, and the general public. The problem stems from the subscription-based model of scholarly publishing, whereby publishers place academic journal articles behind paywalls so that anyone who can’t pay can’t read them.

Open Access (OA) is a movement based on the principle that this situation is fundamentally unethical, and that the fruits of academic endeavor should be freely available to everyone. OA archiving and publishing are the two main strategies for accomplishing this goal, and they promise to benefit both the global research community and individual authors, moving published research into the open and thereby broadening its readership and generating more citations. OA is also fast becoming a requirement for recipients of research funding, as many public and private funding agencies have enacted public access policies to make the results of funded research accessible to all.

Open Access Week, Oct. 23-29, is an opportunity for the global research community to learn more about this important movement and the many ongoing efforts to make it the new norm in research and scholarship. To celebrate the occasion, FSU Libraries is hosting a number of workshops related to openness in research and education, and we hope you’ll join us to learn more about OA and how it can benefit you as a student, teacher, or researcher:

Open Educational Resources (OER) are free to access, reuse, revise, remix, and redistribute. This workshop will cover the benefits of using OER, resources for finding and evaluating OER, and considerations for sharing OER-based courseware and assignments with the world. This workshop will also provide a brief introduction to Creative Commons (CC) licenses and their role in the creation of Open Educational Resources (OER).

Interested in Open Access (OA) publishing, but concerned about the growing problem of “predatory” publishers? What are the benefits of OA publishing, and what tools and strategies can you use to evaluate the quality of OA journals? What about options for funding (or obtaining waivers) to cover OA article processing charges? This workshop will provide answers to these questions and more.

Wondering how to find the best conferences and publication venues in your discipline? What about building your scholarly profile and communicating the impact of your research in ways that will resonate with a broader audience? And, once you’ve got your work out there, what can you do to assess and quantify the impact of your research? This workshop will cover a range of tools and strategies that early-career researchers can use to accomplish these objectives and more.

In addition, we’d also like to take this opportunity to highlight some important ways that the Libraries support the FSU community in taking action to advance openness in research and education:

So, what can you do to advance the cause of OA through your own research and teaching?

For more information, see our research guides on Open Access Publishing and the Open Textbook Movement , or contact Devin Soper, Scholarly Communications Librarian at FSU Libraries’ Office of Digital Research & Scholarship. And don’t forget to follow the conversation on Twitter! #OAweekFSU

Bringing Data Carpentry to FSU

My name is Rachel Smart and I’m a graduate assistant for Digital Research and Scholarship. I was adopted by DRS in mid-March when the Goldstein Library was reamed of its collection. It was devastating for the 2% of the campus who knew of its existence. Bitterness aside, I’m very grateful for the opportunity I’ve been given by the DRS staff who warmly welcomed me to their basement layer; here I’m being swiftly enthralled by the Open Access battle cry. The collaborative atmosphere and constant stream of projects never fails to hold my interest. Which leads me to Data Carpentry…

In May of this year, I met with Micah Vandegrift (boss and King of Extroverts) regarding my progress and the future direction of my work with DRS. He presented me with the task of running a data workshop here in our newly renovated space. Having never organized something this scale before, I was caught off guard. However, I understood the importance and need for data literacy and management trainings here on campus, and I was excited by the prospect of contributing to the establishment of a Data Carpentry presence here at FSU. Micah was kind enough to supply me with a pair of floaties before dropping me into the deep end. He initiated first contact with Deb Paul from iDigBio, a certified Data Carpentry instructor, here on campus and I joined the conversation from there.

It took a few weeks of phone calls and emails before we had a committed instructor line-up, and we were able to apply for a self-organized Data Carpentry workshop in April. Instructors Matthew Collins, Sergio Marconi, and Henry Senyondo from the University of Florida taught the introduction to R, R visualizations, and SQL portions of the workshop. I was informed that you aren’t a true academic librarian until you’ve had to wrestle with a Travel Authorization form, and I completed them for three different people, so I feel thoroughly showered in bureaucratic splendor. However, the most obstructive item on my multipart to-do list of 34+ tasks was finding the money to pay for food. DRS has an event budget with which we paid the self-hosting fee and our instructors’ traveling expenses, but we were not allowed to use it for food. This delayed the scheduling process, and if it weren’t for the generous assistance from iDigBio, we would have had some very hungry and far fewer attendees. If I were blessed with three magical freebies for the next potential Data Carpentry event, I would use the first to transform our current event budget into food-friendly money, and I would save the other two in case anything went wrong (ex, a vendor never received an order). This may seem overly cautious, but just ask anyone who had to organize anything. We are perfectly capable of completing these tasks on our own or with a team, but some freebies for the tasks which fall beyond our control would come in handy.

The event ran smoothly and we had full attendance from the 20 registered attendees. As busy as I was in the background during the event, attendees came up to me and let me know how well the workshop was going. There were also comments indicating we could do things a little differently during the lessons. I think most of the issues that sprung up during the event were troubleshooting software errors and discrepancies in the instructions for some of the lessons, for example, the SQLite instructions were written using the desktop version of the program and not the browser plugin everyone was using. The screen we used to display the lessons and programming demos was the largest we could find, but it was still difficult for some people to see. However, adjustments were made and attendees were able to continue participating.

The most rewarding element of the experience for me were the resulting discussions among participants during planned collaboration in lessons and unplanned collaboration during breaks and long lunch periods. The majority of our participants have various backgrounds in the Biological Sciences, but as individuals they had different approaches to solving problems. These approaches frequently resulted in discussions between participants about how their various backgrounds and research impacted their relationship with the tools and concepts they were learning at Data Carpentry. On both days of the event, participants came together in our conference room for lunch and rehashed what they had learned so far. They launched into engaging discussions with one another and with DRS staff about the nature of our work and how we can work together on future initiatives. This opportunity to freely exchange ideas sparked creative ideas relating to the Data Carpentry workshops themselves. On the second day, an increased number of participants brought their own project data to work with in workshop exercises.

The future of Data Carpentry here at FSU looks bright, whether I will be there for the next workshop is unknown. Thank you, Deb Paul, Micah Vandegrift, Emily Darrow, Kelly Grove, and Carolyn Moritz for helping me put this workshop together, and thank you to everyone who participated or contributed in any way.

Spring 2017: A User Experience Internship In Review


It’s my final semester in the iSchool program, and I made it. I had a long journey from the start, including a brief hiatus, and yet I returned to finish with a passion – I even received the F. William Summers Award to prove my academic success. But perfect GPA aside, I’m most proud of my personal and professional development while remotely interning for the Office of Digital Research and Scholarship. The highlight was visiting FSU for the first time this semester and working in the office for a full week. Through meetings, workshops, and events, I learned even more and enjoyed interacting with the team in person. It was a fun and informative visit which I’d recommend any remote intern to do, if possible.


A beautiful Tallahassee day at Strozier Library.

My Spring semester objective was to learn more about user experience (UX) and apply it by compiling a report for the office’s website redesign. To prepare for the process, I spent half of the semester reading journal articles, checking out books, and utilizing online sources such as LibUX, Usability.gov, and Lynda.com via FSU. The other half of the semester, I applied what UX principles I learned to consult the office on how to redesign their current website. With this project, I now have a foundation in UX and demonstrated the process through quantitative research, user personas, and visual design. It’ll be exciting to see what recommendations will be used and how it’ll impact existing and new users.


Hitting the books on web and UX design to Depeche Mode.

Overall the yearly internship was somewhat unconventional since I worked remotely, but I was still able to understand the parts that make up a whole within digital scholarship. At this point I better comprehend how technology is changing research support and the research process as well. Although my time as an intern has ended, I’m looking forward to seeing what more the Office of DRS has to offer in the future – the new website included. I am grateful to have been introduced and involved with such a supportive and innovative community at FSU.

Thank you, Micah Vandegrift, for your leadership and mentorship, and the entire DRS team, for sharing your time and knowledge. With your guidance, I made it! 🎓

#Textbookbroke FSU

To celebrate Open Education Week (March 27-31), a team from University Libraries partnered with the Student Government Association to bring the #textbookbroke campaign to FSU. #Textbookbroke is a national campaign aimed at informing students about Open Textbooks, Open Educational Resources, and alternatives to traditional textbooks. It is also aimed at empowering students to provide feedback on their course materials and encourage their instructors to explore more affordable alternatives.

To that end, we organized two well-attended tabling events at Strozier and Dirac, with the goal of engaging with as many students as possible over the course of each event. We created an engagement display board where students could share the most they have spent on textbooks in a single semester, and we also encouraged students to complete a short survey on how the textbook affordability problem has affected them.


Over the course of the events, we spoke with hundreds of students from a variety of disciplinary backgrounds and at different stages of their educational careers. 316 students contributed to the engagement board, and 350 submitted responses to the student survey. Overall, the data from the engagement board suggests that $407.32 is the average maximum amount spent by students on textbooks in a single term across all disciplines. Some of the more striking findings from the survey include the following:

  • 93% of students would use an online textbook if it was free
  • 97% of students feel that a $30 print textbook would reduce financial strain
  • 72% of students have decided not to purchase a required textbook due to high cost
  • 11% of students have decided not to take a course due to the cost of the textbook

These findings not only underline the impact of the textbook affordability problem on FSU students, but also suggest that the vast majority of our students would support broader adoption of OERs and Open Textbooks at FSU. We believe that students can play a key role in promoting such broader adoption by becoming advocates for OER on campus, and we hope that our many conversations with students during #textbookbrokeFSU will inspire them to take action to that end. At the same time, FSU Libraries is doing its part to support FSU instructors in adopting more open, affordable course materials through an Alternative Textbook Grants program that launched in late 2016.

This is an exciting time for open education at FSU, and our team is looking forward to continuing to advocate for change in this space, providing both students and instructors with the information and resources they need to make a difference! For more information about the open education movement and related initiatives at FSU, see our research guide on OER, or contact Devin Soper, Scholarly Communications Librarian at FSU Libraries’ Office of Digital Research & Scholarship.

 

 

Using R on Early English Books Online

In order to follow along with this post you will need:

  1. Basic knowledge of the Text Encoding Initiative guidelines for marking up texts.
  2. Understanding of the structure of XML and the basics of XPath.
  3. Some experience with Regular Expressions is helpful, but not necessary.
  4. A willingness to learn R!

A few months ago, I started working through Matt Jockers’ Text Analysis with R for Students of Literature. I wanted to improve my text analysis skills, especially since I knew we would be acquiring the EEBO-TCP phase II texts, which contain text data for thousands of early modern English texts (if you are an FSU student or faculty member and you want access to these files, email me). To start, I decided to do some analysis on Holinshed’s Chronicles, which are famous for their impact on Shakespeare’s history plays. While I have been able to create a few basic analyses and visualizations with this data, I’m still learning and expanding my understanding of R. If you ever want to work through some of the ins-and-outs (or would prefer an in-person consultation on R), you should attend the Percolator from 3-5 on Wednesdays in Strozier or email me to schedule a consultation. We will also be holding a text analysis workshop from 10-11 on April 14.

I am going to be working from two of the EEBO TCP phase I texts, since these are currently open access. You can download the entire corpus for phase one in SGML format: https://umich.app.box.com/s/nfdp6hz228qtbl2hwhhb. I’ve used a stylesheet generated by the TEI council to transform the files into TEI P5-compliant XML files. You can get the example files on my GitHub page (along with the finalized code). Alternately, you can get all of the P5-compliant TEI files directly from the Text Creation Partnership Github.

If you want to follow along with this blog post, do the following:

Step 1. Get your texts. Go to my GitHub page and download holinshed-v1.xml and holinshed-v2.xml. Put them in a directory that you can easily find (I have mine on my desktop in a directory called “holinshed” within another directory called “eebo_r”).

Step 2. Download R and R Studio, as outlined in our Text Analysis libguide.

Step 3. Set Working Directory. Open R Studio, and type setwd(“”), where the path to the folder you created is contained within the quotes. On a Mac, your path will likely look something like this:

setwd("~/Desktop/eebo_r")

And on Windows it will look something like:

setwd("C:/Users/scstanley/Desktop/eebo_r")

(Note that you shouldn’t use a “\” character for windows filepaths, even though that is standard. Forward slashes are considered escape characters in R.)

You can either type this into the script pane or in the console. My script pane is on the top-left, but yours may be somewhere else within your RStudio Environment. If you are on a Mac, hit “ctrl+enter” Note: I am using the script pane to edit my code, and hitting ctrl + enter to have it run in the console. If you just want to run your code in the console without saving it as a script, you can type directly into the console.

Step 4. Install the XML and Text Mining packages. Go to Tools > Install Packages and type “XML” (all uppercase) into the Packages text field. Click “Install.” Do the same with “tm” (all lowercase). You could also enter install.packages(“tm”) and install.packages(“XML”) into your console with the same effect.

Step 5. Now that you have the XML and text mining package installed, you should call them into the session:

library(XML)
library(tm)

Again, hit ctrl+enter. 

Now you’re ready to get started working with R!

Remember from the beginning of this post that I created a directory within my working directory (“~/Desktop/eebo_r”) to store the files I want to analyze in. I called this directory “holinshed”. I am going to create an object called `directory` that references that filepath. To do this, I’m going to use an assignment operator (`<-`). This gets used quite frequently in R to assign some more complex or verbose object another name. In this case, we will say:

directory <- "holinshed"

Now, we want to get all of the files within that directory: 

files <- dir(path=directory, pattern=".*xml")

This line of code sets another object called “files” which follows the directory we set with the “directory” object, and finds all of the objects within that directory that end in “.xml” (all of the XML files).

This is where things can get a little confusing if you don’t understand XML and XPath. For a basic overview, you can take a detour to my presentation on TEI from the Discover DH workshop series, which contains an overview of XML.

What you will need to know for this exercise is that XML structures are perfectly nested and hierarchical, and you can navigate up and down that hierarchy using a XPath. If XML is like a tree, XPath is your way of moving up and down branches to twigs, jumping to other branches, or going back to the trunk.

For the purposes of this assignment, I am interested in specific divisions within Holinshed’s Chronicles—specifically, the ones that are labelled “chapter” and “section” by the encoders of the EEBO-TCP texts. The way that I would navigate from the root of the document to these two types of divisions is with the following XPath:

/tei:TEI//tei:div[@type='chapter'] | /tei:TEI//tei:div[@type='section']

(find me all the divisions with a value of “chapter” on the type attribute AND find me all the divisions with the value of “section” on the type attribute.)

Out of the box, R could not parse XPath, but the XML package that you installed at the beginning will allow you to select only those pieces from your documents.

Now we need to get the  XML content out of the two files in our “holinshed” directory. To do this, we will need to create a for loop. To start, create an empty list.

documents.list <- list()

This gives us a place to store the objects when the for loop finishes, and goes back to the beginning. Without the empty list, the content will just keep overwriting itself, so at the end you will only have the last object. So for example, I made the mistake of not creating an empty list while creating my for loop, and I kept only getting the divisions from the second volume of Holinshed’s Chronicles, since the second volume was overwriting the first.

Our for loop is now going to take every file in the “holinshed” directory and do the same thing to it. We begin a for loop like this:

for(i in 1:length(files)){
#the rest of the code goes here

This basically says for every object in 1 to however long the “files” object is (in this case “2”), do the following. Also, note that the pound sign indicates that that line is a comment and that it shouldn’t be processed as R code.

Now, within this for loop, we are going to specify what should be done to each file. We are going to create a document object using `xmlTreeParse` for each object within the “holinshed” directory.

document <- xmlTreeParse(file.path(directory, files[i]), useInternalNodes = TRUE) 

(If you find it hard to read long code on one line, you can put carriage returns. Just make sure that the returns happen at a logical place (like after a comma), and that the second line is indented. Spacing and indentation do matter in R. Unfortunately, WordPress isn’t allowing me to provide an example, but you can see how that would look in practice in the example R file provided in my eebo_r GitHub repository.)

The [i] in “files[i]” will be be where the numeric information will be stored on each loop. So the first loop will be files[1] and the second will be files[2] (which correspond to “holinshed-v1.xml and holinshed-v2.xml). If we had more than two xml files in this directory, the for loop would apply to all of those as well.

Next, you will use the empty list that you have created. Define each of the documents.l that corresponds to files[1] or files[2] (holinshed-v1.xml and holinshed-v2.xml, respectively) as being the nodeset that follows the XPath we created above. In other words, create a list of all of the divisions with a value on @type of “chapter” or “section” within each document.

documents.list[[files[i]]] <- getNodeSet(document, "/tei:TEI//tei:div[@type='chapter'] | /tei:TEI//tei:div[@type='section']", namespaces = c(tei="http://www.tei-c.org/ns/1.0"))

Ignore namespaces for now. They are important to understanding XML, but as long as you don’t have documents that contain multiple XML languages, you won’t need to worry as much about it. I can discuss the function and importance of namespaces in another post.

So, in the end, your full for loop will look like this:

for(i in 1:length(files.v)){
   document <- xmlTreeParse(file.path(directory, files.v[i]), useInternalNodes = TRUE)
   documents.l[[files.v[i]]] <- getNodeSet(document, "/tei:TEI//tei:div[@type='chapter'] | /tei:TEI//tei:div[@type='section']", 
        namespaces = c(tei="http://www.tei-c.org/ns/1.0"))
}

If you want to run multiple lines of code, you can highlight the entire for loop, and hit “ctrl+enter.” Alternately, you can put your cursor at the beginning of the for loop in the script pane, and click “option+command+E” on a mac, or go to the menu and click “code > run region > run from line to end” to run from that line to the end of the script. This is also useful if you ever save an R script and want to come back to it later, and start from where you left off. This way you don’t need to go back and run each line individually.

Now you should have a list with two items. Each item on this list is a node set (which is a specialized type of list). Rather than having documents.l being two nested lists, I want to convert each document into its own list. I did it with the following code. See if you can figure out what exactly is happening here:

holinshed1.l <- documents.l[[1]] 
holinshed2.l <- documents.l[[2]]

Now that I have two separate lists for each document, I want to concatenate them into a single, list of divisions. In R, you use `c` to concatenate objects:

both.documents <- c(holinshed1.l, holinshed2.l)

Now, if you check `length(both.documents)`, you should get 359. Your console will look like this

> length(both.documents)
359

Basically, what this means is that there are a total of 359 divisions in both documents that have a value on type of either “chapter” or “section.”

Now, you are going to want to return all of the paragraphs that are children of these two divisions.* To do this, we are going to need to create another for loop. This time, instead of creating an empty list, we will create an empty vector. I’m going to call this vector paras.lower.

paras.lower <- vector()

I’m going to give you the full code for selecting the contents (text, basically) of all of the paragraphs, and then explain it point-by-point after.

for(i in 1:length(both.documents)){
   paras <- xmlElementsByTagName(both.documents[[i]], "p")
   paras.words.v <- paste(sapply(paras, xmlValue), collapse = " ")
   paras.lower[[i]] <- tolower(paras.words.v)

This says for every object in 1 to the length of “both.documents” (which we determined was equivalent to 359 divisions), do the following:

Create an object called “paras” which will select all of the children of the node set “both.documents” with the tag name of “p.” On each loop, do this for one division within both.documents.

Now create another object (this time a vector), that essentially takes the content of paras (the text within all the <p> elements, stripping the nested tags) and collapses it into a vector.

Now take the vector you’ve created (all of the words from each paragraph within each division) and make the characters all lowercase.

This process may seem slightly confusing at first, especially if you are unfamiliar with what each piece is doing. If you are ever confused, you can type ?term into the console, and you will find the documentation for that specific aspect of R. So, for example, if you typed ?sapply, you’d see that sapply applies a given function over a list or vector (so essentially the same thing happens to multiple objects within a vector or list, without you needing to explicitly state what happens to each item).

Now that you have your character vector with the content of all of the paragraphs, you can start cleaning the text. The one problem is that paras.lower.v contains multiple vectors that need to be combined into one. You can do this by using the paste() function we used in the last few lines.

holinshed.all <- paste(paras.lower, collapse=" ", sep="\n") 

Now, if we ask for the length of holinshed.all, we see that it returns 1, instead of 359.

Now, we are going to use the tm package that we installed at the beginning. This package can facilitate a lot of types of analysis that we won’t cover in this post. We are going to simply use it to easily remove stopwords from our texts. Stopwords are commonly-occurring words that we may not want to include in our analysis, such as “the”, “a”, “when”, etc.

To do this, you are first going to create a corpus from your holinshed.all vector:

holinshed.corpus <- Corpus(VectorSource(holinshed.all))

Now you will remove stopwords from this corpus. You can use the following code to remove all English stopwords:

holinshed.corpus = tm_map(holinshed.corpus, removeWords, stopwords("english"))

However, with a corpus this big, R will run very slow (it will likely take upwards of 10 minutes to remove all the stopwords from your corpus). If you want to let it run and take a break here, feel free to do so. However, if you are impatient and would prefer to continue on right now, I have a premade text corpus in my R GitHub repository, which you can use instead of following the next step.

If you do want to remove the stopwords by yourself, run the above code, grab yourself a cup of coffee, work on some other writing projects for a bit, take a nap—whatever suits you best. Once the stopwords are removed, you will see a “>” once again in your console, and you can then type in

writeCorpus(holinshed.corpus, filenames ="holinshed.txt")

This will create a file that has all of the content of the paragraphs within the <div>s with the type value of “chapter” or “section” minus the stopwords.

**Impatient people who didn’t want to wait for the stopwords to get removed can start up again here**

Now that you have a text file with all of the relevant words from Holinshed’s Chronicles (holinshed.txt), we are going to analyze the frequencies of words within the corpus.

We are going to use the scan() function to get all of the characters in the Holinshed corpus.

holinshed <- scan("holinshed.txt", what="character", sep="\n")

This line of R will create an object called “holinshed” which contains all of the character data within holinshed.txt (the corpus you just created).

You will once again need to use the “paste” function to collapse all of the lines into one (as the line of code above separated the documents on each new line).

holinshed <- paste(holinshed, collapse=" ")

Now you will split this very long line of characters at the word level:

holinshed.words <- strsplit(holinshed, "\\W") 

This splits the strings of holinshed at the level of the word (“\\W”). If you attempt to show the first 10 items within holinshed.words (`holinshed.words[1:10]`), you will notice that it gives you a truncated version of the whole document, and then 9 NULLs. This is because strsplit converts your vector into a list, and then treats the whole document like the first item on that list. Using unlist(), we can create another character vector:

holinshed.words <- unlist(holinshed.words)

Now, if you enter `holinshed.words[1:10]`, you will see that it returns the first 10 words… but not quite. You will notice that there are a number of blank entries, which are represented by quote marks with no content. In order to remove these, we can say:

holinshed.words <- holinshed.words[which(holinshed.words!="")]

Now, if you enter holinshed.words[1:10], it will display the first 10 words:

[1] "read"     "earth"    "hath"     "beene"    "diuided"  "thrée"  
[7] "parts"    "euen"     "sithens"  "generall" 

In order to get the frequencies of the words within our corpus, we will need to create a table of holinshed.words. In R, this is incredibly simple:

holinshed.frequencies <- table(holinshed.words) 

Now, if you enter length(holinshed.frequencies), R will return 37086. This means that there are 37,086 unique strings (words) within Holinshed’s Chronicles. However, if you look at the first ten words in this table (`holinshed.frequencies[1:10]`), you will see that they are not words at all! Instead, the table has also returned numbers. Since I don’t care about numbers (you might, but you aren’t writing this exercise, are you?), I’m going to remove all of the numbers from my table. I determined that we start getting actual alphabetic words at position 895. So all you need to do is redefine holinshed.frequencies as being from position 895 to the end of the document.

holinshed.frequencies <- holinshed.frequencies[895:37086]

Now you can sort this frequency table so that the first values of the table are the most frequent words in the corpus:

holinshed.frequencies.sort <- sort(holinshed.frequencies, decreasing = TRUE)

Now, if you enter `holinshed.frequencies.sort[1:10]` to return a list of the most often used words in our Holinshed corpus.

If you want a graphic representation of this list, you can plot the top twenty words (or 15 or 10):

plot(holinshed.frequencies.sort[1:20])

This graph should show up in the right pane of your RStudio environment (unless you have it configured in a different way), and will show you a visual representation of the raw frequencies of words within our corpus.

Try it on your own!

  1. We analyzed the top 20 words for the two combined volumes of Holinshed’s Chronicles, but what would our top 20 words look like if we analyzed each text individually?
  2. If you look closely at the XML, you will notice that our original XPath (/tei:TEI//tei:div[@type=’chapter’] | /tei:TEI//tei:div[@type=’section’]) excludes a lot of content from the Chronicles. Specifically, it ignores any division without those type attributes. Further, using `xmlElementsByTagName` only selects the direct children of the node set, which excludes paragraphs that occur within divisions nested within chapters or sections (see, for example `<div type=”part”>`, which occurs occasionally within `<div type=”chapter”>` in volume I). Write code that selects the contents of all paragraphs.
  3. Words in the top 20 list like “doo,” “haue,” and “hir” would presumably be picked up by a stopwords list, if they had been spelled like their modern English equivalents. How could you get rid of a few of these nonstandard stopwords?

Check back to my eebo_r GitHub page for additional R exercises and tutorials using the EEBO-TCP corpus! And if you have any questions about this post or want to learn more about R, schedule a consultation with me.

Notes

* I specifically don’t say that you are looking for all the paragraphs within these divisions, because the code we are about to use only selects children, not descendants. Understanding the difference between these requires some knowledge of XPath and the structure of XML documents.

 

Open Education Week 2017

Open Education Week, March 27-31, is an opportunity to celebrate and raise awareness about the abundance of free and open educational resources (OER) available to teachers and learners around the world. OER are written by experts and often peer-reviewed, just like their commercial equivalents, but they are published under open copyright licenses so that they can be downloaded, distributed, and adapted for free. Many excellent examples of OER are available through online portals such as OpenStax College, the Open Textbook Library, OER Commons, BCcampus, and MERLOT.

To celebrate the growth of OER and the exciting opportunities they present, educational institutions from all over the world are coming together during Open Education Week to showcase what they are doing to make education more open, free, and available to everyone.

To mark the occasion at FSU, University Libraries and the Student Government Association are partnering to bring the #textbookbroke campaign to FSU. #Textbookbroke is a national campaign aimed at informing students about open textbooks, OER, and other low-cost alternatives to traditional textbooks. It is also aimed at empowering students to provide feedback on their course materials and encourage their instructors to explore more affordable alternatives. Stop by our event tables at Strozier Library on March 28th and Dirac Library on March 29th to share how much you spent on textbooks this term and learn about textbook affordability initiatives at FSU!

In addition, FSU Libraries will also announce the successful applicants for its Alternative Textbook Grants program, which was launched in late 2016 to support FSU instructors who are interested in adopting or remixing open textbooks and educational resources to replace commercial course materials. Based on the applications we have received thus far, participating instructors could save FSU students up to $100,000 by the spring of 2018!

For more information about the open education movement and related initiatives at FSU, see our research guide on OER, or contact Devin Soper, Scholarly Communications Librarian at FSU Libraries’ Office of Digital Research & Scholarship. And don’t forget to follow the conversation on Twitter! #textbookbrokeFSU

Automagical Repository Harvesting

Over the last couple of years, FSU Libraries dedicated librarians and staff to in-house development of an institutional repository platform that is open-source, flexible, and modular. I was hired as the full-time repository specialist for the Office of Digital Research and Scholarship recently and I quickly realized the strategic importance of the institutional repository concept: its purposes, benefits, and potential future impact intersects with the key issues surrounding libraries, technology, scholarly communications, and digital scholarship today.

One of my early tasks focused automating metadata harvesting from other repositories. Figuring out a time- and cost-efficient way to tackle the tracking and depositing of new publications is a key challenge in the field of scholarly communication today. Aside from the issue of how much time this takes per scholarly object, this framework lends itself to human error and, as a result for researchers, decreased scholarship discoverability, accessibility, and validity, which at times can be in tension with the overall goals and purposes of an institutional repository. Publicly accessible APIs provided by public repositories offer the chance to eliminate or greatly reduce the time it takes to process a deposit and the risk that bibliographic information will be inaccurately transferred from one system to another.

In response to this challenge, I have developed two tools to increase the efficiency of repository ingest. PMC Grabber is a PHP-based tool that uses PubMed Central’s APIs to programmatically search the PubMed Central database, pull metadata from the database, and transform the metadata for ingestion into FSU’s institutional repository. With this framework, the Libraries can run constructed searches every six or twelve months and stay on top of new publications from FSU researchers posted in PubMed without a hassle. While the tool does not fully automate the ingestion workflow from harvest to deposit, it significantly mitigates the time-intensive task of manually discovering and creating ingest records for individual articles.

PMC Grabber Workflow Diagram showing distinct steps, database table layout, and outcomes.
SQLite database management menu after using PMC Grabber.
SQLite database embargo table populated after a search using PMC Grabber.

The other tool, codenamed WOS (Web of Science) Grabber, combines a workflow using different tools and applications as well as the core concept of PMC Grabber. The goal is to capture all FSU-affiliated publications appearing in Web of Science with minimal participation necessary on the part of authors. Using a combination of Web of Science searches, Zotero, SHERPA/RoMEO API calls in Google Sheets, and OpenRefine, thousands of publications can be identified and staged for ingest. The end result of the workflow  is a set of publications that can be filtered to discover different sub-sets of articles: (1) those that can be deposited into an institutional repository as publisher versions with no author intervention; (2) those that can be deposited into an institutional repository as accepted manuscripts/final drafts; and (3) those that only allow pre-print versions to be deposited into institutional repositories. Using WOS Grabber I was able to quickly and easily identify over 2,000 articles published in 2016 affiliated with FSU. 500 of these articles (a good 25% of all Web of Science indexed scholarship from FSU!) were open access and were immediately added to our ingestion queue, and a little more than 1500 of the articles were identified as allowing final draft deposit into a repository.

Overall, my involvement with this projects has been positive and signals a promising future for repository managers looking to leverage emerging technologies and centralized repositories. My experiences suggest that through the use of new tools and technologies, what is still being described as an unmanageable goal is quickly becoming a feasible solution for institutional repositories. Libraries with sufficient resources (in terms of skilled personnel and funding) should continue to push the envelope in this area and discover different ways to improve repository workflow efficiency and, ultimately, user access to scholarship. If my experiences are any indication, an investment in and a focus on this kind of work will have great returns for everyone involved.