During the Fall ’21 and Spring ’22 semesters, I served as a Graduate Research Assistant with the Office of Digital Research and Scholarship (DRS) at FSU Libraries. Collaborating with Matthew Hunter, the Digital Scholarship Librarian, I worked to increase FSU Libraries’ support of research services that utilize 3D scanning and modeling, 3D printing, and extended reality technologies. Working on various immersive scholarship- and digital humanities-based projects, including a self-curated exhibition, has made this one of the most memorable experiences of my graduate student career!
Continue reading My Time as the Immersive Scholarship Graduate Research AssistantTag: digital humanities
DH Currents: The Black Women’s Suffrage Digital Collection
Land Acknowledgement: Florida State University is located on land that is the ancestral and traditional territory of the Apalachee Nation, the Muscogee Creek Nation, the Miccosukee Tribe of Florida, and the Seminole Tribe of Florida. We pay respect to their Elders past and present and extend that respect to their descendants and to all Indigenous people. We recognize that this land remains scarred by the histories and ongoing legacies of colonial violence, dispossession, and removal. In spite of all of this and with tremendous resilience, these Indigenous nations have remained deeply connected to this territory, to their families, to their communities, and to their cultural ways of life. We recognize the ongoing relationships of care that these Indigenous Nations maintain with this land and we extend our gratitude as we live and work as respectful guests upon their territory. We encourage you to learn about and amplify the contemporary work of the Indigenous nations whose land you are on and to endeavor to support Indigenous sovereignty in all the ways that you can.
“DH Currents” is a blog series conceived in the summer of 2020 by the members of the Office of Digital Research and Scholarship at Florida State University Libraries. The goal of the series is to identify and highlight digital scholarship projects that take-up anti-racist and decolonial causes as part of their methodologies, content, and intentions. These initiatives foreground the principles of inclusion, truth telling, and dismantling the often oppressive practices of academic and cultural heritage preservation work. Each post will include a project description in addition to input gathered from its principal investigators, maintainers, and other participants. This series aspires to provide both a platform to share information about these scholars’ important contributions to the field of digital scholarship and to spark dialogue on topics related to ways academic libraries and other memory institutions can engage in this urgent, necessary labor.
Continue reading DH Currents: The Black Women’s Suffrage Digital CollectionIt All Starts Here: Digital Scholarship @ FSU
This semester I set to the task of conducting an environmental scan of digital scholarship at FSU, focusing specifically on projects, faculty, and researchers incorporating various kinds of audio-visual media, tools, and platforms into their work. This project, building off my previous research in digital humanities initiatives using audio-visual media outside the University and the growing interest in such projects in the DH field at large, attempts to identify new horizons and domains for DRS to explore.
The goals of this undertaking lie somewhere between generating a possible blueprint for preservation and access to such projects (a goal traditionally sought by archives or media labs) and making new connections for FSU’s Office of Digital Research and Scholarship (DRS) which is a goal aligned with this emerging entity in academic libraries we are calling digital scholarship centers (Lippincott, et al 2014). Over the course of the semester, I’ve spoken with ethnomusicologists, new media artists, choreographers, digital humanities scholars, GIS experts, digital archivists, and web developers (just to name a few) with the hopes of finding common threads to weave into a shared infrastructure of AV media-focused resources for library collaborations. Although daunting, the value of such an environmental scan has been concisely articulated by E. Leigh Bonds:
I was less interested in labeling [the research of faculty at Ohio State University] than I was in learning what researchers were doing or wanted to do, and what support they needed to do it. Ultimately, I viewed the environmental scan as the first step towards coordinating a community of researchers (2018).
Bonds’ mission of “coordinating a community” is especially apt considering the wide array of scholarship happening at Florida State University. Despite differences in disciplines, approaches, and aims, the use of digital technologies in working with AV media has become a ubiquitous necessity that requires distinct but often overlapping tools and skill-sets. The digital scholarship center, as noted by Christina Kamposiori, operating under a “hub and spoke” organizational model, can effectively serve as a networking node and site of scholarly intersections and cross-pollination (2017).
Such an arrangement, eclipsing traditional conceptions of the library as simply a book repository or service center, better positions library faculty and staff to exercise their knowledge and expertise as technologist partners in scholarly projects working with digital AV content while also enhancing the research ecosystem through developing shared resources. This setup, while dependent on many complex factors, is attainable if the digital scholarship center can effectively check and track the pulse of its community of researchers, identifying their areas of interest, needs, and prospective directions. For DRS, some observations drawn from my environmental scan seems like a good place to begin.
One genre of support DRS and other library units working with digital media can begin to cultivate is providing documentation, preservation, and data management frameworks for digital projects whose final form exists outside traditional “deliverables” of academic scholarship (i.e. print-based publications, and the like). These can be “new media” objects like e-publications and websites, or more complex outputs like performances and/or artworks incorporating many different layers of digital technologies. The work of Tim Glenn, Professor in the School of Dance, is a great example of this kind of intricate digital scholarship which blends choreographic craft and technical execution to create captivating performances. One piece in particular, Triptych (2012), relies on the coordinated interaction between dancers’ bodies, cameras, projectors, and pre-edited video to create what Glenn calls “a total theater experience.”
The amount of digital data and infrastructure that goes into such a project is a bit staggering when we consider the lattice of capture and projection video signals, theater AV technology, lighting control signals, and creating the video documentation of the performance space itself. Glenn’s website is a testament to his own stellar efforts to capture and document these features of the work, but as many archivists and conservators will attest, this level of artist-provided documentation is often not the case (Rinehart & Ippolito, Chapters 1-2, 2014). With this kind of complex digital scholarship, DRS can develop models along a spectrum, either directly with researchers on developing documentation plans and schemas from the ground-up (see examples of such work from The Daniel Langlois Foundation and Matters in Media Art) or serving as a conduit for depositing these digital objects into FSU’s scholarship repository, DigiNole, to ensure their long-term accessibility.
Of course, the other side of the coin is the maintenance, compatibility, and sustainability of such platforms and repositories at the University. DigiNole, built on the Islandora open-source software framework, is the crown jewel of FSU’s digital collections. It serves as the access point to the digital collections of FSU libraries as well as the University’s research repository and green OA platform for works created by faculty, staff, and students. An incredibly valuable and integral part of the library’s mission, Diginole has the advantage being built on an extensible, open-source platform that can be expanded to accommodate a wide variety of digital objects (not to mention that it is also maintained by talented and dedicated librarians, developers, and administrators).
As such, DigiNole can play an equally integral role in data management and documentation projects as a repository of complex, multifaceted digital objects. The challenge will be normalizing data into formats that retain the necessary information or “essence” of the original data while also ensuring compatibility with the Islandora framework. Based on my conversation with FSU’s Digital Archivist, Krystal Thomas, another, more long-term, goal to enhance the digital preservation infrastructure of the library will be implementing a local instance of Archivematica, another open-source software framework that is specifically designed to address the unique challenges of long-term digital preservation of complex media. Another step the University can potentially take in increasing this infrastructure across campus is to seek out a trusted data repository certification. For those of us working in digital scholarship centers, these kinds of aspirations will always be moving targets, as is the nature of the technological landscape. But having a strongest possible grasp on the local needs and conditions of the scholastic community we work with will allow both librarians and administration to channel resources and energy into initiatives that have the highest and most palpable impacts and benefits.
Ultimately, the kind of infrastructure DRS or any other academic unit wishes to build should be in response to the needs of its scholars and foster solutions that have cross-disciplinary applications and implications. Whether generating data management plans, developing scholarly interfaces, or building out our homegrown digital repositories, an R1 institution like Florida State University needs systems that account for the wide variety of scholarship happening both on-campus and at its many satellite and auxiliary facilities. Looking towards the future, we can glimpse the kind of fruitful digital scholarship happening at FSU in the work of undergraduates like Suzanne Raybuck. Her contributions to Kris Harper and Ron Doel’s Exploring Greenland project and whose fascinating personal research on the construction of digital narratives in video games represent promising digital scholarship that bridges archival, humanities, and pedagogical research. Hopefully DRS and its partner organizations can keep pace with such advancements and continue to improve its services and scope of partnerships.
Acknowledgments
Enormous thank you to the entire staff of FSU’s Office of Digital Research and Scholarship for allowing me the space to pursue this research over the past year, namely Sarah Stanley, Micah Vandegrift, Matt Hunter, Devin Soper, Rachel Smart, and Associate Dean Jean Phillips. Thanks to Professor Tim Glenn and Assistant Professor Hannah Schwadron in the School of Dance, Assistant Professors Rob Duarte and Clint Sleeper in the College of Fine Arts, Assistant Professor Sarah Eyerly in the College of Music, doctoral candidate Mark Sciuchetti in the Department of Geography, Krystal Thomas, Digital Archivist at Special Collections & Archives, and Presidential/UROP Scholar Suzanne Raybuck for your time, contributions, and conversations that helped shape this research.
WORKS CITED
Bonds, E. L. (2018) “First Things First: Conducting an Environmental Scan.” dh+lib, “Features.” Retrieved from: http://acrl.ala.org/dh/2018/01/31/first-things-first-conducting-an-environmental-scan/
Kamposiori, C. (2017) The role of Research Libraries in the creation, archiving, curation, and preservation of tools for the Digital Humanities. Research Libraries UK. Retrieved from http://www.rluk.ac.uk/news/rluk-report-the-role-of-research-libraries-in-the-creation-archiving-curation-and-preservation-of-tools-for-the-digital-humanities/
Lippincott, J., Hemmasi, H. & Vivian Lewis (2014) “Trends in Digital Scholarship Centers.” EDUCAUSE Review. Retrieved from https://er.educause.edu/articles/2014/6/trends-in-digital-scholarship-centers
Rinehart, R. & Ippolito, J. (2014) Re-Collection: Art, New Media, and Social Memory. The MIT Press: Cambridge, Massachusetts.
Using R on Early English Books Online
In order to follow along with this post you will need:
- Basic knowledge of the Text Encoding Initiative guidelines for marking up texts.
- Understanding of the structure of XML and the basics of XPath.
- Some experience with Regular Expressions is helpful, but not necessary.
- A willingness to learn R!
A few months ago, I started working through Matt Jockers’ Text Analysis with R for Students of Literature. I wanted to improve my text analysis skills, especially since I knew we would be acquiring the EEBO-TCP phase II texts, which contain text data for thousands of early modern English texts (if you are an FSU student or faculty member and you want access to these files, email me). To start, I decided to do some analysis on Holinshed’s Chronicles, which are famous for their impact on Shakespeare’s history plays. While I have been able to create a few basic analyses and visualizations with this data, I’m still learning and expanding my understanding of R. If you ever want to work through some of the ins-and-outs (or would prefer an in-person consultation on R), you should attend the Percolator from 3-5 on Wednesdays in Strozier or email me to schedule a consultation. We will also be holding a text analysis workshop from 10-11 on April 14.
I am going to be working from two of the EEBO TCP phase I texts, since these are currently open access. You can download the entire corpus for phase one in SGML format: https://umich.app.box.com/s/nfdp6hz228qtbl2hwhhb. I’ve used a stylesheet generated by the TEI council to transform the files into TEI P5-compliant XML files. You can get the example files on my GitHub page (along with the finalized code). Alternately, you can get all of the P5-compliant TEI files directly from the Text Creation Partnership Github.
If you want to follow along with this blog post, do the following:
Step 1. Get your texts. Go to my GitHub page and download holinshed-v1.xml and holinshed-v2.xml. Put them in a directory that you can easily find (I have mine on my desktop in a directory called “holinshed” within another directory called “eebo_r”).
Step 2. Download R and R Studio, as outlined in our Text Analysis libguide.
Step 3. Set Working Directory. Open R Studio, and type setwd(“”), where the path to the folder you created is contained within the quotes. On a Mac, your path will likely look something like this:
setwd("~/Desktop/eebo_r")
And on Windows it will look something like:
setwd("C:/Users/scstanley/Desktop/eebo_r")
(Note that you shouldn’t use a “\” character for windows filepaths, even though that is standard. Forward slashes are considered escape characters in R.)
You can either type this into the script pane or in the console. My script pane is on the top-left, but yours may be somewhere else within your RStudio Environment. If you are on a Mac, hit “ctrl+enter” Note: I am using the script pane to edit my code, and hitting ctrl + enter to have it run in the console. If you just want to run your code in the console without saving it as a script, you can type directly into the console.
Step 4. Install the XML and Text Mining packages. Go to Tools > Install Packages and type “XML” (all uppercase) into the Packages text field. Click “Install.” Do the same with “tm” (all lowercase). You could also enter install.packages(“tm”) and install.packages(“XML”) into your console with the same effect.
Step 5. Now that you have the XML and text mining package installed, you should call them into the session:
library(XML) library(tm)
Again, hit ctrl+enter.
Now you’re ready to get started working with R!
Remember from the beginning of this post that I created a directory within my working directory (“~/Desktop/eebo_r”) to store the files I want to analyze in. I called this directory “holinshed”. I am going to create an object called `directory` that references that filepath. To do this, I’m going to use an assignment operator (`<-`). This gets used quite frequently in R to assign some more complex or verbose object another name. In this case, we will say:
directory <- "holinshed"
Now, we want to get all of the files within that directory:
files <- dir(path=directory, pattern=".*xml")
This line of code sets another object called “files” which follows the directory we set with the “directory” object, and finds all of the objects within that directory that end in “.xml” (all of the XML files).
This is where things can get a little confusing if you don’t understand XML and XPath. For a basic overview, you can take a detour to my presentation on TEI from the Discover DH workshop series, which contains an overview of XML.
What you will need to know for this exercise is that XML structures are perfectly nested and hierarchical, and you can navigate up and down that hierarchy using a XPath. If XML is like a tree, XPath is your way of moving up and down branches to twigs, jumping to other branches, or going back to the trunk.
For the purposes of this assignment, I am interested in specific divisions within Holinshed’s Chronicles—specifically, the ones that are labelled “chapter” and “section” by the encoders of the EEBO-TCP texts. The way that I would navigate from the root of the document to these two types of divisions is with the following XPath:
/tei:TEI//tei:div[@type='chapter'] | /tei:TEI//tei:div[@type='section']
(find me all the divisions with a value of “chapter” on the type attribute AND find me all the divisions with the value of “section” on the type attribute.)
Out of the box, R could not parse XPath, but the XML package that you installed at the beginning will allow you to select only those pieces from your documents.
Now we need to get the XML content out of the two files in our “holinshed” directory. To do this, we will need to create a for loop. To start, create an empty list.
documents.list <- list()
This gives us a place to store the objects when the for loop finishes, and goes back to the beginning. Without the empty list, the content will just keep overwriting itself, so at the end you will only have the last object. So for example, I made the mistake of not creating an empty list while creating my for loop, and I kept only getting the divisions from the second volume of Holinshed’s Chronicles, since the second volume was overwriting the first.
Our for loop is now going to take every file in the “holinshed” directory and do the same thing to it. We begin a for loop like this:
for(i in 1:length(files)){ #the rest of the code goes here }
This basically says for every object in 1 to however long the “files” object is (in this case “2”), do the following. Also, note that the pound sign indicates that that line is a comment and that it shouldn’t be processed as R code.
Now, within this for loop, we are going to specify what should be done to each file. We are going to create a document object using `xmlTreeParse` for each object within the “holinshed” directory.
document <- xmlTreeParse(file.path(directory, files[i]), useInternalNodes = TRUE)
(If you find it hard to read long code on one line, you can put carriage returns. Just make sure that the returns happen at a logical place (like after a comma), and that the second line is indented. Spacing and indentation do matter in R. Unfortunately, WordPress isn’t allowing me to provide an example, but you can see how that would look in practice in the example R file provided in my eebo_r GitHub repository.)
The [i] in “files[i]” will be be where the numeric information will be stored on each loop. So the first loop will be files[1] and the second will be files[2] (which correspond to “holinshed-v1.xml and holinshed-v2.xml). If we had more than two xml files in this directory, the for loop would apply to all of those as well.
Next, you will use the empty list that you have created. Define each of the documents.l that corresponds to files[1] or files[2] (holinshed-v1.xml and holinshed-v2.xml, respectively) as being the nodeset that follows the XPath we created above. In other words, create a list of all of the divisions with a value on @type of “chapter” or “section” within each document.
documents.list[[files[i]]] <- getNodeSet(document, "/tei:TEI//tei:div[@type='chapter'] | /tei:TEI//tei:div[@type='section']", namespaces = c(tei="http://www.tei-c.org/ns/1.0"))
Ignore namespaces for now. They are important to understanding XML, but as long as you don’t have documents that contain multiple XML languages, you won’t need to worry as much about it. I can discuss the function and importance of namespaces in another post.
So, in the end, your full for loop will look like this:
for(i in 1:length(files.v)){ document <- xmlTreeParse(file.path(directory, files.v[i]), useInternalNodes = TRUE) documents.l[[files.v[i]]] <- getNodeSet(document, "/tei:TEI//tei:div[@type='chapter'] | /tei:TEI//tei:div[@type='section']", namespaces = c(tei="http://www.tei-c.org/ns/1.0")) }
If you want to run multiple lines of code, you can highlight the entire for loop, and hit “ctrl+enter.” Alternately, you can put your cursor at the beginning of the for loop in the script pane, and click “option+command+E” on a mac, or go to the menu and click “code > run region > run from line to end” to run from that line to the end of the script. This is also useful if you ever save an R script and want to come back to it later, and start from where you left off. This way you don’t need to go back and run each line individually.
Now you should have a list with two items. Each item on this list is a node set (which is a specialized type of list). Rather than having documents.l being two nested lists, I want to convert each document into its own list. I did it with the following code. See if you can figure out what exactly is happening here:
holinshed1.l <- documents.l[[1]] holinshed2.l <- documents.l[[2]]
Now that I have two separate lists for each document, I want to concatenate them into a single, list of divisions. In R, you use `c` to concatenate objects:
both.documents <- c(holinshed1.l, holinshed2.l)
Now, if you check `length(both.documents)`, you should get 359. Your console will look like this
> length(both.documents) 359
Basically, what this means is that there are a total of 359 divisions in both documents that have a value on type of either “chapter” or “section.”
Now, you are going to want to return all of the paragraphs that are children of these two divisions.* To do this, we are going to need to create another for loop. This time, instead of creating an empty list, we will create an empty vector. I’m going to call this vector paras.lower.
paras.lower <- vector()
I’m going to give you the full code for selecting the contents (text, basically) of all of the paragraphs, and then explain it point-by-point after.
for(i in 1:length(both.documents)){ paras <- xmlElementsByTagName(both.documents[[i]], "p") paras.words.v <- paste(sapply(paras, xmlValue), collapse = " ") paras.lower[[i]] <- tolower(paras.words.v) }
This says for every object in 1 to the length of “both.documents” (which we determined was equivalent to 359 divisions), do the following:
Create an object called “paras” which will select all of the children of the node set “both.documents” with the tag name of “p.” On each loop, do this for one division within both.documents.
Now create another object (this time a vector), that essentially takes the content of paras (the text within all the <p> elements, stripping the nested tags) and collapses it into a vector.
Now take the vector you’ve created (all of the words from each paragraph within each division) and make the characters all lowercase.
This process may seem slightly confusing at first, especially if you are unfamiliar with what each piece is doing. If you are ever confused, you can type ?term into the console, and you will find the documentation for that specific aspect of R. So, for example, if you typed ?sapply, you’d see that sapply applies a given function over a list or vector (so essentially the same thing happens to multiple objects within a vector or list, without you needing to explicitly state what happens to each item).
Now that you have your character vector with the content of all of the paragraphs, you can start cleaning the text. The one problem is that paras.lower.v contains multiple vectors that need to be combined into one. You can do this by using the paste() function we used in the last few lines.
holinshed.all <- paste(paras.lower, collapse=" ", sep="\n")
Now, if we ask for the length of holinshed.all, we see that it returns 1, instead of 359.
Now, we are going to use the tm package that we installed at the beginning. This package can facilitate a lot of types of analysis that we won’t cover in this post. We are going to simply use it to easily remove stopwords from our texts. Stopwords are commonly-occurring words that we may not want to include in our analysis, such as “the”, “a”, “when”, etc.
To do this, you are first going to create a corpus from your holinshed.all vector:
holinshed.corpus <- Corpus(VectorSource(holinshed.all))
Now you will remove stopwords from this corpus. You can use the following code to remove all English stopwords:
holinshed.corpus = tm_map(holinshed.corpus, removeWords, stopwords("english"))
However, with a corpus this big, R will run very slow (it will likely take upwards of 10 minutes to remove all the stopwords from your corpus). If you want to let it run and take a break here, feel free to do so. However, if you are impatient and would prefer to continue on right now, I have a premade text corpus in my R GitHub repository, which you can use instead of following the next step.
If you do want to remove the stopwords by yourself, run the above code, grab yourself a cup of coffee, work on some other writing projects for a bit, take a nap—whatever suits you best. Once the stopwords are removed, you will see a “>” once again in your console, and you can then type in
writeCorpus(holinshed.corpus, filenames ="holinshed.txt")
This will create a file that has all of the content of the paragraphs within the <div>s with the type value of “chapter” or “section” minus the stopwords.
**Impatient people who didn’t want to wait for the stopwords to get removed can start up again here**
Now that you have a text file with all of the relevant words from Holinshed’s Chronicles (holinshed.txt), we are going to analyze the frequencies of words within the corpus.
We are going to use the scan() function to get all of the characters in the Holinshed corpus.
holinshed <- scan("holinshed.txt", what="character", sep="\n")
This line of R will create an object called “holinshed” which contains all of the character data within holinshed.txt (the corpus you just created).
You will once again need to use the “paste” function to collapse all of the lines into one (as the line of code above separated the documents on each new line).
holinshed <- paste(holinshed, collapse=" ")
Now you will split this very long line of characters at the word level:
holinshed.words <- strsplit(holinshed, "\\W")
This splits the strings of holinshed at the level of the word (“\\W”). If you attempt to show the first 10 items within holinshed.words (`holinshed.words[1:10]`), you will notice that it gives you a truncated version of the whole document, and then 9 NULLs. This is because strsplit converts your vector into a list, and then treats the whole document like the first item on that list. Using unlist(), we can create another character vector:
holinshed.words <- unlist(holinshed.words)
Now, if you enter `holinshed.words[1:10]`, you will see that it returns the first 10 words… but not quite. You will notice that there are a number of blank entries, which are represented by quote marks with no content. In order to remove these, we can say:
holinshed.words <- holinshed.words[which(holinshed.words!="")]
Now, if you enter holinshed.words[1:10], it will display the first 10 words:
[1] "read" "earth" "hath" "beene" "diuided" "thrée" [7] "parts" "euen" "sithens" "generall"
In order to get the frequencies of the words within our corpus, we will need to create a table of holinshed.words. In R, this is incredibly simple:
holinshed.frequencies <- table(holinshed.words)
Now, if you enter length(holinshed.frequencies), R will return 37086. This means that there are 37,086 unique strings (words) within Holinshed’s Chronicles. However, if you look at the first ten words in this table (`holinshed.frequencies[1:10]`), you will see that they are not words at all! Instead, the table has also returned numbers. Since I don’t care about numbers (you might, but you aren’t writing this exercise, are you?), I’m going to remove all of the numbers from my table. I determined that we start getting actual alphabetic words at position 895. So all you need to do is redefine holinshed.frequencies as being from position 895 to the end of the document.
holinshed.frequencies <- holinshed.frequencies[895:37086]
Now you can sort this frequency table so that the first values of the table are the most frequent words in the corpus:
holinshed.frequencies.sort <- sort(holinshed.frequencies, decreasing = TRUE)
Now, if you enter `holinshed.frequencies.sort[1:10]` to return a list of the most often used words in our Holinshed corpus.
If you want a graphic representation of this list, you can plot the top twenty words (or 15 or 10):
plot(holinshed.frequencies.sort[1:20])
This graph should show up in the right pane of your RStudio environment (unless you have it configured in a different way), and will show you a visual representation of the raw frequencies of words within our corpus.
Try it on your own!
- We analyzed the top 20 words for the two combined volumes of Holinshed’s Chronicles, but what would our top 20 words look like if we analyzed each text individually?
- If you look closely at the XML, you will notice that our original XPath (/tei:TEI//tei:div[@type=’chapter’] | /tei:TEI//tei:div[@type=’section’]) excludes a lot of content from the Chronicles. Specifically, it ignores any division without those type attributes. Further, using `xmlElementsByTagName` only selects the direct children of the node set, which excludes paragraphs that occur within divisions nested within chapters or sections (see, for example `<div type=”part”>`, which occurs occasionally within `<div type=”chapter”>` in volume I). Write code that selects the contents of all paragraphs.
- Words in the top 20 list like “doo,” “haue,” and “hir” would presumably be picked up by a stopwords list, if they had been spelled like their modern English equivalents. How could you get rid of a few of these nonstandard stopwords?
Check back to my eebo_r GitHub page for additional R exercises and tutorials using the EEBO-TCP corpus! And if you have any questions about this post or want to learn more about R, schedule a consultation with me.
Notes
* I specifically don’t say that you are looking for all the paragraphs within these divisions, because the code we are about to use only selects children, not descendants. Understanding the difference between these requires some knowledge of XPath and the structure of XML documents.
Invisible Work, Fungible Labor
With the approaching Symposium on Invisible Work in the Digital Humanities, I’ve been thinking increasingly about my transition from graduate work in a “traditional academic department” to working in a library. As a graduate student, I was aware of the fact that my work was rendered invisible by the fact that it was often not treated as work. Indeed, until very recently, graduate assistantships at private universities were not treated as real employees. And often graduate students are ineligible to become PIs on grants, or receive other opportunities that would allow them to advance in the field. Central to the idea that graduate students don’t “do real work” is the idea that their labor and research is somehow secondary or derivative of “real work” done by faculty. Even in the digital humanities, graduate labor is figured as research assistantships, project management positions, and coordination.
The issue of “centrality” in a research project (especially a funded research project in which there are “principal investigators”) is a problem for DH researchers in libraries as well as for graduate students. As a recent article in Digital Humanities Quarterly entitled “Student Labour and Training” points out, graduate student research outputs often come in the form of less academically viable formats (like blog posts and social media). The authors note that students’ “lack of involvement in the dissemination of project outcomes […] prevents both students and the academic field as a whole from seeing student research as tantamount to faculty research.” Arguably, the traditional outputs of conference papers and single- or co-authored publications allow students more room to diverge from the PI’s stated goals for the project. The idea that students could be writing and generating scholarly products that expand upon, rather than simply feed into, a faculty members’ stated goals is somewhat jarring in an academic landscape. To many, graduate students are apprentices rather than budding practitioners in their own right.
As I moved into the realm of practitioner (in the sense that I was considered a valid employee by FLSA and NLRA), I began to realize that, while some issues of labor disappeared, the issue of centrality to research remained. I have had the good fortune to work in a library that is open to exploring digital scholarship, and has indeed encouraged my efforts in the digital humanities. Yet, there is a still-persistent underlying question about the utility of some of the work I have done: “How are you serving the existing needs of the scholarly community?” Often, especially when new initiatives have been posed, the immediate question has been “Have you done a climate survey?” or “What are the preexisting needs of the campus community?” My reaction to this sentiment has been similar to that of Dot Porter’s to the OCLC report “Does Every Research Library Need a Digital Humanities Center”:
It is galling for these professionals to be told, as they are in the OCLC report, that “the best decision is to observe what the DH academics are already doing and then set out to address gaps,” and “What are the DH research practices at your institution, and what is an appropriate role for the library? What are the needs and desires of scholars, and which might your library address?” and especially “DH researchers don’t expect librarians to know everything about DH, and librarians should not presume to know best [my italics].” What if the librarians are the DH researchers? What if we do, in fact, know best? Not because we are brilliant, and not because we are presumptuous, but because we have been digital humanists for a while ourselves so we know what it entails?
I understand the impulse from librarians to take their cues from researchers in more “traditional” academic departments, especially considering the fact that library and information science is considered a social science, where climate surveys, environmental scans, and other such methodologies are common. However, the fact is that in the context of digital humanities, librarianship and information science as disciplines have greatly influenced the types of intellectual work that is being done in the field. To artificially remove this influence from the equation is a disservice both to librarians and to potential collaborators.
Part of this problem comes back to the issue of “centrality” I mentioned with graduate work. Acting as if the library’s (or a librarian’s) goals should be derived from the goals of faculty limits the potential impact of scholarship from librarians, either through limiting the media or venue through which it can be disseminated or limiting the findings it is allowed to make. And it’s not just the idea that librarians should be in service to faculty; it’s the idea that libraries (as organizations) generate priorities based on faculty priorities, which then filter seamlessly down to the librarians doing on-the-ground work. When talking about the complexities of librarians’ work (or service), Trevor Muñoz points out the significance of the venue of publication for the first major special issue on digital humanities librarianship: “Attending critically to this context means noting that this very welcome special issue on digital humanities and libraries was published in journal devoted to library administration” (emphasis in original). However, I would like to point out the significance of framing digital humanities as, primarily, a discussion for library administrators. It is, of course. However, it also contributes to the idea of DH in libraries as being a top-down issue, rather than one that is done in exploratory ways by librarians that feeds up into wider library (and, yes, university) goals.
Even the promotional materials for the Invisible Work Symposium betrays some of the underlying sentiment about the role that libraries play in the wider university community. From the announcement:
Imagine, for example, a typical project between a professor of history and a university digital scholarship center. Is the digital scholarship center simply providing a service, or are they considered an equal partner in the work? […] Similarly, the digital scholarship center might be thinking about recycling the resulting code for use in other projects, contributing to broader digital scholarly efforts, and so on.
In this scenario, the labor of the “digital scholarship center” is always collectivized and always working with the intention of feeding into broader efforts. The assumption that there is always one mission for a group of library staff and that this mission is univalent and universally agreed-upon. I think that this view reduces the impact that individual librarians actually play in research projects. Which is not to say that libraries don’t have unified (and often stated) goals. Libraries frequently use strategic initiatives to promote specific areas, focus collection development and digitization around specific subjects, and play to the strengths of their employees and the wider university community. However, I’d like to posit that this is no different than how departments look for candidates in key areas or conduct cluster hires for faculty positions.
I think the main problem is that flattening the various perspectives and individual research interests of librarians exacerbates perceptions of library staff as “in service.” By acting as if librarians prioritize research solely upon the basis of administrative-level or department-wide mandates, we are basically saying that the work of librarians is fungible: “Anyone who can do this prescribed work in a procedural manner is qualified to do this job.” In treating the laborers who build and sustain infrastructure, design metadata schemas, and preserve and provide access to research as essentially fungible we are treating library spaces as neutral and failing to acknowledge the rhetorical and political impact of universities as sites of knowledge production. Pushing back against this notion is especially critical in a time when administrators see libraries as primarily empty student space, and when outsiders ask “Why do you need libraries/librarians when you have Google?”
Since so many of the methods from the digital humanities are the intellectual descendents of research done in library and information science, it makes sense that librarians would own their intellectual contributions to DH work. In order to give librarians the institutional power to assert their ownership of their research, it is essential for us to acknowledge that library employees’ research agendas are not simply derivative of wider library goals (generated in some sort of nondescript aether of environmental scans). Rather the opposite is the case: the research interests of individual employees are essential to shaping the type of work that is done at an institutional level.
Getting Started: Four Tools to LEAD Your Research
Digital research and scholarship is a developing and exciting field – and there are equally many new and exciting tools to choose from. LEAD (Locate, Enhance, Aggregate and Demonstrate) your research to success by using the platforms outlined below! Continue reading Getting Started: Four Tools to LEAD Your Research
Discover DH: An Introduction to Digital Humanities Theories and Methods
For budding digital humanists, it can often be difficult to know what you need to learn. On top of writing for courses, exams, presentations, and learning the traditional work of your field, you now need to learn a series of unfamiliar methods and terms (many of them opaque acronyms: RDF, TEI, JSON). Even knowing where to ask for help is a challenge, since DH resources are frequently scattered across campus.
If you’re attuned to channels of communication in the digital humanities, you’ve probably seen a lot of learning opportunities this summer: DHSI in Victoria, HILT in Indiana, the DH conference (in Kraków this year). All of these are excellent places to immerse yourself in the field of digital humanities and to learn about the great work current scholars in the field are doing. There’s only one problem: these conferences and training events are prohibitively expensive. Even with scholarships and waived tuition, it can be very difficult to get yourself across the country (or the globe!) to learn about DH, especially if you’re in school.
This is why the Office of Digital Research and Scholarship is offering a 10-week workshop series on topics in the digital humanities. These classes are designed with busy students and scholars in mind. We will be offering two sessions per each weekly course, with one session in Strozier library and another in a different building on campus. The workshops are divided into “hack” and “yack”: sessions that are discussion-based and sessions focusing on learning a new tool or DH skill, respectively.
We’ll be offering sessions on the following topics:
- Getting Started in the Digital Humanities
- Markdown and GitHub
- Managing Digital Projects
- Text Analysis and Visualization
- Copyright and Digital Projects
- Introduction to Text Encoding
- Digital Tools in the Classroom
- Network Visualization
- Mapping
- Publishing in the Digital Humanities
More details about the individual sessions and scheduling are at the Digital Research and Scholarship website. You can also register for individual workshops on our calendar.
Come join us in exploring this exciting new area!
Anno Discipuli: A Digital Scholarship Internship in Review
*Micah’s Note: This is a guest post from Matt Hunter, who interned with me this past academic year. Matt’s enthusiasm and knowledge were an incredible asset to DRS this year, as we established our office, hosted events, and kicked off our program of digital humanities support. His time and efforts provided great momentum, and we’ll be the lesser without him. Good luck Matt!
Over the past academic year, my last in the MLIS program at FSU’s iSchool, I worked as an intern in the Office of Digital Research and Scholarship (DRS). I joined the DRS in the summer of 2015 after connecting with classmate Camille Thomas, who worked under Micah at the time. Camille spoke of some cool vaguely-digital projects underway down in the Stygian depths of Strozier Library’s basement, and I asked if I couldn’t join in on the fun.
My introduction to digital work in libraries was a weekly “R&D” group meeting where some nascent DRS staff brainstormed media-rich mapping projects to build a portfolio of a tools and services to support. Although the project ended up falling through a couple weeks later, it was still cool to play around with HistoryPin and the library’s digital collections. Looking back now, that brainstorm session couldn’t have been more ideal; we covered a variety of digital scholarship topics in one meeting – content development, tool analysis and selection, copyright considerations, to name just a few – and even had to deal with what happens when a project fails! After two semesters working on projects, reviewing tools, writing proposals, and learning more about modern librarianship than I could imagine, I still think back to that first discussion as a distinct turning point in my library career.
As a new office in a new area of librarianship, DRS is somewhat unique and unfettered by tradition and long-established departmental rules or guidelines. This freedom allowed me to blaze my own path in projects, and be part of a creative support network spreading across campus. I jumped onto as many projects as I feasibly could, and covered a pretty wide spread. Over the year, I worked on things ranging from metadata entry to drafting a digital humanities strategic plan for FSU Libraries, to digital publishing. Here’s some of the highlights.
Internship, Fall 2015:
This semester was dedicated to learning how librarians worked in the research process of universities, and understanding the state of digital scholarship as a discipline.
- Il Secolo metadata – The very first week working for Micah and DRS, I was assigned some “proper intern work” – cleaning up metadata records for a 19th-Century Italian newspaper housed in the FSU Digital Library. This was my first introduction to digital library systems, and despite Micah’s efforts to relegate me to Data Entry Intern, I used the experience to explore the guts of Islandora.
- Digital Bibliography project – This started out as a simple faculty request for the management and display of a bibliography online. After researching possibilities, I developed a (semi-)functional prototype, which can be viewed here. The faculty member liked it enough to pitch it to DRS’s upcoming Project Enhancement Network and Incubator (PEN and Inc.) program for full development support!
- Digital Scholarship Symposium!
- I learned the ropes of Voyant, a text-analysis tool, and assisted Abby Scheel, Humanities Librarian, with her workshop. We played with different visualizations from the North American Slave Narratives
- I was also introduced to Tableau and CartoDB, providing cool entry points into mapping that I now have in my quiver of tools.
- I took the initiative to develop a Digital Humanities Strategic Plan, outlining how FSU Libraries might develop services and support infrastructure to support non-STEM researchers doing digital scholarship. After interviewing a handful of faculty who were already heavily involved with digital humanities research at FSU on what they would like to see, I drafted and presented this plan, which is under consideration and has been (I’m told) used as the framework for future support plans! This was a huge experience for me.
- Acronym/abbreviation research: I had briefly heard of some of the alphabet-soup of topics DRS worked with on a daily-basis, but to go from recognizing to being able to knowledgeably talk about OA, TEI, DataViz, GIS, CC, Digischol, DH, TaDiRah, DiRT, R, and all the names of tools and processes that are thrown around took a goodly chunk of the first few months of my time. There is a serious reliance on jargon in the office, and it took a while to really be able to keep up! Micah also likes to coin business-y sounding new buzzwords for fun, which is weird – “coordinize”? “re:mergence”? I fought him on this a few times. It was a good time.
- Scalar, Zotero, Slack – Among the dozens of other tools I was exposed to I spent the most with these three. Scalar and Zotero because of the bibliography project I was working on, but Slack was a great way to communicate with the team outside of my in-person hours and really helped with feeling like I was connected to and a part of the office. Being able to send emoji in “work” correspondence was also pretty cool.
- Signed on for Internship Part 2! As December rapidly approached, I realized I wasn’t where I wanted to be with the completion of some of the projects I had spearheaded, and I asked the team if they wouldn’t mind if I stuck around for just a little bit longer so I could wrap things up as best I could. This set the stage for Internship, Part Deux.
Internship, Part 2: The Internship Strikes Back
- Percolator – When Sarah Stanley joined the team in November, she brought a ton of cool ideas to the table, one of which was having an open space and time for anyone to come discuss digital projects. I loved the idea because it put the library center stage as a valuable part of the scholarship process (something that I’ve come to feel very strongly about being a Good Thing To Do). We started hosting the Percolator every Wednesday 3p-5p, and this is where I spent most of my intern time. These meetings were absolutely amazing introductions to new ideas and tools. For example, I:
- learned how to use TEI to encode a 19th Century cookbook with elements that could be referenced and queried in corpus searches;
- looked at ArcGIS projects to see socio-economical divides in food availability between greengrocers and fast-food shops
- Publish or Perish: Conversations on Academic Publishing and the Institute on Copyright in Higher Education – in February, I took a few days off from my real job to attend these fantastic events hosted by the library on the topics of publishing and copyright. I can’t say nearly enough about how amazing the speakers were or how much I learned from taking part in the conversations that happened those two days, but I am incredibly grateful I got the chance to attend and learn so much.
- Proposed and presented at THATCamp Florida! – This was one of the cooler experiences I had in my internship year, for a bunch of different reasons. The biggest is that this “un-conference” was really the first time I felt like I knew enough about digital scholarship stuff to talk knowledgeably about it and not feel like I was faking it. My proposed session there was supposed to be a conversation about funding and research and libraries, but it ended up with me just talking about how things work at FSU (as I understand them). This was fine and well and good, and still a fantastic experience, but my main takeaways from this conference came from outside my own session.
- Proposed (and was accepted to present) a paper at the 2016 Keystone Digital Humanities conference with Sarah! – Because my traditional academic background (Classics/Latin) values papers and conferences so highly, I feel like this acceptance (and the super awesome paper we’re going to present) is the encapsulation of my accomplishments in the internship. Talking about funding and digital humanities at a conference with notable and highly-respected players in the digital humanities fields is beyond anything I could have expected when I joined the team last summer, and I think it’s rather fitting that it’ll be a year after my original R&D meeting. I am beyond excited about this opportunity, and I can’t wait until June!