a sign of the times in academic publishing

Catching up on end of the year email, I came across the following notice in the UC Berkeley Department of History’s Fall 2012 newsletter (pdf):

Geoffrey Koziol’s new book was published by Brepols: The Politics of Memory and Identity in Carolingian Royal Diplomas (2012). Thanks to subventions from the History Department and UC Berkeley’s Committee on Research, the price is a moderate $100, which may seem like a lot, but European academic presses are increasingly pricing books at $200, beyond the ability of even mid-sized college libraries to afford them. It is becoming very difficult to publish innovative scholarship of any length and complexity. Flexible sources of funding are sorely needed.

Incidentally, although I eventually found myself specializing in American history, Koziol’s undergraduate survey course on medieval Europe played a big part in my decision to major in history. You can find out more about his book here.

what would it take for historians to be able to share archival material?

Recently, a friend of mine asked if I had any thoughts on why historians tend not to do much sharing of archival materials – that is, of materials that they’ve collected in the course of their research. I said I didn’t really know why, but I could speculate, and since speculation is one of the reasons blogs exist, I thought it would be worth writing up a post about it. The conversation also got me thinking in a more positive direction: let’s say historians do start sharing more archival material, what forms could that sharing take? What kind of infrastructure would they need? Is it something we could start building now?

But first, what do we mean by sharing archival material? Let’s say you’re a historian and you’re on a research trip. You request material and some of it turns out to be relevant to your research, some not so much. (And some of it is just too interesting to pass up.) You take notes, maybe even make some full transcriptions, but there are almost always going to be some materials that you decide you want to copy. Maybe you want to be able to see just how the document was laid out, maybe you want exact wording but don’t have time to transcribe it, or maybe you simply don’t have enough time to read the documents during your visit, but you can take lots of photographs quickly. Whatever the reason, odds are you’re going to come home and find yourself with lots of copies of archival material from the trip. This is the kind of material we were talking about sharing.

A second preliminary point: historians do share. Maybe not everyone, maybe not all the time, and almost certainly not everything, but I don’t want to give the impression that historians solely collect and hoard documents and then guard their hoards. However, I think much of the sharing that goes on stops short of sharing actual (copies of) material. You’ll see historians talk to each other about what they’ve found; give each other advice about what to expect when working at a particular place or on a particular collection; or even publish articles in historical journals discussing where to find sources for various topics or, conversely, what kind of topics could be researched using  particular collections. All of this certainly counts as sharing, but it may not extend to the sharing of archival material to go along with information about archival material. That said, there is still a tradition of formally publishing selected primary sources, whether in journals or as edited book collections. This may consist of archival material (in the sense that archivists understand by the word “archives“) and previously published material.

I am deep in the realms of speculation here, but I suspect that when historians do share archival material – outside of formal publication – it tends to be stuff they are not actively using. This could be stuff they’re done with, or it could be “incidental finds”: stuff they’ve collected that turns out not to fit in with their research, but which they know may be relevant to another researcher (“I was looking through the papers of so-and-so and came across these letters, thought you’d be interested so I’m passing them along”). Sharing those kinds of finds is, not so incidentally, one of the reasons I went into the archives/library fields: I love playing matchmaker between sources and researchers.

These kinds of sharing – whether of information, materials, published research – shows the scholarly community at its best, so why don’t more historians do more sharing of archival materials (assuming that it is accurate to say that many don’t)?

Here are my guesses:

1. It hasn’t become standard practice, so it’s not something that occurs to everyone while they’re doing research. That may be a tautological explanation, but I really think this is something that could be self-reinforcing: if more historians were already sharing material, then you’d probably see more sharing. There’d be more models for it.

2. Worries about being “scooped.” Releasing their raw materials, so to speak, might make it possible for someone else to use the material they collected and then publish first. Depending on context, this might be a real concern, but in other cases the two historians might end up taking very different interpretative approaches: priority in publishing isn’t quite as important in history as in some other fields. Also, this shouldn’t really be a big concern once the historian who collected the material has published.

3. This is closely related to point 2: historians still generally get the most credit for traditional publications. This seems to be changing, but the incentives have long been weighted towards publishing and disseminating finished research, rather than the materials on which research could be based.

4. A “do your own research” ethic. Maybe I’m being uncharitable here, but I think many people who are more than willing to talk about material they’ve found could still be reluctant to share the copies they’ve made themselves, especially if it took a lot of time, effort, and money to collect them. I suspect people are more willing to share when they’ve built up trust with their colleagues and when there’s some reciprocity involved. This also ties in to the point about credit and incentives.

5. Permissions/rights. In my experience researching the 19th and 20th century US, it’s pretty uncommon to come across truly unrestricted archival materials. In the days when I primarily requested photocopies, the vast majority of those copies arrived with stamps on them saying that they were for personal research use only and that further permission would be required if I wanted to use them for any other purpose. Even when taking digital photos myself, there’s usually an agreement somewhere that puts similar restrictions on those images. Furthermore, copyright in unpublished materials can be a really complicated area, especially if it’s not something you’ve been trained to navigate. The physical owner of a letter, for instance, might not have the right to publish that letter, much less grant permission to do so to someone else.

6. Lack of infrastructure. Let’s say you have material to share and you have the right to share it (or are just willing to take risks): how are you going to do that? You could e-mail a few files or send out paper copies in an envelope – if there even is a paper form to your records – but what if you have a hundred or more files/images/pages?  And how are you going to handle the descriptive context and content that goes with the material? You usually need citation and location information, at the very least, if you’re going to authenticate the materials as being legitimate copies of the originals. You should have this information, if your intent was to collect things in a way that would make it possible to cite them later, but it’s still something to watch out for.

I think that last point is really key: once you’ve gotten past all of the other objections, there’s still the problem of coming up with an effective way to share material that the average historian could actually carry out without too much trouble. Not everyone has the time/background/resources to just go out and build  their own digital repository/collection/archive (I’m sidestepping the terminology question here).

What are the possibilities? I can think of a few:

1. Personal networks. I guess you could call this peer-to-peer sharing, if you like putting everything into technology terms. This is basically scholars sharing material with each other at an individual level. This can be done through the mail or in person – I assume that for most of the history of history, when scholars shared material this is how they did it – or through e-mail or other file transfer methods.

Advantages: It’s pretty simple and doesn’t really require historians to do anything they don’t already know how to do, unless they’re trying some complicated file transfer method. It also happens to be a method that historians are already using.

Challenges: It’s not public, for one thing. So it’s not quite open sharing. (For some I’m sure that’s a feature, not a bug.) It also might not scale very well as the volume of material that gets transferred grows. Plus there’s a potential problem of losing track of essential metadata when sending around batches of image files: you have to be careful not to end up with directories full of filenames like DSCG1128 with no clear indication of what archives and what collections those files are supposed to be linked to. That latter issue is something everyone has to face when managing image and text collections, but coordinating among many different individuals is likely to be more difficult than coordinating among institutions or groups.

2. Historian-hosted websites: Historians could set up their own websites to host the material they want to share.

Advantages: This could be open to any visitor, though of course the site owner could also employ password protection. It also would maintain the connection between the historian who collected the material and the material. If the historian were to change affiliation, as often happens in the academic world, the site could “move” with them fairly easily (in the sense of being updated to reflect the new affiliation).

Challenges: It requires historians to know how to host a site and manage an image and/or text collection, or at least to have access to someone with that knowledge. (Note: I’m not saying these are bad skills to have, just that you can’t assume many historians have them right now.) This actually might not be too difficult, depending on the platform being used. I didn’t have to know much about the internal workings of wordpress to be able to set up this blog, but finding a pre-packaged archival system that’s easy for a regular user to set up and maintain is a bit trickier. WordPress is comparatively simple.

Also, this could lead to material being distributed across dozens of personal sites, which could make it difficult to find things. As with option 1, coordinating among lots of individuals can be difficult. And what if two or more historians have materials they copied out of the same collection? Ideally, that would get linked up.

3. Institutional hosting based on the researcher’s affiliation: The researcher’s home institution supports and hosts the materials.

Advantages: As in the historian-hosted model, in this model the materials could be placed on the open web. Ideally, institutional support would mean that the institution’s archivists, librarians and IT staff would all collaborate, reducing the burden on any one individual. Institutions might be able to work the archival materials into existing infrastructure, such as a digital repository if they have one up and running.

Challenges: As mentioned above, academics often change affiliation. What happens to the material then? Does it become part of the institution’s holdings or will it be transferred? Or will one copy go with the historian and one stay with the institution? And will the new institution want to host material that’s been/being hosted elsewhere?

Another issue that could come up is the difference between records that the historian produces – such as notes, drafts, teaching materials, and other personal papers – and those that the historian collects – such as archival and other source material, much of which will be copies of materials held at other institutions. The historian’s home institution might be very interested in keeping the (or “their”) historian’s personal papers while at the same time being reluctant to keep copies of source materials taken from elsewhere.

There would also still be a need for coordination to make it possible for researchers to search across different institutions’ holdings. This is essentially the same problem the historian-hosted model would face, but at least there would be fewer institutional sites and many institutions already have a history of sharing metadata.

One additional note: the Valley of the Shadow project, which I think has been both successful and influential, might fit this model. William G. Thomas III and Edward Ayers have since moved to other institutions, but the site remains at the University of Virginia.

4. Archival institution hosting: in this model, the institution that holds the original also makes the digital copy available.

Advantages: lots of archives already have ongoing digitization projects. As holders of the originals, they are in the best position to authenticate the material they put on the web. They are also in the best position to maintain the links between individual items and their archival context – that is, where the items fit in within the larger context of the collection and perhaps of the institution as a whole. Duplication of copies among researchers shouldn’t be a problem, as the originals (or maybe we should call them “original copies”?) will be available at the archives’ site.

Challenges: Historians’ choices of what to copy and archivists’ choices of what to copy are likely to diverge quite often. Historians are probably most interested in individual items or ranges of material within collections. This can make perfect sense in the context of a research program, but to an outside observer it might look rather haphazard and partial and may not make the best focus for a digitization project. Archivists have to be concerned with their own institutional priorities and in many archives historians may not even be the primary users. That said, there are surely many opportunities to collaborate  on projects and I’m sure that historians will find many archives’ own digitization projects useful for their research.

As for the kind of sharing I’ve been talking about in this post, there are some archives that employ “scan-on-demand” policies in which material is scanned as it’s requested. I don’t know how many of these scans get posted to the open web – in some cases, the scanning simply makes it less costly to produce additional copies in the future – but it could be one way to facilitate sharing among historians. I think some archives are also experimenting with programs where historians can take digital photographs in the course of their own research and then have the option of giving the archives a copy of those photos (or some subset of them) to then be put on the web. But I’m not sure if that’s actually happening, or I’ve just read about it as a proposal.

5. Some other kind of consortial or centralized hosting. Could this be something like an arxiv for collected archival material? In theory, it would be possible to create something like that, but getting it off the ground could be difficult, as it would have to find a home somewhere. Maybe this is a possibility that the Digital Public Library of America could look into. Many public libraries have history rooms, after all.

Those are the five main models I could come up with off the top of my head. I think you can probably find actual examples of the first four, although I’m not sure I’ve come across a personal website hosting copies of archival material. My takeaways from this exercise:

  1. Outside of mailing packages from place to place, we’re really talking about digitized or digital materials here and the web remains the most open way to share them.
  2. To make this kind of sharing more open and more routine, historians need to have relatively accessible ways to transfer their material into a system for sharing.
  3. In the near future, I think we’ll see the first and third model most often. That is, historians will continue to share with colleagues and peers at an individual level, while larger-scale sharing will come mostly in the form of projects. Projects make it possible to pool resources and seem to align best with scholarly incentives.
  4. As for model 4, I think archives-driven projects will continue to be much more common within archives than historian-driven projects, for obvious reasons. However, the boundaries between models 3 and 4 are pretty artificial, as archives and research institutions already do a lot of collaboration – not to mention the fact that many universities have archives and special collections on campus. So in some ways the boundaries are an artifact of the way I’ve set up the post.
  5. There need to be ways to share and combine metadata so that people can search and browse across sites and collections. This is already true and people are working on it.
  6. There’s no escaping permissions and rights questions. Another point already in effect.
  7. A lot of what I’ve written here applies to any type of researcher who uses archives. I’ve focused on historians because that’s the context of my original conversation but I don’t mean to exclude other researcher groups from the larger discussion.
  8. There are a lot of issues I haven’t even gotten into, such as bulk access to archival material.
  9. Sometimes you just have to see the originals for yourself. Nothing wrong with that, especially if you like visiting archives and you’ve got time and support.