Tag Archives: electronic records management

Awkward Adventures in Digital Forensics

So, this happened at work yesterday:

Awkward Seal meets Digital Forensics

Awkward Seal meets Digital Forensics

Yep, that happened.

I should probably back up:

Libraries and archives have been long familiar with all manner of ways to handle, preserve, provide access to, and generally “deal with” paper- (and film-) based materials (letters, diaries, newspapers, photographs, microfilm, etc.)—-you know, the stuff you can hold IN YOUR HANDS and see what it is—-and even, to a reasonable extent stuff you can’t see what it is just by looking at it (audio/video tapes?).

And then there’s all this “new” digital stuff. I say “new” in quotation marks because, hey, it’s really not THAT new. But it’s a lot newer than, say, paper. But it’s new enough. New enough that for many years, archivists have been sort of…shall we just say, not dealing with it quite to the extent that one might have hoped?

Digital stuff — floppy disks, CDs, DVDs, USB flash drives, hard drives, etc. (not to mention your online life, like webmail and social media) — actually takes a lot more coddling than the paper stuff. Did you ever go up to your grandmother’s attic or your father’s garage and stumble onto a box of neat paper stuff from like 50+ years ago? And you rummaged through it, awed by all the neat things you either never saw before or had completely forgotten about?  Who hasn’t done that, right?

Well, if in 50 years, you stumble onto a box of today’s records, you might be out of luck because there’s a good chance those records will be stored on some type of digital media. Yep, imagine you just found a box of CDs, or better yet floppy disks. Imagine a box of floppy disks in 50 years. You have enough trouble finding the drive you need to read those NOW, am I right?

USB floppy disk drive

USB floppy disk drives are about $15 on Amazon – if you have floppies, get one and start your migration now, while you still can!

OK, so digital media present a variety of challenges to archivists. It’s actually pretty fragile (keep away from light, heat, and in some cases magnets); it’s dependent on technology/hardware to read it (not just your eyes or a magnifying glass); and it can’t survive by accident like a box of papers could. And those are just some of the problems of keeping the data “alive.” Not to mention figuring out how to arrange and describe the files or to provide access to them.

(Here’s a tip: Writing the equivalent of “oh there’s also 1 floppy disk” somewhere in your finding aid probably isn’t going to be super helpful. What’s on it? Do you even know? Can you trust the label—if there even is one? And if it’s on floppy disk, how are you going to let patrons use it? Do you have a floppy disk drive available? And how are you going to make sure that nobody accidentally overwrites the data? Oh and what if the floppy disk spontaneously stops working at some point — or already has — and who hasn’t experienced that?—no comments from those of you too young to even remember floppy disks!— Man those transparent neon ones were the worst for failing at inopportune times—probably due to light damage, I know now!)

OK so there are all these…problems. And a lot of archives have been sort of sweeping this problem under the rug for a while now. Well, the research about how to deal with these problems seems to have been growing rather exponentially over the past several years, and so a lot of us are finally getting our digital act together and attempting to figure out what to do…including the archives where I work.

My co-worker Toni (as the preservation archivist) and I (as the digital initiatives archivist) have been charged with learning how to handle our collections’ digital preservation needs. We’ve been attending “digital preservation” and “electronic records” workshops (SAA’s Digital Forensics for Archivists 2-day workshop was fantastic); reading up on all sorts of things (highly recommend OCLC’s Demystifying Born Digital Reports as a starting point for anyone interested in this topic- they’re simple & to the point, but great); and downloading & experimenting (on test data sets/disks only) with free & trial software (such as FTK Imager). We have learned about using write-blockers and creating disk images to capture the entire contents of a piece of media without inadvertently changing it or missing anything.

Which brings us to what happened yesterday—and another lesson in digital stuff (and this lesson is for everyone, not just archivists).

So we were experimenting with FTK Imager yesterday afternoon, and we popped in a floppy disk I had brought from home. It had a blank adhesive label on it (on which I later wrote my name once I discovered the contents), and we had used Windows Explorer to drag/drop two boring Microsoft Office documents onto it so we were sure there would be something to image.

Here’s what the contents of that floppy disk looked like to Microsoft Windows (2 files):

Floppy disk contents viewed in Windows Explorer

Floppy disk contents viewed in Windows Explorer

Then, we used FTK Imager to create a disk image, capturing ALLLLLLLL of the contents of that disk——including remnants of any deleted files that were never overwritten. That’s right, I said deleted files.

So when we looked at the disk contents in FTK Imager, here’s what we saw (and that’s about the time my jaw dropped and I started with the nervous “omigod-blast-from-the-past-in-a-bad-way” laughter as Toni looked over my shoulder probably wondering if I had gone mad):

Floppy disk contents viewed in FTK Imager

Floppy disk contents viewed in FTK Imager

Um yeah, that’s more than the 2 files I was expecting. Apparently, this was a disk that I DID use…in 2002…and still had lying around. I recognized (and was immediately mortified by the presence of) a diary entry from an ex-boyfriend, nor was I thrilled about what those chat logs from AOL Instant Messenger (hey remember that?) might contain. I also recognized other innocuous MS Office documents: Excel files containing lists of all my classes & grades, Word documents with translations for Latin class (such as the copy of Tacitus’s Annales you can see selected in the image—notice that you can see the hex as well as the text in the window underneath), and other things that looked like school stuff. (We actually exported and opened some of these files I deemed definitely-not-embarrassing. — Oh, and I have since, in the privacy of my own home, looked at that diary entry and the chat logs—-all totally harmless, but who doesn’t have things from sophomore year of college that they’d rather not revisit in front of co-workers?)

We actually were able to learn some things during this experiment, some of which actually pertained to what we were trying to do, but the most salient of these lessons (for me at least) was this:

The IT folks are not just making things up when they tell you that your files are not really gone simply because you hit delete and you cannot “see” them in your operating system anymore. The data is still there unless it is overwritten.

All you did was delete the pointer to that data, cluing your drive in that it can reuse that space if it wants to. If you tore the index pages out of the back of a book, does the content of the book cease to exist? Nope. Sort of like that. If you are interested in a technical explanation of what’s going on when you delete files and why they’re not really gone, I highly recommend this blog post: How-To Geek Explains: Why Deleted Files Can Be Recovered and How You Can Prevent It.

But the bottom line is that when you delete a file, it’s not really gone. I knew this. I KNEW this. But knowing it on the level of “I read it in a book and I’ve heard knowledgeable people say it also,” and knowing it on the level of “omigod I just saw the proof” are not the same. (This must be why they make you do lab experiments in chem class…)

And omigod I just saw the proof. And that was WAY. TOO. EASY.

So. HTG (How-To Geek) suggests some ways to actually truly erase data if/when you need to. But personally, if I had something I wanted to never see the light of…well, a screen…again EVER, then I would only be satisfied with the physical destruction of the media (better copy anything you actually DO want onto a new drive first though). So, to conclude, for your viewing enjoyment, here are some YouTube videos of people physically destroying data on:

…hard drives (you’re going to need a hammer to bust up the platters inside)…

…floppy disks (some of the videos just crinkled them but I wouldn’t trust anything that doesn’t involve cutting up that magnetic disk)…

…and CDs (oh there are tons for this one—who hasn’t tried the microwave one? the melting one is fun—and of course there’s always just breaking it—but one guy even claims to have 101 ways)…

OK, that’s enough fun for now. Hopefully I was able to turn this slightly embarrassing work story into a teachable moment! And yes, I have taken that disk home with me and it will be getting destroyed…

Carry on, folks, and listen to your IT guys!

Advertisements

Preserving Our Cultural Heritage Conference at Indiana University, presentations, part 1

Last weekend, I attended the Preserving Our Cultural Heritage conference for grad students/new archivists conference at Indiana University in Bloomington. There were many interesting papers and presentations, and I would like to touch briefly on each of the ones I attended. I’m not necessarily going to recreate or even attempt to “summarize” the presentations, but I’ll tell a little bit about what my “take away” from the presentation was.

*****

Stacie Williams (Simmons College) presented “The Rainbow Connection and the Archives: Using Digital Preservation to Link the Jim Henson Company’s Past, Present, and Future.” This presentation centered on digitization activities at the Jim Henson Company Archives (located in Queens, NY), which is a private corporate archives. Documents being digitized included early sketches of muppets, most of which are signed and dated with the artist’s name – which can help with questions of intellectual property. These images can also be used by conservators who need references to the construction of existing muppets, so they can clean and maintain them. 

I learned some interesting things about Jim Henson – such as that he began making muppets for advertising purposes. I also had not given much thought to how muppets are constructed or what they are made of – but there are materials in this archive that cover all of that, as well as the evolution of different processes of doing so.

*****

Jason Groth (IU-Bloomington) read his paper “Migration Thinking: Dietrich Schuller, Albrecht Hafner, and the Inception of the Digital Mass Storage System for Sound Archives.” His topic centered around an important debate in audio preservation: whether preserving the object (e.g., the tape) or the actual content (i.e., the sounds) is paramount.

This is a big problem for archivists in general these days – with audio, video, and electronic files of all types. If you focus your effort more on the object itself, then you are stuck being technologically dependent on old equipment. (For instance, if the sound only exists on an eight-track tape, you need an eight-track player. Or, if you have a file on a floppy disk, you need a floppy disk drive and the software and a computer capable of running the software.) If you migrate to a newer technology, you save the content (hopefully, assuming you did it right and didn’t lose any quality in the process!), but you are still setting yourself up to be technologically dependent, just on a newer technology. You’ll be doing the same thing again in a few years, probably. It’s an endless cycle…

*****

Dorothy Chalk (IU-Bloomington) gave a presentation “Preserving Growth, Preserving Decay: Born-Digital Materials That Will Not Sit Still.” I didn’t know what to think with a title like that! But after she got going, it made sense. She focused mainly on a born-digital poem called “Agrippa” by William Gibson. The poem was distributed on floppy disks that were meant to self-destruct (overwrite themselves) upon being viewed once. The discs were also distributed in a book whose ink was supposed to fade over time also – even more so than regular books and on purpose! It was a very odd and interesting idea. Despite Gibson’s attempts to create something that would disappear almost immediately, bootleg copies of the poem made it to the web, and it has grown from there.

In Chalk’s opinion, libraries ought to be documenting this web following as well, which I suppose is a good idea in theory, but I think it might set an unrealistic precedent for libraries collecting and documenting web communities related to other works. (Or perhaps she’s right, and this is a perfectly reasonable expectation but only seems unattainable now because we don’t have any processes set up for actually doing it!)

*****

Eric Holt (University Archivist at Indiana State U.) gave a presentation and demonstration entitled: “Open-Source Electronic Recordkeeping: A Review of Alfresco Enterprise Content Management System.” Alfresco is a piece of open-source electronic recordkeeping software that is certified by the Department of Defense 5015.2 recordkeeping standard but is not in a proprietary format and is less expensive to implement than some other systems. This system had some neat features, and it looked pretty easy to use based on the demonstration.

One thing Mr. Holt said during his presentation that really struck home with me was: He has an easier time providing people with information from the 1960s and 1970s than with more recent information (say, in the last 10 years). This is so true. Now that so many things are in electronic format, they are in such danger of disappearing. It’s too easy for people to click “delete” on items that are old or seem unimportant (or to not store things on the server like you’re supposed to and then lose data in an individual hard drive crash).

Is the answer to insist that people keep printing everything out so that their files can eventually make it into the archives?  Well, ye—-I mean, no, of course not.* We’re going to have to keep working hard to find better ways of preserving (really preserving – so that they are still accessible in 10 years) electronic records. Why? Well, because (a) that’s how many things come to the archives these days (if they make it at all – see aforementioned hard drive crash scenario!), and (b) some records only really exist properly in a digital environment (e.g., interactive web sites – or heck, any web site with links for that matter; Flash animations; even moderately fancy PowerPoints).

* My initial near-slip of saying “yes” to the printing everything out bit is due to my own personal perservation activities. I still print everything out. I just feel safer that way. Anything I want to have a copy of in 10 (or 50) years, I print it out. This also applies to stuff that falls into the “omigod I would be really screwed if I lost that” category – like tax-related documents!

Okay… I think that’s enough for today. I’ll pick up with the other 5 presentations later.