Lost Doctor Who episode recovered - but how will we 'recover' websites in the future?

 by Martin Belam, 16 January 2004

The BBC announced yesterday that they had recovered one of the missing episodes of Doctor Who from the 60's - "Day of Armageddon" from the William Hartnell story known as "The Daleks Master Plan". It has been handed to the restoration team for "treatment", which hopefully means Vidfire & DVD release. There now remain 108 Doctor Who episodes that are lost. The recovery rate is low enough (erm..well there is this one, and "The Lion" from 'The Crusade' in the last 5 years) to suspect that every find will be the last.

Many other people have devoted many hours to write much better essays about how the BBC came to junk a load of its archived recordings during the late 60's and early 70's. Suffice it to say that at the time they couldn't see any commercial value in retaining black & white recordings with no overseas sales value. There was no video or DVD retail market, plus the BBC was operating at a time when in the UK there were three television channels, none of which operated on a 24 hour basis. [Just pause and think for a minute about that concept]. With those few broadcast hours to fill it wasn't necessary to repeat old shows - "television nostalgia" was a couple of decades away.

The thing that nags at the back of my brain is that, on the whole, much the same thing is happening on the internet. Of course we have the Internet Archive & Wayback Machine, and sometimes the Google cache to recover lost sites, but they are like the accidental tourists of archiving the web. It seems to me that the scientific community has generally been good at archiving their material, but that commercial and public service websites alike have been pretty slack in preserving or future-proofing their internet material.

The Wayback Machine has a few snapshots of how the BBC site looked, and for ages we had on the live servers a site for an Andrew Neil programme from back in 1997 (I think - I'm happy to be corrected) which was apparently the first BBC programme support site. Much to our amusement now it pretty much looked like the average personal website page from the time, and had named credits at the bottom of the page for the coding and development. Last time I looked it wasn't live anymore, which I thought was was a shame.

And I know this problem first-hand. One of the presentations I have done internally at the BBC is the history of how the search service developed across the site. There are a couple of iterations of search on the site which I could not find preserved. To make the presentation complete I had to mock-up how I remembered they looked using the elements available to me. The BBC has a Treasure Hunt amnesty for discarded broadcast material which netted the recovered doctor who episode. Perhaps we should have one for anyone who has preserved screenshots of the web site...

I remember in the olden days, one of the things you could lose your trusted user status for was taking a site down without telling ops so it could be backed up - as much for legal reasons than anything else.

So technically all that stuff should still be available on CD/dat/dlt/punched card somewhere - it's just a question of finding who did it, where they put it, and whether threre is still a way of reading what they put it onto...

There's still a couple of very old sites on the server, for example http://www.bbc.co.uk/election97/. You just have to know where to look.

These days information and archives aren't the problem - they'd love to archive things more regularly. The problem tends to be the production teams, who have a nasty habit of deleting things without telling anyone.

But you knew that already.

Funnily enough, when BBC News switched to designing for 800x600, I sent in an e-mail suggesting that they update the archive to use the new design (assuming their content management system made this possible). They didn't however, and so you can still see old stories as they way they would have looked at the time.

This page shows a fairly early design:
http://news.bbc.co.uk/1/hi/uk/politics/42626.stm (though two server side includes fail).

Curiously, this page of an earlier date shows a newer design:

From what I understand, their content management system doesn't easily allow them to move articles into the new design - although they are working on moving older articles over to the new system as and when they can.

