Welcome

Photos of Larryblakeley
http://www.royblakeley.name/larry_blakeley/larryblakeley_photos_jpeg.htm

(Contact Info: larry at larryblakeley dot com)

Important Note: You will need to click this icon to download the free needed to view most of the images on this Web site - just a couple of clicks and you're "good to go."

I manage this Web site and the following Web sites: Leslie (Blakeley) Adkins - my oldest daughter

Lori Ann Blakeley (June 20, 1985 - May 4, 2005) - my middle daughter

Evan Blakeley- my youngest child

Way back when
http://www.newscientist.com/opinion/opinterview.jsp?id=ns23701
Photo by Anne Hamersky

Brewster Kahle


Imagine if your very first Web pages or some furious, ill-written late-night postings came back to haunt you years later. Well, now they can. The Wayback Machine gives you access to the Internet Archive, which has taken an almost-complete snapshot of the World Wide Web every 60 days since 1996 - that's about 2 billion pages. This archive is now a vast record, storing pages others have censored, deleted or simply forgotten to maintain.

Paul Marks talked to Wayback's inventor, Brewster Kahle
 

Why on earth are you doing this?

Websites are like shifting sands. The average life of a Web page is 100 days. After that either it's changed or it disappears. So our intellectual society is built on sand. You can't hold people accountable if, say, the promises posted on the Web by politicians are not available after the election. And key academic papers can become unavailable if a researcher leaves a university and their website is deleted. We've found that many websites of publicly funded projects disappear within a year. So as taxpayers we are investing in research projects, but we're not investing in a Web library that organises them and gives future generations access. Our Wayback Machine is the first attempt to do this.

But a lot of the stuff on the Web is unreliable. What use are such pages to posterity?

The whole point of comprehensive library collections is that you can't tell in advance what will be important. The Web is the people's medium, it's not elitist. Anyone can publish there, so you've got the good, the bad, the ugly, the profane. It's just us, that's the amazing thing. For instance, a lot of libraries are now used for genealogical work. What would you give for a video clip of your great-grandmother? I'd give a lot. I may not watch it very often, but I'd love some way of knowing who she was.

After 9/11, weren't some sites ordered off the archive - such as those with US nuclear power plant information?

Yes, there were things that had to be taken off, but by and large I feel that we have the basic ingredients of a great digital library. It applies at other levels too, because there are a lot of personal Web pages, and they might be linked to a picture of your wife - well, in a few years it may be your ex-wife. If there are any requests from the original authors to not have things in the archive, we remove them.

How can you be sure that you're not missing something important?

I guarantee that in the future researchers will curse us for having missed something absolutely critical. But only people using the archive can tell us about mistakes in what we collect. There is a cheaper alternative concept, called "dark archiving," which means that we should just archive things and not give people access to them. But preservation without access is dangerous - there's no way of reviewing what's in there.

What about pay sites?

We don't record pay sites or sites that have passwords.

But if you don't, you can end up not getting both sides of a story. Is the world of information splitting into two - the free and the paid-for?

Maybe. But there are already archives of commercial information. It's like a traditional library: you pay for access or you schlep down to a building to look at it. It's the old world, it's tired. Our archive is the people's medium, the wired way, and you can use it wherever you are. How many subscribers does LexisNexis have? How many people use Google? Which would you rather publish in?

While you have preprints of some scientific papers, you don't have the subscriber-only journal sites with the final, refereed versions?

We haven't really dealt with the academic world. They are in many ways handling their own stuff well enough. I praise the print publishers for the way they've preserved their output; compared with the music and video people, they rock. But publishing houses don't last forever, and their interest in maintaining materials that aren't beneficial commercially is limited. That's what the public funding of libraries is for. The public library system in the US gets $25 billion a year. That's a lot of money - $5 to $6 billion of that goes to publishers, paying for books. We could use a little of that money to do a lot better job of trying to put the classics - the best works of humankind - within reach of every child at home via my archive or something like it.

How big is the archive now?

Over 100 terabytes. As plain text in book form, that'd be over 3000 miles of shelf space. And we're adding 10 terabytes a month. It costs $40,000 a month just to buy new storage. Next year it will cost half that for the same amount of storage but by then there will be twice as much to record, or more.

What does a Wayback Machine look like?

It's 150-odd standard PC cases, with four drives in each, standing on end on racks. So they look a bit like a bookshelf - which is deliberate.

And where is the Wayback archive - physically?

It's now in three places, two in the San Francisco Bay Area and one at the new Library of Alexandria in Egypt. If you ask someone, "What do you know about the Great Library of Alexandria?" they mostly say, "Isn't that the one that fried?" So don't just have one copy. Take special care of collections that are really important to the definition of cultures.

Will there be more Wayback Machines?

We won't be the only player in town: we'd like to be part of a web of libraries and archives that all interoperate: you go to "the library" and are automatically forwarded to the right "place." I was in Glasgow recently, at the International Federation of Library Associations meeting and from what the librarians were saying, I think the question is not whether we are going to save our digital heritage, but how. And, to me, that's about access. Say the British Library collects the UK's Web pages, do they just make them available inside their library? Right now, people look for stuff on the Web, and if it's not on the Web, it doesn't exist for them. So trying to get the best works we have to offer on the Web is important from a library perspective.

And will there be subject collections as well as national ones?

We've already put together a collection of 11 September coverage on the Web and TV. I think it's very important to see things through each other's eyes, across national divides. In the US, there was a wake-up call on 11 September when people started looking at magazines and newspapers in Europe, and Arab papers to try to get other perspectives. And these other people were trying to figure out if we were going to bomb them.

I'm very encouraged by what we saw in the user logs. People were looking to see whether Palestinians were really cheering in the street, as reported by CNN. Well, there were a few, but they weren't at all representative. I don't think that you want to see that through your own media, you want to see it through theirs.

Where did the idea for a universal archive come from?

Technologists have promised the digital library for decades. In 1945, Vannevar Bush, who was technology adviser to several US presidents, wrote an article in The Atlantic magazine outlining how computers might one day augment libraries. Then in 1960, a young graduate called Ted Nelson got sidetracked from his masters degree in sociology at Harvard into writing text-retrieval software. He published his ideas, and coined the term "hypertext" in 1965. So in many ways the digital library is long overdue.

Where did you come in?

I got involved with computers really early. A friend in high school built a computer in transistor logic, before the days of integrated circuits. It was a great activity for a reclusive kind of kid from the New York 'burbs of Westchester county. So I went on to do computer science at MIT, under Marvin Minsky and Danny Hillis. I got interested in encryption, and digital libraries.

After graduating in 1982, I helped Danny start a company called Thinking Machines, which built fast parallel computers. We built one of the first search engines, which indexed every word in hundreds of newspapers and magazines for the Dow Jones News Service.

Were they well received?

After we built these big computers, I really thought the Sun would come up a different colour. I thought the world would now be enlightened because we could get out all this amazing information. But it turned out that most information was still being passed around on paper. So I invented a system called WAIS - the Wide Area Information Server - which was the first Internet publishing system. It faded pretty fast, because the Gopher information index and then the World Wide Web and the Mosaic browser came along, and they were just much more usable and better systems.

Archiving the Web is an enormous task. How did you get to raise the cash?

I started a commercial website-cataloguing company at the same time, called Alexa Internet. Alexa is a free service bundled into the browsers - it's the "what's related?" feature in Netscape, for instance. And baked into Alexa's charter was a contract to donate all the pages it looks at to the archive with a six-month time delay. Alexa is now owned by Amazon.com.

So I'm helping fund the archive. Many others are helping - private organisations like the Markle Foundation, and conventional archivers like the Smithsonian and the Library of Congress. And we got a grant of about $1 million from the National Science Foundation, over four years.

And the Library of Congress will want things to be permanent, right? But in 1986, the BBC Domesday project gathered a huge amount of data about Britain and stored it using a laser disc format whose players are practically extinct. Can you make your archive portable to new technologies?

In a way, it's good to have some loss to motivate people, and for it to happen so quickly. We've moved the archive on to new formats twice now. We started on digital tape, but we found that unreliable, slow, and expensive. We recorded 1996, 1997 and 1998 on tape. By 1999 we were just using hard drives, and now we're using a new generation of hard drives.

How do you make your data portable?

We use a very, very simple file format. We add a minimum of "meta-data" - that's information about each Web page or file like the date, the server it came from, and the file-type and its size. Then comes that server's response to our request in the hypertext-transfer protocol HTTP, which is plain text, and then the file. We developed the format back in 1996 and we've never had to change it, not once.

And the Web pages themselves are in the hypertext markup language HTML, which is just plain text, really. In last resort you could print it out and read between the pointy brackets, but we can easily preserve browser programs that display it properly. I think we are safe with image formats like jpeg and pdf, too. But there are already issues with the Postscript format, which was used a lot for passing scientific papers around in the mid-1990s. Not many home users have a program that can display it. So we have to think about things like converting it on request to HTML.

And what about the reliability of your physical storage?

The disc drives do have a certain decay rate. So as larger drives come on the market, we copy to new ones. But we keep the old ones so we can go back to them if necessary. We started using discs when they hit 16 gigabytes. Now we're buying 160-gigabyte drives, and 200-gigabyte ones are coming out any day. Those engineers just keep going!

Was there a single "eureka moment"?

Well, AltaVista was the first Internet search engine that tried to be a complete index of all the pages. But what really got me was that they threw away the original pages. That grated no end.

But I've been interested in libraries for ages. After the second Thinking Machines design, I went to library school. I didn't finish, but I did read library-use studies - what do people go to libraries for? I have an 8-year-old son, and I'd like him to have a different upbringing than I had. I'd like him to be able to ask questions, and have the best that humankind has to offer within reach. As Raj Reddy, who heads the computing department at Carnegie Mellon University, said in 1997, we should aim for "universal access to all human knowledge." I'm in a position to be able to try to help make that come true. So that's what makes me spring out of bed and say "let's get there!"