Welcome
Photos of Larryblakeley
http://www.royblakeley.name/larry_blakeley/larryblakeley_photos_jpeg.htm
(Contact Info: larry at larryblakeley dot com)
Important Note: You will need to click this icon to download the free
needed to view most of the images on this Web site - just a couple of clicks and you're "good to go."
I manage this Web site and the following Web sites: Leslie (Blakeley) Adkins - my oldest daughter
Lori Ann Blakeley (June 20, 1985 - May 4, 2005) - my middle daughter
Evan Blakeley- my youngest child
Way back when
Imagine if your very first Web pages or some furious, ill-written late-night
postings came back to haunt you years later. Well, now they can. The Wayback
Machine gives you access to the Internet Archive, which has taken an
almost-complete snapshot of the World Wide Web every 60 days since 1996 - that's
about 2 billion pages. This archive is now a vast record, storing pages others
have censored, deleted or simply forgotten to maintain.
Paul Marks talked to Wayback's
inventor, Brewster Kahle
Why on earth are you doing
this?
Websites are like shifting sands. The average life of a Web page is 100 days.
After that either it's changed or it disappears. So our intellectual society is
built on sand. You can't hold people accountable if, say, the promises posted on
the Web by politicians are not available after the election. And key academic
papers can become unavailable if a researcher leaves a university and their
website is deleted. We've found that many websites of publicly funded projects
disappear within a year. So as taxpayers we are investing in research projects,
but we're not investing in a Web library that organises them and gives future
generations access. Our Wayback Machine is the first attempt to do this.
But a lot of the stuff on the Web is unreliable. What use are such pages to
posterity?
The whole point of comprehensive library collections is that you can't tell in
advance what will be important. The Web is the people's medium, it's not
elitist. Anyone can publish there, so you've got the good, the bad, the ugly,
the profane. It's just us, that's the amazing thing. For instance, a lot of
libraries are now used for genealogical work. What would you give for a video
clip of your great-grandmother? I'd give a lot. I may not watch it very often,
but I'd love some way of knowing who she was.
After 9/11, weren't some sites ordered off the archive - such as those with US
nuclear power plant information?
Yes, there were things that had to be taken off, but by and large I feel that we
have the basic ingredients of a great digital library. It applies at other
levels too, because there are a lot of personal Web pages, and they might be
linked to a picture of your wife - well, in a few years it may be your ex-wife.
If there are any requests from the original authors to not have things in the
archive, we remove them.
How can you be sure that you're not missing something important?
I guarantee that in the future researchers will curse us for having missed
something absolutely critical. But only people using the archive can tell us
about mistakes in what we collect. There is a cheaper alternative concept,
called "dark archiving," which means that we should just archive things and not
give people access to them. But preservation without access is dangerous -
there's no way of reviewing what's in there.
What about pay sites?
We don't record pay sites or sites that have passwords.
But if you don't, you can end up not getting both sides of a story. Is the world
of information splitting into two - the free and the paid-for?
Maybe. But there are already archives of commercial information. It's like a
traditional library: you pay for access or you schlep down to a building to look
at it. It's the old world, it's tired. Our archive is the people's medium, the
wired way, and you can use it wherever you are. How many subscribers does
LexisNexis have? How many people use Google? Which would you rather publish in?
While you have preprints of some scientific papers, you don't have the
subscriber-only journal sites with the final, refereed versions?
We haven't really dealt with the academic world. They are in many ways handling
their own stuff well enough. I praise the print publishers for the way they've
preserved their output; compared with the music and video people, they rock. But
publishing houses don't last forever, and their interest in maintaining
materials that aren't beneficial commercially is limited. That's what the public
funding of libraries is for. The public library system in the US gets $25
billion a year. That's a lot of money - $5 to $6 billion of that goes to
publishers, paying for books. We could use a little of that money to do a lot
better job of trying to put the classics - the best works of humankind - within
reach of every child at home via my archive or something like it.
How big is the archive now?
Over 100 terabytes. As plain text in book form, that'd be over 3000 miles of
shelf space. And we're adding 10 terabytes a month. It costs $40,000 a month
just to buy new storage. Next year it will cost half that for the same amount of
storage but by then there will be twice as much to record, or more.
What does a Wayback Machine look like?
It's 150-odd standard PC cases, with four drives in each, standing on end on
racks. So they look a bit like a bookshelf - which is deliberate.
And where is the Wayback archive - physically?
It's now in three places, two in the San Francisco Bay Area and one at the new
Library of Alexandria in Egypt. If you ask someone, "What do you know about the
Great Library of Alexandria?" they mostly say, "Isn't that the one that fried?"
So don't just have one copy. Take special care of collections that are really
important to the definition of cultures.
Will there be more Wayback Machines?
We won't be the only player in town: we'd like to be part of a web of libraries
and archives that all interoperate: you go to "the library" and are
automatically forwarded to the right "place." I was in Glasgow recently, at the
International Federation of Library Associations meeting and from what the
librarians were saying, I think the question is not whether we are going to save
our digital heritage, but how. And, to me, that's about access. Say the British
Library collects the UK's Web pages, do they just make them available inside
their library? Right now, people look for stuff on the Web, and if it's not on
the Web, it doesn't exist for them. So trying to get the best works we have to
offer on the Web is important from a library perspective.
And will there be subject collections as well as national ones?
We've already put together a collection of 11 September coverage on the Web and
TV. I think it's very important to see things through each other's eyes, across
national divides. In the US, there was a wake-up call on 11 September when
people started looking at magazines and newspapers in Europe, and Arab papers to
try to get other perspectives. And these other people were trying to figure out
if we were going to bomb them.
I'm very encouraged by what we saw in the user logs. People were looking to see
whether Palestinians were really cheering in the street, as reported by CNN.
Well, there were a few, but they weren't at all representative. I don't think
that you want to see that through your own media, you want to see it through
theirs.
Where did the idea for a universal archive come from?
Technologists have promised the digital library for decades. In 1945, Vannevar
Bush, who was technology adviser to several US presidents, wrote an article in
The Atlantic magazine outlining how computers might one day augment libraries.
Then in 1960, a young graduate called Ted Nelson got sidetracked from his
masters degree in sociology at Harvard into writing text-retrieval software. He
published his ideas, and coined the term "hypertext" in 1965. So in many ways
the digital library is long overdue.
Where did you come in?
I got involved with computers really early. A friend in high school built a
computer in transistor logic, before the days of integrated circuits. It was a
great activity for a reclusive kind of kid from the New York 'burbs of
Westchester county. So I went on to do computer science at MIT, under Marvin
Minsky and Danny Hillis. I got interested in encryption, and digital libraries.
After graduating in 1982, I helped Danny start a company called Thinking
Machines, which built fast parallel computers. We built one of the first search
engines, which indexed every word in hundreds of newspapers and magazines for
the Dow Jones News Service.
Were they well received?
After we built these big computers, I really thought the Sun would come up a
different colour. I thought the world would now be enlightened because we could
get out all this amazing information. But it turned out that most information
was still being passed around on paper. So I invented a system called WAIS - the
Wide Area Information Server - which was the first Internet publishing system.
It faded pretty fast, because the Gopher information index and then the World
Wide Web and the Mosaic browser came along, and they were just much more usable
and better systems.
Archiving the Web is an enormous task. How did you get to raise the cash?
I started a commercial website-cataloguing company at the same time, called
Alexa Internet. Alexa is a free service bundled into the browsers - it's the
"what's related?" feature in Netscape, for instance. And baked into Alexa's
charter was a contract to donate all the pages it looks at to the archive with a
six-month time delay. Alexa is now owned by Amazon.com.
So I'm helping fund the archive. Many others are helping - private organisations
like the Markle Foundation, and conventional archivers like the Smithsonian and
the Library of Congress. And we got a grant of about $1 million from the
National Science Foundation, over four years.
And the Library of Congress will want things to be permanent, right? But in
1986, the BBC Domesday project gathered a huge amount of data about Britain and
stored it using a laser disc format whose players are practically extinct. Can
you make your archive portable to new technologies?
In a way, it's good to have some loss to motivate people, and for it to happen
so quickly. We've moved the archive on to new formats twice now. We started on
digital tape, but we found that unreliable, slow, and expensive. We recorded
1996, 1997 and 1998 on tape. By 1999 we were just using hard drives, and now
we're using a new generation of hard drives.
How do you make your data portable?
We use a very, very simple file format. We add a minimum of "meta-data" - that's
information about each Web page or file like the date, the server it came from,
and the file-type and its size. Then comes that server's response to our request
in the hypertext-transfer protocol HTTP, which is plain text, and then the file.
We developed the format back in 1996 and we've never had to change it, not once.
And the Web pages themselves are in the hypertext markup language HTML, which is
just plain text, really. In last resort you could print it out and read between
the pointy brackets, but we can easily preserve browser programs that display it
properly. I think we are safe with image formats like jpeg and pdf, too. But
there are already issues with the Postscript format, which was used a lot for
passing scientific papers around in the mid-1990s. Not many home users have a
program that can display it. So we have to think about things like converting it
on request to HTML.
And what about the reliability of your physical storage?
The disc drives do have a certain decay rate. So as larger drives come on the
market, we copy to new ones. But we keep the old ones so we can go back to them
if necessary. We started using discs when they hit 16 gigabytes. Now we're
buying 160-gigabyte drives, and 200-gigabyte ones are coming out any day. Those
engineers just keep going!
Was there a single "eureka moment"?
Well, AltaVista was the first Internet search engine that tried to be a complete
index of all the pages. But what really got me was that they threw away the
original pages. That grated no end.
But I've been interested in libraries for ages. After the second Thinking
Machines design, I went to library school. I didn't finish, but I did read
library-use studies - what do people go to libraries for? I have an 8-year-old
son, and I'd like him to have a different upbringing than I had. I'd like him to
be able to ask questions, and have the best that humankind has to offer within
reach. As Raj Reddy, who heads the computing department at Carnegie Mellon
University, said in 1997, we should aim for "universal access to all human
knowledge." I'm in a position to be able to try to help make that come true. So
that's what makes me spring out of bed and say "let's get there!"