Laughter is the Best Medicine

Sharon Bertsch McGrayne: “The Theory That Would Not Die” | Talks at Google

>>Male Presenter: Hello everyone. And welcome
to Sharon Bertsch McGrayne’s talk on “The Theory That Would Not Die.” When I got the
announcement of the book and I asked for a copy of it, I said, “This looks like a very
Googley book.” Bayes’ theorem is used all over the place and when I asked you guys if
you’re interested, it was either the first or the second most popular book that I’ve
ever put up. So, it looks like we’ve had a very good turnout
given that everyone here probably knows more about Bayes theorem than I do. And I know
a fair bit. I will turn it right over to our speaker. Welcome. [applause]>>Sharon Bertsch McGrayne: Thank you. Well,
thank you very much for inviting me and thank you all for coming. I wanna start right out
with some truth in advertising and tell you that I’m not a computer scientist. I’m not
a statistician, a scientist, or a mathematician. I come to you from newspaper reporting and
from science writing. However, I became intrigued with Bayes rule seven or eight years ago when
I could Google the word “Bayesian” and get fewer than a hundred thousand hits. And last
week, when I went on and I Googled “Bayesian,” I got eleven million hits. So, today I wanna talk to you about how you
all are real revolutionaries. You’re participants in a remarkable–almost overnight–revolution
about a very fundamental scientific issue–how you deal with evidence, how you deal with
data, how you evaluate the evidence and measure the uncertainties involved, update it as new
knowledge arises and then hopefully, change minds in light of the new data. Now, usually when I talk about Bayes’ Rule,
I start with a long list of examples of where Bayes is used and I do not think I need to
do that with this crowd. Google often uses naive Bayes and other Bayesian methods and
recently, Google’s Bayesian driverless car got headlines all over the world. And when I wrote a short piece about it for
Scientific American, it was one of the most popular articles in that issue. So, it was
well-known. But there are a couple of examples that I’d like to bring up today that you might
not know so much about. [pause] The first is the Air France jet Flight 447 that took
off two years ago last month from Rio de Janeiro bound for Paris overnight. Went into a high altitude, intense electrical
storm and disappeared without a trace with 228 people aboard. The world’s most high-tech
search ever, naval search ever, lasted without success almost two years. They were searching
for the wreckage and for these two black boxes, which as you can see are actually red and
white. They’re the size of shoe boxes and they were
searching in mountainous terrain two and a half miles down under the ocean in an area
the size of Switzerland. They had no success. Last winter, the French government hired some
of the same people who appear in “The Theory That Wouldn’t Die” who developed Bayesian
Naval search methods. And their software calculated the most probable
sites for the wreckage, which was found in April after an undersea search of one week.
OK? Now, about a month ago, the agency in charge of British archaeological sites announced
that a wonderful Bayesian program had shown them that most of what they’d known about
Neolithic era in Britain was, in their words, bollocks. [laughter] Neolithic people built these strange hilltops
surrounded by concentric circles of ditches. And they had not built them gradually over
the ages. They built and abandoned them, often within the space of one generation. And most
of them had been built during a building spree roughly five thousand, five hundred years
ago. Now, the remarkable thing to me about these
two examples is that the British and the French governments were saying how wonderful Bayes’
had worked. And as we’re gonna see today, a lot of people didn’t even dare mention the
word ‘Bayes’ for decades in the 20th Century. So, to understand why you all are such revolutionaries
we’re gonna have to go back to the beginning. And given the time constraints, I’m gonna
race through the beginning until we get to the Second World War and when I’ll slow down
some. But I hope we’re gonna see two big patterns emerging. First, Bayes becomes an extreme example of
the gap between academia and the real world. And second, military super-secrecy during
the Second World War and the Cold War had a profound effect on Bayes. Now, Bayes rule
of course, is named for the Reverend Thomas Bayes, a wealthy Presbyterian minister and
amateur mathematician who lived in a chic resort outside of London in the early 1700s. We know very little about him. I’m not gonna
show you his portrait because it’s indubitably of someone else. We don’t know his birthdate.
Wikipedia has his death date wrong. But we do have something personal about Bayes. And
that is his handwriting. Goose quills in Latin. OK? And we know another thing about him, excuse
me. This, this handwriting comes from the Institute and Faculty of Actuaries in London.
And we do know something else about Bayes and that is that he discovered his theorem
during the 1740s when Europe was racked by a religious controversy. The issue is not unknown today. It was whether
or not we can use evidence about the natural world around us to make rational conclusions
about the existence of God. They called Bayes, in his generation, would have called God–.
Said it was not about God the creator, but about God the cause, God the primary cause,
the first cause. We do not know that Thomas Bayes wanted to
prove the existence of God, but we do know that Bayes tried to deal with the issue of
cause and effect, mathematically. And in so doing, of course, he produced a simple one-line
theorem that we modify our initial beliefs. And he actually called it a guess. He used
the word “guess” and he said if, if nothing else works, start with 50-50 odds that it
works and modify this, this guess with objective new information and get a new and improved
belief, which in turn carries with it a commitment to update that belief each time a new piece
of information arrives. But Bayes didn’t believe in his theorem enough
to publish it. And he dies ten or fifteen years later with it filed away in his notebook.
Now, going through Bayes’ papers, a friend of his, Richard Price, who was another Presbyterian
minister and an amateur mathematician, decides that the theorem will help prove the existence
of God the cause. Now, unlike Bayes, Richard Price was famous
in his day. This is a royal society portrait done by a famous painter, Benjamin West. He
later becomes a famous supporter of the American Revolution, friends of our Founding Fathers
and a founder of the insurance industry. And he spends the next two years, off and
on, editing Bayes’ theorem. Gets it published. Unfortunately, in a journal that primarily
the British gentry read and not particularly continental mathematicians and it sank out
of view and was neglected. But certainly by today’s standards, Richard Price would be
considered Thomas Bayes’ co-author. If, however, there were justice in this world,
Bayes’ rule should be named for someone else entirely. And that is the great French mathematician,
Pierre-Simon Laplace, who’s better known today for the Laplace transform. Now, as a young
man of 25, Laplace discovers the rule independently of Bayes in 1774 and calls it the “Probability
of Causes”. Now, Laplace, also unlike Bayes, was the quintessential
scientific researcher. He mathematized every science known to his day and he spends the
next 20 years, off and on, in the midst of this enormous career, developing what we call
Bayes’ rule into the form that’s used today and actually used it. But when Laplace dies in 1827, the Western
world begins almost a manic fad collecting precise and objective facts. There were clubs
that collected them. Even women could do it. Some of the famous numbers were the chest
sizes of Scottish soldiers, the number of Prussian officers who were killed by kicking
horses, [laugher] the number of victims of cholera. And with lots of these precise numbers at
their disposal, any up to date statistician rejected Bayes’ rule. They preferred to judge
the probability of an event by the frequency that it occurred–nine times out of ten, three
out of four, and so on. And eventually they will become known as the Frequentists. And the Frequentists become the great opponent
of Bayes’ rule up until quite recently because for them, modern science requires both objectivity
and precise answers. And Bayes, of course, calls for a measure of belief and approximations
and the Frequentists called that quote “subjectivity run amuck,” “ignorance coined into science.” By the 1920s, they were calling it, saying
that Bayes “smacked of astrology, of alchemy.” And another said, “We use Bayes formula with
a sigh as the only thing available under the circumstances.” Now, the surprising thing
that I discovered in all of this time is the theorists and the philosophers denounced Bayes’
rule as subjective. People who had to deal with real world emergencies,
who had to make one-time decisions based on scanty data, they kept right on using Bayes’
rule because they had to make do with what they had. So, for example, Bayes’ rule helped
free Dreyfus from the treason trials in the 1890s in France. Artillery officers in France, in Russia, and
the US, tested their ammunition and, and aimed their fire using Bayesian tables in both World
Wars. The Bell telephone system survived the 1907 financial panic and the US insurance
industry started our first and only–for many years–social insurance, workers compensation
insurance, almost overnight at the time of the First World War despite having few facts
about the safety of particular industries or particular businesses. Now, every good book needs a villain. And
the villain of our piece is Ronald Fisher. Despite Bayes’ usefulness, Ronald Fisher started
attacking Bayes in the 1920s and ’30s and theoretician’s attitudes about Bayes changed
from tepid toleration to outright hostility. One of the reasons is that Ronald Fisher was
a giant. He was a superb geneticist. He founded modern statistics for scientific work, randomization
methods, sampling theory, experimental design methods–these are all some of Fisher’s great
achievements. But unfortunately for Bayes, the thing that Fisher hated most was Bayes’
rule. He was a fervent eugenicist with an explosive
temper and a remarkable inability to understand other people. For example, if Fisher got bored
at a meeting, he might pull out his false teeth and clean them in public. [laughter] He, he interpreted scientific and statistical
questions as, as personal attacks. And his life became, a colleague said, “a sequence
of scientific fights, often several at a time at scientific meetings and in scientific papers
and he hated Bayes’ rule the most.” He didn’t need Bayes. He didn’t work with great amounts of uncertainty.
He was a eugenicist and he filled his house with cats and dogs and thousands of mice for
cross-breeding experiments and he could trace their genealogy back for generations, precisely.
His experiments were repeatable and they produced precise answers. And Fisher calls Bayes’ approximation and
measures of belief “an impenetrable jungle, perhaps the only mistake to which the mathematical
world has so deeply committed itself. It’s founded on an error and must be wholly rejected.”
And he kept up this very personal fight against Bayes for 40 years, into the 1950s when a
lone Bayesian at NIH was showing that cigarettes, smoking cigarettes, caused lung cancer. Now, Fisher was a chain smoker. He even swims
smoking. [laughter] And he was a paid consultant to the tobacco
industry. And he proposed, believe it or not, not that smoking caused lung cancer, but that
lung cancer probably caused smoking. OK? So, Fisher’s stature and his utter inability
to discuss Bayes’ rationally delayed its development for decades. And I think we have to say that
Fisher is an example of how a destructive personality can affect a field, particularly
a small field. Now, I wanna switch gears a bit and dwell on the personal history of Alan
Turing. First, because he’s a hero of mine and second
because it illustrates how Bayes’ worked as a pencil and paper method, as one of the earliest
computer techniques, and as an illustration of the effect of government secrecy. This,
of course, is Alan Turing. [pause] Now by the time the Second World War began, as we’ve
seen, Bayes was virtually taboo as far as sophisticated statisticians were concerned. Fortunately, Turing was not a statistician.
He was a mathematician. And besides fathering the modern computer, computer science, software,
artificial intelligence, the Turing machine, the Turing test, he will father the modern
Bayesian revival. Now, it’s also important to remember that during the Second World War,
England was cut off from the agricultural produce and supplies of the continent, particularly
of France and could feed only one in three of its residents. And it depended on convoys
of unarmed merchant marine ships bringing in each year 30 million tons of food and supplies
from North and South America and from Africa. Now, Hitler said he thought that the U-boats
and their attacks on these convoys will win the war. And Churchill said later “the only thing I
was really scared of during the war were the U-boats” because of the attacks on the supply
lines. And in fact, German U-boats did sink almost 3000 Allied ships and killed more than
50,000 merchant seamen. Now, the German Navy ordered the U-boats around
the Atlantic Ocean via radio messages that were encrypted with word scrambling machines
called ‘Enigmas’. The German government bought the Enigmas during the ’20s and distributed
them to all of its different agencies. So, the Navy had one set of codes and could
develop their own complexities and their own security controls. The Army had another. The
foreign service, the Italians, the Spanish nationalists and so on. The German railways
had one. And the most complex of all was that operated by the German Navy. And this is a German naval Enigma. They’re
had to sort out in pictures, but this one comes from Frode Weierud’s website called
‘CryptoCellar’. Now, an Enigma machine, as you can see, looks like a complex type sturdy
typewriter. It had wiring coming out of the very bottom. It had wheels. This one has four wheels, but
they started with three and added more later. It had starting places. It had code books.
And it had many other features that could be changed within hours if necessary, and
that could produce millions upon millions of permutations. And no one, German or British, thought that
the British could ever read, ever read those messages. So, this is one of the Enigma machines
that Alan Turing will use Bayes’ to conquer. Now, on September 4th, 1939, Turing goes to–.
He’s been working on the Enigma all summer by himself. But he’s ordered the day after the war is
declared to go to Bletchley Park, which is the British super-secret center for decryption
efforts just north of London. When he arrives at Bletchley Park he was 27. He looked 16.
He was handsome and athletic. His mother sent proper business suits for him to wear to work. He preferred shabby sports coats. He had a
five o’clock shadow. Sometimes, his fingernails were dirty. And he will spend the next six
years working on coding and decryption efforts. When he arrives at Bletchley Park, no one
is working on the all-important Navy code, the one that will, will control whether or
not supplies can reach Great Britain. Turing, however, liked to work alone. And
after a few weeks, he announced that no one was doing anything about it and “I could have
it to myself.” And he goes up into the loft of one of the stable buildings at Bletchley
Park and stays there during lunches and breaks and the women on the staff organize a pulley
to take up baskets of food for him for lunch and, and tea. And, and if you don’t mind, I’ll read just
a bit from “The Theory That Wouldn’t Die.” [reads from book] “Late one night, soon after joining Bletchley
Park, Turing invented a manual method for reducing the number of tests that would be
needed to determine the settings for those wheels. It was a highly labor intensive Bayesian
system that he nicknamed ‘Banburismus,’ for the nearby town of Banbury where needed supplies
were printed. He wrote later during the war, ‘I was not
sure it would work in practice, but if it did it would let him guess a stretch of letters
in an Enigma machine, hedge his bets, measure their belief and their validity by using Bayesian
methods to assess their probabilities and add more clues as they arrived.’ If it worked, it would identify settings for
two of Enigma’s wheels and reduce the number of wheel settings to be tested from 336 to
as few as 18.” And at a time, obviously, when every hour
counted, the difference could save sailors’ lives. “So Turing and his slowly growing staff begin
to comb intelligence reports to collect what they called ‘cribs’, which were the German
words that they predicted would occur in the original uncoded German message. The first
cribs came primarily from German weather reports because they were standardized and repeated
often. Weather for the night, situation Eastern channel.
And as one blessed fool radioed each night, ‘Beacons lit as ordered.’ Reports from British
meteorologists about weather in the English Channel provided more clues, more hunches.
And in a fundamental breakthrough, Turing realized that he couldn’t systematize his
hunches or compare his hypotheses, their probabilities, excuse me, without a unit of measurement. And he named his unit ‘A Ban for Banburismus’.
And he defined it as quote ‘about the smallest change of weight in evidence that is directly
perceptible to human intuition.’ End quote. One Ban represented odds of ten to one in
favor of a guess. But Turing normally dealt with much smaller quantities, deciBans and
even centiBans. The Ban was basically the same as the bit,
the measure of information that Claude Shannon discovered by using Bayes’ rule at roughly
the same time at Bell Telephone Laboratories. Turing’s measure of belief, the Ban, and its
supporting mathematical framework had been called his greatest intellectual contribution
to Britain’s defense. Using Bayes’ rule and Bans, Turing began calculating
credibility values for various kinds of hunches and compiling reference tables of Bans for
technicians to use. It was a statistic-based technique, produced no absolute certainties,
but when the odds of a hypothesis added up to 50 to one, cryptanalysts could be close
to certain that they were right.” Now, Turing was obviously developing a home
grown Bayesian system. No one knows where he got it, whether he discovered it and developed
it on his own, with his assistant Jack Good, or whether he knew about the rudiments of
it from the lone defender of Bayes’–Cambridge University during the pre-war period Harold
Jeffries, who used Bayes’ for earthquake and tsunami research to find out the epicenter
of the earthquakes. Now, within a year and a half of the war starting,
by June of 1941, Turing could read the U-boat messages within an hour of arrival at Bletchley
Park. And the British could reroute the convoys around the U-boats and for almost a month
that summer no convoy, no ship is attacked by a U-boat. By the fall, however, the fall of 1941, Banburimus
is critically short of typists and junior clerks, otherwise known as “girl power”. Turing
and the other decoders wrote a personal letter to Churchill, delivered it to Downing Street.
One of them delivered it to Downing Street. And he responded immediately. Among the help
that was attempted, Ian Fleming, of James Bond fame, plans an elaborate raid to capture
code books for Turing. Fortunately, it’s so elaborate, it’s better for a novel than–.
And it was cancelled. [laughter] The Navy collected code books for Turing from
sinking German ships and two young men lost their lives collecting code, code books for
Turing from sinking German ships. But breaking Enigma–. If we could actually have that one
back. I think. [pause] Whoop. Not there. Never mind. But eventually breaking the Enigma codes becomes
routine, like a factory at Bletchley Park. However, [pause] let’s go back to Turing if
we can. There we go. Thank you. But shortly after Germany attacks Russia in June of 1941,
the German army started using a vastly more complex code and word scrambling machine called
the ‘Lorenz’. These were ultra-secret codes and the Supreme
Command in Berlin relies on these new Lorenz codes to communicate high-level strategy to
high-level army commanders around Europe. And some of the messages were so important
that Hitler himself signed them. [reads from book] “And a group of Britain’s leading mathematicians
begin a year of desperate search. They used Bayes’ rule, logic, statistics, Boolean algebra,
and electronics. And they also began working, work on designing and building the first of
ten Colossi, the world’s first large scale electronic computers. Turing invented a highly Bayesian method known
as ‘Turingery’, or ‘Turingismus’. It was a paper and pencil method again. The first step
was to make a guess and assume, as Bayes had suggested, that it had a 50-50 percent chance
of being correct. Add more clues, some good and some bad. And as one of the decoders described it ‘with
patience, luck, a lot of rubbing out and a lot of cycling back and forth, the plain text
appeared.’ Now, the engineer who built the Colossi, Thomas
Flowers, had strict orders to have the second model of the Colossus ready by June 1, 1944.
He was not told why, but his team worked, he said, so hard until they thought their
eyeballs would drop out. They get it ready by June 1. And on June 5,
there’s a message that they receive from Hitler that he’s sending to his army commander in
Normandy named General Irwin Rommel. And Hitler tells Rommel, if there is an invasion of Normandy,
it will be a diversionary tactic and do nothing for five days. The real invasion will occur later, somewhere
else. The message is decrypted at Bletchley Park. A courier takes it to General Eisenhower
where he and his staff are determining when to launch the invasion of Normandy. A courier
gives the piece of paper to Eisenhower. We know this from Thomas Flowers, by the way.
He gives the paper to Eisenhower, who reads it. He cannot tell anything, even to his top
staff about Bletchley Park and the decoding efforts. He gives the paper back to the courier
and he turns to his staff and says, “We go tomorrow morning.” November 6, 1944. And later, Eisenhower says
that the decoding at Bletchley Park, Bletchley Park shortened the war in Europe by two years.
Now, a few days after Germany’s surrender in May of 1945, a year later, Churchill makes
a surprising and still shocking move. Everything that showed that decoding had helped
to win the Second World War was to be ultra-secret. No one could reveal anything about what they’d
done at Bletchley Park. And Turing, of course, could not be mentioned. And this Colossi were
to be destroyed. To be destroyed, cut into unidentifiable pieces,
except for the last two most complex ones. [crash] Bombing. [laughter] Except for the two most–, the latest ones,
the most complex ones, which Britain apparently used during the Cold War to decode Soviet
messages. But I think [sound of truck backing up] we have to think that without Churchill’s
orders, it might have been Great Britain that became the leader of the 20th Century computer
revolution. Now, after the war, Turing was working on
computers and other projects. No one knew what he had done at Bletchley Park, when two
spies flee from Britain [door slam] to Moscow. They had been spying for the Soviet Union
all during the war. And they fled in 1950. One was named Guy Burgess and he was an openly
homosexual graduate of Cambridge University. And US Intelligence told Britain that the
spies had been tipped off by another homosexual graduate of Cambridge, an art historian named
Anthony Blunt. And the government panicked and thought that it was probably a homosexual
spy ring of graduates from Cambridge. [laughter] The number of arrests for homosexual activity
spikes. And the day after Queen Elizabeth II is crowned, Alan Turing is arrested. He
is arrested for homosexual activity in the privacy of his home with a consenting adult.
He is found guilty. He’s sentenced to either prison or chemical castration. He chooses the estrogen injections. Over the
next year, he grows breasts. And the day after the 10th Anniversary of the Normandy Invasion
that he helped make possible, Alan Turing commits suicide. Anthony Blunt is later Knighted
and it’s 55 years before the British government, the Prime Minister in this case, apologizes
for its treatment of, of Turing. Yeah, thank you. Now, after the, the Second
World War, its wartime successes were totally classified. And Bayes’ rule also emerged from
the Second World War, even more suspect than before. The anti-Bayesians still focused on
Thomas Bayes’ starting guess, this outrageously subjective prior. And without any public proof that their method
worked, the Bayesians were stymied. When Jack Good, who had been Turing’s statistical assistant
during the war, gave a talk at the Royal Statistical Society about the theory of the method, no
mention of course of the application, the next speaker’s opening words were “after that
nonsense.” During Senator McCarthy’s witch hunt against
communists in the US Federal government, a Bayesian at the National Bureau of Standards
was called, only half-jokingly, “un-American and undermining the United States’ government.” At Harvard Business School, professors had
developed the very Bayesian decision trees that MBAs use. But they were called ‘Socialists’
and ‘so-called scientists’. And a Swiss visitor to Berkeley’s very Frequentist department,
statistics department in the 1950s, realized that it was “kind of dangerous” to apply these
Bayesian methods. Now, the Cold War military, of course, continued
to use and develop Bayes’ rule. So, they knew it worked, but it was secret. For example,
the 1950s wrestled with the problem of how do you predict the probability of something’s
happening if it’s never happened before. There had never been an accidental H-bomb
explosion. So, the Frequentists said “you couldn’t predict the probability of its ever
happening in the future.” So, a post-doc at Rand Corporation, Albert Madansky, used Bayes’
to warn that Curtis LeMay’s strategic air command–I think you know Curtis LeMay from
the movie ‘Dr. Strangelove’ –that his strategic air command could have produced at least 19
H-bomb accidents a year. And the Kennedy Administration eventually
added safeguards. But there were other secret Cold War projects, too. The National Security
Agency cryptographers used Bayes’ to decode Soviet messages. And an advisor to the National Security Agency
and to the Institute for Defense Analyses used it election nights to predict the winners
of a congressional and presidential elections for 20 years, but refused to let anyone say
that he was using Bayes’, apparently, to keep his connection to Bayesian cryptography totally
secret. And the US Navy used Bayes secretly to search for a missing hydrogen bomb in Spain,
for a nuclear submarine, Scorpion, which sank without a trace. And then, in a classified
story that’s told for the first time in “The Theory That Wouldn’t Die” how Bayes’ actually
caught a Russian submarine in the Mediterranean, and convinces the Navy. Now, as a result, during the years of the
Cold War, Bayes becomes a real flesh and blood story about a small group of maybe a hundred
of more believers struggling for legitimacy and acceptance. And for many years, they concentrated
on theory, trying to make probability and Bayes a respectable branch of mathematics. But many Bayesians of that generation remember
the exact moment when Bayes’ overarching logic descends on them and they talk about it like
an epiphany, where they’re converted. To them, Frequentism looked like a series of ad hoc
techniques, whereas Bayes’ theorem had what Einstein had called “the cosmic religious
feeling.” The reason, of course, was that it was concerned
with a very fundamental scientific issue. As David Speigelhalter told me, “It was basic.
A huge sway of scientists say you can’t use probability to express your lack of knowledge
or to describe one-time events that don’t have any frequency to it.” And many scientists find this rather disturbing
because it’s not a process of discovery. It’s more a process of interpretation. So, both
sides were proselytizing their methods as the one and only way to approach statistics.
It was, one statistician told me, a “food fight, a devastating food fight.” And it was one that didn’t subside until late
in the 20th Century. Both sides use these religious terms when a Bayesian was appointed
Chair of an English statistics department. Frequentists called him ‘a Jehovah’s Witness
elected Pope’. [laughter] He, in turn, asked how to encourage Bayes’.
He replied tartly, “Attend funerals.” The Frequentists reported in kind that if Bayesians
would only do what Thomas Bayes had done and publish after they were dead, we’d all be
saved a lot of trouble. [pause] The extraordinary fact though about Bayes’
during the Cold War is that although the military was using Bayes’ and the civilian Bayesians
were under attack, there were very few visible civilian applications. For example, it was
an MIT physicist who used Bayesian methods to do the first nuclear power plant safety
study 20 years after the industry started. He did it in 1973. And he predicted what actually
would happen at Three Mile Island. But he had the big, bad Bayes word hidden in the
appendix of Volume Three of the multi-volume Rasmussen Report. And the only big public
application of Bayes was one using the words in the Federalist Papers as data. Now, the Federalist Papers were a series of
essays that appeared in New York State newspapers to convince New York State voters to vote
for the Constitution. And some of them were anonymous. And Fred Mosteller of Harvard and
David Wallace of the University of Chicago launched a massive Bayesian study and concluded
and convinced everyone that all twelve of the anonymous Federalist Papers were written
by James Madison. But they also came to what they called an
“awesome statistical conclusion.” And that was, they said that ‘Thomas Bayes’ beginning
guess is controversial hated subjective prior was irrelevant if you have a lot of data to
update it with.’ The problem was that Mosteller had to organize an army of about a hundred
Harvard students to punch the data into MITs computer center and no one else was willing
to undertake such a mammoth organizational problem. By the 1980s though, another factor was working
against Bayes, too. And that was the computer revolution was flooding the modern world with
enormous amounts of data and with a lot of unknowns. And Laplace’s method had involved
integration of functions and it was hopelessly complex. So, it was beginning to look, even to many
Bayesians, as though Bayes was an old-fashioned, 18th Century theory crying for a computer
and software. But many academic statisticians thought computers were a copout. They’d started
out as abstract mathematicians. Most were Frequentists. They focused on small data sets, relatively
small data sets, with few unknowns. They didn’t need computers. Bayesians themselves during
this period also didn’t realize that the key to making Bayes useful in the workplace was
not more theory, but computational ease. Theorist Dennis Lindley, who’d been programming
his own computer since 1965 and actually regarded Bayes’ ideal for computers. He wrote me, “I
consider it a major mistake of my professional life not to have appreciated the need for
computing rather than mathematical analysis. I should have seen that Bayes’ enabled one
to compute numerical answers.” It was a particularly poignant case of the Canadian mathematician
Keith Hastings, who published in 1970 what should have been a real breakthrough paper,
what’s now called ‘The Hastings Metropolis Algorithm’, or simply ‘The Metropolis Algorithm’. He used Markov chains, Monte Carlo sampling
techniques. Published it. Got no reaction at all. A year later, drops out of research
and goes to teach at the University of British Columbia. And it was not until 20 years after
his, after his work when he was fully retired that he realized the importance of what he
had done. And Hastings told me with some anguish in
his voice that his work was ignored because quote “a lot of statisticians were not oriented
toward computing. Statisticians took these theoretical courses, cranked out theoretical
papers, and some of them wanted an exact answer, wanted exact answers, not estimates.” So, as a result, during the 1980s as computers
were pouring out this fascinating new data about pulsars and plate tectonics and evolutionary
biology and pollution in the environment, it was not analyzed often by, by statisticians.
It was analyzed by computer scientists, by engineers, physicists and biologists. And it would be imaging that would force the
issue because by the late ’70s, early 1980s, industrial automation, the military, and medical
diagnostics were producing blurry images from their ultrasound machine, PET scans, MRIs,
electron micrographs, telescopes, military aircraft, and infrared sensors. And there was Bobby Hunt, who in 1977, finally
suggested that Bayes’ could be used for image restoration. He had done it while working
on strategic weapons programs and digital image processing at Sandia and Los Alamos
National Labs in New Mexico. During this period, others were introducing
iterations and Monte Carlo. And, in 1974. in 1984, if I can get the next slide. Alan
Gelfand and Adrian Smith–. [pause] Sorry. It’s just a picture of Adrian Smith and Alan
Gelfand. Thank you. [pause] Who decide to start something new. And Gelfand, reading
around, discovers the iterations and Gibbs sampling. Adrian Smith on the left, at the University
of Nottingham at the time. Alan Gelfand at the right at University of Connecticut at
the time. And when Gelfand, the minute he saw the papers on iteration and Gibbs sampling,
he said all the pieces fell together. Bayes, Gibbs sampling, Monte Carlo, Markov chains,
iterations. And they wrote their watershed synthesis,
now called MCMC for Markov chain-Monte Carlo, very, very fast ’cause they were scared other
people would put the pieces together, too. But they also wrote it very carefully. They
used the word ‘Bayes’ only five times in 12 pages. And Gelfand told me later, “There was always
some concern about using the ‘B’ word, a natural defensiveness on the part of Bayesians in
terms of rocking the boat. We were always an oppressed minority trying to get some recognition.
And even if we thought we were doing it the right way, we were only a small component
of the statistical community and we didn’t have much outreach into the scientific community.” But Bayesians thought the paper was an epiphany
and the next ten years is, what I call, a frenzy of research, solving, calculating problems
that for two and a half centuries had only been dreams. Gelfand, of course, says that
they were lucky because the relatively inexpensive powerful workstations became available at
the same time. Then Smith’s student, a David Spiegelhalter,
came out with his BUGS software. BUGS standing for Bayesian Statistics Using Gibbs Sampling.
It was off the shelf software. He comes out with the first one in 1991 and it’s BUGS that
causes the biggest single jump in Bayesian popularity. And it sends Bayes out into this scientific
and the technological world, where outsiders from computer science, from physics, artificial
intelligence, refresh it, broaden it, secularize it, de-politicize it and it gets adopted almost
overnight. It was a modern paradigm shift for a very pragmatic age. It happened overnight, not because people
changed their minds about a philosophy of science, but because finally it worked. The
battle between Bayesians and Frequentists subsided. Researchers adopted–could adopt–the
method they thought fit their needs best. Prominent Frequentists moderated their positions.
Bradley Efron, a National Medal of Science recipient who had written a classic defense
of Frequentism, recently said, “I’ve always been a Bayesian.” [laughter] And someone else, who once called Bayes “the
crack-cocaine of statistics [laughter] –seductive, addictive, and ultimately destructive” began
recruiting Bayesian interns for Google. Thank you. [applause] Now I’d be happy to try to answer some questions,
but it would be very kind of you if you’d use the microphone so people could hear the
questions. And I wouldn’t have to maul them, summarizing them.>>MALE AUDIENCE MEMBER #1: So what was the
Neolithic discovery?>>Sharon Bertsch McGrayne: It was that the–.
It had happened all of a sudden, within a short time period five thousand, five hundred
years ago. And that instead of their being used for rites over the ages and built over
the ages, that they’d actually been built and abandoned in this very short time period. And they’re so excited by this Bayesian mega-analysis
that they’ve done that they’re gonna try another of the early Anglo-Saxon period in Great Britain
next. Yes.>>MALE AUDIENCE MEMBER #2: OK. So–.>>Sharon Bertsch McGrayne: Now remember I’m
not a mathematician. [laughter]>>MALE AUDIENCE MEMBER #2: So, my training
actually is mathematics and we have, our controversies are, we have what’s called the axiom of choice
which is about the behavior of infinite sets and the continuum hypothesis and all these
other sort of questioned axioms. And if you assume not the axiom of choice,
you get one mathematics. If you assume it, you get a different mathematics. But mathematicians
will all agree that these are just models. These are abstract ideas. It’s not that this
axiom is true or not true ’cause it’s formalisms. In the real world, you can’t even substantiate
an infinite set anyway. How come with the statistical community, ’cause in the math
community it was like, whether it’s useful. It’s like, you can extend math this way or
that way. Which one is more useful as a model? Why weren’t they able to say, “OK, Bayesianism
might not make sense in some platonic way.” Or it might, there’s selecting the priors,
which is more of an art than a science. But just based on the fact that it’s useful, why
weren’t people able to go with it? [pause]>>Sharon Bertsch McGrayne: It’s very odd because
a whole class of people, physicists, did use Bayes’. And they used it even before the Second
World War. Fermi used to do computations if he couldn’t sleep at night. He’d do Bayesian
computations. [laughter] And then he’d come in the next morning and
announce how the experiments of the day were probably gonna come out. Well, we’re not all
Fermis, but–. And during the war, Los Alamos used them. And after the war, the physicists actually
tried to get statisticians to, to use them. There were NATO conferences and so on. So,
it’s very puzzling. There were articles that told statisticians you can do it manually.
There were non-statisticians. There were historians in the ’60s using computers. I think they were trapped. They were so defensive.
They were such a small group and they were under attack. That’s not a good way to be,
mentally. You can’t burst out if you’re under– if your fortress is under attack. That’s the
only thing I’ve been able to conclude.>>MALE AUDIENCE MEMBER #3: So, you mentioned
in your book but not in the talk on the Lindley’s paradox and this research now in Princeton
about psychokinetic random number generators that, according to a Frequentist, there was
just as significant that people had random number generating abilities, but Bayesians
said that this is silly and the same study proves that it doesn’t exist. Would you go into that more? Like, the study
was cooked, then. Bad. Garbage in, garbage out. But was this real data? [pause]>>Sharon Bertsch McGrayne: Was it real data?
Well, I guess he had his machines recording it, but I think the statisticians don’t record
it, regard it as real data at all. The same question came up just recently of–come on,
what’s it called–ESP. And Frequentists’ analysis saying that ESP could work. It was something routed in the last year,
I think. So, it’s a long-standing problem, but Dennis Lindley I think cooked that guy’s
data and said it was garbage. Garbage in and garbage out. [laughter] Yes, there’s some papers about it and they’re
in the bibliography.>>MALE AUDIENCE MEMBER #4: To follow up on
that, there’s an article on this very subject in the latest issue of the Skeptical Inquirer.>>Sharon Bertsch McGrayne: Good. Good.>>MALE AUDIENCE MEMBER #4: I’ll try to find
my copy and bring it in. I just wanna remark about history. I’ve been in computing since
1964 and when I was at Cornell, there was a tremendous, there was actually–. Oh, my
phone’s going crazy. There were actual random walk problems in
the textbook, the programming textbook, and the social scientists were just generating
reports this big. I don’t exaggerate. I hauled them around, of computer analysis, factans,
and xtabs and all kinds of crazy stuff. So, I think there’s this whole background
of — that was going on underneath this controversy that was preparing us for it.>>Sharon Bertsch McGrayne: Howard Raiffa gives
his–. I was in a totally different kind of meeting and someone asked what I was working
on and I said Bayes’ rule. And he said, “Oh, Bayes’ rule? Howard Raiffa taught Bayes’ rule
at Harvard Business School in the 1960s.” And he dug out of his notes this reference,
too. It’s 60 or 80 pages on Bayes, Markov chains and so on. So, you’re absolutely right.
So it’s such a puzzle why–. [pause] I really think it must have been the fortress mentality
that — it’s just hard to break out of that. If there aren’t any more questions, I would
be happy to sign anyone’s books if they want me to.>>Male Presenter: Thank you very much. [applause]

Leave a Reply

Your email address will not be published. Required fields are marked *