>>Male Presenter: Hello everyone. And welcome

to Sharon Bertsch McGrayne’s talk on “The Theory That Would Not Die.” When I got the

announcement of the book and I asked for a copy of it, I said, “This looks like a very

Googley book.” Bayes’ theorem is used all over the place and when I asked you guys if

you’re interested, it was either the first or the second most popular book that I’ve

ever put up. So, it looks like we’ve had a very good turnout

given that everyone here probably knows more about Bayes theorem than I do. And I know

a fair bit. I will turn it right over to our speaker. Welcome. [applause]>>Sharon Bertsch McGrayne: Thank you. Well,

thank you very much for inviting me and thank you all for coming. I wanna start right out

with some truth in advertising and tell you that I’m not a computer scientist. I’m not

a statistician, a scientist, or a mathematician. I come to you from newspaper reporting and

from science writing. However, I became intrigued with Bayes rule seven or eight years ago when

I could Google the word “Bayesian” and get fewer than a hundred thousand hits. And last

week, when I went on and I Googled “Bayesian,” I got eleven million hits. So, today I wanna talk to you about how you

all are real revolutionaries. You’re participants in a remarkable–almost overnight–revolution

about a very fundamental scientific issue–how you deal with evidence, how you deal with

data, how you evaluate the evidence and measure the uncertainties involved, update it as new

knowledge arises and then hopefully, change minds in light of the new data. Now, usually when I talk about Bayes’ Rule,

I start with a long list of examples of where Bayes is used and I do not think I need to

do that with this crowd. Google often uses naive Bayes and other Bayesian methods and

recently, Google’s Bayesian driverless car got headlines all over the world. And when I wrote a short piece about it for

Scientific American, it was one of the most popular articles in that issue. So, it was

well-known. But there are a couple of examples that I’d like to bring up today that you might

not know so much about. [pause] The first is the Air France jet Flight 447 that took

off two years ago last month from Rio de Janeiro bound for Paris overnight. Went into a high altitude, intense electrical

storm and disappeared without a trace with 228 people aboard. The world’s most high-tech

search ever, naval search ever, lasted without success almost two years. They were searching

for the wreckage and for these two black boxes, which as you can see are actually red and

white. They’re the size of shoe boxes and they were

searching in mountainous terrain two and a half miles down under the ocean in an area

the size of Switzerland. They had no success. Last winter, the French government hired some

of the same people who appear in “The Theory That Wouldn’t Die” who developed Bayesian

Naval search methods. And their software calculated the most probable

sites for the wreckage, which was found in April after an undersea search of one week.

OK? Now, about a month ago, the agency in charge of British archaeological sites announced

that a wonderful Bayesian program had shown them that most of what they’d known about

Neolithic era in Britain was, in their words, bollocks. [laughter] Neolithic people built these strange hilltops

surrounded by concentric circles of ditches. And they had not built them gradually over

the ages. They built and abandoned them, often within the space of one generation. And most

of them had been built during a building spree roughly five thousand, five hundred years

ago. Now, the remarkable thing to me about these

two examples is that the British and the French governments were saying how wonderful Bayes’

had worked. And as we’re gonna see today, a lot of people didn’t even dare mention the

word ‘Bayes’ for decades in the 20th Century. So, to understand why you all are such revolutionaries

we’re gonna have to go back to the beginning. And given the time constraints, I’m gonna

race through the beginning until we get to the Second World War and when I’ll slow down

some. But I hope we’re gonna see two big patterns emerging. First, Bayes becomes an extreme example of

the gap between academia and the real world. And second, military super-secrecy during

the Second World War and the Cold War had a profound effect on Bayes. Now, Bayes rule

of course, is named for the Reverend Thomas Bayes, a wealthy Presbyterian minister and

amateur mathematician who lived in a chic resort outside of London in the early 1700s. We know very little about him. I’m not gonna

show you his portrait because it’s indubitably of someone else. We don’t know his birthdate.

Wikipedia has his death date wrong. But we do have something personal about Bayes. And

that is his handwriting. Goose quills in Latin. OK? And we know another thing about him, excuse

me. This, this handwriting comes from the Institute and Faculty of Actuaries in London.

And we do know something else about Bayes and that is that he discovered his theorem

during the 1740s when Europe was racked by a religious controversy. The issue is not unknown today. It was whether

or not we can use evidence about the natural world around us to make rational conclusions

about the existence of God. They called Bayes, in his generation, would have called God–.

Said it was not about God the creator, but about God the cause, God the primary cause,

the first cause. We do not know that Thomas Bayes wanted to

prove the existence of God, but we do know that Bayes tried to deal with the issue of

cause and effect, mathematically. And in so doing, of course, he produced a simple one-line

theorem that we modify our initial beliefs. And he actually called it a guess. He used

the word “guess” and he said if, if nothing else works, start with 50-50 odds that it

works and modify this, this guess with objective new information and get a new and improved

belief, which in turn carries with it a commitment to update that belief each time a new piece

of information arrives. But Bayes didn’t believe in his theorem enough

to publish it. And he dies ten or fifteen years later with it filed away in his notebook.

Now, going through Bayes’ papers, a friend of his, Richard Price, who was another Presbyterian

minister and an amateur mathematician, decides that the theorem will help prove the existence

of God the cause. Now, unlike Bayes, Richard Price was famous

in his day. This is a royal society portrait done by a famous painter, Benjamin West. He

later becomes a famous supporter of the American Revolution, friends of our Founding Fathers

and a founder of the insurance industry. And he spends the next two years, off and

on, editing Bayes’ theorem. Gets it published. Unfortunately, in a journal that primarily

the British gentry read and not particularly continental mathematicians and it sank out

of view and was neglected. But certainly by today’s standards, Richard Price would be

considered Thomas Bayes’ co-author. If, however, there were justice in this world,

Bayes’ rule should be named for someone else entirely. And that is the great French mathematician,

Pierre-Simon Laplace, who’s better known today for the Laplace transform. Now, as a young

man of 25, Laplace discovers the rule independently of Bayes in 1774 and calls it the “Probability

of Causes”. Now, Laplace, also unlike Bayes, was the quintessential

scientific researcher. He mathematized every science known to his day and he spends the

next 20 years, off and on, in the midst of this enormous career, developing what we call

Bayes’ rule into the form that’s used today and actually used it. But when Laplace dies in 1827, the Western

world begins almost a manic fad collecting precise and objective facts. There were clubs

that collected them. Even women could do it. Some of the famous numbers were the chest

sizes of Scottish soldiers, the number of Prussian officers who were killed by kicking

horses, [laugher] the number of victims of cholera. And with lots of these precise numbers at

their disposal, any up to date statistician rejected Bayes’ rule. They preferred to judge

the probability of an event by the frequency that it occurred–nine times out of ten, three

out of four, and so on. And eventually they will become known as the Frequentists. And the Frequentists become the great opponent

of Bayes’ rule up until quite recently because for them, modern science requires both objectivity

and precise answers. And Bayes, of course, calls for a measure of belief and approximations

and the Frequentists called that quote “subjectivity run amuck,” “ignorance coined into science.” By the 1920s, they were calling it, saying

that Bayes “smacked of astrology, of alchemy.” And another said, “We use Bayes formula with

a sigh as the only thing available under the circumstances.” Now, the surprising thing

that I discovered in all of this time is the theorists and the philosophers denounced Bayes’

rule as subjective. People who had to deal with real world emergencies,

who had to make one-time decisions based on scanty data, they kept right on using Bayes’

rule because they had to make do with what they had. So, for example, Bayes’ rule helped

free Dreyfus from the treason trials in the 1890s in France. Artillery officers in France, in Russia, and

the US, tested their ammunition and, and aimed their fire using Bayesian tables in both World

Wars. The Bell telephone system survived the 1907 financial panic and the US insurance

industry started our first and only–for many years–social insurance, workers compensation

insurance, almost overnight at the time of the First World War despite having few facts

about the safety of particular industries or particular businesses. Now, every good book needs a villain. And

the villain of our piece is Ronald Fisher. Despite Bayes’ usefulness, Ronald Fisher started

attacking Bayes in the 1920s and ’30s and theoretician’s attitudes about Bayes changed

from tepid toleration to outright hostility. One of the reasons is that Ronald Fisher was

a giant. He was a superb geneticist. He founded modern statistics for scientific work, randomization

methods, sampling theory, experimental design methods–these are all some of Fisher’s great

achievements. But unfortunately for Bayes, the thing that Fisher hated most was Bayes’

rule. He was a fervent eugenicist with an explosive

temper and a remarkable inability to understand other people. For example, if Fisher got bored

at a meeting, he might pull out his false teeth and clean them in public. [laughter] He, he interpreted scientific and statistical

questions as, as personal attacks. And his life became, a colleague said, “a sequence

of scientific fights, often several at a time at scientific meetings and in scientific papers

and he hated Bayes’ rule the most.” He didn’t need Bayes. He didn’t work with great amounts of uncertainty.

He was a eugenicist and he filled his house with cats and dogs and thousands of mice for

cross-breeding experiments and he could trace their genealogy back for generations, precisely.

His experiments were repeatable and they produced precise answers. And Fisher calls Bayes’ approximation and

measures of belief “an impenetrable jungle, perhaps the only mistake to which the mathematical

world has so deeply committed itself. It’s founded on an error and must be wholly rejected.”

And he kept up this very personal fight against Bayes for 40 years, into the 1950s when a

lone Bayesian at NIH was showing that cigarettes, smoking cigarettes, caused lung cancer. Now, Fisher was a chain smoker. He even swims

smoking. [laughter] And he was a paid consultant to the tobacco

industry. And he proposed, believe it or not, not that smoking caused lung cancer, but that

lung cancer probably caused smoking. OK? So, Fisher’s stature and his utter inability

to discuss Bayes’ rationally delayed its development for decades. And I think we have to say that

Fisher is an example of how a destructive personality can affect a field, particularly

a small field. Now, I wanna switch gears a bit and dwell on the personal history of Alan

Turing. First, because he’s a hero of mine and second

because it illustrates how Bayes’ worked as a pencil and paper method, as one of the earliest

computer techniques, and as an illustration of the effect of government secrecy. This,

of course, is Alan Turing. [pause] Now by the time the Second World War began, as we’ve

seen, Bayes was virtually taboo as far as sophisticated statisticians were concerned. Fortunately, Turing was not a statistician.

He was a mathematician. And besides fathering the modern computer, computer science, software,

artificial intelligence, the Turing machine, the Turing test, he will father the modern

Bayesian revival. Now, it’s also important to remember that during the Second World War,

England was cut off from the agricultural produce and supplies of the continent, particularly

of France and could feed only one in three of its residents. And it depended on convoys

of unarmed merchant marine ships bringing in each year 30 million tons of food and supplies

from North and South America and from Africa. Now, Hitler said he thought that the U-boats

and their attacks on these convoys will win the war. And Churchill said later “the only thing I

was really scared of during the war were the U-boats” because of the attacks on the supply

lines. And in fact, German U-boats did sink almost 3000 Allied ships and killed more than

50,000 merchant seamen. Now, the German Navy ordered the U-boats around

the Atlantic Ocean via radio messages that were encrypted with word scrambling machines

called ‘Enigmas’. The German government bought the Enigmas during the ’20s and distributed

them to all of its different agencies. So, the Navy had one set of codes and could

develop their own complexities and their own security controls. The Army had another. The

foreign service, the Italians, the Spanish nationalists and so on. The German railways

had one. And the most complex of all was that operated by the German Navy. And this is a German naval Enigma. They’re

had to sort out in pictures, but this one comes from Frode Weierud’s website called

‘CryptoCellar’. Now, an Enigma machine, as you can see, looks like a complex type sturdy

typewriter. It had wiring coming out of the very bottom. It had wheels. This one has four wheels, but

they started with three and added more later. It had starting places. It had code books.

And it had many other features that could be changed within hours if necessary, and

that could produce millions upon millions of permutations. And no one, German or British, thought that

the British could ever read, ever read those messages. So, this is one of the Enigma machines

that Alan Turing will use Bayes’ to conquer. Now, on September 4th, 1939, Turing goes to–.

He’s been working on the Enigma all summer by himself. But he’s ordered the day after the war is

declared to go to Bletchley Park, which is the British super-secret center for decryption

efforts just north of London. When he arrives at Bletchley Park he was 27. He looked 16.

He was handsome and athletic. His mother sent proper business suits for him to wear to work. He preferred shabby sports coats. He had a

five o’clock shadow. Sometimes, his fingernails were dirty. And he will spend the next six

years working on coding and decryption efforts. When he arrives at Bletchley Park, no one

is working on the all-important Navy code, the one that will, will control whether or

not supplies can reach Great Britain. Turing, however, liked to work alone. And

after a few weeks, he announced that no one was doing anything about it and “I could have

it to myself.” And he goes up into the loft of one of the stable buildings at Bletchley

Park and stays there during lunches and breaks and the women on the staff organize a pulley

to take up baskets of food for him for lunch and, and tea. And, and if you don’t mind, I’ll read just

a bit from “The Theory That Wouldn’t Die.” [reads from book] “Late one night, soon after joining Bletchley

Park, Turing invented a manual method for reducing the number of tests that would be

needed to determine the settings for those wheels. It was a highly labor intensive Bayesian

system that he nicknamed ‘Banburismus,’ for the nearby town of Banbury where needed supplies

were printed. He wrote later during the war, ‘I was not

sure it would work in practice, but if it did it would let him guess a stretch of letters

in an Enigma machine, hedge his bets, measure their belief and their validity by using Bayesian

methods to assess their probabilities and add more clues as they arrived.’ If it worked, it would identify settings for

two of Enigma’s wheels and reduce the number of wheel settings to be tested from 336 to

as few as 18.” And at a time, obviously, when every hour

counted, the difference could save sailors’ lives. “So Turing and his slowly growing staff begin

to comb intelligence reports to collect what they called ‘cribs’, which were the German

words that they predicted would occur in the original uncoded German message. The first

cribs came primarily from German weather reports because they were standardized and repeated

often. Weather for the night, situation Eastern channel.

And as one blessed fool radioed each night, ‘Beacons lit as ordered.’ Reports from British

meteorologists about weather in the English Channel provided more clues, more hunches.

And in a fundamental breakthrough, Turing realized that he couldn’t systematize his

hunches or compare his hypotheses, their probabilities, excuse me, without a unit of measurement. And he named his unit ‘A Ban for Banburismus’.

And he defined it as quote ‘about the smallest change of weight in evidence that is directly

perceptible to human intuition.’ End quote. One Ban represented odds of ten to one in

favor of a guess. But Turing normally dealt with much smaller quantities, deciBans and

even centiBans. The Ban was basically the same as the bit,

the measure of information that Claude Shannon discovered by using Bayes’ rule at roughly

the same time at Bell Telephone Laboratories. Turing’s measure of belief, the Ban, and its

supporting mathematical framework had been called his greatest intellectual contribution

to Britain’s defense. Using Bayes’ rule and Bans, Turing began calculating

credibility values for various kinds of hunches and compiling reference tables of Bans for

technicians to use. It was a statistic-based technique, produced no absolute certainties,

but when the odds of a hypothesis added up to 50 to one, cryptanalysts could be close

to certain that they were right.” Now, Turing was obviously developing a home

grown Bayesian system. No one knows where he got it, whether he discovered it and developed

it on his own, with his assistant Jack Good, or whether he knew about the rudiments of

it from the lone defender of Bayes’–Cambridge University during the pre-war period Harold

Jeffries, who used Bayes’ for earthquake and tsunami research to find out the epicenter

of the earthquakes. Now, within a year and a half of the war starting,

by June of 1941, Turing could read the U-boat messages within an hour of arrival at Bletchley

Park. And the British could reroute the convoys around the U-boats and for almost a month

that summer no convoy, no ship is attacked by a U-boat. By the fall, however, the fall of 1941, Banburimus

is critically short of typists and junior clerks, otherwise known as “girl power”. Turing

and the other decoders wrote a personal letter to Churchill, delivered it to Downing Street.

One of them delivered it to Downing Street. And he responded immediately. Among the help

that was attempted, Ian Fleming, of James Bond fame, plans an elaborate raid to capture

code books for Turing. Fortunately, it’s so elaborate, it’s better for a novel than–.

And it was cancelled. [laughter] The Navy collected code books for Turing from

sinking German ships and two young men lost their lives collecting code, code books for

Turing from sinking German ships. But breaking Enigma–. If we could actually have that one

back. I think. [pause] Whoop. Not there. Never mind. But eventually breaking the Enigma codes becomes

routine, like a factory at Bletchley Park. However, [pause] let’s go back to Turing if

we can. There we go. Thank you. But shortly after Germany attacks Russia in June of 1941,

the German army started using a vastly more complex code and word scrambling machine called

the ‘Lorenz’. These were ultra-secret codes and the Supreme

Command in Berlin relies on these new Lorenz codes to communicate high-level strategy to

high-level army commanders around Europe. And some of the messages were so important

that Hitler himself signed them. [reads from book] “And a group of Britain’s leading mathematicians

begin a year of desperate search. They used Bayes’ rule, logic, statistics, Boolean algebra,

and electronics. And they also began working, work on designing and building the first of

ten Colossi, the world’s first large scale electronic computers. Turing invented a highly Bayesian method known

as ‘Turingery’, or ‘Turingismus’. It was a paper and pencil method again. The first step

was to make a guess and assume, as Bayes had suggested, that it had a 50-50 percent chance

of being correct. Add more clues, some good and some bad. And as one of the decoders described it ‘with

patience, luck, a lot of rubbing out and a lot of cycling back and forth, the plain text

appeared.’ Now, the engineer who built the Colossi, Thomas

Flowers, had strict orders to have the second model of the Colossus ready by June 1, 1944.

He was not told why, but his team worked, he said, so hard until they thought their

eyeballs would drop out. They get it ready by June 1. And on June 5,

there’s a message that they receive from Hitler that he’s sending to his army commander in

Normandy named General Irwin Rommel. And Hitler tells Rommel, if there is an invasion of Normandy,

it will be a diversionary tactic and do nothing for five days. The real invasion will occur later, somewhere

else. The message is decrypted at Bletchley Park. A courier takes it to General Eisenhower

where he and his staff are determining when to launch the invasion of Normandy. A courier

gives the piece of paper to Eisenhower. We know this from Thomas Flowers, by the way.

He gives the paper to Eisenhower, who reads it. He cannot tell anything, even to his top

staff about Bletchley Park and the decoding efforts. He gives the paper back to the courier

and he turns to his staff and says, “We go tomorrow morning.” November 6, 1944. And later, Eisenhower says

that the decoding at Bletchley Park, Bletchley Park shortened the war in Europe by two years.

Now, a few days after Germany’s surrender in May of 1945, a year later, Churchill makes

a surprising and still shocking move. Everything that showed that decoding had helped

to win the Second World War was to be ultra-secret. No one could reveal anything about what they’d

done at Bletchley Park. And Turing, of course, could not be mentioned. And this Colossi were

to be destroyed. To be destroyed, cut into unidentifiable pieces,

except for the last two most complex ones. [crash] Bombing. [laughter] Except for the two most–, the latest ones,

the most complex ones, which Britain apparently used during the Cold War to decode Soviet

messages. But I think [sound of truck backing up] we have to think that without Churchill’s

orders, it might have been Great Britain that became the leader of the 20th Century computer

revolution. Now, after the war, Turing was working on

computers and other projects. No one knew what he had done at Bletchley Park, when two

spies flee from Britain [door slam] to Moscow. They had been spying for the Soviet Union

all during the war. And they fled in 1950. One was named Guy Burgess and he was an openly

homosexual graduate of Cambridge University. And US Intelligence told Britain that the

spies had been tipped off by another homosexual graduate of Cambridge, an art historian named

Anthony Blunt. And the government panicked and thought that it was probably a homosexual

spy ring of graduates from Cambridge. [laughter] The number of arrests for homosexual activity

spikes. And the day after Queen Elizabeth II is crowned, Alan Turing is arrested. He

is arrested for homosexual activity in the privacy of his home with a consenting adult.

He is found guilty. He’s sentenced to either prison or chemical castration. He chooses the estrogen injections. Over the

next year, he grows breasts. And the day after the 10th Anniversary of the Normandy Invasion

that he helped make possible, Alan Turing commits suicide. Anthony Blunt is later Knighted

and it’s 55 years before the British government, the Prime Minister in this case, apologizes

for its treatment of, of Turing. Yeah, thank you. Now, after the, the Second

World War, its wartime successes were totally classified. And Bayes’ rule also emerged from

the Second World War, even more suspect than before. The anti-Bayesians still focused on

Thomas Bayes’ starting guess, this outrageously subjective prior. And without any public proof that their method

worked, the Bayesians were stymied. When Jack Good, who had been Turing’s statistical assistant

during the war, gave a talk at the Royal Statistical Society about the theory of the method, no

mention of course of the application, the next speaker’s opening words were “after that

nonsense.” During Senator McCarthy’s witch hunt against

communists in the US Federal government, a Bayesian at the National Bureau of Standards

was called, only half-jokingly, “un-American and undermining the United States’ government.” At Harvard Business School, professors had

developed the very Bayesian decision trees that MBAs use. But they were called ‘Socialists’

and ‘so-called scientists’. And a Swiss visitor to Berkeley’s very Frequentist department,

statistics department in the 1950s, realized that it was “kind of dangerous” to apply these

Bayesian methods. Now, the Cold War military, of course, continued

to use and develop Bayes’ rule. So, they knew it worked, but it was secret. For example,

the 1950s wrestled with the problem of how do you predict the probability of something’s

happening if it’s never happened before. There had never been an accidental H-bomb

explosion. So, the Frequentists said “you couldn’t predict the probability of its ever

happening in the future.” So, a post-doc at Rand Corporation, Albert Madansky, used Bayes’

to warn that Curtis LeMay’s strategic air command–I think you know Curtis LeMay from

the movie ‘Dr. Strangelove’ –that his strategic air command could have produced at least 19

H-bomb accidents a year. And the Kennedy Administration eventually

added safeguards. But there were other secret Cold War projects, too. The National Security

Agency cryptographers used Bayes’ to decode Soviet messages. And an advisor to the National Security Agency

and to the Institute for Defense Analyses used it election nights to predict the winners

of a congressional and presidential elections for 20 years, but refused to let anyone say

that he was using Bayes’, apparently, to keep his connection to Bayesian cryptography totally

secret. And the US Navy used Bayes secretly to search for a missing hydrogen bomb in Spain,

for a nuclear submarine, Scorpion, which sank without a trace. And then, in a classified

story that’s told for the first time in “The Theory That Wouldn’t Die” how Bayes’ actually

caught a Russian submarine in the Mediterranean, and convinces the Navy. Now, as a result, during the years of the

Cold War, Bayes becomes a real flesh and blood story about a small group of maybe a hundred

of more believers struggling for legitimacy and acceptance. And for many years, they concentrated

on theory, trying to make probability and Bayes a respectable branch of mathematics. But many Bayesians of that generation remember

the exact moment when Bayes’ overarching logic descends on them and they talk about it like

an epiphany, where they’re converted. To them, Frequentism looked like a series of ad hoc

techniques, whereas Bayes’ theorem had what Einstein had called “the cosmic religious

feeling.” The reason, of course, was that it was concerned

with a very fundamental scientific issue. As David Speigelhalter told me, “It was basic.

A huge sway of scientists say you can’t use probability to express your lack of knowledge

or to describe one-time events that don’t have any frequency to it.” And many scientists find this rather disturbing

because it’s not a process of discovery. It’s more a process of interpretation. So, both

sides were proselytizing their methods as the one and only way to approach statistics.

It was, one statistician told me, a “food fight, a devastating food fight.” And it was one that didn’t subside until late

in the 20th Century. Both sides use these religious terms when a Bayesian was appointed

Chair of an English statistics department. Frequentists called him ‘a Jehovah’s Witness

elected Pope’. [laughter] He, in turn, asked how to encourage Bayes’.

He replied tartly, “Attend funerals.” The Frequentists reported in kind that if Bayesians

would only do what Thomas Bayes had done and publish after they were dead, we’d all be

saved a lot of trouble. [pause] The extraordinary fact though about Bayes’

during the Cold War is that although the military was using Bayes’ and the civilian Bayesians

were under attack, there were very few visible civilian applications. For example, it was

an MIT physicist who used Bayesian methods to do the first nuclear power plant safety

study 20 years after the industry started. He did it in 1973. And he predicted what actually

would happen at Three Mile Island. But he had the big, bad Bayes word hidden in the

appendix of Volume Three of the multi-volume Rasmussen Report. And the only big public

application of Bayes was one using the words in the Federalist Papers as data. Now, the Federalist Papers were a series of

essays that appeared in New York State newspapers to convince New York State voters to vote

for the Constitution. And some of them were anonymous. And Fred Mosteller of Harvard and

David Wallace of the University of Chicago launched a massive Bayesian study and concluded

and convinced everyone that all twelve of the anonymous Federalist Papers were written

by James Madison. But they also came to what they called an

“awesome statistical conclusion.” And that was, they said that ‘Thomas Bayes’ beginning

guess is controversial hated subjective prior was irrelevant if you have a lot of data to

update it with.’ The problem was that Mosteller had to organize an army of about a hundred

Harvard students to punch the data into MITs computer center and no one else was willing

to undertake such a mammoth organizational problem. By the 1980s though, another factor was working

against Bayes, too. And that was the computer revolution was flooding the modern world with

enormous amounts of data and with a lot of unknowns. And Laplace’s method had involved

integration of functions and it was hopelessly complex. So, it was beginning to look, even to many

Bayesians, as though Bayes was an old-fashioned, 18th Century theory crying for a computer

and software. But many academic statisticians thought computers were a copout. They’d started

out as abstract mathematicians. Most were Frequentists. They focused on small data sets, relatively

small data sets, with few unknowns. They didn’t need computers. Bayesians themselves during

this period also didn’t realize that the key to making Bayes useful in the workplace was

not more theory, but computational ease. Theorist Dennis Lindley, who’d been programming

his own computer since 1965 and actually regarded Bayes’ ideal for computers. He wrote me, “I

consider it a major mistake of my professional life not to have appreciated the need for

computing rather than mathematical analysis. I should have seen that Bayes’ enabled one

to compute numerical answers.” It was a particularly poignant case of the Canadian mathematician

Keith Hastings, who published in 1970 what should have been a real breakthrough paper,

what’s now called ‘The Hastings Metropolis Algorithm’, or simply ‘The Metropolis Algorithm’. He used Markov chains, Monte Carlo sampling

techniques. Published it. Got no reaction at all. A year later, drops out of research

and goes to teach at the University of British Columbia. And it was not until 20 years after

his, after his work when he was fully retired that he realized the importance of what he

had done. And Hastings told me with some anguish in

his voice that his work was ignored because quote “a lot of statisticians were not oriented

toward computing. Statisticians took these theoretical courses, cranked out theoretical

papers, and some of them wanted an exact answer, wanted exact answers, not estimates.” So, as a result, during the 1980s as computers

were pouring out this fascinating new data about pulsars and plate tectonics and evolutionary

biology and pollution in the environment, it was not analyzed often by, by statisticians.

It was analyzed by computer scientists, by engineers, physicists and biologists. And it would be imaging that would force the

issue because by the late ’70s, early 1980s, industrial automation, the military, and medical

diagnostics were producing blurry images from their ultrasound machine, PET scans, MRIs,

electron micrographs, telescopes, military aircraft, and infrared sensors. And there was Bobby Hunt, who in 1977, finally

suggested that Bayes’ could be used for image restoration. He had done it while working

on strategic weapons programs and digital image processing at Sandia and Los Alamos

National Labs in New Mexico. During this period, others were introducing

iterations and Monte Carlo. And, in 1974. in 1984, if I can get the next slide. Alan

Gelfand and Adrian Smith–. [pause] Sorry. It’s just a picture of Adrian Smith and Alan

Gelfand. Thank you. [pause] Who decide to start something new. And Gelfand, reading

around, discovers the iterations and Gibbs sampling. Adrian Smith on the left, at the University

of Nottingham at the time. Alan Gelfand at the right at University of Connecticut at

the time. And when Gelfand, the minute he saw the papers on iteration and Gibbs sampling,

he said all the pieces fell together. Bayes, Gibbs sampling, Monte Carlo, Markov chains,

iterations. And they wrote their watershed synthesis,

now called MCMC for Markov chain-Monte Carlo, very, very fast ’cause they were scared other

people would put the pieces together, too. But they also wrote it very carefully. They

used the word ‘Bayes’ only five times in 12 pages. And Gelfand told me later, “There was always

some concern about using the ‘B’ word, a natural defensiveness on the part of Bayesians in

terms of rocking the boat. We were always an oppressed minority trying to get some recognition.

And even if we thought we were doing it the right way, we were only a small component

of the statistical community and we didn’t have much outreach into the scientific community.” But Bayesians thought the paper was an epiphany

and the next ten years is, what I call, a frenzy of research, solving, calculating problems

that for two and a half centuries had only been dreams. Gelfand, of course, says that

they were lucky because the relatively inexpensive powerful workstations became available at

the same time. Then Smith’s student, a David Spiegelhalter,

came out with his BUGS software. BUGS standing for Bayesian Statistics Using Gibbs Sampling.

It was off the shelf software. He comes out with the first one in 1991 and it’s BUGS that

causes the biggest single jump in Bayesian popularity. And it sends Bayes out into this scientific

and the technological world, where outsiders from computer science, from physics, artificial

intelligence, refresh it, broaden it, secularize it, de-politicize it and it gets adopted almost

overnight. It was a modern paradigm shift for a very pragmatic age. It happened overnight, not because people

changed their minds about a philosophy of science, but because finally it worked. The

battle between Bayesians and Frequentists subsided. Researchers adopted–could adopt–the

method they thought fit their needs best. Prominent Frequentists moderated their positions.

Bradley Efron, a National Medal of Science recipient who had written a classic defense

of Frequentism, recently said, “I’ve always been a Bayesian.” [laughter] And someone else, who once called Bayes “the

crack-cocaine of statistics [laughter] –seductive, addictive, and ultimately destructive” began

recruiting Bayesian interns for Google. Thank you. [applause] Now I’d be happy to try to answer some questions,

but it would be very kind of you if you’d use the microphone so people could hear the

questions. And I wouldn’t have to maul them, summarizing them.>>MALE AUDIENCE MEMBER #1: So what was the

Neolithic discovery?>>Sharon Bertsch McGrayne: It was that the–.

It had happened all of a sudden, within a short time period five thousand, five hundred

years ago. And that instead of their being used for rites over the ages and built over

the ages, that they’d actually been built and abandoned in this very short time period. And they’re so excited by this Bayesian mega-analysis

that they’ve done that they’re gonna try another of the early Anglo-Saxon period in Great Britain

next. Yes.>>MALE AUDIENCE MEMBER #2: OK. So–.>>Sharon Bertsch McGrayne: Now remember I’m

not a mathematician. [laughter]>>MALE AUDIENCE MEMBER #2: So, my training

actually is mathematics and we have, our controversies are, we have what’s called the axiom of choice

which is about the behavior of infinite sets and the continuum hypothesis and all these

other sort of questioned axioms. And if you assume not the axiom of choice,

you get one mathematics. If you assume it, you get a different mathematics. But mathematicians

will all agree that these are just models. These are abstract ideas. It’s not that this

axiom is true or not true ’cause it’s formalisms. In the real world, you can’t even substantiate

an infinite set anyway. How come with the statistical community, ’cause in the math

community it was like, whether it’s useful. It’s like, you can extend math this way or

that way. Which one is more useful as a model? Why weren’t they able to say, “OK, Bayesianism

might not make sense in some platonic way.” Or it might, there’s selecting the priors,

which is more of an art than a science. But just based on the fact that it’s useful, why

weren’t people able to go with it? [pause]>>Sharon Bertsch McGrayne: It’s very odd because

a whole class of people, physicists, did use Bayes’. And they used it even before the Second

World War. Fermi used to do computations if he couldn’t sleep at night. He’d do Bayesian

computations. [laughter] And then he’d come in the next morning and

announce how the experiments of the day were probably gonna come out. Well, we’re not all

Fermis, but–. And during the war, Los Alamos used them. And after the war, the physicists actually

tried to get statisticians to, to use them. There were NATO conferences and so on. So,

it’s very puzzling. There were articles that told statisticians you can do it manually.

There were non-statisticians. There were historians in the ’60s using computers. I think they were trapped. They were so defensive.

They were such a small group and they were under attack. That’s not a good way to be,

mentally. You can’t burst out if you’re under– if your fortress is under attack. That’s the

only thing I’ve been able to conclude.>>MALE AUDIENCE MEMBER #3: So, you mentioned

in your book but not in the talk on the Lindley’s paradox and this research now in Princeton

about psychokinetic random number generators that, according to a Frequentist, there was

just as significant that people had random number generating abilities, but Bayesians

said that this is silly and the same study proves that it doesn’t exist. Would you go into that more? Like, the study

was cooked, then. Bad. Garbage in, garbage out. But was this real data? [pause]>>Sharon Bertsch McGrayne: Was it real data?

Well, I guess he had his machines recording it, but I think the statisticians don’t record

it, regard it as real data at all. The same question came up just recently of–come on,

what’s it called–ESP. And Frequentists’ analysis saying that ESP could work. It was something routed in the last year,

I think. So, it’s a long-standing problem, but Dennis Lindley I think cooked that guy’s

data and said it was garbage. Garbage in and garbage out. [laughter] Yes, there’s some papers about it and they’re

in the bibliography.>>MALE AUDIENCE MEMBER #4: To follow up on

that, there’s an article on this very subject in the latest issue of the Skeptical Inquirer.>>Sharon Bertsch McGrayne: Good. Good.>>MALE AUDIENCE MEMBER #4: I’ll try to find

my copy and bring it in. I just wanna remark about history. I’ve been in computing since

1964 and when I was at Cornell, there was a tremendous, there was actually–. Oh, my

phone’s going crazy. There were actual random walk problems in

the textbook, the programming textbook, and the social scientists were just generating

reports this big. I don’t exaggerate. I hauled them around, of computer analysis, factans,

and xtabs and all kinds of crazy stuff. So, I think there’s this whole background

of — that was going on underneath this controversy that was preparing us for it.>>Sharon Bertsch McGrayne: Howard Raiffa gives

his–. I was in a totally different kind of meeting and someone asked what I was working

on and I said Bayes’ rule. And he said, “Oh, Bayes’ rule? Howard Raiffa taught Bayes’ rule

at Harvard Business School in the 1960s.” And he dug out of his notes this reference,

too. It’s 60 or 80 pages on Bayes, Markov chains and so on. So, you’re absolutely right.

So it’s such a puzzle why–. [pause] I really think it must have been the fortress mentality

that — it’s just hard to break out of that. If there aren’t any more questions, I would

be happy to sign anyone’s books if they want me to.>>Male Presenter: Thank you very much. [applause]