Laughter is the Best Medicine

Denise Herzing: “Dolphin Communication: Cracking the Code” | Talks at Google

the founder and director of The Wild Dolphin
Project and spent, I think, 32 years now
observing dolphins in the wild. She and I have a
research project, joint between Georgia Tech
and The Wild Dolphin Project where we’re trying to really
look at two-way communication with wild dolphins. She’s going to give us an
overview today of her work. And towards the end, I’m going
to show you some of the stuff that you all can
get into, especially if you have some background
in machine learning. So thank you very much
for coming, Denise. Shall we? [APPLAUSE] DENISE HERZING: It’s
great to be here. I’m glad you’re all
here and listening. Like Thad said, I’m
going to share with you a little bit about our
general dolphin work, and how we work in the field
and how we analyze data. And then we’re going to talk
about our joint projects and see if we can
intrigue any of you into the machine learning
world of dolphin sounds. Every 20 years,
“National Geographic” covers our work in the field. And a couple of years ago,
they did another story about thinking like a dolphin. And our work was highlighted
very much because Thad and I had been working
on trying to crack the code of dolphin
communication in a couple of different ways. Now, the reason we really think
dolphins have some potential, they’re kind of like
primates in the water. On the upper-left,
you see an example of encephalization
quotient, which is the physical measure of
intelligence, in some ways. It’s a brain-to-body ratio. Humans have an EQ
of about seven. And dolphins are just
second to humans, even above the great apes. So we know they have
the physical structure for intelligence. They use tools in the wild. They’ve been known to understand
artificially-presented languages, comprehending
both syntax and semantics. And they recognize
themselves in mirrors, which is no small feat, apparently. But one of the big
questions still is, do they have language? Do any non-human
animals have language? My particular perspective
is that we really haven’t looked close enough. If you’re a biologist, you
tend to suspect that animals are doing a lot of
complex communication, we just haven’t cracked
the code because we haven’t had the tools. Now, collaboration
with Thad is based on two different strategies. The first is decoding the
dolphin’s natural sounds. So like he was
mentioning, we have a 32-year underwater acoustic
and behavioral data set, where we want to try to
correlate sounds and behaviors and look at some of the
really detailed categories of their sounds using an
anthropological framework. So knowing their society,
the individuals, because it’s a small resonant group. And the second way is to
interface with technology directly, creating some
kind of two-way interface where we can start exploring,
perhaps, a mutual language and maybe learning back
and forth from both these strategies. So first, I’m going to
tell you a little bit about the specific
dolphins that we work with and how we’re approaching the
dolphin signals themselves. A little bit about how
we analyze these signals, some of the challenges
working underwater with dolphin communication. Keeping in mind that we
use computers all the time as biologists, and it’s
a really good marriage and it’s a really
necessary tool. So dolphins have a lot
of different senses. Of course, they’re
great acousticians. But they also have pretty
good vision, especially as a species that
live in clear water. They have cross-modal abilities
between vision and sound. They have taste. They do not have smell. And they have touch. In the water,
tissue and water has about the same
acoustic impedance, so they can actually feel sound. So this becomes another
way they communicate. This is a summary of the history
of dolphin communication, using one of Gary
Larsen’s slides, showing that primarily
we’ve worked in captivity. We’ve tried to
look at acoustics. And believe it or
not, we’ve been looking for English,
which is a little scary, a little antiquated. So here, you have the
slide, and the scientists are hearing, “habla espanol?” and other Spanish phrases. And they have no idea
what they’re hearing, but they’re writing it down. [LAUGHTER] So we’ve progressed, at least. We want to look and see what the
dolphins themselves are doing, and how they might structure
a communication system for a complex aquatic society. So we basically correlate
video with sound. [VIDEO PLAYBACK] – Hi, I’m Cindy Rogers, and
I’m the research assistant at the Wild Dolphin Project. – Hi, and I’m Michelle Green. I’m a graduate student here. [END PLAYBACK] DENISE HERZING: It’s funny. I was thinking, when I
was in graduate school, one of my teachers
told me there were only a couple of people
in the world who could read a spectrogram
for human speech, right? This is how old I am. Now, of course, computers do it. But we use the basic
technique with dolphins. We take our video, we run the
audio track that correlates– [DOLPHIN SOUNDS] –with the behavior. This is some dolphin
aggression with a bite. [DOLPHIN SOUNDS] So it’s a busy place. There’s a lot of sounds. There’s a lot of
dolphins making sounds. We don’t always know who’s
making what sound, which is another issue. Our project is in the Bahamas. So here you see the
coast of Florida. Across the Gulf Stream are
these shallow sandbanks to the Bahamas and the
dolphins live there. It’s great underwater
visibility, which is why I chose to work there. Because I wanted to see what
these animals were doing underwater, not just
looking at the surface, which is the case in
most parts of the world. My nonprofit had a boat
donated in 1992, which is a great ocean-going vessel. So we live there for four
to five months a year. We do our work there, we
sleep there, we cook there, we analyze data out there. We go where the
dolphins are, basically. My main tool is underwater
video with a hydrophone, an underwater
microphone, of course, simply to correlate
sound and behavior and try to get as much
data with that equipment. Our project is
pretty noninvasive. We don’t grab or
tag the animals. We have about 300 individuals
tracked over the decades. Actually, now we’re on
our fourth generation. So we track that, as well. Now the species is actually
a quite convenient species to work with. They’re Atlantic
spotted dolphins and they don’t have any
spots when they’re born. So here you see a
mother, fully grown. She’s 35 years old here. And we use classic
ID techniques. We look at dorsal fin nicks and
notches, which most people use from surface work. But because we
work underwater, we can also look at
full body marks. Because there are
spots, we track their spots, which are kind of
like constellations of stars. So we do this ID
work every summer because they gain
spots with age. So we actually know all our
dolphins and most of which were born certain years. We kind of take
the broad overview, the anthropological framework. So we sex our dolphins because,
again, we’re in the water. And if you want a
little take home tidbit to tell your kids
or your significant others, you can tell them you now know
that to sex a female dolphin, you look at the mammary
slits on the underside. Because of course,
they nurse their young. We know their
reproductive status. They’re pregnant
for about a year. They’ll have three to four
years in between having calves. We know the females
associate with each other by reproductive status. So if some are pregnant,
they’ll hang out with other pregnant dolphins. And the males have
lifelong friendships that are formed when
they’re juveniles. And their job is not only
try to mate with females, but to protect the group
and try to scout out sharks and that sort of thing. So we understand the
society a fair amount, so have a lot of metadata around
our acoustic data, as well. And we do know some things
about dolphin sounds. We know they make
whistles, specifically signature whistles, which
are actually [? main. ?] [DOLPHIN SOUNDS] So these are long distance
communication signals that go three to five miles
in the water, for example. They make burst
pulses, which are– [DOLPHINS CHIRPING] –very funny sounding sounds. Very hard to categorize. But they’re very social. This is probably the most
common type of dolphin sound and the least studied. [DOLPHINS CLICKING] We have echolocation clicks,
which of course we know a lot about from Navy work. And then we have compressed
echolocation clicks which we call “buzzes,” and these are
social sounds, unlike clicks for navigation. [DOLPHINS BUZZING] And these buzzes are used
in courtship and discipline. So we have some
basic correlations about what sounds they
make with gross behaviors. But what we’d really like
to do is look at the greater complexity potential. Now, dolphin research
is always about 20, 25 years behind
terrestrial research. For example, we’ve
known since 1992 that vervet monkeys use
different kinds of alarm calls to communicate that
different kinds of predators are coming. So very specific, there’s eagle
coming or a leopard coming, so their kin can take
the appropriate action. Some fantastic work
with prairie dogs that really shows the complexity
of the kind of information that can be encoded in
a very short alarm call, including things like a human
with a yellow shirt on walking across a field with a gun,
or not a gun, et cetera, et cetera. So there’s a lot of things
these animals are doing. And you can imagine a scenario
where a dolphin would be pretty handy if you could communicate
whether the hammerhead shark was coming towards
you, or a tiger shark, or a great white shark. It might really determine
if you survive or not. Again, we suspect there’s
a lot of information there that we don’t know about. The big question with
dolphin sounds, at least– and all animal communication,
for that matter– are the signals
referential, meaning they refer to something,
they label something, like a name or word for an
object, or are they graded? Do the sounds run
into each other? Do they just show things like
increasing intensity of sound? If I’m going to talk
really loud or really fast, that would be kind of a graded
system versus a discreet word or referential signal. So this is one really big
debate in animal communication. And dolphins have
both types of signals. We know signature whistles,
for example, their names, are technically a
referential signal. They can call each
other and they can broadcast their own name. But is it language? And this is where we really fall
short in our tools to look at, is there more complexity
in the order, structure, potential grammar,
of their sounds? Now, dolphin communication
is also very complicated for the researcher because
they’re really good at what they do, acoustically. They have a pretty neat
system for creating sound and receiving sound. Probably the big message here
is they’re very directional, so they send out very
high frequency signals, up to 240 kilohertz. High frequency out this
way, low frequencies drop off to the side. And they receive the sound
through their lower jaw, through a fatty organ. They don’t have external
ears like we do. Same thing, they receive
sound directionally. And to make it more
complicated for the researcher, if you want to get
a head on signal and get the full high
frequency, the dolphins can internally steer
their sound 20 degrees off and you would have no physical
signal that they’re doing that. So it’s a little challenging
to get ahead on signal. So when you’re trying to
localize a vocalizer– and you see on the
right side of the screen here a four hydrophone array
that helps us triangulate who’s making sounds. And this technology,
actually, we’re just starting to incorporate
with some new equipment. So lots of times
we don’t even know who’s making the sound
in these big groups, so very challenging
for the data. Again, collecting ultrasounds
has been pretty recent, at least in our work. On the lower left
you see an example of the spectrogram, the whistle
that you’ve been hearing. And the full picture
is the ultrasound above about 110 kilohertz. And hopefully, you
can see on the far right there are
some harmonics that now start showing up
in this high frequency sound spectrogram. You see places where
the clicks go broadband. We always knew that, we just
didn’t have the equipment to record it. And you see some places where
the narrow-band signal wouldn’t even be recognized. There’s no information
on the narrow-band, but there’s information
in the high-band. Now, historically, we’ve used
a lot of different techniques, and whistles have been
fairly well studied. Primarily, and I hate to say it,
but they’re easiest to measure, right? We can measure contours, we’re
pretty good at visual pattern recognition. Here, you see examples of
four signature whistles from different dolphins. And you can tell
they’re quite different. You can imagine that dolphins
can tell that, as well. And we can measure
their basic parameters. Early on, we also
used neural nets. If we had enough samples to
train a computer to tell us if two signature whistles– the
red and the green, as you see– are really different
or the same. If the computer had problems
separating them or not separating them. And that would give us
a quantitative index of how different they were. For example, between a mother
and a calf or two juveniles So these are tools we have
used, very custom program tools. Now, these are signature
whistles from a mom and a calf. Gemini is the mom. [DOLPHIN SOUNDS] And Geo is her son. [DOLPHIN SOUNDS] And what we discovered when we
were doing our neural network is that some of these
signals were very messy. They were just sloppy
and we couldn’t track the contours of the whistle. So we had to throw them
out of our data sets for the neural network, which
was not really OK because we really wanted to measure
all the whistles of all the mother-calf pairs,
that sort of thing. Now, with some of
these new programs, it’ll actually extract these
patterns in a different way and allow us to
code these signals. So neural nets were
great for a while, but they didn’t get us very far. So the process we
really go through is, because we have underwater
video and we have sound, we use a program
called Observer, which is a behavioral
coding software for behavioral biologists. And we basically
take a time line and we will code
our body postures. For example, you might see a
head-to-head, a little peck rubbing, another head-to-head. Courtship, a little
courtship behavior, and another reunion,
or peck-to-peck. So we’ll code that
on a timeline. And then simultaneously,
we’ll take our sounds– and you saw a little bit
of this in the video– and we’ll walk through
our vocalizations. And we’ll code them
in very generic terms. So here, we might see a scream. Here, we have a
signature whistle. Now we have a buzz. Then we have some other
really messy kinds of sounds. We don’t know how
to categorize those. One of our main impetuses for
connecting with Thad’s group was to really help us get at
these categories of sound, specifically burst pulses, to
help categorize things that we human beings aren’t very
good at, apparently. Again, match this with
behavior, and then look for some structure and
order to really take a look at, are there really any
grammatical rules? Is there structure? Is anything akin to
language or partially akin to human phonemes? for example. Here are three spectrograms. Two are humans and
one is a dolphin. Take a quick guess
which is which. You guys are experts. I bet you’re all going
to get this right. The top one is not a dolphin. That’s a human speaking,
as is the bottom one. So the middle one are
dolphin burst pulse sounds. [DOLPHIN SOUNDS] And you can see they look
a little bit like phonemes. We’re not saying they are. But these are the
structures we’re used to looking at
and analyzing, right? I know my computer can
recognize my voice. But again, is it language? We don’t know. This is one way to
start looking at it. It’s not the only
way because you still have to understand the function
and meaning of “words,” if they’re there. But we really want to take
a look at it now that we have some potential tools. For example, one
thing that we started with, with the Georgia
Tech work, was just asking, is a whistle just one unit? That’s the way we’ve
always approached it as dolphin researchers. But in fact, could it
be parts of whistles that are recombined? And of course, this
is critical if you’re looking at language, right? We recombine phonemes
to make different words in different orders. So some of these tools
are being applied. Through Thad, there is
some dynamic time warping that’s been applied
with other researchers. But we’re pretty excited
about these tools. I’m going to let Thad jump in
here and talk about the tool. THAD STARNER: So this is
a tool called [? Ohura, ?] which is something we’ve
made at Georgia Tech. This is Daniel Kohlsdorf’s
PhD dissertation, for those of you who are
into machine learning, you can go look it up in Georgia
Tech’s dissertation database. In particular, we have
the spectrogram up here, we have about 40
features here that are learned features as being
distinct in the database. Basically looking
at convolutions, see where those features
are in the spectrogram. From those we try to look
for repeating time series, and that’s these patterns here. So you take in raw audio,
look for the areas that have dolphin
vocalization in them, try to find these features,
and then, from those features, look to see how they recur
in time and get a motif. Then you look to see where
all those motifs are in time and make sure they look
reasonable on the system. From that, we then
take this code book and try to explain as much
of the audio as we can. So you’re just trying to
basically reduce this thing to a series of strings. That data reduction really
helps us move much more quickly in trying to find
repeated patterns. And from these repeated
motifs, and labeling them, we start find things
that, it turns out, are regular expressions. So we just look at how
these things reoccur. And to our surprise,
regular expressions turn out to be one of the most
predictive things for behavior that we know that Denise
has observed in the wild. What we did is took
her 2012 field season, took the first half of
it, trained up a system. So this is unsupervised
learning where we got the motifs and got the regular expressions. And we tried bag of– for the machine learning
people now for a second– tried bag of words,
bag of grammars, and bag of regular expressions. It turns out, the
most predictive thing for the behaviors were bag
of regular expressions. So what we looked at
is the second half of her 2012 database and
looked at the behavior she tagged visually. So she actually saw
foraging behavior, she actually saw aggression,
mother-calf reunion, or play. Given that, we then
say OK, what sort of motifs, what sort
of regular expressions are affiliated with
these different types of visual behavior? Can we actually see a
correlation with them acoustically? The answer is yes. And the correlation
within class is very high. Correlation across
class is very low. This is excellent news
for us because it now starts telling us
that yes, there is some stuff here
that is discriminative in the acoustic environment. Now the question is, how can
you do this on a bigger scale? Doing this sort of
stuff takes days of computer power for a
relatively small data set. [DOLPHIN SCREECHES] DENISE HERZING: So these
are bottlenose dolphins now very coordinated and
synchronized in the sounds. And I’ll let you just
hear it for a second. [DOLPHIN SCREECHES] Two diads of male dolphins
basically competing. And you can hear them
synchronizing their sounds and see them orient
in a second here. [DOLPHIN SCREECHES] What are they saying? So what we basically
want the computer to do is to help us automate
categorizing and coding these sounds, as the
video section you just saw so we’re not manually
going through it all the time, and build a larger database. THAD STARNER: So the system
that we showed before from Daniel Kohlsdorf’s
dissertation is quite power hungry. It takes a lot of computations. It’s looking at a lot of the
spectrogram at the same time. And so it’s something that has
some problems being paralyzed, though we can do it– certainly
can do a lot with a bigger iron out there. But we’re hoping now to make
some things a little faster. So that’s what we’ll
be showing you next. So I’ll let you, again, actually
listen to this for a second. [DOLPHIN SOUNDS] DENISE HERZING: These are
spotted dolphins, now. This is aggression,
typical spotted aggression. [DOLPHIN SOUNDS] THAD STARNER: And now this is a
new tool, where we’re actually doing something similar
to the Shazam algorithm, for those who are
familiar with that. We’re looking at
the spectrogram as if it’s a computer
vision problem, looking for areas of interest. And then we’re starting to look
at these patterns through time. Now the problem
is before, we were looking at the distribution
of these things and how they change. Now we’re looking for something
very simple, which is just the majority of the
regions of interest, are they increasing in frequency
or decreasing in frequency? And that’s what
this is down here. And it turns out
that’s sufficient. Josh got an initial
annotation of what’s going on. So you can see here
that it’s labeling. The red dots are the features. This is the general,
main feature we’re using the cluster on. You see at the top we have
JE, then ED, EJD, EJE. And later on, you can see
the EJE as a combined thing. So at the top is basically
the individual motifs. The ones that are
combined together, those are starting to be the
regular expressions we’re seeing. And this is brand new stuff. This just came out this month. We’re trying to get this so
that Denise can sit there and say, OK, I’m going
to load in this database. And in, say, minutes,
as opposed to days, get back some initial
results and be able to figure out where to
throw more attention, more computational power on. Hopefully, we’re
going to continue to develop this and make this a
little bit more sophisticated. But it allows her to really
tune the parameters a bit and see where our best
bets are for future work. DENISE HERZING:
For example, you’re going to hear
synchronized squawks. [DOLPHINS SQUAWKING] These are adult dolphins. Now you’re going
to hear juveniles. [DOLPHINS SQUAWKING] They’re trying to
synchronize their squawks. This is something
they learn over time. And we’re hoping these
tools will help us pull out of our data the specific
categories in order of sound, so we can differentiate and look
at the process, for example, of development of
vocal use in dolphins. Now, the second way we’ve been
interfacing as researchers is to look at,
can we incorporate some underwater computer
technology, real time, to try to look at
dolphin communication? And we ask the question, is it
possible to directly, real-time communicate with
another species? Well, in the
Bahamas, we actually have two species of dolphins. Bottlenose dolphins
you see here. They are about three feet
larger than the spotteds. But they interact
on a regular basis. They forge together sometimes. They babysit each other’s young,
inter-species babysitting. There’s some dominance
dynamics between the males of both species, but they’ll
form temporary alliances when we get another
intruder in, like a shark or an unknown
bottlenose, for example. So they’re already
communicating with each other to a certain extent. We’ve been trying to look
at that to pick apart how much they understand,
perhaps, of each other’s sounds and/or behavior. But if we look at
the rest of nature, other species are actually
decoding each other already. We just aren’t in that circle. So we have sentinel
warning calls that are used between species
with birds and monkeys. Now, with cetaceans–
with whales and dolphins– we have a couple
interesting examples where they’re not just
listening and taking advantage of knowing an alarm call
from another species. When they get together
with each other, they actually create a short,
it’s not a mutual language, but they use mutual signals. So it appears to be easier,
at least for whale and dolphin species, to agree on some sounds
they use when they’re together versus their own
communication system, which is pretty interesting. I always thought maybe the way
to go is to just decode them. But if you’re trying
to communicate with another species, maybe
you have to spend the time. And maybe it’s not
culturally possible to learn the intricacies of a non-human
animal communication system. So that’s the
approach we’re taking. What started happening in the
Bahamas, we work non-invasively and we try to do
behavioral observations, but the dolphins are
often interactive with us because we’re right
in the water with them, we’re close, we know them. And they started doing things
like mimicking our postures. [LAUGHTER] Goofy things like, oh, gosh. One time we had– I’ll never forget this– a swimmer who was
trying to do a dolphin kick, you know the
dolphin kick in the water? And then the dolphin was
behind a swimmer going– [LAUGHTER] Just spazzing out because he
was not a good dolphin kicker. So they have some
funny sense of mimicry because that’s what they do
with each other all the time. And sometimes they would,
what I would call try to acoustically mimic us, too. And we kept thinking, God, maybe
we should really create a tool. I mean, it’s a pretty unique
situation in the world. These dolphins have
time, they’re safe, they’re pretty friendly, and
they seem interested in humans. Now, we have plenty of other
examples, scientifically, of interfaces that have been
created with other species. One of the best
known, of course, is the work with Kanzi
by Sue Savage-Rumbaugh, who, over many decades and
many different types of work, designed a keyboard so
Kanzi could interface. And they could go
out in the woods and talk about things and
share ideas and thoughts. Probably the most
technically advanced dolphin-human interface actually
happened at the Epcot Center in Orlando, Florida. It was actually an
underwater keyboard that was created so dolphins
and humans could both use it. You basically either stuck
your hand, if you were human, or your beak, if you were a
dolphin, into a hole which broke an infrared beam which
triggered a word in English or a whistle in dolphin. And then the dolphin could go
and say, let’s go to the reef and get a fish, or the
human could do that. But it turns out we humans
were really slow in the water, so the dolphins would make
a command for the humans and go over and they’d wait
and they’d wait for the humans to get there. [LAUGHTER] So part of the
problem is they’re so fast, acoustically and
physically, in the water. So they had a pretty good
results, no big breakthroughs, but it was really
the first example. And there have been,
historically, other people that have developed
keyboards for dolphins, but this is really the first
true underwater keyboard. So we worked with them
for a while on our ideas. And our original
vision was, well, we’re out there on the boat,
we have computers on the boat. Let’s put a keyboard
under the water. We’ll have the diver
work with the dolphins, explore the keyboard together. Well, we found
out pretty quickly that the dolphins have a lot
better things to do in the wild than sit at the boat
and press a keyboard. A fish goes by,
they’re out of there. So we quickly created
a portable keyboard we could push through the
water, because we’re often swimming hard to keep up with
them to see what they’re doing. And it was a visual keyboard
with a little acoustic sound that was correlated
with a specific signal. I think this should play. This is the whistle for scarf. [WHISTLE SOUND] There it is. So the idea was, we labeled four
simple objects with whistles that they could mimic
but that were outside of their normal repertoire. Because we didn’t
want to say something inappropriate because we
didn’t know their language. [LAUGHTER] Like, Your mother does
something with blowfish. I don’t know. Or pufferfish. They do, actually. They get high chewing on
pufferfish, did you know that? [LAUGHTER] THAD STARNER: Really? DENISE HERZING: They do. Yeah. There was some great footage on
the internet not too long ago. Yeah, they pass
around the pufferfish and they have the–
you know the fugu, the drug from the liver
of the pufferfish? So every animal has– THAD STARNER: So poison. They get high off the poison. Really? DENISE HERZING: Yeah,
just enough to get high. [LAUGHTER] THAD STARNER: Things
you never tell me. DENISE HERZING: Sorry,
I just thought of it. I was trying to make a joke. Anyway, sorry about that. We labeled these
four objects– scarf, rope– things that we, as
humans, bring in the water and they like to play
with and drag around. Sargassum, which is a seaweed
they play around with. And then a bow ride, which
is really high motivation because they like bow rides. So we had this system. It was pretty archaic,
this was in the late ’90s. And they were interested,
but they couldn’t really trigger the system
with their sound and we couldn’t do any real
time sound recognition, so it was really
not good enough. So we said, let’s just wait. We’ll find some computer
geniuses to help us out. So Thad actually came
to my lab originally to try to decode the
dolphin natural sounds using some of his tools. THAD STARNER: And then I
made the mistake of saying, oh, we can make your
keyboard for you. It’ll only take a semester. Five years later– [LAUGHTER] –we actually have a
system that we field. We’re doing this
every summer now. Our most modern one is a stereo,
192-kilohertz sampling system. We can produce these
whistles in the water. We can also listen
for them, as well. We’re finally in the stage
where we have GPUs in this thing so that we can do
convolutional filters and start picking up stuff at
any frequency they do it at. One of the biggest problems with
this box is, believe or not, the speaker. It turns out it’s very hard to
get enough power in the water to actually communicate. Whenever you see
the Navy do this, they have a hydrophone in the
water with a big amplifier hooked up to a big generator and
they’re pushing 200 watts in. We can’t exactly put 200 watts
into a chest mounted box. So we’ve been making
our own custom speakers. Yes, I am a speaker
manufacturer. You don’t want my speakers,
though, because they really do not have very good response. But they’re very good
at certain narrow bands. And we can send
out these things. While the dolphins can
communicate at three miles, we’re working on our
signals to work at 60 feet. So now that we’re
actually successful, we can actually get
these things in the water with enough computational
power and enough sound power to actually do two-way
communication work. And I’ll let Denise
tell you what we’re trying to do with it. DENISE HERZING: So the system,
basically, as you see it– there’s the box with the
computer, two hydrophones to receive the sound,
speakers to play the sound, and then the
operator has a keypad with the pre-programmed
sounds in it. Here you see pictures of
us getting our boxes on to go in the water. We actually practice in
the water with the boxes without the dolphins
because we want to be ready for how
they try to mess us up when we’re trying to
do things with them. And the whole system
here is designed on modeling that
communication system. Because the only thing that
has worked with other species– primates, Irene Pepperberg’s
work with birds– is the idea that you want
to show the species how to use the system. It’s like how your kids are
exposed to human communication as they’re growing up, right? They see you going to the
refrigerator for milk, you talk about milk,
you offer them milk. It’s the same idea. You don’t throw the species
in there and go, OK, perform. Do what we want you to do. So if my colleague, Adam,
wants to get the scarf from me in the water, he has
to use this keyboard. He has to ask for scarf, he
has to come over and get it, and then we exchange it. So we practice that. We usually put a third
person in to be a dolphin and try to mess us up and
ask for things we don’t have and that sort of thing. So we learn how to
respond appropriately. And then when the
dolphins come around, we have small time frames– I call them windows
of opportunity– where they aren’t doing
their normal behavior, they’re there to play,
they’re ready to engage. And we just try to tap
into those moments. And we can be in the water
an hour, two hours sometimes with them, where we’re
exchanging these toys and playing sound and
having them see the system. So the idea is diver A and
diver B have a box on it with a computer. Diver A can play
the scarf whistle so that diver B is going
to hear it as a word. We wear these
bone-conducting earphones. So diver B would
hear it as a scarf. The dolphin would
hear it as a whistle. Or diver B could play a
sound, a sargassum whistle. Same thing. Diver A hears it
as an English word so they’re sure what went out. The dolphin will hear it
as a different whistle. And then the hope
is that the dolphins will mimic the whistle. And then diver A
and diver B would be notified in the earphone
that that whistle was made, sargassum or scarf. And then the appropriate toy
would be given to the dolphin and we’d swim off
into the sunset together, playing our toys. So it’s a simple
system, but dolphins can mimic sounds really well. But to get them to understand
the function of a sound is a whole other level. So it’s all about exposure,
repeating the sound to them, showing them how it’s used,
so that if they want the toy, they could ask us for the toy. For example, in the system we
have our own signature whistles made, so we have names. So I have a Denise
whistle, for example. We also have some of
the dolphin’s names in the computer. Because we have a very
small subset of dolphins that work with us
on a regular basis. So we have their signature
whistles in the computer so if they come
up, we can actually greet them with
their name, which is what they do with each other. And then we can engage
in this behavior. So it’s a potential tool. This is actually some
underwater video of the work. [WATER SLOSHING] DENISE HERZING: So there’s
two people in the water, both with boxes. I’ve got the yellow scarf. THAD STARNER: They
can’t hear you when it– DENISE HERZING: Yeah, sorry. I’ve got the yellow scarf. [DOLPHIN SOUNDS] We’ve just labeled
it for the dolphins. [DOLPHIN SOUNDS] I’ve just asked for the
scarf using the whistle. Then he gives me the scarf. Now I’m going to offer
it to the dolphins. [DOLPHIN SOUNDS] I’m going to label
it as I offer it. [DOLPHIN SOUNDS] And then we try to
get the scarf back. [LAUGHTER] Which we don’t always. THAD STARNER: She
loses a lot of scarves. DENISE HERZING: But
it’s interesting. Just last year– we try to
ask for the scarf back, too. And every once in a
while, they’ll drop it and so we’re hopeful that
they’re maybe understanding that I’ve asked for the scarf,
not sargassum or the rope, et cetera. So this year, we’re going
to have the new computers. The great thing about
this next summer is that we finally have data
for what the dolphins have been trying to do with us. For example, they’ve been trying
to mimic these whistles in ways that the computer wasn’t
set up to recognize. For example, they’ll put little
additions to our whistle on it. Or they’re doing some
high frequency stuff that the computer’s
not recognizing. But now we’re going to– THAD STARNER: One of
our biggest problems was that we were sampling
at 22 kilohertz at first. Because normally we’re
between 7 and 10. But it turns out that when
we went back and looked, there was some
evidence of mimicry at double the frequency. And we just simply didn’t have
a computer board fast enough to deal with it. And it was not until this
year that we actually have a GPU-based system
where we can actually look for the same pattern no
matter what frequency it’s at. We know that they can produce
whistles much higher than we were looking at before. But now we should
be able to cover the whole range of whistles. Not the whole range
of burst pulse, but the whole range
of whistles, and be able to process
that in real time. DENISE HERZING: And to know
what they’re trying to do is really helpful. So now we can react
more in human time. So the whole idea
of this system is to empower the dolphins to
communicate with us, so we’re not just giving them
orders or commands, and to see how far it’ll go. And the hope is that,
with the decoding software of their natural sounds, we
can eventually figure out some motifs or patterns. We can loop back into
the two-way system and see if we can use
more of their signals in the system than our
artificially-created whistles. So breaking the
communication barrier of humans and other animals– it’s a pretty
interesting concept. We haven’t really tried
it on a grand scale. Can machine learning help? I’m pretty convinced,
after working with dolphins and these sounds
for 30-plus years, that you guys our only hope. We really need
these power tools. And we need the
expertise that you’ve all done in your various fields for
just recognizing human speech, in many ways. And I really think that’s
going to help us really crack the code with a lot
of other animals, not just dolphins. But the tools will
really be applicable, I think, across disciplines. THAD STARNER: And that’s
one of the things that– those of you who do machine
learning, part of the problem is just getting good data sets. Well, Denise has 32 years of it. We’re slowly getting it to the
stage where it’s cleaned enough that machine learning techniques
can really be used on it. Really understanding
it well enough that we can figure out
when there is dolphin vocalization when there is not. We really think that these
machine learning techniques can help, which we’ve
done a first pass at, whihc is really
showing some promise. However, one of the big
problems is simply resources. We need more smart people
to look at this data set and to try different methods. The method that we’re
using, those of you who are familiar with the
machine learning approaches here, it’s sort of a
different take on it. And so what we really want to
do is get a little team together and see if we can get
some progress here in the next couple of years. DENISE HERZING: Yeah. So we’re trying to put
working groups together. If you want to help
support our regular work, please go to our website, We have memberships, we
have the chat society. THAD STARNER: And there’s
also some small projects out there at Inner
Speech, for those of you who go to that conference. Some specializes in
animal-computer interaction. There is also an animal-computer
interaction conference that’s now part of ACM
that has a yearly meeting. It’s now starting to get a small
coterie of researchers that are not just looking at
marine mammals, but also birds and primates and all
sorts of animals, seeing if we can come upon some
common tools and common methods to go forward and see
if we really can crack this code of what
these animals are doing with their communication. DENISE HERZING: Great. THAD STARNER: So
thank you very much. I’ll run around
with a microphone and take any questions. Thank you. [APPLAUSE] AUDIENCE: Thank you, that
was really interesting. Do you have any idea what
the dolphins think of you? Are they there for play? What are they maybe
trying to communicate? Do you have any idea? THAD STARNER: It’s a
really unique situation because they have, I
won’t say an easy life, but they have a
well-resourced life. So they have fish, they don’t
have a lot of predators, there’s not a lot
of human impact. So they have time to
play with each other and apparently humans,
when we’re there. I don’t know. I don’t know what
they think of us. I think they think we’re
funny in the water. I think they think
we provide bow rides and some
interesting entertainment. But I really think intelligence
seeks intelligence. And I think they
really recognize another species of
interest, like they do with the bottlenose who are
out there and other species. THAD STARNER: I think
they make fun of me. DENISE HERZING: Well,
that’s for sure. THAD STARNER: I am not
a very strong swimmer. One day I was out
there with Denise and got some water in my mask,
so I was doing mask clearing. I was just floundering
around and they all ignored her and
came and circled me and just all stood the same
way I’m standing in the water going, are you dying? Can we watch? [LAUGHTER] DENISE HERZING: I’m surprised
they didn’t drag you back to the boat. We’ve actually had them
help a struggling swimmer before, like escort
them back to the boat because they recognize that. THAD STARNER: There’s
definitely a sense of play. DENISE HERZING: Yeah. So we’ll see. I’d like to know. AUDIENCE: I want
to thank you very much for the wonderful talk,
a very interesting project. I have a question for you
and also a question for Thad. You said your research
does not require attaching any physical
things to the dolphins. So when you acquire
the data, you are always at some close
proximity to the dolphins. So how do you know that the data
that you have is all they’re saying? And also a question to Thad. The approach that
you used to cluster all of these [INAUDIBLE]
expressions, one idea would be to just run
it on a human voice and see whether it
can find out patterns in what people are saying. THAD STARNER: Want to go first? DENISE HERZING: Yeah, sure. AUDIENCE: Thank you. DENISE HERZING: Yeah. We’re in the water,
we’re in close proximity. It’s pretty hard to
build a dolphin blind. And we’ve tried sticking
a camera over a boat. But we feel that if they
are habituated to us and we’re not grabbing
them and poking them, they basically start
behaving and we try to record their behavior. Now, it’s most likely that we’re
not recording everything they say, absolutely. And we’ve toyed
with doing things like putting passive
acoustic devices down on the sea floor, which
is only 20, 40 feet deep, just to record when
we’re not there. But they move around a lot
so it’s a little challenging. We’ve thought about
building some kind of drone. I’d love to do an
underwater remora that could attach to them and
just go with them when we can’t and record sounds. You can do that kind
of stuff now, right? So that’d be pretty cool. So yeah I’m sure
there’s things they’re saying that we haven’t heard. THAD STARNER: And the
sound is directional, too. They can actually select one
person, like in this audience, and the other people
can’t hear it. So they can be very directional
with their high frequency stuff. So, yes, we’re missing stuff. We know we’re missing stuff. Hopefully, this two-way work– and you can hear
them in the water, too, because you get
lower frequency stuff. So you kind of know when
they’re aiming at you. But yeah, it’s a problem. We’re just hoping to
get enough data that we can deal with that issue. Now, as far as running it
on human language, we have. We ran it on sign
language, actually. And a data set that I made
as part of my master’s thesis back in the mid ’90s. It actually finds half those
signs from the video features. We’ve done it more recently with
connect data for sign language, which does the same thing. We’re doing it on
synthetic speech. We’re doing it on MIT’s
OpenCourseWare lectures, the TIMIT data set, OCR. A lot of things where we
know what the answer is. And it performs really well. We have an exercise
database too, where it’s just
exercise routines and it discovers all
six of the exercises. And the automatic
recognizers get 97% accuracy. So the algorithms are doing
well on these well-constrained problems. But this is the
aspirational work. Can we actually make
it work on something we don’t know the answer to? Can it work well enough to
help Denise get somewhere on this data set? Because we don’t to be perfect. We don’t even need
to be that good, we just0 have to accelerate
what she’s doing. That’s a hard thing. AUDIENCE: So if I understood
the correlation that was being done between
alphabet and patterns, or features
[? being ?] structured, it sounded like we were trying
to take the English language approach and equate pieces
of sounds to the alphabet. I’m wondering whether
there’s any room to take the Chinese
language philosophy and take a character to a
whole phrase or a meaning or a sentence and use that. Especially when it comes to– could there be portions
of the language– and this is part of the second question–
that are actually singing? Nothing to do with
communication, just plain singing, cooing,
comfort noises, things like that. Or just singing in rhythmic
tones, hypnotic tones. That’s the second question. THAD STARNER: So
hopefully this algorithm would get some of
that, both the issues. There’s a lot of work on zebra
finch, that sort of thing, as far as just bird song. When we’re doing
human speech, we get things like
coughs and sneezes and just the microphone
against the person’s lapel. And a lot of that stuff
is actually useful, because you can get
rid of the noise. You characterize the types
of noise, you get rid of it. Anybody who does
speech recognition tells you that’s important. As far as how much of
these individual units mean, is it a composition or
is it wholly on their own? Don’t know. DENISE HERZING: But regarding
that whistle question, so your question about
Chinese Mandarin where you have the different
inflections of the tonal. So that would be something
that the system could pick up, as far as if it
was a combination, the tail end went up,
or went down, et cetera. Now, decades ago, people thought
about this issue and they looked at whistle languages
in humans in the mountainous areas– for example, the
Pyrenees and a few other areas in the world– to look at how much content
goes between those mountains when one guy is whistling
to another and coming home. Some of it ended up
being contextual. They knew it was Joe over
here and Fred was whistling. And they knew it was meal
time so he was probably going to go get some bread
and get some food for dinner, whatever. That has been speculated on. But part of the trick here
is that whistles are a really small part of their
communication and these other kinds of sounds, which
are the really hard ones to categorize–
the burst pulses, that are more like
phonemes, really– they are the ones
that we’re really lacking and understanding
of their function, their categories, and all that. So that’s part of
why we started, because people have really
gone through the whistle stuff a lot. THAD STARNER: And
the thing is the– Con Slobodchikoff’s
work on prairie dogs is really quite astonishing. In 200 milliseconds,
0.2 seconds, these prairie dogs are
encoding what type of predators they are. You can actually
watch this yourself. If it’s a hawk and
they give a long call, everybody goes in the burrow. If it’s a snake,
they’ll sit there and stand up and look at it. So Con has shown
that they can encode, in this 0.2 seconds, the shape
and size of the object that’s impinging upon their territory. And they do this in
these higher harmonics. If these prairie dogs can do
it, what can these dolphins do? One of the grant
proposals we put together is trying to take all the stuff
that he painfully put together over 20 years, all these
experiments and all this data he’s collected. Can we actually take
a look at his data, where we know what the
answer is and rediscover it using our techniques? We haven’t gotten there yet. That was part of
a grant proposal that’s being
evaluated right now. But we’re hoping
to not only just do Denise’s work, but a lot
of these other animal vocalizations specialists, see
if we can get their databases and see if we can
accelerate stuff for them. And for the ones we
know the answers to, see if it actually
pulls them out, as well. AUDIENCE: You mentioned,
in the “Dolphin Diaries,” that very occasionally
Atlantic spotted dolphin allows you to babysit their calves. You also mentioned it’s
not really a great idea. [LAUGHTER] DENISE HERZING: Oh, that
they’ll give us the– AUDIENCE: Yeah. So I don’t know if any
incident has happened and what you plan to
do if there is a shark attack in that situation? DENISE HERZING: What
he’s referring to is– I describe in my book,
because I’ve grown up with a lot of these dolphins. They’re grandmothers now, but
I knew them when they were born and I know their
grandkids and all that. And so occasionally,
they’ll come to the boat. And we’ll be anchored and we’ll
be in the water with them. And the moms will leave
their calves with us and they’ll take off. [LAUGHTER] You go, that might not be
a very smart thing to do. But they figure, well,
I got to go forage, so take care of
Junior for a while. And we’re like, yeah, right. What are we going to
do if a shark shows up? Block him? But they don’t go far. And it’s not like if
the calf made a noise, they wouldn’t come
swimming and running. But what I find
interesting about that is that it’s kind of like
they are incorporating you in their society. They’re smart enough to not
put their calf in danger. At least, I hope so,
because that wouldn’t be very survival-driven. But it is interesting. And that’s why,
I think, the link is there to really go further. It’s just that we have to bridge
the gap with our technology. I think that’s the trick. AUDIENCE: I’m wondering, since
you’ve been observing dolphins for such a long time, do you
observe some young dolphin born without such a full pattern
of dolphin vocabulary, then gradually
they learned this? DENISE HERZING: Watching
young dolphins develop has actually been one of the
biggest things for our work. Because when I first
went out there, I’d see the adults fighting
or the adults mating. And so you get a sense of,
oh, that’s what they do. That’s the ritual. But when you see the
little guys, they’re trying and they’re fumbling,
and they’re learning and they’re getting
feedback from their cohort. So yeah, absolutely. They’re learning social rules. They’re learning
sound restrictions, I’m sure, when to make
sound, when to not. We’ve had a couple young calves
that were real squeakers. We’d hear them coming
from miles away. We could tell who it was. Little Frida– [DOLPHIN SQUEAK]. They’d be like, shut
up, you’re shark bait. Stop it. And sure enough,
two months later, she shows up with a big
chunk out of her fluke. Someone bit her. And then it happened again
and it’s like, come on, you have to quiet down. And finally, her mom,
I guess, convinced her to shut up and not be so vocal. But yeah, they absolutely go
through the learning process. They don’t always
learn sufficiently and they do die
and get in trouble. But yeah, absolutely. And other researchers have taken
dolphin vocalization databases and applied information
theory to them and looked at, do little dolphins babble,
like little human kids do, and then gradually
refine their sounds? And, in fact, they do. The information is constrained
and given feedback. So yeah, absolutely. It’s a process. AUDIENCE: I’m just
wondering if you have any sense of the upper
limit of the frequency dolphins are able to hear at. And also, are there
any differences in the way a dolphin might hear
relative to how a human hears and do you capture that? DENISE HERZING:
Dolphins have been tested, at least some species,
and had audio-grams made. It looks like they’re
hearing has a couple of different sensitivity peaks. So they hear best around
30 to 40 kilohertz. And then they have another by
bimodal peak at about a 100, 120 kilohertz. Now, researchers haven’t
measured much higher than that because it looks
like it drops around 120. But on the recording side, just
recently in the last few years, people have recorded sounds
up to about 240 kilohertz. For some reason,
they’re making sounds higher than we
think they can hear, so that’s a question mark. Yeah. Of course, their ears
are completely different. They have mammalian ears. They have certain
specializations. Of course, a lot of high
frequency inner hair cells and a lot of turns in
the cochlea and stuff. Yeah. So they’re adapted to their
environment, certainly. They can produce
really loud sounds, like 220 decibels,
which, underwater, is like a blasting cap. It’s a lot of energy
because of the medium. And they do crazy things with
their sounds, which I could get into on some crazy detail. They can phase out
signals to block certain parts of the bandwidth. So there is some pretty
complicated stuff going on there, which we’re
just starting to get at. THAD STARNER: We’re seeing some
crazy stuff with the chat work that we don’t understand yet. It’s like, was that
intentional or not? Because some of the
stuff they’re doing, acoustically, is really wacky. [APPLAUSE]

34 thoughts on “Denise Herzing: “Dolphin Communication: Cracking the Code” | Talks at Google

  1. No mention of Wavelet or Hilbert-Huang transform? Seems like it would be better to do ML or NN analysis after moving beyond spectrograms.. should be totally possible to do a realtime transform mapped onto a spiral using an off the shelf fpga or even cuda card

  2. Maybe these groups could assist in the research?

  3. Dolphin Study for over 50 years has 'revealed' nothing to prove language used. It is and has been DEBUNKED to be untrue…National Geographic agrees as does the U S Navy.. this is a scam proven untrue after all this time by science…these people are rip-offs..

  4. Humans are so far up their own assholes they can only understand other species through their own human-made definitions. So with that being said, dolphins do not have a human language. Human languages are conceptual and symbolic at their roots: i.e. you have to be socialized in a language to understand the referential symbol attached to an arbitrary sound. What if language could transcend human words? How would a complex language without words work? This is the blinder stopping us from cracking the code. Dolphins are also producing higher frequency sounds we cannot hear. We are not processing all the info being sent in their sounds, nor are we taking into account their sonar capabilities MAY allow them to communicate like a projector: sending and downloading information via sound that they process in their minds as images. Why would you need clumsy words with hardware like that? Open your minds to the possibilities of non-human language.

  5. Dolphins are not the only animal that have their own language. Every animal to insect to arachnids and so on have their own language. Dolphins are social creatures, but so are all species to themselves. Dolphins can also be social with humans. To study them for so many years and to conclude that dolphins have a sophisticated language all their own is moronic to me. Of course they do! As do all animals. To think otherwise would be strange to me. We are all sentient beings.

  6. no offense but the google glass really is an eye sore, despite how much importance it could offer.

  7. so human language is iconic (if you make up words, a khenz is more likely to be a pointy object than a soome) and some birds and monkeys, but what about dolphoins? are objects that are more transparent to echolocation more ephemeral sounds than those that are solid?

  8. if you put the machines on in real time and hear the words, do you also modulate the HF sounds to human range frequencys? would that even help to hear in real time?

  9. Thank you for sharing this deep dive! Just added it to our "Dolphin Talk" science playlist. 🐬

  10. Do dolphins use vocalizations to convey information? Names? The answer to those questions is YES. That's called LANGUAGE.

  11. What do you expect to find there? Maybe these dolphin society members posess intellect of some cavemen? So the question is how big their abstract capabilties are? If at some point we could understand their language – would they e.g. be able to understand quantum mechanics (well even not even all humans understand this 😉 ) or how engines work or just do some math – how big are their capabilities of logical reasoning? They pass the mirror test – okay, that's impressing. But is there more? Is there any evidence that they are able to do much more abstract things than just context dependent – like playing or catching fish or parental stuff and protecting group. In the beginning of the video it was said, that some of the dolphins can use sponges to catch fish. But in the end it sound like that they merely in the development state as humans were when they started to use tools to catch animals and lived in caves. Some possible features like sending images from one dolphine to another would be of course interesting, but actually irrelevant. It gives you actually nothing – if you see e.g. infrared or can smell as much as dogs can, you won't get smarter from that, it's just additional biological feature. So what can they do with their brains and their language? Are they theoretically capable in doing such things as we do?

  12. I saw this woman’s presentation in 2013. This one is from 2017 and it is essentially the same. So no real progress in 4 years??

  13. Dolphin brains see sound. Literally. They hear see. I think we aren't grasping this. We are hear see blind. I think they are projecting thought images in each other's minds with sound we just aren't going to decifer into a language like we speak. I imagine a more cymatics style 3D visual language they are communicating with. Look at dolphin brains next to our brains.
    EXACTLY! We have to bridge the gap. Well said!

  14. If dolphins use "signature sounds" as names, and any dolphin in the group can call that dolphin by name (once learned) — Then that is a huge clue that they are intelligent, self-aware, aware of others, and have other words and therefore phonemes . A name shows a whole set of intelligent understanding of the world and their own kind. So the trick seems to be to figure out other simple words for nouns and verbs and other types of words.

  15. "Two pairs of male diads" — So, swim buddies? Friends? Guards on duty? Same-sex partners? Brothers or cousins or brothers-in-law? — Do they pair up from anyone among the group? Or is it typically the same pair? Do the female dolphins also pair up? Do you get male-female pairs for this? Is there a sex-based division of tasks in the group, or what degree of unisex (gender-neutral) versus gender role task division? So many questions. This must be fascinating work to do.

  16. Super interesting talk! I work with machine learning as well and this could be a way to make progress in this area even though it takes a lot of energy to collect data for your specific goal, maybe we can decode the first alien language with our “new” tools. I really hope you make good progress! Best luck! Ps. Have you collected sounds from captive dolphins? Or other tribes. Do the sounds match?

  17. Is there a study of communication between aquatic mammals and horses, elephants, apes and other terrestrial mammals? There is very little information on this . I would like to see gorilla and/or orangutan interactions with whales and dolphins. There is so much to explore!

  18. I hear some serious smack talk. Even without the computer, I feel like the tone of the communication is pretty clear. Amazing work!

  19. So the dolphins took the researcher’s words, repeated them beyond what our best machines can interpret, added ornamentation like agglutination to the words… And when exactly are we going to admit this is language? The dolphins must see us as monotone morons compared to their complexity and range.

  20. Has anyone taken advantage of the human ear's incredible capabilities by dropping the recordings by seven octaves? This would divide the frequencies by 128, putting everything pretty much on the piano keyboard, and so easily amenable to human aural pattern recognition.

  21. why can't someone invent, a mechanical device, to make the vibrations, the dolphin doesn't use any electrical devices to make its sound ? and all sounds, are just vibrations at different frequencies

Leave a Reply

Your email address will not be published. Required fields are marked *