(A Podcast About) Making Better Podcasts is back! And we’re kicking off this new batch by going very deep into the most fundamental building block of the medium: sound itself—what it is, how it works, and how it’s captured.
Want to know more? Subscribe to this series (and leave a like and review) wherever you get your podcasts. Follow us on LinkedIn and Instagram, and feel free to reach out any time: pitcrew@citizenracecar.com
(Intro Music)
David:
This is a podcast about making better podcasts. This is Episode Four, and it's called “Input”.
Today, we're going to talk about sound. And, I mean, of course we are. This is a podcast about audio production and audio storytelling, so what else would we talk about? Smell?
But seriously though, today isn't about words or stories.
(Sound collage begins: crowd milling, chatting)
It's not about structure.
(Orchestra begins tuning, warming up)
Or even music, exactly.
(Conductor taps baton)
It's about sound itself…
(Distant explosion, mysterious hum)
…the actual physical characteristics of sound waves, and how we can understand them, capture them, and manipulate them.
So let's start by talking to someone who works here with me at Citizen Racecar doing the job of manipulating sound. He's an audio engineer, which means it's his job to listen to things, consider how they sound, and tweak them in a variety of ways to get them to sound better.
So I have here an actual professional audio engineer, Mr. José Miguel Baez. Hello, Jose.
Jose:
Hello.
David:
Welcome. You can confirm that you are, in fact, a professional audio engineer?
Jose:
I, uh, have my certifications. Yes.
David:
(Laughs)
Fantastic. So, um, let's start with some basic questions. What… what is sound? How would you describe what sound is?
Jose:
Whoa
David:
I'm putting you on the spot here.
Jose:
Totally on the spot.
Energy waves pushing particles of air—that's the way I would explain it. Like, because in order to perceive sound, you need a sense, right?
(Eerie, pulsing music fades in)
You need either physical, like, touch or, like, some way to catch it and hear it. But all it is is waves. it's a wave of energy moving through, in this case, air. It's, uh, literally pushing air.
(Distant explosion, wind sounds)
David:
No, I love that. It's one of my favorite things about sound is that it's actually physical, right? It's like, it's like… it's like wind, basically. It's moving air.
Jose:
Totally physics.
(Music)
David:
Sound happens when something causes an object, just about any kind of object, to vibrate. When you do…
(Vibration noise, wind sounds)
…that makes a wave of energy disperse into the air around that object. You can think of this like the way dropping a pebble into a pond causes ripples to spread out in concentric circles from where it entered the water.
Different kinds of vibrations create different kinds of air ripples, or sound waves as they're more properly known. We can describe these waves by measuring the size of the ripples, which we call amplitude, and the distance between them, which we call frequency.
Now amplitude equates to volume. Our ears and minds register larger, more forceful, ripples as louder sounds…
(Closer explosion)
…because they actually push harder on our ears and bodies.
Frequency, on the other hand, creates what we understand as pitch. When the peaks of the wave—the top points of the ripples—are further apart, we register that vibration as sounding lower.
(Electric guitar slowly plays ascending scale in the background)
When those peaks are closer together, we think of it as sounding higher.
Jose:
It has to do with energy. Depending on what's pushing air, you're going to get completely different sounds, or completely different, like, frequency response. If you move a very large object, the amount of air that's being pushed by moving the large object is going to create, like, a lower frequency response. You move, like, a piece of paper on your desk, not a lot of mass, you get like a soft sound. In useful terms, it's good to think of it as a range, and the frequency determines how much energy that specific wave required.
David:
A piano keyboard is sort of a picture of this low high spectrum. When you push the key all the way to the left side, if you're sitting facing the keyboard, the strings associated with that key vibrate in a way that creates a wave with 27.5 peaks per second. We hear that as a very low note.
(Piano plays its lowest note)
At the other end, all the way to the right, is a key that causes a vibration of 4,186 peaks per second, which we hear as a very high note.
(Piano plays its highest note)
The notes produced by each of the keys between those two break up that spectrum of frequencies into 88 divisions, each of which we hear as a progressively higher note.
(Piano plays a rising arpeggio)
Now what's interesting is that when we hear those notes, we're not actually just hearing those notes. It's possible to produce just a single clear sound wave on only one pitch. This is called a sine wave, and you can only really do it with an electronic synthesizer. It sounds like this.
(Sine wave plays)
But any naturally occurring vibrations from a human voice, or a musical instrument, or whacking a table, or dropping a book on the floor produce many different waves all at the same time, in a variety of different shapes at different frequencies.
Jose:
The complexity is that when you talk—like, when anyone talks—or any sound, really, it’s not a pure sine wave at that frequency. Sound is composed of very complex waves at all sorts of ranges of frequency. So, when you actually sit down and observe it, you can see how much energy there is at each frequency. So everyone's voice ends up being different. Every sound ends up being weighted different in different frequencies.
David:
When the wave for one of those frequencies is significantly stronger than the others, we hear that sound as a musical note, with that strong frequency, or fundamental, as the pitch of that note, and the other frequencies as what are called harmonics or overtones.
(Note sounds)
This is why it sounds different when different instruments play the same note.
(Violin, piano, trombone, and flute all play the same note)
They're all playing the same fundamental, but producing different strengths of various overtones when they do.
Now, when none of the waves are sufficiently stronger than the others, we don't pick out a clear note anymore, and then we register that sound as noise.
(Collage of noisy sounds)
If all possible frequencies of sound wave are all playing equally strongly at the same time, it produces what we call “white noise”.
(White noise)
Human speech is somewhere in the middle. There's a clearer fundamental pitch than noise, but not quite clear enough to register as specific notes. When we clarify the sound around…
(Begins singing)
…particular fundamentals, it changes from speech to singing.
(Applause)
Well, thank you.
Obviously sound is really complicated, and so thank goodness we have pretty sophisticated built-in equipment in our bodies for capturing it, primarily your ears.
(Ambient music begins)
The way your ears work is that they have a membrane in the back of the ear canal called an eardrum that captures these vibrations, then connects to special nerves that translate those vibrations into electrical signals that they then transfer to the brain.
This capture of sound waves creates some of the most profound emotional experiences that humans can have. Conversation, comforting, whispering in someone's ear, shouting in anger, not to mention the nearly infinite variety of music. For whatever reason, human are wired to avidly consume sound waves.
But, for the vast majority of human history and prehistory, this consumption was very personal and totally ephemeral. You had to be physically near enough to the vibrating thing to be able to hear its vibrations. And so the softer those vibrations were, the closer you had to be. And you could only hear that sound once. Every sound that happened, happened, dissipated out into the universe, and was gone forever.
Then, in the second half of the 19th century, a couple of amazing things happened. First, people figured out that this process—capturing sound waves with a membrane and turning them into electrical signals—is something we can also do artificially.
(Sound collage fades in: outdoor concert)
It's called transduction. This is how both microphones and speakers work, except they do it in reverse of each other. Both have a membrane, like your eardrum. A microphone's membrane vibrates when it is near sound, and then the mechanism of the mic transduces those vibrations and turns them into electrical signals. A speaker does exactly the same thing, but backwards. It takes in electrical signals and then transduces them into vibrations of a membrane, which then play out into the world.
This new technology meant that soft sounds could be amplified, and so things that were very quiet and intimate could now be heard by a lot of people and much farther away.
Around the same time, and it's really hard to say which happened first, people also figured out that you could capture a representation of sound waves and then recreate very similar sound waves later on. This is what we call recording.
This started by sending sound vibrations to a needle, which would cause that needle to vibrate and cut grooves into a physical medium.
(Archival footage of man singing a folk song, from an Edison cylinder)
They first did this with wax cylinders, and then discs made out of shellac and then out of vinyl.
Archival Footage:
My Country ‘Tis of Thee, by the Edision Quartet!
(Wind band music)
David:
You could then use these grooves to cause another needle to vibrate in the same patterns and then amplify those vibrations.
At first they did this just by attaching a cone to it, like an old megaphone, to make something called a gramophone. But then later, but then later they added this transduction technology…
(Vintage jazz music)
…transducing the vibrations of the needle into electrical signal, and then electrically amplifying them, sending them to speakers.
Now, eventually the needles into vinyl was supplemented and then largely supplanted by other techniques, like translating the vibrations into magnetic signals captured on a piece of plastic tape, or into digital ones and zeros in a computer, but these are variations on the same theme.
(Modern music begins)
All of this is so commonplace to us now. that it's really worth taking a minute to consider what a total revolution this was in human experience. For our first 300,000 years as a species, sound, which we crave and love so much, was always local and temporary. If it existed at all beyond the moment of playing, it was only in people's memory.
But then, just in the last 150 years or so, all of that has changed. You could capture a piece of sound, this totally ephemeral thing, and replay it as many times as you wished. People could hear it much later on, or very far away, or both. For the first time in human history, you could step in the same river twice.
Jose:
It's a, it's kind of freaky. Especially if you start reading on the history of recording. It’s my understanding that it's almost a direct response to the photograph. An inventor from Paris…Uh, I think it was De Martinville, or something? He saw that there was a camera, and he wanted to make a camera so that you could record your dreams as you woke up, without taking pen and paper, so that you wouldn't forget. And…
David:
(Interrupting, incredulous)
THAT’S the original reason for recording? It's like, Jungian, like…
Jose:
That's the oldest…
David:
The original function, intended function of sound recording is to record a personal testimony about your imagination and your dreams.
Jose:
Right. Right.
David:
Kinda sounds like a podcast.
Jose:
Yeah, it's actually very interesting!
Archival Footage:
An Ideal Republic, by William Jennings Bryan, Edison Records. I can conceive of a national destiny surpassing the destiny of the present and the past. A destiny which meets the responsibilities… (fades out)
David:
It's important to remember, though, that as magical as all of this is, you are not actually listening to me speak right now. Not really. The sound waves I created when I said these words dissipated out into the air like all sound waves do, never to be heard again. Instead, you are hearing a replication of the sound of my voice, created by a whole system of transductions, translations, and copies.
(Music starts: cinematic hip-hop)
I spoke into a microphone, which caused the element inside that microphone to vibrate in a way that mirrored the sound of my voice. The microphone then transduced those vibrations into an electrical signal, and sent them down a cable into a digital recorder. This recorder translated that electrical signal into digital code and sent it to my computer. Then I used the computer to manipulate that sound. Jose and our colleague Alex used their computers to manipulate it further. Then we created a finished digital file which we published on the internet.
You found that file and then downloaded it onto some kind of a device, which performed all of that backwards. It took the digital files, sometimes manipulated them further, then transduced them back into electrical signals and sent those signals to some kind of speakers: maybe large speakers in your living room, or smaller speakers in your car, or built into your computer, or tiny speakers inside a set of earbuds. The speakers turned them back into sound waves, which were picked up by your ears. These were not the same sound waves that I produced.
And at every stage in that process, there was the potential to make radical changes to your final listening experience. At every step, the sound can be manipulated. Improved, or worsened, changed further and further from the original sound of my voice into something else, something different.
Now, as a Producer, I am aware that there are many of those changes that I have no control over. You will have a very different experience, for instance, depending on what kind of speakers you use to create the final sound waves that are transmitted to your ears. But I can't do anything about that. You're going to use whatever speakers you want. There are a lot of things, though, that I do have control over. For me, personally, it's really important to me to consider each of them and try to maximize the final value of your listening.
As creators of media, what we're really doing is creating experiences for people. Ways for them to spend a certain amount of their time, and in that time, hopefully to have certain emotional responses. We want people to laugh, to cry, to learn, to be inspired, to be infuriated. Whatever we're trying to accomplish. And so I want to try to make sure that whatever is coming out of whatever speaker you are using is giving you an experience that is as close as possible to the one that I intended you to have.
(Music fades out)
This starts at the very first step when my voice goes into a microphone. Different microphones capture sound in very different ways, and this can make an enormous difference to the final product.
So, since we're here, let's take a little side journey into the wide world of microphones. We'll listen to some different mics and see what we can hear.
(Sound quality changes)
So this is me recording myself directly into my computer, the built-in microphone on my computer. Now, it's not too terrible. You'll notice, though, that a lot of the high end and a lot of the low end are gone from my voice now. There's just sort of this woofy kind of middle sound. That's because the microphone that's built in isn't as sensitive. It can't pick up as wide a range of frequencies as the more professional mic I usually use does.
You’ll also hear a lot more of what we think of as room sound. And that's because the mic is about, I don't know, a foot and a half away from me, and so you're getting some of the sound that's going directly to the microphone that's built into computer and some of the sound that's leaving my mouth and going out in those concentric circles and bouncing around the room.It's bouncing off the walls, it's bouncing off the ceiling, and then going into the microphone. That's why you hear that sort of… it's not quite echo, but it's sort of resonance, presence of the room. And that makes the sound a little less clear.
(Sound quality changes)
Now, instead of into my computer, I'm recording into my phone. In this case, an iPhone.
This is actually a fairly good quality microphone, as far as it goes. It records a broader range of frequency than the one we were just hearing. But it's very small, and that it has to be to fit inside the phone and that limits what it can do. It also tends to, what's called “overdrive”, which is because I'm closer to the mic, it's taking in more physical sound waves than it can really handle.
And so you hear. sort of a crunch, kind of a distorted sound that comes from literally just pushing the membrane too hard.
(Sound quality changes)
Now, when I back up, you lose that overdrive, but you hear more of the room. You hear more of that bouncing and echoing effect from what's going on around me, from the sound waves going all over the room before they go into the microphone.
(Clicking sound, sound quality changes)
Okay. So now we're going to start looking at what I would consider to be professional-quality microphones. So, we're going to look at three different kinds of microphone in a row, three different price points, and three different sorts of designs that sound really different from each other and are, kind of, good for different things.
So this first microphone is really kind of the entry level of professional. It's called a Shure SM-58, and that's Shure with an “h”. This retails for about $100. But it's maybe the most popular microphone in the world. And this is really a microphone that was designed for live performance. If you go see a band at a bar, there is a 98-and-a-half-percent chance that they are singing into one of these microphones. And because of that, what it's good for is, it's very directional. It's designed to really just capture the area right around the head of the microphone and not much else, so that you can sing into it on stage and have it not capture the other singers, the other instruments very much.
So because of that use case…
(Sound quality changes)
…when you get far away from it—now, I'm only, I don't know, six or eight inches away from the microphone—I get very quiet. It's really hard to hear me at all.
(Sound quality changes)
And then when I get up close, it's a little bit crunchy. There's a little bit of distortion. That little bit of crunchiness doesn't really matter, or even kind of sounds good, when you're on stage and it's blasting through a big PA system. But when you're recording, it's not really what you're going for. So, this is probably the least expensive option for a decent microphone, but it's not maybe the best choice
(Clicking sound, sound quality changes)
So now I've moved to a much fancier and more expensive microphone. This is a microphone by a company called Neumann. It's called the TLM 103, and it retails for about twelve hundred dollars. Which is a lot more than the last one, obviously, but you can spend even more than this on professional studio microphones. If you set your mind to it, you can spend seven, eight, ten thousand dollars, even more, per microphone. But this is sort of the… I would call this the low end of the high end.
Now, unlike the last microphone we were using, the Shure, that was a “dynamic” microphone, and this is something called a “condenser” microphone. I won't go into the mechanical details of what makes those different, but it's a… it's a different sort of, uh, mechanism and it's much more delicate, more sensitive. And so as you can hear, it's a really rich sound, really clear. You can really hear the nuances of my voice. But it's finickier.The mic itself is more delicate. That last microphone, you can drop it on stage and pick it up again and keep using it just fine. That's what it's designed for. This one's not like that. You have to be sensitive with it. You have to be careful with it.
(Noise in the background)
Now, do you hear that traffic going by? One of the trade-offs of this microphone is it's much less directional. So it's, it sounds much clearer and more beautiful, but it's much more likely to pick up all that extraneous sound.
(Clicking sound, sound quality changes)
Now, this mic is somewhere in the middle. It's another Shure microphone, this one's called an SM7B. This is kind of our standard microphone that we use here at the company. And it's a really popular microphone for broadcast, for radio, a lot of DJs use this microphone. Now, it's also a dynamic microphone like the other Shure was, but it's a much better dynamic microphone. And more expensive, this costs about $400. Now, this one I like because I feel like it really splits the difference between those other two. t's not quite as fragile. It doesn't pick up as much ambient sound. And so, I feel like it's an awesome compromise. This is exactly the microphone that I've been using to record the rest of this podcast.
(Clicking sound, sound quality changes)
Now, with all of these, you can't plug them directly into your computer. These are, sort of, traditional analog microphones that were, um, designed before computers really, and really what they are is transducers. They take the sound out of my mouth and convert them into electrical signal. Now, that means you have to have another device that turns that electrical signal into digital signal, some kind of an input or converter that will take the signal and turn it into something your computer can understand. You can't just plug any of these microphones directly into your computer and have it work.
(Clicking sound, sound quality changes)
Well, now I'm using a microphone called a “Blue Yeti”. And this is a USB mic. This kind of combines two steps into one box. It has that transducer, the microphone, that changes the sound coming from my voice into electrical signal, but then it also has a converter that converts that into digital signal built right into the microphone body. And then you can go directly out and plug into the USB port on your computer. And this microphone totally costs about $100, and that replaces both of the things I was using before: the microphone and the digital recorder.
And honestly, there are some compromises because of that. I don't think this sounds nearly as good. I think neither the microphone nor the digital converter are the same quality as either the ones I was using before, even when I was using the $100 microphone. But is this good enough for your podcast? It's certainly less expensive. It's only one thing to buy instead of two. There's a lot less to think about. You plug it in and there's one knob.
It's really up to you.
(Pensive music starts)
It's not like there are right and wrong options here, there's just different colors. You can make a beautiful picture with crayons, and you can make a beautiful picture with oil paints, but, you know, it's really just about the effect you're going for.
I think what matters is intentionality. When you're using the crayons, you're leaning into the crayon-ness and really exploring what effects that can provide. If you're just ignoring the difference and treating crayons like oil paint, or vice versa, you're just going to end up with a mess.
(Sound quality changes)
And of course, choice of a microphone is only one of many choices that contribute to the finished sound of your podcast. There's nothing preventing you from just taking that raw recording, uploading it to your feed, and calling it done. But, almost every recording benefits significantly from some kind of editing. Which brings us to the magical world of post-production—that's the showbiz term for everything you do to recordings to make them be the way you want them to be before publishing.
We live in a golden age when it comes to the possibilities of audio engineering. When people were recording directly onto wax cylinders, you couldn't really do post-production at all—what you got was what you got. Then in the time of tape machines, manipulations were possible, but they were limited and really tricky to pull off. But now, when everything is digital and computers are stunningly powerful, the possibilities are endless. In the digital realm, you can change, adjust, play with every one of the physical characteristics of sound.
(Music fades out)
You can adjust the amplitude, of course, either by increasing how much of it goes into your signal chain, which is called “gain”, or how much of it comes out at the other end, which is called “volume”.
You can simulate the sound of…
(Sound becomes overdriven)
…pushing too much through an amp or a speaker, which is called “overdrive”.
(Sound is clear again)
You can adjust the relative strength of the different overtones by adjusting the equalization, which is almost always abbreviated to EQ…
(Sound quality shifts)
…either bringing out the high end of the sound…
(Sound quality shifts)
…or bringing out the low end of the sound…
(Sound quality shifts)
…or emphasizing the tones in the middle.
(Sound is clear again, moves around in the stereo field)
You can move sound all around within the stereo field by using things like panning.
(Panning stops)
You can distort sound digitally by doing things like…
(Sound becomes digitally decimated)
…decimating the bit depth.
(Sound is clear again)
You can create artificial echoes of different kinds…
(Reverb and delay effects)
… using things like digital reverb and digital delay. This is a fun one because there are lots of different kinds of digital delay. Really, lots of different kinds.
(Sound is clear again)
Honestly, there's almost no end to what you can do to sounds after you've recorded them.
Do you think there's a limit on the amount you can manipulate a piece of sound with the tools you have?
Jose:
That's an interesting question. Whenever you think that there is a limit, someone actually goes ahead and build something different. So, I don't think there's a creative limit. You can do pretty much whatever you want.
David:
And that's what we're going to explore next. When you can pretty much do whatever you want to, what do you want to do?
(Closing music)
This is (A Podcast About) Making Better Podcasts. My name is David Hoffman,and I am the Host, Writer, and Producer of this series. Our in-studio guest today was Jose Miguel Baez, who also did the sound design. Alex Brouwer and I collaborated on the post-production. Publication and promotion by Candice Chantalou. This series is a production of CitizenRacecar, Citizenracecar.com


