This post is part of a series delving deep into how we stream Chronicles of Rinn. Each part is linked below:
Howdy folks and welcome to the latest instalment of our deep dive into how we stream Chronicles of Rinn! In the first instalment, we did a broad overview of how the whole stream comes together. This week, we’re taking a deeper look at audio and how that all works. We’re gonna look specifically at audio from my point of view – the way I setup my microphone for play and how I manage music, because mine is the most complicated setup and the one I can explain best.
To recap, I use an AKG P420 condenser microphone with a cardioid pickup pattern (it listens at the front but not really the sides or back). This microphone is connected to my Focusrite Scarlett 2i4 audio interface via XLR cabling, and then I process that audio in REAPER, my digital audio workstation (DAW). I also take a standard 3.5mm (1.8”) cable from an unused audio output into the other input of my Focusrite which carries the music I use. I mix these together in Reaper and the final mix gets sent back out from my audio interface and into the line in of my computer via a phono to 3.5mm cable, which is what the players and the stream hears.
Let’s break down some of those words a bit.
You don’t get very far without a microphone, and there’s a lot of lingo when you read about them, so let’s dispel some of the mystery.
Dynamics, Condensers, and Pickup Patterns
You have two categories of microphone to be concerned with (for our purposes) – dynamic, and condenser. Without getting too deep into the reeds about it, a dynamic microphone ‘just works’ when you plug it in, whereas a condenser microphone requires additional power, called ‘phantom power’. That might come from your audio interface (as it does for me), a USB cable, or batteries in the microphone.
Dynamic microphones tend to be sturdier and less sensitive, and condensers tend to be clearer and pick up a lot more. That can be a good and a bad thing. Condenser microphones tend to get a nicer recording, they’ll certainly pick up more of the nuances of your voice, but they’ll also pick up more of the nuances of everything else, so if your room has a lot of echo, your computer makes a lot of noise, you bump your microphone, etc etc, more of that will come through compared to a dynamic microphone, which is probably not what you want. Dynamic microphones are more forgiving. Because they’re more sensitive, condenser microphones don’t have to sit as close to your mouth as a dynamic microphone does, though, which may be worth bearing in mind if not having microphones in frame is important to you.
Microphones also have a variety of pickup patterns (describing the area in which they pick up audio), the most common being cardioid (from the front and a tiny bit from the back, a ‘heart’ shape hence cardioid), omnidirectional (from all around), and figure-of-eight (front and back). I bought my microphone specifically because it can do all three, and I use them in different scenarios – cardioid when at my desk so it picks up me but minimises other sounds, omnidirectional when we play in person so I get everyone at the table, and…well I’d maybe use figure-of-eight if I was doing an in person interview. It’s kinda niche, cardioid and omnidirectional are more common.
If you’re just recording one person, you’ll want to use a cardioid microphone to minimise any background noise. You also get hyper and super cardioid microphones which pickup sound in the same directions but in a much narrower and more focused band. These are usually called ‘shotgun’ microphones, and shotgun condenser microphones get used a lot for recording dialogue on film sets. If you have or have seen a ‘video microphone’, it’s probably a hyper or super cardioid condenser like that. Radio presenters, on the other hand, tend to use dynamic microphones, it’s part of what gives them that ‘radio sound’. Vocals for music are usually recorded with a normal cardioid condenser microphone but that’s also done in an acoustically treated space to minimise the problems condensers can bring up.
USB vs XLR
If you’re shopping for microphones there’s another choice to make, which is whether to get a USB microphone or a ‘normal’ XLR microphone. Microphones have a diaphragm which moves as it’s struck by sound waves to create a weak analog electrical signal, but in order for your dulcet tones to be understood by your computer those electrical signals need amplified and converted into the digital 1s and 0s that your computer works with. USB microphones are fully plug and play – the microphone itself gets power from its USB connection that allows it to do that amplification and conversion within the microphone itself automatically. ‘Normal’ microphones, on the other hand, need a dedicated device that can handle that conversion. This is usually a box called an audio interface.
Normal microphones connect to the audio interface via an XLR cable (other connections are available, but extremely rare), which is a strange looking cable with three connection pins (what they do is cool but not important right now). The audio interface has preamps which allow you to control how much amplification is applied to the microphone signal and usually some other settings too, and it connects to your computer via USB (probably). Exact same process as a USB microphone, it’s just that the stages are decoupled.
Now that all sounds as if USB microphones are the wave of the future being as it’s one less thing to worry about, and indeed if you just want to plug something in and go then yes, USB microphone is probably the way to go. That said, XLR microphones are the industry standard because, as is often the case with audio equipment, decoupling the two stages tends to produce better performance as dedicated audio interfaces have better preamps and converters than those built into a USB microphone. You can probably imagine how trying to fit all the circuitry necessary to amplify and convert the microphone signal inside the body of the microphone itself could be somewhat limiting compared to having a dedicated device for that job.
You may have also seen people using mixers to manage their audio, mixers being the boxes with sliders and knobs that you see at live concerts and in recording studios, although for consumers they’re usually far smaller. Mixers and audio interfaces are similar, but not the same. Mixers are used to ‘mix’ multiple audio signals together, and then output a stereo feed somewhere else – traditionally loudspeakers. In the modern day, however, some mixers will also feature USB connections to allow them to pass audio into a computer in basically the same way as an audio interface. The exact implementation of this varies, but my understanding is that it’s often only this stereo mix that gets passed into the computer, whereas audio interfaces will pass in the individual inputs into the computer as discrete audio channels.
For just one microphone you won’t notice much of a difference, but if you have multiple microphones or audio sources, the distinction is important. Because I use an audio interface to capture my microphone and my music, I get them as separate tracks in my computer that I can control independently. I can mute one without affecting the other, and I can process them individually. If I used a mixer, they’d be lumped together (although I could control them individually on my mixer). Basically, a mixer usually provides you more audio channels to work with, although those audio channels are less flexible once they’re ‘in’ the computer, whereas an interface provides you less channels, but more flexibility about what happens to them in the digital domain.
The choice of USB vs XLR is personal and entirely up to you. There is less of a selection for USB microphones (I’ll give suggestions later), and XLR microphones are usually cheaper than their USB equivalent, but an XLR microphone + audio interface is more expensive than just a USB microphone, although will probably be higher quality but more complicated. I don’t think there’s a right or wrong answer, but if you expect to do anything audio related beyond just talking at a single microphone, an audio interface route will likely give you more flexibility in the long term.
Also, they usually have headphone outputs that are better than the ones built into your computer (1s and 0s have to be converted back to analog voltages to power headphones and speakers and some of those converters are better than others), so you might notice an improvement in your listening experience too.
Shields & Stands
Regardless of your microphone, you’re probably going to need something to mount it on, and if you’re speaking into it you’ll likely need something called a pop shield. A pop shield is essentially a little screen that sits between you and your microphone to reduce plosives, the ‘p’ and ‘b’ sounds in your dialogue. When you make a ‘p’ or ‘b’ sound, you’re actually making a blast of air that makes a popping sound in the microphone that you don’t want, and a pop shield just sits in the way of that blast and lessens it. You can get foam ones that cover the whole mic (what I currently use), ones on a bendy arm that clamps to your microphone stand, or you might even get a microphone that has one built in. Or you can also make one out of tights and a coathanger!
You’ll also need something to hold your microphone in place. Some USB microphones will come with a stand for them already, in which case you’re basically sorted, but most of the time you’ll need to get either a regular microphone stand that sits on the floor, or one that clamps to your desk. It’s up to you which you use. You also have to be careful where you mount it, and how shock absorbent it is. Microphones capture sound, sound is vibration, and when you walk around on the floor or tap on your desk or play with the microphone stand you create vibrations that get transmitted up to the microphone and out into your audio feed. You can get shock mounts for microphones (many come with one), and lots of stands will have rubber feet to decouple it from the floor or desk or whatever as well. Make sure you use that stuff, otherwise you’ll get all sorts of weird rumbles in your microphone signal.
When you’re setting up a microphone you need to set the gain for it, this is effectively the level of amplification, you can think of it as microphone volume (although that’s not super accurate). When you look at it on an audio meter (either in a DAW or OBS or on your mixer or interface if it has one), when you speak you’ll be in the green, yellow, or red on most meters. You want to be hitting the edge of green and yellow, probably about -6dB (decibels, they’re a reference level hence the – sign, don’t get too hung up on it) at your normal level of speech.
Red is bad. Don’t go red. That’s clipping and introduces digital distortion, which is nasty and to be avoided. Digital distortion is different from analog distortion which you get out of an overdriven guitar amp or an old record player. That analog distortion is quite nice to many people, but digital distortion is…it is not nice. Don’t go red.
Whew, okay, I think that’s…I think that covers most of the theory. TL;DR: USB microphones are simpler but lower quality (on average), condensers are sensitive, dynamics are forgiving, audio interfaces are more flexible than mixers and both are more flexible than a USB microphone. Get a pop shield, get a stand, don’t go red.
I use an AKG P420 condenser microphone (on a cardioid pattern), which goes into my Focusrite Scarlett 2i4 audio interface via XLR. That should all hopefully make a lot more sense to you now. I have it mounted on a desk clamp stand (it’s an unbranded thing I got a few years back off eBay I think) which has served me perfectly well.
I’m mostly using my P420 because it’s what I have. I didn’t have a whole lot of budget for a microphone when I bought it for our in-person stream, so I bought one that would cover a lot of ground, and it’s done a great job so far. I do have microphones I’d like to move to, however, and some other alternative suggestions.
In terms of dynamic microphones, there’s a few ones that would be worth looking into. For starters, there’s the ubiquitous Shure SM58 – that’s the classic ball grill microphone that you see at concerts. They retail for under £100 new (I work in British money, sorry Americans), but given how common they are in the audio world there’s a rich secondhand market for them too. They can take an absolute beating; you can find all sorts of torture test videos for them out there. You’re unlikely to ever have one break on you unless you really mess up.
On the more expensive end of the spectrum there’s also the Rode Procaster which I’ve heard good things about, that sits in the £100–£200 range and is a nice radio-style microphone with a built-in pop shield. Beyond that you’re looking at microphones like the Shure SM7B or Electrovoice RE-20 which are industry standard broadcast microphones. Michael Jackson used the SM7B for the vocals on Thriller.
For condensers, there’s obviously the aforementioned P420 I use, and there’s a few other microphones in the same price range like the Rode NT-1A (Niall on our stream uses this one, I’ve always thought it sounded thin) or the Audio-Technica AT2020. They both go for about £100 new, but there are higher end models in the same line like the NT-2A or AT2050.
The NT1A and AT2020 also have USB models that would serve you well if you choose to go down that route, but the Blue Yeti is probably the one that pops up the most in the USB world. I’ve used a Blue Yeti before (it was our in-person microphone before I got the P420) and think it represents a bang for buck that can’t be matched by other USB mics I’ve seen – it has a number of different pickup patterns it can use.
If you’re looking for an audio interface I think the Focusrite Scarlett series is your best bet – there are a range of models with varying numbers of inputs and they start around the £100 mark. I’ve had very few issues with mine, but the drivers can be a little weird with Windows machines. There are cheaper interfaces available like Behringer’s UPhoria series, and another line I’ve been keeping an eye on are Audient’s iD series – Audient make high-end preamps for the recording industry which they use in that series. They’re a little pricier though, although the entry level iD4 model isn’t too bad.
Cables…just get any cable. If you go looking, you’ll find people discussing stuff like ‘oxygen-free copper’ cabling and similar nonsense – it doesn’t make a difference. I mean, it might make a difference, on some level, but it doesn’t make anything near a significant enough difference to be bothered with and is largely a marketing tactic. The important thing with cables is to get as short a cable as you can reasonably use. The longer a cable run the more problems it can cause (noise, interference and high frequency loss, mainly), so keep it short where you can. I use a 2 metre, own-brand cable I bought from Thomann, a European music store. I have no complaints.
There isn’t really an objective right or wrong answer when it comes to microphones. Think about the space you’ll be using it in, do some research, maybe even listen to or try different microphones if you’re able to, and pick what feels right based on that. You should also be aware that as with most technical equipment and hobbies there’s a certain amount of ‘gear snobbery’ that comes with the territory where people will debate at length how shockingly awful entry-level equipment is compared to XYZ piece of equipment that costs a four-figure sum. I’d advise you to ignore 90% of it – whatever you use will likely be a big improvement over the microphone on your earphones, webcam or laptop (although some are really quite reasonable). The main thing is whether it sounds good to you and the people that have to hear it.
Heidphones! If you’re playing online, you should use them. Voice conferencing software’s echo cancellation is usually…passable but if you listen on speakers, the other people in your call are almost always going to get a little bit of echo and for some people it really affects their ability to communicate properly, nevermind the effect it has on a stream.
This is really just a personal thing, so I won’t spend too much time on it and certainly not with the detail we looked at microphones, but there are similarly two categories of headphones – open and closed back. This is literally whether the earcups on headphones are sealed or partially open, and it influences the sound you hear.
Closed back headphones are the most common, and they both keep the headphone output inside the headphones, and block out other sounds. Open back headphones are often a little pricier and less common, and they have vents on the earcups that let some of the sound out, and some external sound in. This has the effect of making it feel less like you’re wearing headphones (in terms of sound) and gives what you listen to a bit more ‘space’.
This is really a personal preference thing, although closed back headphones will be better at reducing echo in a voice call. I personally use Beyerdynamic DT880s which are partway between open and closed and extremely comfortable, but I also have a pair of Audio Technica ATH-M50s which are a closed-back pair I really like. Both the ATH range and Beyerdynamic have cheaper models available, and if you’ve not used a ‘nice’ pair of headphones before you might be quite surprised by the difference.
Use whatever you have/want, but use them, and probably turn off your echo cancellation in your voice conferencing software – it’ll likely keep your microphone signal more consistent as the software won’t be continually trying to remove echoes that don’t exist.
This might not be relevant to everyone but for those curious, I use music from Epidemic Sound for our streams. They’re a subscription service that give you streaming-cleared music you can use for video production; it’s largely pretty decent although it’s used in a lot of places so it’s pretty recognizable now (I once heard a track I use for the Nethergloam in a Bernie Sanders campaign ad). MediaMonkey is what I actually play my music with, and I set it to output on an unused audio connection that I then pipe into the other input on my audio interface via a standard 3.5mm audio cable like you’d use in your car.
I have playlists for different scenarios, but most of the time I use a ‘General Ambience’ playlist. I have a Regular Combat and a Big Combat playlist that I use for battles, and a Creepy playlist for suspenseful moments or dungeon ambience. I’ve also got some location-specific playlists for certain spots that have a distinct vibe.
When we started I used an array of music from video games, mostly, and then when we began streaming I contacted a number of composers to ask permission to use their stuff instead – many got back to me and just asked for credit, but I moved to Epidemic Sound once we reached affiliate status just for peace of mind. My suggestion is to look into composers for community mods and independent composers – their music is usually less recognizable, often still a very high quality, and they deserve more acknowledgement. Contacting video game studios for soundtrack permissions when there’s money involved is likely to be difficult and/or expensive.
The final piece in the puzzle for my audio is the processing I do in Reaper, my DAW (digital audio workstation). DAWs are programs designed specifically for audio production – you might have heard of programs like Audacity, Pro Tools, Cubase, Logic etc. Reaper is my DAW of choice for a number of reasons (price being a chief one, they have a generous trial and the software itself is cheap for normal people use), and I use it to process my microphone signal using audio plugins, bundle it with the music, and pipe it out to the stream and my players.
I do that bundling by routing my processed microphone signal and the music out a secondary set of outputs on my interface. I then use a phono to 3.5mm cable between those outputs and the line input on my PC, and use that line input as my ‘microphone’. If you’re on a laptop, you might not be able to do this – you could plug it into the microphone input but that input probably expects a microphone-level signal which is significantly quieter than a line level signal that the interface will output, so it might be extremely loud/distorted.
There are also other options for audio processing, most streaming software allows for the use of audio plugins on audio sources (OBS certainly does), so you can still process the audio the stream hears this way. The most important thing with audio processing (and audio generally) is to listen carefully to your audio. Don’t blindly follow what I or anyone else do – determine what the problems you can hear with it are and process it accordingly. This post is already extremely long so I can’t go through all the ins and outs of audio processing (there’s plenty of literature out there on the internet about it), but I can share with you my processing chain and the reasons behind each section. You can also tweet at me (@JakeMagnificent), or come by our Discord, and I’d be happy to help as far as I can.
My processing begins by trying to siphon off as much noise from my microphone signal as I can. I use iZotope’s RX 7 plugin for this (which costs a hefty sum unless you pick it up on sale), but many DAWs (certainly Reaper) have their own noise reduction plugins that do an OK job. These usually work by looking at some audio from your microphone when you aren’t speaking, and then filtering out that frequency content as far as possible from your signal. The RX plugin handles all that automatically and with quite good results, others can be a bit more finnicky.
The next stage is to EQ my voice. This is where I set the frequencies I want to make louder or quieter, and is a functionality almost all DAWs can handle natively. All voices are different and therefore need EQed differently, but in most cases, the majority of the human voice (the bits that make it intelligible certainly) live between 300Hz and 3kHz. If you actually cut away all the frequency content outside this area, it makes a vocal recording sound like it’s coming down a phone line because that’s basically the only frequencies transmitted in phone calls. They initially prioritised transmitting those frequencies for phones so that people could understand each other and…haven’t really bothered to change things because folk get by quite easily with just that frequency content. Or at least that’s the explanation I heard.
The first thing I do is to remove all the lowest frequency content, below about 60Hz or so. There’s basically no voice down here, but it’s where rumbles live, as well as power line hum (electricity has a certain frequency it arrives in your house at, 50-60Hz, and so that’s the frequency you hear it at) so filtering that off can get rid of some extraneous noise. I give myself a little boost up to around 200Hz for a bit more oomph, and roll off some of the frequencies between 200 and 400Hz, where things can sound muddy and boomy.
After that, I add a subtle boost around 5kHz for a bit more ‘presence’ and intelligibility, and finally I boost frequencies at around 10-12kHz. This is where the ‘sparkle’ is, I’m told – it makes a vocal recording sound polished.
Once I’ve EQed my voice, it goes to a compressor. Compressors can be difficult to understand, but they basically constrain how loud a sound can be to a certain region. If the volume of a sound exceeds a certain threshold, the compressor kicks in and sort of ‘squashes’ it to keep it more consistent, stopping any big bursts of volume. The main control on a compressor is ratio, which is a ratio of the original volume to the compressed volume. Mine is set on a 4:1 ratio, which means that for every four ‘units’ my microphone volume exceeds the threshold, the compressor squashes it to only exceed the threshold by 1. I have quite a high threshold set, this is mostly there to catch any spikes in volume and keep things under control.
The settings you need for a compressor will vary quite a lot based on your microphone and the way you speak – if you speak at a very consistent volume, you can use a compressor with a lower ratio, or if your voice is very expressive and varies a lot in volume, you’ll probably need a a compressor with a higher ratio to keep it under control.
The plugins I use for most of my processing are paid, but there are many equivalent plugins that ship with DAWs, and there are various collections of good, free plugins you can also use. The plugins that come with Reaper can be downloaded separately for use elsewhere, or there are suites like MeldaProduction’s free offerings. Either of those ought to cover most bases you need without costing you a penny, but just might not have as much functionality as premium plugins.
And…that’s basically it! If you’ve watched Chronicles you’ll know I often use voice modulators as well, I make these all by hand with different effects, all built off this initial base I use for my voice. They’re outside the scope of this post right now but I can cover them in the future if it interests people.
Advanced Mode: Dante
I’m just going to touch on this briefly – it’s not something I use currently but something I have considered doing. If you’d like a better way of getting audio from your DAW to where it needs to go, you can look into software called Dante. Dante is an audio networking software (mostly used in live sound and broadcast) for sending audio data through a network. You can use it (in theory) within just the one computer though to send audio from your DAW to OBS or Discord or wherever it needs to go, even to other computers on the same network.
People will often mention similar software like Virtual Audio Cable or Voicemeeter to do that, but my experience with those solutions was that they introduced a lot of latency between my audio and video (so they went out of sync), which is why I settled on just using an actual cable. I’ve tested Dante and got it working nicely but it was a pain to set up and maintain, so I ultimately didn’t follow through with it. If I had a dedicated space and production PC that wasn’t also my home machine this is probably what I’d use though. You can give it a look on Audinate’s website, although Dante is paid software (but has a free trial for a month)
Ok! That’s…that’s basically it. You can see how this chunk wound up derailing the ‘do it all as one post’ idea, but hopefully you found it useful. I can’t cover all the theory for this stuff in one post so if you’re interested I’d encourage you to do reading elsewhere, but you can tweet me @JakeMagnificent or come by the Discord and I’m happy to answer questions and chat to folk about it.
Next issue we’ll look at video, in what will probably be a significantly shorter post (I’ve been studying and working in audio for…six or seven years now, not so with video). Until then, you can see this stuff in action on Tuesdays at 7pm UK time, or over on YouTube!
2 thoughts on “Jake’s Take: How I Stream Our RPG Actual Play, Part 2 – Digging into Audio”
Pingback: Jake's Take: How I Stream Our RPG Actual Play, Part 3 – Digging into Video | Animancer
Pingback: Jake's Take: How I Stream Our RPG Actual Play, Part 1 – Overview | Animancer