Hey folks! It’s been a minute, huh? Last entry in this series saw us look at video, this week we’re going to dig into what goes into the actual production of the stream – the stuff that takes us from 8 folk playing D&D on Discord and a VTT to a live show you can watch on Twitch! This is the bit that I found most confusing when I started out because there was relatively little advice (good advice, certainly), so hopefully this’ll prove useful to folks. As always, let’s start with the broad strokes before we dig in.
Right now, while we’re playing remotely, we get together on Discord and play through a virtual tabletop. On my PC I use software called OBS Studio to stream to Twitch, where I capture our Discord video call and crop it down to individual video feeds, then put them all together in a few different scenes. I’ve got scenes for us playing (both with and without the tabletop visible), and intro/break/end scenes where we’re not visible, as well as a title sequence I made in After Effects.
So in order to get live on Twitch or YouTube or Facebook or whatever you use, you need some form of software that knows how to talk to the server and send it the video feed that makes up your stream. Some services have their own apps that let you do that, like Instagram for example (I think Twitch has one too now?), but those are usually only good for very simple streams, like if you’re just talking at your phone or something.
If you’ve got anything more complicated than that, you need software that can handle your requirements. There’s a few options, we use OBS Studio which is far and away the most popular and open source, but there’s a fork of it called Streamlabs OBS that’s similar which we used to use. Those two are free, and other popular ones are VMix and Wirecast, both of which I believe to be paid software and designed for more professional users than the average streamer. OBS is pretty good, I’ve not had any trouble with it.
OBS (or whatever you use, probably) lets you create scenes with different sources in them. Scenes are, well, scenes. Different collections of sources for different purposes. We’ll talk about them in more detail later but as an example, I have a scene for before the stream starts, I’ve got a scene that has the title sequence, I have a scene for me and the players without the VTT and one for me and the players with it. You get the idea.
You can have all sorts of sources in your scenes – video sources (cameras), audio inputs (microphone feeds), audio outputs (headphone/speaker outputs), images, slideshows, web pages, a specific window on your computer or whatever’s displayed on your monitor or a game, whatever. A source that’s just another scene, even.
Once you’ve got your scenes assembled with the sources you want, OBS (or whatever software you’re using) then sends it over to Twitch/YouTube/whoever, with the right details for your channel so it knows where to go. If everything works, it’s available for people to watch!
Twitch and YouTube seem to be the two big venues for streaming, and I’m not sure which one’s the better choice. Twitch is geared for livestreams first and foremost, and video-on-demand (normal video) content second, whereas YouTube is the other way round. We stream on Twitch but only really because that seemed like “the place you stream” when we were starting out, rather than because I put any significant thought into the decision.
Twitch has a bigger livestream audience overall, but is also geared quite heavily towards folks streaming video games, I’m not sure what the difference is in terms of more ‘niche’ streams like RPGs. The larger audience also comes with a larger pool of streamers, who you’re also ‘competing’ against in a way that you don’t have to worry about with video-on-demand content so much, so bear that in mind too. If you’re already established on one platform to begin with then it probably makes sense to use that one.
You don’t necessarily have to choose, though! At least in the beginning. I’ve never done it but I know you can multistream to both YouTube and Twitch at the same time. It requires more processing on your computer, but if it’s up to the job it’d be a good way of comparing the two platforms. If you become a Twitch Affiliate or (I presume) a YouTube Partner, they require your streams on the platform to be exclusive to them, so you can’t multistream after that point. Takes a while to get there though, so plenty of time to trial both!
Discord and the VTT
Outside of OBS, there are two other programs the stream relies on which are Discord and the virtual tabletop. As I mentioned in part 1, Discord is our video/voice chat solution of choice. Since we started this deep dive we’ve moved into the Animancer server, which has a separate zone set up for our game. The unified text, voice, and video features Discord has make it nice, and we’ve got a voice channel we play in, as well as general and off-topic chat channels, as well as character-specific channels. Discord isn’t the best solution for the video side of RPG streaming, but in terms of campaign management I think it’s great.
We use that Bingo Collective voice channel for our video. I pop it out, put it in fullscreen mode, and then I capture that window and crop it down for each player. This works…fine? It’s not ideal – if someone has to miss a session (which is relatively common), I need to adjust the cropping and layout. Other video solutions (I know Skype and Microsoft Teams can be made to do this) can output each participant’s video using something called NDI or Network Device Interface. This basically creates a bespoke video/audio output that sits on your network and which OBS can pull in as a source (you need an extension), and gives you (in theory, not tried it myself) more granular control of stuff and isolated audio feeds.
Everyone in our group uses Discord a lot anyways so it was natural for us to stick with it for this, and while it’s a headache to adjust the crops each time, I prefer it to the friction of forcing everyone onto a different solution, so we’re sticking with it for the moment.
The last piece of the puzzle is our VTT. Back in part 1 I mentioned we used Fantasy Grounds – while that was true then, we’ve since moved over to Foundry, which is proving a better fit for us. I like Fantasy Grounds and it can do a lot of neat automation, I’d absolutely use it again for a different game or group, but it caused problems for some folk and Foundry is proving a better solution. Next issue we’ll look specifically at Foundry, what it does and how we have it setup, and we’ll maybe discuss other tabletops like Fantasy Grounds or Roll20 which we’ve also used and what features they offer.
Foundry’s standout feature is the sheer volume of third-party plugins and addons that are available, most of which are free. I use a module called Stream View to setup a dedicated stream viewport, which I then capture and bring into OBS so the stream can see what the players see. This is a more elegant solution than the dedicated laptop I used to use for Fantasy Grounds.
Beyond that, everything else happens in OBS!
Ok, so let’s look at the different OBS scenes I have setup and what they’re composed of, then we’ll look at the somewhat confusing streaming settings.
The best place to start is…probably the start! This is the Intro scene, which is basically the exact same as the Break and End Screen scene, just with different text. The Intro scene runs before the show starts to give people time to turn up. The Break scene is for if we need to vanish for a minute (this saw more use when we played in person), and the End Screen is for when the stream ends, funnily enough.
That’s what things look like in OBS. These scenes are fairly simple – they have a looping video file in the background of smoke which I made myself in After Effects (there’s tutorials online, I just followed one of them), and a slideshow of Izzy and Emma’s art. There’s only one audio source here which is the music I’m playing. We go into that in more detail in part 2, this is just an Audio Output Capture of the unused audio connection that I hook up to my interface. Beyond that, the only other elements are text at the top to let viewers know what’s going on (the show’s not started / we’re on break / see you next week), and the Animancer logo.
Next up is the Title Sequence scene, which is just composed of the title sequence video itself. I put that together in After Effects as well, and composed the music for it myself. It’s fairly simple and at some point I’d like to revise it, but I usually have more important things to be working on than that.
After that we have the ‘proper’ scenes for when we’re playing. These are all fairly similar – frames for all the players with their character nameplates and with a background. This scene is the non-VTT scene, so it uses the smoke background too (I get a lot of mileage out of that).
For the player cameras, I have these all set up as their own scene which is added as a source on the main scene. This lets me keep the camera frames glued onto their camera and lets me propagate changes to their camera throughout all the scenes it’s active in. This was a huge help in getting the stream manageable and slick. I also keep their character portrait as a background for each camera, it means that if someone isn’t up for being on video that week I can put that over their camera or if someone’s absent I can also put their portrait up. Doesn’t come up much but it’s also useful between sessions if I’m making tweaks to layouts to know which camera’s which. The little nameplates are ones I made in Photoshop. I’d like one day to animate these, but these work well enough for the time being.
There’s a lot of audio sources in here, and most of them aren’t heard by the stream. The Line In source is my microphone plus the music and the Focusrite Output source gives us the player audio. Everything else is just a backup. The Elgato HD60 source is my camera microphone, which I record to a separate track that the stream doesn’t hear in case something goes wrong with my audio (as happened recently), and I have a backup of the music as well in case I need to adjust the level of it before the final video goes onto YouTube.
OBS lets you record six audio tracks, but only one audio track gets encoded with the stream. My audio tracks look like this:
- Track 1: Stream audio. Player audio, Jake audio (including music), and any videos with audio the stream needs to hear.
- Track 2: Jake mic track only
- Track 3: Player audio only
- Track 4: Camera backup audio
- Track 5: Music backup
Those backups are useful in case something goes wrong with the stream audio or if I get the audio mix slightly wrong. I don’t have a good way to monitor the stream audio while we’re playing, so I sort of have to get things as good as I can beforehand, then hope for the best while we play. If there’s a problem, I’ll tweak stuff in Premiere before the video goes onto YouTube. I’ve been burned a few times with not having separate audio tracks for stuff and as a result some of the audio on earlier episodes isn’t where I’d like it to be, but I have a pretty good handle on stuff now.
You can see in that image I also have a 6 player and 5 player preset of this scene – those are, as you’d imagine, laid out for only playing with 5 or 6 players.
Last but not least there’s the VTT scene. This is basically identical to the main one, but it has Foundry for a background rather than the smoke. I used to just put the tabletop in a frame in the middle (that’s what ‘Remote Play Map’ scene is), but I don’t think it was much use there, so I’ve been experimenting with this layout instead.
The spacing between the cameras is compressed so that we can fit chat alongside, and I also put an Opacity filter on everyone’s camera so that we’re all at 95% opacity rather than 100%. That means that tokens and map detail will be visible under people if you’re really looking, but things are still opaque enough that they don’t look weird. You can see you still get the impression of the E in my camera up there but it doesn’t look jarring.
The Foundry background is another scene-as-a-source like the cameras, I pull out another browser instance with the stream view in it and capture that for that scene, and that’s what everyone else sees. I’m having an issue where if something is maximised over that browser, it won’t update. Not sure what’s responsible for that problem, I’ll try to fix it but it works alright if I have things mostly maximised so I can get by.
That’s basically…it! At least in terms of the scenes.
Streaming settings are very jargony and quite confusing, so I’ll show you the settings we use. Most software has an automated wizard that’ll figure out what you should be streaming at, it’s probably worth listening to that unless you have a problem with what it suggests.
These settings are based mostly on Twitch’s recommendations. It’s worth checking what your streaming service of choice has to say. The main streaming settings that people talk about are resolution, framerate, and bitrate. Resolution and framerate are probably fairly clear – it’s how many pixels your video is pumping out, and how often it’s pumping them out. Most people stream at 1920×1080 or 1280×720, at either 30 or 60 frames per second.
Your best resolution and framerate depend on what sources you have available, and what kind of content you’re streaming. Higher framerates make video look smoother and more like real-life, and framerates closer to 24 frames per second make things look more film-like. Higher resolution is usually just straight up an improvement over low resolution, but the more visually detailed something is in the first place, the more it’ll benefit from it. A flat colour won’t see any difference whether it’s in 1920×1080 or 1280×720. A person’s face will, on the other hand.
We stream at 1280×720 and 50 frames per second. We use 50 frames per second because it’s the framerate my camera operates at, and we don’t have any sources that operate at a higher framerate than that. This is important, because streaming at a higher framerate or resolution than your sources output at doesn’t make any improvement to things. Your computer streaming to Twitch at 60 fps doesn’t make your 30fps camera work twice as hard, your computer just sends the same frame twice, which doesn’t make a difference in the end.
To explain the choice of resolution, we have to touch on bitrate. Bitrate is simply the number of bits you’re sending Twitch/YouTube/wherever every second: the amount of data you’re firing over the internet to Twitch, and importantly, the amount of data Twitch is firing out to your viewers. This is important to understand for two reasons: a) if you have fast internet and are taking advantage of it to stream at a high bitrate, viewers with slower internet speeds will struggle to watch, and b) your bitrate caps the amount of data you can send to Twitch, which affects how much of an impact a higher framerate or resolution can have.
I won’t get into the why of that because I can’t think of a simple enough analogy, but basically a ‘good’ stream at 1920 x 1080 and 60 fps requires a lot more data than Twitch can handle nicely (YouTube might be better, I don’t know), and more data means potentially less people able to watch. You can sort of choose between a ‘good’ stream at 720p or an ‘average’ stream at 1080p, and we opted to have a better stream at a lower resolution. We’re able to stream to Twitch at a 4000Kbps bitrate and get a good quality, whereas we might have to go to 6 or 7000Kbps to see any benefit from 1080p.
It’s worth noting that once you reach Affiliate on Twitch you can sometimes get transcoding, where Twitch will give your viewers the option to watch at lower resolutions, and if you reach Partner you always get that. That’d minimise those concerns, but we’re not there yet.
That’s our stream settings window (at least the relevant portion). We record the streams at 1920×1080, that’s why the Rescale Output box is checked and set to 720p. That means Twitch receives the stream at a healthy resolution for streaming, but the final video will be a bit crisper for YouTube.
The Rate Control setting sets how you want your computer to figure out the bitrate. I use the CBR option (Constant BitRate), which just uses a fixed value. There are other ways to do it where the computer adjusts bitrate dynamically based on the content, but I’ve heard that CBR is the way to go for streaming.
Recording is a different beast, however. These are the settings I use for that. I use the CRF (Constant Rate Factor) mode for that, where my PC adjusts things based on the visual content. I use a CRF of 17 which produces good videos without overtaxing my PC. I record to .mkv format rather than the default .mp4 because if my computer crashes mid-stream, the video file isn’t corrupted as it would be with an MP4. MKV and MP4 files are the same information, just in different containers, so it’s really easy to convert from one to the other. If I need to edit videos before they go to YouTube then I’ll convert the MKV to MP4, then take it into Premiere.