How to build the Perfect Media Server | Part 1 - The Tech Stack | mergerfs, SnapRAID, and docker.

Show Video

what if I told you you could build your own media server using nothing but free software that's more flexible than commercial solutions supports mismatched drive sizes and grows with your needs well in this series I'm going to show you how to do exactly that choosing how to store your data is one of the most hotly debated topics amongst data hoarders in fact I frequently find myself discussing this topic with friends my poor family and anybody else who will listen back in 2016 I wrote a post over at linux server.io entitled The Perfect Media Server this video is a long overdue discussion of those ideas and principles more recently I converted that blog post and the ones that came after it into a documentation website you can find at perfect mediaserver.com in this multi-part series we're going to dive deep into setting up your own media server and we'll do this using nothing but free and open-source software namely Linux Merger FS Snapraid and Docker in this part one of what is going to be a multi-part video series we'll discuss the tools and why I selected them part two will come in a week or two and in that I'm going to walk you through the process of taking a bunch of random hardware bits and putting them together into a functional system so part two will be the hardware video and part three is going to be the install and configure this thing type phase followed by part four which will come soon after where I'll start looking at like top 10 self-hosted app lists that you can install things from like the one I have over at perfectmediaserver.com of all the things I've ever done in my well I don't want to say my life but it's kind of true building a DIY Naz around these technologies has actually been a true labor of love and I am pouring a decade plus of experience and testing into this series not to say that there aren't other perfectly valid approaches to building a media server but many of these recommendations earned many of these gray hairs in my beard and they are the result of hard-fought late night battles before we had AI to dig us out of any holes it's a collection of tools and techniques and philosophies about building a perfect media server as opposed to this is the way you must do it now before we get caught up too much in the tooling itself though let's just take a few moments to think about what we want this thing to be able to do when we're done with it so first of all I wanted to act as a networkattached storage so a NAS and NAS to serve files across the network i also want to be able to group multiple hard drives under one pool or mount point and each disc should be its own individually standalone mountable thing no striping it should also be flexible and upgradeable over time and support hard drives of mismatch sizes as well as not require a large upfront investment of 10 identical hard drives for me adding one to two drives a year is quite easily achieved either as replacements or additions now I also want this to have some kind of fault tolerance for failed hard drives because these little guys are out to get you make no mistake they will fail it's not a matter of if but when i also want file level check something to guard against bit rot i also wanted to be able to saturate a gigabit LAN connection for both reads and writes on top of all that I want it to run on commodity consumer grade hardware you know this isn't an expensive Dell box for example this is just a a random motherboard a random power supply commodity hardware that you can buy used off Facebook Marketplace or eBay or even buy it brand new if you want to once we've got the hardware taken care of I want to be able to run a suite of self-hosted applications like Jellyfin and image to do photo processing if it can do a little bit of local AI too maybe that would be fun but perhaps most importantly this needs to be using free and open-sourced software and I don't really want to be having to worry about subscriptions or licenses or rugpulls happening down the road i want to build this solution once and have it last a lifetime let's start by giving you a quick overview of the building blocks we'll be using to put this thing together first up is merger FS this is the layer that sits on top of your data discs and it provides a single mount point for your applications to read from and write to it's not a file system per se despite the name you know it merges other file systems transparently together for you instead of being a file system in and of itself next up is snap raid now if we're using something like unrade or ZFS fault tolerance is built in as part of the design of the system and we get that for free out of the box but with merger FS providing no fault tolerance we have to turn to snap raid instead this creates a snapshot of par for your data discs and it is an imperfect solution because there's always a delta between when your par snapshot was taken and reality this is less of an issue for large mostly static data sets like media files but it can be a pretty poor choice for use cases with large amounts of churn like application data or databases you know stuff like your plex metadata for example would be a poor choice for snap raid and finally is docker now Docker probably doesn't need much introduction at this point if you're already into any kind of self-hosting but just in case you're not familiar Docker is a way to run applications in what are known as containers containers are self-contained little boxes of code happy little accidents of code that will wrap up everything an app needs to run and it just runs using a few lines of Docker Compose configuration we can deploy any application without needing to have any knowledge of how the underlying application itself works under the covers and in fact in the last decade or so Docker has become the standard packaging format for self-hosting so let's dive a little deeper now into each of these three pillars now that we understand what we're trying to build let's examine the core technologies that make this possible we'll start with merger FS which solves our first major challenge how to combine multiple drives into a single easy to use storage pool now merger FS takes just a bunch of drives you know like these five on the table here so this is sometimes referred to as a Jbod or just a bunch of drives and it makes them appear as a single drive it's tempting to call this merging an array but given there's no fault tolerance or striping of any kind let's just stick with a merging shall we i mean crows have a merger why can't hard drives have a merging each drive that Merger FS merges typically has its own individually readable file system this means you can pull a disc from your server and put it into another one and read the contents immediately so if you want to do off-site backups you take your hard drive physically out of your server and go and put it into another one at another house and it will just be read there's no syncing there's no rebuild time no revering each disc is just a disc now this actually speaks a little bit to what Moger FS is doing behind the scenes it doesn't really actually care too terribly much about the underlying storage at all it works at the file layer so if you give Me FS a file system XFS ext4 butterfs and yes even ZFS it will merge them together into one merging discs can be included that already have data on them and MurderFS will pick them up and their contents immediately so let's look again at our requirements and see where Merr FS fits in we want to be able to group multiple drives together under one mount point i don't want any data striping and I want each disc to be individually readable and I need it to support mismatched drive sizes this one is the killer feature for merger FS take any drive of any size and throw it into your merger FS configuration and it will be merged immediately unrade I should note is also capable of this to a degree but it's not as flexible because you have to set your config kind of in stone and then you can't start and stop the array with a drive missing it's just not as flexible it's the only other solution in this space though that supports individually mountable drives with mismatched physical sizing because of this flexibility you are not locked into the discs you buy on day one this solution can grow with you very easily so let me give you an example each year on Black Friday I typically buy one or two hard drives depending on the sales that are going on and I tend to target the $200 to $250 range with Merger FS all you need to do is format the disc add a small entry into your FS tab file and suddenly your storage pool instantly grows by 18 terabytes no rebuilding required just instant storage over the last 5 years or so that's meant that my average disc size has grown from about 10 terabytes per disc to now 18 terabytes per disc so let's suppose you have five discs in your system that also means that no disc in your system will ever be more than 5 years old because you're always rotating one or two discs a year and there's always a project waiting for those old drives too maybe you can use them to create an off-site backup using something like ZFS perhaps now this flexibility in every aspect is why I truthfully think that despite the many advantages of ZFS it's a poor choice for those of us building media servers and no RAID Z expansion is not going to fix this i will link to an oldie but a goodie blog post in the description down below talking about the hidden cost of ZFS if you want to know more about what I mean here there is a place for ZedFS in this system but it's just not the primary storage medium that we're going to use and we'll get on to ZFS a little bit later and by the way if that scares you off don't let it because the use of ZFS here is 100% optional as disc sizes have exploded over the last decade the need for tons and tons of them has diminished significantly in a single box especially when we consider things like H.265 and the AV1 video codecs i mean in the old days to get to 100 terabytes for example that often required 20 plus drives I remember seeing people in the old days on the Unrade forums posting things about like Norcco 4224s and like massive disc shelves that needed a small data center in your basement and that's a lot of noise that's a lot of heat and a lot of cost not just in terms of buying 20 plus drives but electricity and all the rest of it but nowadays 100 terabytes is pretty easily achieved with just five 20 terabyte discs and a 5-base system can be built and deployed really easily under a desk without requiring affforementioned basement data center and again this is all about flexibility meds has meant that as drive sizes have increased over the last decade or so it's been pretty trivial for me to keep up with the latest drive size changes and buy discs which are going to be your primary expense when building a server when they go on sale so you might be wondering what's the performance impact of using Meredger FS this all sounds too good to be true and in my testing the answer has been pretty minimal since there's no striping involved you'll get the native performance of each individual drive and that usually means these are easily capable of saturating a gigabit network link and in most cases the physical right speeds of the drives themselves will be the limit so these older drives this is a 4 TB drive here i find that these older red drives they typically write in the 130 40 megabits a second range whereas the newer ones like the 16 terabyte at the front there or is it a 10 whatever the newer drives can write in the sort of 200ish range and MergerFS can handle that no problem so so far Merger FS has given us the flexibility to combine drives of any size into a single pool but there's one critical piece missing what happens when one of these drives inevitably fails well this is where SnapRid enters the picture and this provides the data protection layer that MergerF FS itself doesn't include out of the box now while MergerFS handles the organization of our files across multiple discs SnapRid works behind the scenes to create recovery information that can protect those files from both drive failures and silent data corruption these two technologies complement each other perfectly merger fest for storage flexibility and snap raid for data protection it follows the old Unix philosophy of just do one thing and do it well now recall with merger FS that it provides absolutely no fault tolerance as I just said if a disk in that merging fails the data is gone this may or may not be a problem depending on how you sourced the data in question you know what I mean i've left my eye patch over there snap Raid is a free and open- source project licensed under the GPLV3 which calculates parity information for disc arrays and it can protect against up to six failures all at once it essentially takes JBOD and provides a cohesive glue I suppose for protecting them against drive failure and bit rot it's primarily targeted at media center users with lots of big files that rarely change now Snap Raid supports mismatched drive sizes just the same as MurderFS does although one caveat is that the parity drive must be as large or larger than the largest data disc this is very similar to Unrade if you're familiar with that now the name snap Raid comes from the fact that it isn't really RAID at all it's actually a snapshot RAID no striping occurs and par information is devoted to an entire dedicated par disc this implementation is a major difference from unrade or traditional RAID like madam or ZFS all of which calculate parity in real time so let's take a look and see which of our requirements snap RAID has checked off we need to provide a fault tolerance to protect against the inevitable hard drive failure i want check sums for files to guard against bit rot i need to support hard drives of differing and mismatched sizes and each drive should have its own separately readable file system with no striping of data snaprid helps us meet each of these criteria and when it's combined with merger FS it enables each drive to remain individually formatted no striping whilst still having some kind of fault tolerance data integrity is checked for against bit rock using 128 bit check summing which enables the silent fixing of these errors in case there has been any bit rot whilst you weren't paying attention furthermore any files changed since the last sync can be restored on a fileby-file basis allowing for quite a sophisticated backup solution at the file level i should note an important caveat there that any file since the last sync so your snapshots aren't versioned or anything like that you get one snapshot and it was the last one so if you're looking to recover a file from a month ago this won't help you but if you're looking to rebuild the contents of a failed drive then that's exactly what SnapRid is for snaprid will also work on already populated data drives which is a big win over traditional RAID again it allows only the drives in use to be spun up unlike RAID which requires all of the drives to be spinning to access a file on just one drive i'll put a link in the description to a section of perfectmediaserver.com discussing is SnapRid right for me and the answer is probably maybe it kind of depends on your use case as I've said SnapRaid was designed with large mostly static data sets in mind like media collections and we are building a media server here right a common use case is a media or home file server let's say you have a two terbyte collection of movies and a two terbyte collection of TV shows how often does that content change well not very is probably the answer to that that you'll acquire a show and then it will just sit on your discs for the next 10 years until you get bored and delete it one Sunday afternoon so does it therefore make sense to require a realtime parity calculation and pay the penalty any time you write to your merging i nearly said a raid but I'm going to call it a merging does it make sense to pay that par write calculation tax almost like a like an unrade would give you or just run that computation once daily at a quiet time when you're not actually using the server so here's an example you acquire a file and save it to disk and it's called best movie ever.mv that file sits on disk and is

immediately available to JellyFin or Plex or whatever you want to use for playback but the file remains unprotected on disk until you run the snap raid parity sync this means that if in between your acquisition of said file and the parity sync if you were to experience a drive failure for some reason that file would be lost or unreoverable it's simple to run a manual parity sync if that file is really important to you just by running snap raid sync but that window of time is it's really important you understand that just that that window of time is often just a few hours and depending on how you acquired said file it really does it matter i've actually been running without snap rate on my server at all for a good two or three years at this point and I haven't so far experienced any issues but your mileage may vary with that so I include snap raid here because I think it's important that we have at least feature parity with the commercial offerings like Unrade and I kind of don't want to bring ZFS and and Trunaz and all that kind of stuff into the conversation too much because they're aimed at totally different use cases in my mind like SnapRid for example is incredibly badly suited to high turnover applications such as databases or app data you know stuff like your Plex metadata in a snap raidbacked uh array of discs would just it would perform horribly tens of thousands of tiny little files each of those have to be opened and closed and transacted through the par system it's just not what SnapRaid is designed for now if this is your use case you should probably look at a real-time parity based solution like running a ZFS mirror for example and we'll get on to all that stuff later on in the video series but for now if you can cope with this risk window and have a largely static data set Snap Raid is probably the right choice for you there are several projects available to automate this par sync daily you can just throw it in a chron job if you want to but there are some more sophisticated ways to do it which again we'll dive into later in the video series snaprade by the way does offer a drive pooling feature that on the face of it looks like it makes murder fest superfluous however the pooling feature by snap braid creates a read only virtual view of the files in your array using symbolic links and we want to be able to read and write to this array with various different applications and the snap raid pooling option it can be useful as a a hobby project but it's not suitable as the primary mount point for our system now that we've established how to store and protect our data with merger FS and snap braid let's talk about how we're going to actually use that data storage this is where Docker comes in and it provides a simple approach to deploying and managing the applications that will bring your media server to life back in 2016 when I wrote the original perfect media server article over at Linux server.io Docker was barely 1.0 it was brand brand new fast forward several years several years and containerization has cemented itself as a major player in the industry in fact 10 years ago you were pretty crazy if you ran containers in production now you're pretty crazy if you don't for those of us looking to build a media server though containers offer a uniquely brilliant way to run applications they divorce the running application from its data whilst making managing their persistent data and configuration really straightforward and simple so why should I use containers well we'll come on to what a container is shortly but first let's discuss why you might want to consider using them at all my light bulb moment with Docker came a few years ago after I'd reinstalled my server OS i'd gotten everything working pretty much the way I wanted it but for some reason an update came along and broke everything however this time I was using Docker for my applications to store the data for those applications in volumes bind mount volumes and I'd kept a text file with each of the Docker run commands I'd used to create those containers in the first place remember this was pre-Doca Compose this really was a long time ago what normally followed reinstallation was a lengthy often multi-day process of getting things just back to where they were of resetting up a whole bunch of apps this time though things were different i'd copied each docker run command and I was back up and running within about 10 minutes i was done that was it the game changed for me that day this is because the configuration data for the applications was stored in a separate location in a volume rather than being lost when I'd reloaded the server containers also provide a number of other benefits these range from portability to standardization to security to in my case employability yes a lot of the skills I've learned building a on the face of it proxy little media server have led to me having a career in DevOps and you know networking and all this kind of stuff do not underestimate the value of having a server at home that you can hack on and break and learn with another oldie but a goodie resource is a talk by James Botmley from Fosdem in 2016 that really cemented my understanding of containers i'll put a link to it in the description down below it's good stuff docker Compose touts itself as a tool for managing and deploying multicontainer applications however it excels at managing the application containers most of us will run on our media systems you can declaratively define the image the container name the volumes the ports and everything else that make up what is going to be that deployment of that thing and capture that in a YAML file in a self-documenting system that will save you from that really annoying how did I do this 6 months ago again question be honest we've all faced that at one time or another i made a video in the early days of this channel about how I manage my compose files with Anible now we haven't touched on Ansible yet because I don't want to scare people away but it's something I use on the daily for doing a lot of my system configuration via automation and don't worry I will avoid it for as long as possible just to try and keep things as simple as possible we'll probably come on to that in part I don't know three in this video series i'm not entirely sure yet but I will stay away from the Anible Kool-Aid as long as I possibly can so you can head over to perfectmediaserver.com to take a look at all of the available documentation I've written over the years um but this brings us to the end of part one of our perfect media server journey if you found this helpful please consider subscribing to catch parts two three and four in this series and drop your questions about merger fest snap raid or docker in the comments down below i will respond to and address as many as I can as I make this series this is a real-time thing like part two isn't filmed yet so if you ask a question in the next week or two I will do my best to address it moving forward now thank you so much for watching and I'll see you in part two i've been Alex from KTZ Systems

2025-04-24 21:42

Show Video

Other news

The HD, WIDESCREEN Tube TV! Sony Trinitron KV-30XBR910 2025-05-30 19:30

Tech News: Byju Returns with AI,1st AI Laptop, Gemini, Nvidia,Perplexity, Shiprocket, Samsung, Apple 2025-05-26 21:55

A Tech Insider's Look at Nuclear With Faraz Ahmad 2025-05-26 06:40