EP011: Backup & Archiving
Data Integrity Is Your #1 Job!
We’ve all heard horrifying sob stories of data loss. Generally, these stories start with ‘I thought I had it backed up’ or ‘I don’t understand I had it on an expensive RAID’
As a postproduction professional data integrity should be your number one job – after all clients have trusted you to shepherd their data through final finishing and delivery.
In this episode of The Offset Podcast, we dive into backup and archiving strategies.
We’ll start out exploring the differences between a backup and an archive, why it’s important to NEVER work off a client-supplied drive(s), understanding online, nearline, and offline data lifecycle states, redundancy at each state, and understanding the gear needed you’ll need.
We’ll also dive into an overview of LTO and why it is the best option for long-term archiving.
We’ll discuss LTO generations, connectivity, using LTFS as a file system, tape redundancy, and why a stack of drives is NOT a suitable replacement for LTO. Finally, we’ll discuss some business/billing implications of archiving.
Like The Show?
If you like The Offset Podcast we’d love it if you could do us a big favor. It’d help a lot if you could like and rate the show on Apple Podcasts, Spotify, YouTube, or wherever you listen/watch the show. Thank you!
-Robbie & Joey
The Offset Podcast is sponsored by Flanders Scientific -leaders in color-accurate display solutions for professional video. Whether you are a colorist, editor, DIT, or broadcast engineer Flanders Scientific has a professional display solution to meet your needs. Learn more at FlandersScientific.com
Video
Links
- Wikipedia article on LTO
- mLogic/mTape - Popular provider of LTO solutions
- Hedge Canister - popular LTO software for Mac
- YoYotta LTFS archiving software
- Imagine Products My LTO LTO software for Mac
Transcript
01:00:00:15 - 01:00:19:06
Robbie
Hey there. Welcome back to another installment of The Offset Podcast. And today we're talking about a not so exciting topic, but one that's really, really important. Backing up & archiving. Stay tuned.
01:00:19:08 - 01:00:37:22
Joey
This podcast is sponsored by Flanders Scientific leaders in color accurate display solutions for professional video. Whether you're a colorist, an editor, a DIT, or a broadcast engineer, Flanders Scientific has a professional display solution to meet your needs. Learn more at Flanders scientific.com.
01:00:38:00 - 01:00:51:19
Robbie
All right. Welcome back everybody. I am Robbie Carman and that is Joey D’Anna and Joey., we are here today to talk about, as I said in a tease, a subject that's not all that sexy, not all that exciting. It doesn't get really people.
01:00:51:19 - 01:00:53:04
Joey
Maybe not to you.
01:00:53:06 - 01:01:14:04
Robbie
People really going. And that is the idea of backing up and archiving, right? I mean, how many times have we over the years heard, I mean, sob stories of all sorts, right, of things going bad, wrong, you know, very fast for people. And our first question is, oh, well, just restore your backup or go to your archive or whatever.
01:01:14:04 - 01:01:42:14
Robbie
And they look at us with this like wipes, you know, stare going, well, how about that? I didn't have a backup or an archive. Right. I think it should be in every post-production Bible. So it should be preached from, you know, mountaintops that backing up and archiving is something that you have to inject into your DNA if you want to work in, production, production and video post-production.
01:01:42:14 - 01:01:47:07
Robbie
Right. There is nothing, I mean, nothing worse than that horrible.
01:01:47:07 - 01:01:48:09
Joey
Horrible, horrible.
01:01:48:09 - 01:01:58:02
Robbie
Horrible feeling when you realize that something is gone and you're not getting it back. And in fact, Joey, we just had this happen five minutes before we got here.
01:01:58:04 - 01:02:18:03
Joey
Yeah, we had to dig into an archive and, delete a file that had been inadvertently deleted. And this is, it's it's something that I think you need to run under the assumption that you're going to need it. Right? It's like, you know, people that ride motorcycles say there's two types of motorcyclists, ones that have crashed and ones that will crash.
01:02:18:03 - 01:02:46:06
Joey
Right? There's two types of operation of computers. One's that have failed and ones that will fail. So if your data structure, however, you're storing the important assets that you need is not fault tolerant enough to handle a full failure of all of your data going away, you're you're sitting on a ticking time bomb because there is no 100% reliable data storage method anywhere.
01:02:46:08 - 01:03:02:15
Robbie
Yeah. And I think the the important thing that I'll just riff on that to say is that one, no matter how good the marketing that you look into, somebody selling hard drives, computers, etc., it's all just a matter of time before something goes wrong with that hundred percent. And then two.
01:03:02:16 - 01:03:15:17
Joey
They measure it in what's called mean time before failure. Oh yeah. Yeah, each time it takes for the device to fail, this means the failure rate of these devices is 100%. It's just a question of time.
01:03:15:20 - 01:03:44:00
Robbie
Right. And then the second part about that to riff on that is that. And I think this is going to be a theme for this episode, is that it is not a one step or a one size fits all approach. We we are going to make the case in this episode of a multi-tiered approach to, to archiving and backing things up because, honestly, there are different levels of this to what you need depending on the project, but also, cost plays a factor.
01:03:44:02 - 01:04:07:22
Robbie
capacity and speed plays a factor. So we're going to dive into all three of those things. But show you the first place I want to start. is this concept because I think these are these phrases and I, we even do it back and forth wrong sometimes too. But the difference between backing something up and a true archive, in your in your own words, kind of give us what the difference is between those.
01:04:08:00 - 01:04:33:06
Joey
Well simplified to the very core. An archive is saving things for a long time for the ability to get them back. Yeah, a backup is for fault tolerance and recovering from failures. So your backup is there to get you back to work when your storage explodes. Your archive is there for when a client asks for a project to come back.
01:04:33:11 - 01:05:00:23
Joey
You can get it back for them. And both of those things have very different technical requirements, logistical and operational requirements and cost implications. and in fact, I think we'll talk about this in more detail later. You know, where your responsibility as a vendor lies, changes between those two categories, right? I feel that most vendors would would agree that they are responsible for the backup side no matter what, right?
01:05:00:23 - 01:05:27:04
Joey
The client is entrusting you with their data and their project, they expect you not to lose it mid project, correct? However, archiving. Nobody can be expected to manage someone's data forever, right? So there are costs and schedules and fees that can be associated with that, that that can and should be managed. So you really got to look at both of these as two kind of completely separate things and separate solutions that are both very important.
01:05:27:06 - 01:05:48:19
Robbie
Yeah. No, I, I 100% agree with that. So let's start with also some vocabulary that I think is going to be germane to archiving and to back up. And again, we'll dive into both of these scenarios in just a second. And I want to bring these vocab words up just because I think that a lot of people, some people use them, some people have no idea what we're talking about.
01:05:48:21 - 01:06:09:14
Robbie
And then there's just I, I just feel like I'm on a mission to kind of standardize this language a little bit, because it does get a little confusing. And I think it pertains to both kind of avenues of things back, both backing up and archiving. And there's three words that I want everybody to kind of know. Right. And they are online near line and offline.
01:06:09:14 - 01:06:26:18
Robbie
Right. and we can use those to kind of accurately describe where in the life cycle data is. and I think that's an important thing to really kind of to kind of cover because, it's going to very quickly allow you to go, oh, that's the type of device or the medium that I need to be using.
01:06:26:21 - 01:06:32:08
Robbie
These are the things I need to be thinking about when we're talking about that life. So that part of the life cycle for data.
01:06:32:08 - 01:06:54:23
Joey
And so and that's that's a very important thing to think when we're talking about this entire discussion, the phrase you used data lifecycle. That's what you need to be thinking about. It's not I'm working on a project. I'm done with the project. Right. There is a complicated lifecycle for your data moving through the entire production process and post-production process and managing it is what we're going to get into.
01:06:55:04 - 01:06:55:16
Joey
Yeah.
01:06:55:18 - 01:07:19:18
Robbie
Okay. So let's talk about the first. The first scenario is the one that we, in post-production face every single day. That is client hands off a big pile of media, whether that's on a hard drive. It's a download link wherever we need to initially transfer and store that stuff somewhere. Right. Think about this as our active storage. But in our discussion I'm going to use that word again.
01:07:19:18 - 01:07:43:15
Robbie
Online storage, meaning this is the storage that is generally going to be the, the fastest, the most accessible, your best storage, if you will. And so in our case, you know, that's, various NASA arrays, it could be, you know, your, your biggest and best SSD attached to your system. It's going to be the thing that you're going to access the most most frequently.
01:07:43:21 - 01:07:49:00
Robbie
And it's going to concurrently. They're going to keep you the current active projects that you're working on right.
01:07:49:05 - 01:08:06:12
Joey
Yeah. So the first step here is to think about what your needs are for online storage. How fast does it need to be? Do multiple people need to access. It doesn't need to be a Nash or can it be direct attached? And most importantly, how redundant does it need to be because that determines the rest of the backup strategy?
01:08:06:12 - 01:08:08:00
Joey
Here's our first line of defense.
01:08:08:05 - 01:08:23:12
Robbie
Okay, I want to stop for a second and pick up on something that you just said. There, which I think is really, really important. Right. because I see this happening to people all the time. They go, well, I got my stuff from finance on my biggest and fast online storage. It's, you know, whether that's a mass or SSD or whatever.
01:08:23:14 - 01:08:50:05
Robbie
And they stop there. But you just said something that's really important. And that is the word redundancy. Right. And redundant does not mean one as far as I'm concerned. I think you'll agree with this does not mean just because it's on one unit or one drive that we now have redundancy at all, and that includes if you're using a Raid or a ZFS or some other technology, even though, you know, redundancy is the R in Raid, right?
01:08:50:07 - 01:09:07:12
Robbie
It's not a backup. Okay. And that's an important thing. Think about it more like fault tolerance, right? Like, yeah, you could have a hard drive die within this unit. And we still have data, right. But that is not the same thing as having a backup. And so let's let's begin there okay.
01:09:07:12 - 01:09:28:10
Joey
So your raid is you know, and I recommend having some kind of raid solution for your online storage because what that does is it lets you survive one, two or more hard drive failures. Right? Right. That's great. When the hard drive fails, you can keep working, but guess what? It doesn't protect you all. You fat fingered and deleted a file.
01:09:28:16 - 01:09:53:14
Joey
Well, that's gone forever. Now, your entire array power supply dies and gets corrupted somehow. Doesn't protect you from that. Somehow two drives fail before you can replace them in time. Doesn't protect you from that. So the first stage of the online storage is once everything is copied over, I feel like it should also be cloned to another completely separate media.
01:09:53:16 - 01:10:21:13
Joey
So that's stage one, right? And that's what we do with our projects, right? We have the NAS systems. Everything goes on the NAS. And then there's an active projects folder, and that gets a nightly clone to a completely separate raid. Five so our Nazis are raid six. They can tolerate two drive failures. Then we have a complete clone of the working storage set that can tolerate 1 or 2 drive failures, depending on if it's configured for Raid six or Raid five.
01:10:21:14 - 01:10:54:15
Joey
But what makes that a backup, not just a redundancy, is that, let's say the entire storage goes up in flames. Yep. I can take my Raid five or Raid six array that I've been cloning to. I can plug that into a brand new computer that I just bought the Mac store, and I can continue to work. Right. And then the third layer of this, backup solution for us is that we have multiple sites, so our nasties are cloned to multiple sites.
01:10:54:15 - 01:11:21:05
Joey
So to start with, our online storage consists of the NAS at both my house, Robbie's house and at our office, then the backup of the local clones to another array at those offices. Yep. And then finally the second backup. And this is a very, very important thing to have is a off site, as in we have the three NASA sync together.
01:11:21:05 - 01:11:31:08
Joey
All the active projects are synced. So those are in multiple physical locations. So literally if my house burns down the project doesn't go away.
01:11:31:10 - 01:11:40:14
Robbie
Yeah. And so when when you're thinking about that clone device because I think people are probably going, what do you mean? I just bought minus and it cost me ten grand. I'm not going to buy.
01:11:40:14 - 01:11:41:21
Joey
Another bottle by double.
01:11:42:01 - 01:12:06:21
Robbie
Right? Well, so here's the thing. Right? Is that I think there is an argument to be made for that, that literal redundancy and then that mirroring of a similar caliber device. Right. So for example, here at home, I have my main NAS and I actually have a second NAS device that operates and it clones every morning, every night or morning at 3 a.m..
01:12:06:23 - 01:12:28:06
Robbie
It's cloning all the active stuff on one device over to the other device. Now I have it on a NAS because of the situation that you just described, right? One NAS completely blows up. I can't afford that time of, you know, kind of going back to other hard drives where I need something that I can just plug and play and it's ready to go.
01:12:28:06 - 01:12:28:22
Robbie
Right.
01:12:29:00 - 01:12:54:15
Joey
But that right there is a cost to convenience decision. Right? The further we move down this data life cycle, the more sacrifices we can make to the performance of the device. So you might not have the, you know, budget to double your storage just to have a clone. But guess what? You could buy a cheap Raid array with slower drives that's direct attached, that costs a fraction of what you're NAS is.
01:12:54:15 - 01:12:55:21
Robbie
That's what you do. Right? You have.
01:12:55:22 - 01:13:21:15
Joey
Yeah, exactly. I've got a 5 or 6 drive raid five external Thunderbolt chassis, way less cost than my NAS, but also, you know, less interactive performance. That's fine. You know, you just make a decision based on do I need when I get into a disaster recovery mode, how quickly and performance wise do I need to have immediately and.
01:13:21:15 - 01:13:38:15
Robbie
Just and just for those of you who are wondering, like, what's the reality that this is going to happen and something's going to break, I'm going to give you a perfect case of, you know, perfect example of this. About a month ago. I'm sitting here and I'm grading the show and all of a sudden, like, hitting playback and like, I'm like, gosh, that really should be playing back.
01:13:38:15 - 01:13:59:14
Robbie
Like, why is it dropping frames or whatever? And next thing you know, I get a message, an email message from my main NAS that says, hey, look, performance has been degraded on the array because you have lost a drive. And it turns out a second drive is actually failing. Right? So after a mini panic attack, I was like, well, this is this is really bad.
01:13:59:16 - 01:14:29:08
Robbie
So I just happen to have some spares. So I popped in those spares. But guess what? Cool. Popped in the spares. The whole system has to rebuild that data. So that's actually a really dangerous time, by the way, to be working on something when something's rebuilding that. So what did I do? I said, cool, I'm just going to unmount that volume from my machine, and I'm just going to let it do its thing, and I'm just going to relink things over to this other array and like nothing ever happened to me and I can wait two time.
01:14:29:08 - 01:14:49:04
Robbie
So it does. As you said, I started this at the beginning of this. It's not a matter of, if it's just a matter of when. and in that case, that redundancy of having two online systems, kind of greatly help me. Okay. So when I bring stuff in, you transfer to online storage. We're making a backup of that, ideally onto something that's similarly equipped.
01:14:49:06 - 01:14:49:17
Robbie
in terms of.
01:14:49:19 - 01:14:51:11
Joey
Backups, one off site.
01:14:51:13 - 01:15:13:05
Robbie
Right. Exactly. So we have something to go wrong. Joe's house burns down, my house burns down, raid blows up. We have that data, and we're not going to lose. I want to make another point there that I think I see too often, especially in message boards with this. and I understand it's a cost factor and I understand not everybody can do this, but I it makes me gasp every time I look into it.
01:15:13:07 - 01:15:26:07
Robbie
And that is people describing working off of their client drives as their primary storage to do something right. Oh, that is that is an absolute recipe for disaster. So that.
01:15:26:07 - 01:15:27:19
Joey
Is a hard never for.
01:15:27:20 - 01:15:49:18
Robbie
Hard never ever, ever do you have to think of that client supplied hard drive or that client supplied link as like something out of like Mission Impossible, right? It will self-destruct right at any given point in time. And your first responsibility to that client is integrity of their data. And that means I'm going to say never. I'm going to really underline that in bold.
01:15:49:18 - 01:16:07:02
Robbie
It just never work off their storage. Okay. you need to be transferring things to your own storage. and we have a whole episode talking about conforming and offline online with client drives and that kind of stuff. that I think is coming out soon. and we'll talk about that more. And that episode. But that's a big never, never do.
01:16:07:04 - 01:16:28:20
Robbie
Okay. All right. So we do our project. Everything is hunky dory. drives work flawlessly. and I'm done with the project. Client has paid their bill. They're giving me a high five. What next? Well, if you're like most of us. Right. you're not convinced that a project is done when the client says, oh, thanks, we're done.
01:16:28:20 - 01:17:01:04
Robbie
And I paid you right. Inevitably, a week or two, a month, maybe 2 or 3 months later. Hey, you know, I was just looking at this more, and I think we want to make that title change, or, I got a new mix. Could you remarry that and just do a new output for me? Okay, the thing is, the thing about an online storage is that it's going to be your most expensive storage, and it could also potentially be your lowest capacity storage, you know, because if you're wanting to have super, super high performance and let's say you're using SSD as well, SSD is are really expensive.
01:17:01:10 - 01:17:31:00
Robbie
So it's really difficult to build up, you know, hundreds of terabytes of online storage unless you have unlimited funds. Right. So that brings us kind of to this next idea that I want to differentiate from backing up. And that is the idea of near line storage. Right. this is something that you often hear talked about. And to me, the differentiating factor between near line and online storage is mainly a factor of speed and connectivity.
01:17:31:00 - 01:17:31:08
Robbie
Right.
01:17:31:08 - 01:17:32:16
Joey
And time and speed.
01:17:32:18 - 01:17:56:07
Robbie
Time and space. Right. So in my mind, a perfect mirror line option is high capacity. But not necessarily high bandwidth or high speed. Right? I want to have a lot of space that I can just kind of throw things over to this storage that I may or may not need to access, but I'm not ready to put it in, as we'll discuss in a minute, offline or long term.
01:17:56:07 - 01:18:03:10
Robbie
Archive I just needed around, but I don't need it around taking up space on my main storage, right?
01:18:03:12 - 01:18:35:04
Joey
Yeah. And again we got to look at the pros and cons here because if we take it off of our main storage we're losing one layer of redundancy. So if we go to a separate nearline storage or if we use the same clone volume as our Nearline storage, you do need to make sure there's some other level of redundancy there, because if it's only sitting on a single ray drive now where you're between your online and your final archive, well, you're in a really dangerous spot because now you could lose that data.
01:18:35:05 - 01:19:06:18
Joey
My personal kind of mentality on this is that I like to go to the offline to what we're going to talk about in a little bit, which is a full tape archive. I go to that the same time I go to Nearline. So my near line is a convenience. Basically, when a project is done, it goes to tape for my final archive, and then it goes off of the main project folder that syncs to the three locations and syncs to my clone drive.
01:19:06:20 - 01:19:20:19
Joey
So in near line world, I can still get to it if I need to, but if it gets blown up and I have to blow it away, I've got it on tape. Yeah, right. So it's just a matter of time versus expense versus speed.
01:19:20:21 - 01:19:23:01
Robbie
Right? Right. Exactly. And and to.
01:19:23:01 - 01:19:41:07
Joey
Be clear, Nearline storage, as we're describing it is a convenience. It's, you know, I consider a good online storage and a good archive solution and a good backup solution to be necessities. Right near line is a convenience means to oh, the client comes back now I don't need to restore from tape or I don't need to keep this online forever.
01:19:41:07 - 01:20:01:10
Robbie
I agree, and in a certain level, your backup of your online storage is kind of your nearline storage. And for all intents and purposes, right? Yeah, we kind of made it out to say like, oh, well, we have this. I mean, that's a best case scenario that you have something as high performing as your online storage, as your backup for that or your nearline.
01:20:01:11 - 01:20:19:12
Robbie
But in reality, you know, the case that I see a lot of people having is, oh, okay, I have everything on my fast NAS and, you know, high capacity NAS. And now I went out and bought a thunderbolt raid, as you have. And that's my that's my near line. Right. It's just and I agree with the idea that of, it's kind of a temporary location.
01:20:19:12 - 01:20:43:16
Robbie
Right. It's kind of a nice to have thing. but, you know, the other the other way of looking at Nearline, too, is that, you know, if you have high enough capacity online storage and you have an archiving solution that you trust, you might be able to in the truest sense, forego nearline, but your nearline essentially just becomes your online backup at that point, right?
01:20:43:16 - 01:20:45:04
Robbie
It's a one at one time.
01:20:45:06 - 01:21:08:15
Joey
So in my case, I'm actually kind of backwards. My clone volume is smaller than my big NAS because because of the order in which I bought things. So my Nearline solution is a folder on my main NAS where it doesn't get cloned to the essential backup area, right? But it's still accessible, so I don't have to restore it from tape.
01:21:08:15 - 01:21:09:20
Joey
If I need to.
01:21:09:22 - 01:21:33:08
Robbie
Cool all right. So online Merlin as a convenience factor. And then finally we're going to arrive at offline right. And offline as its name implies, is something that we just don't really need to access all that often. this is this is the archive. This is the this is the cold storage. This is there's a lot of different ways people describe this.
01:21:33:08 - 01:21:57:12
Robbie
Right. But generally speaking, the offline storage is going to be the slowest, least performing storage, that you are going to utilize. Right. it is going to be something just because of that speed. It's not going to be readily accessible in the same sense of just like plugging in an SSD and ready to go, right? So you have to factor in time to get stuff back from it.
01:21:57:14 - 01:22:20:11
Robbie
and third, offline storage is a little bit of a choice of how you want to go with it, because there are now actually, really two main parts of archiving as far as I'm concerned. And that is, first, local offline storage, which for a lot of you consists of buying some hard drives and putting it on a shelf.
01:22:20:11 - 01:22:42:06
Robbie
And I just wanted to go on the record and say, this is the most horrible idea that you you've had. Do not do it. Don't even think about buying it. It's a waste of money. I understand that you look at archiving solutions and go, oh, the service or this drive is expensive, but it is. It is not worth putting a hard drive on a shelf and just praying that it's.
01:22:42:10 - 01:23:02:00
Joey
And this is going to be one of the this is the only other thing that I'm going to say in this episode that is a hard, hard absolute, never a drive. Sitting on the shelf is the technological equivalent of a drive sitting at the bottom of the ocean. Just assume that it is dead. And by drive I mean anything.
01:23:02:00 - 01:23:04:21
Joey
I mean Raid array. I mean SSDs, any.
01:23:04:21 - 01:23:05:15
Robbie
Magnetic media.
01:23:05:15 - 01:23:26:18
Joey
Any one of them. Yeah. Has a higher likelihood of failure when they start up, which means it will sit on a drive or sit on a shelf forever. You plug it in the second. All those drives spool up, 2 or 3 of them will fail. All your data is gone. Now, I'm not saying that happens every time, but it will always happen eventually.
01:23:26:18 - 01:23:44:12
Joey
Again, hard drives and SSDs have a 100% failure rate. It is just a matter of time. So if you put if you spend money buying drives and putting them on a shelf for an archive because the other solutions are too expensive, you're throwing away money to avoid spending money.
01:23:44:14 - 01:24:08:04
Robbie
So I had a filmmaker a year or two ago that is local here, the DC area, and I went over to their house, to have a meeting about a film we were going to work on. And I walked into, you know, like their work area or their studio area, and there's this big double set of Ikea bookshelves with probably no joke, 300, 400 of those little sea orange rugged drives.
01:24:08:06 - 01:24:11:08
Robbie
And I literally had a panic attack because I was just sort.
01:24:11:08 - 01:24:12:09
Joey
Of like, not that rugged.
01:24:12:13 - 01:24:21:22
Robbie
They're not that rugged. And the money spent on those hard drives could have a far more robust solution. So yes.
01:24:21:23 - 01:24:45:03
Joey
Now I want to go back to one thing we said at the very beginning here, I consider backup to be the mandatory responsibility of the post-production vendor. Archive doesn't have to be archive has a hard cost associated with it, and a hard responsibility of keeping these things over time. If a client is not willing to pay you for that archive, there is nothing wrong.
01:24:45:03 - 01:25:09:06
Joey
As long as you give clear communication about the responsibility to the client to say, hey, here is all of your project back. Permanent archive is now your responsibility because this is your intellectual property. if you need to come back to me, you need to bring it back to me. That's perfectly valid thing. I agree. If you can't afford a long term archive solution.
01:25:09:08 - 01:25:26:13
Robbie
And when we'll hit on some of that business stuff in just a minute. But I did mention that there's two main parts of this. Now that we've cleared away, don't ever put it on some extra, extra drives they have lying around. And those two main parts are LTO and some sort of in the cloud called cold storage. Right.
01:25:26:15 - 01:25:50:12
Robbie
in the cloud stuff. Honestly, I don't recommend a lot of I know it has a lot of, you know, Backblaze. There's Amazon Glacier, there's I mean, there's a lot of those services that are out there, to kind of back up, you know, very, very, very cost effectively we're talking about, you know, 10th of a cent per gigabyte or something like that on some of these.
01:25:50:12 - 01:26:19:14
Robbie
Right? The downside of going to a cloud cold storage archive like that or offline for offline storage, is it is going to take in our workflows, it is going to take a long time to potentially get there. And even longer to come back from there. Right. that's one of the reasons that it is so cheap is because these these data centers have, you know, Mount Everest worth of, drives sitting and sitting in a rack.
01:26:19:16 - 01:26:35:14
Robbie
And these are something that they just they treat it just like we're describing. It is offline storage for them. So to get something back from there, it requires, you know, a lot of effort on the, on the data centers part or whatever to get it back. And I just I've just never found it for our volume of media.
01:26:35:14 - 01:26:43:16
Robbie
You know, any given project might be 4 or 5, six terabytes. And, you know, times 20 at a time in a month, you know, you got 20TB, 30TB to backup.
01:26:43:18 - 01:27:03:09
Joey
The hard part, too, is also, you know, your data size needs for archive never shrink, right. So the more you archive, the more your monthly bill goes up and it'll just never go down. You know, you're always you're you're only adding cost to a recurring monthly fee. And if you're not recouping that from your client, then it's just not economically viable.
01:27:03:13 - 01:27:26:15
Robbie
So those solutions in terms of like I said, Amazon Glacier black days, you know, Backblaze like, the elk I think they're valuable. I think they're valuable to a point. And for certain types of data for sure. Smaller files, project files, you know, Photoshop files, you know, graphics, some of those kind of stuff. Yeah. Of course, that's going to be quick, easy to get up there, long term storage.
01:27:26:19 - 01:27:32:10
Robbie
But for moving terabytes or petabytes of data not not going to be at the top of my list.
01:27:32:13 - 01:27:53:07
Joey
Especially not for a smaller studio, right? For a large company that has existing cloud contracts and, you know, a normal monthly cloud spend that scales up and down with their needs, you know, the cloud archive is a fantastic solution, I think, for businesses like, like Robbie and I size a local LTO is better in most cases.
01:27:53:07 - 01:28:05:08
Robbie
Okay, so LTO or linear tape open which is the most obtuse name like ever, right. you people look at it and they and I say to clients all the time like, oh, we're just gonna put that back on tape. And they look at me like, I have two heads.
01:28:05:08 - 01:28:06:12
Joey
Like I thought tape was dead.
01:28:06:14 - 01:28:33:02
Robbie
Yeah, I thought it was dead. What are you talking about? Okay, LTO is a data tape format. It has its roots. And, we're probably dating ourselves a little bit here with, you know, going back to the days of Dat and Worm and all these kind of these, these tape technologies that were out there. LTO has become a standard archiving medium that literally everybody involved in big data uses.
01:28:33:02 - 01:28:57:21
Robbie
Right? These tapes, a single tape has a shelf life of 20, 30 years. It's, you know, think about it this way. It's what your bank uses. It's what, you know, governments use. It's you know, it is a it is a verified, bona fide, archiving, format that is designed for longevity and for, you know, future restoring. Right.
01:28:57:23 - 01:29:15:01
Robbie
and there's a couple things you need to know about LTO besides just the name of it. Number one, there's, there's two components of an LTO system or really three components. There is the drive itself. Right. This is a physical piece of hardware that has connectivity on it sometimes. And we'll put this in the show notes. There's a couple different ones out there.
01:29:15:03 - 01:29:22:06
Robbie
Could have, you know, Thunderbolt backplane where it's Thunderbolt connected connectivity. It could have SAS connectivity right in the black one there.
01:29:22:11 - 01:29:34:01
Joey
They're all SAS. They're all SAS. LTL drives are SAS. There's a couple vendors that make really convenient enclosures that have things like finger, bolt or USB to make it more accessible to, desktop computers.
01:29:34:01 - 01:29:41:10
Robbie
Yeah. And these days, you know, there's still there's only a few companies that are actually making the the actual drives. IBM comes to mind HP.
01:29:41:16 - 01:29:59:11
Joey
IBM, quantum, HP. And I believe there's one more. The LTO stands for Linear Tape open. It is a consortium of those companies that make the drives and they all are the same standard. So there's no like the IBM version or HP version or quantum version. Right. They're all cross compatible.
01:29:59:13 - 01:30:28:10
Robbie
So there's the drive itself. There's the tape, which is a the physical medium that gets stored on. We'll get back to that in a second. And then there is the archiving software that you optionally, can use right now I say optionally because, most of, you know, the platforms that, you know, Mac OS or whatever, you can install a toolset to kind of just mount that LTO drive directly on your in your OS and treat it like any other drive to, you know, to just to drive, drag things back.
01:30:28:12 - 01:30:45:11
Robbie
Now you be clear, it's not like any other drive. It's not as fast as any other drive. It's not you're not going to have that interactivity, but you can through a technology or a standardized format called TFS, mount that as just like any other drive and copy back you may want to consider. And there's various tools out there.
01:30:45:11 - 01:30:56:08
Robbie
We'll link to some of them in the show notes, but there are archiving pieces, LTO archiving software. Think about it as a piece of software that builds a database about the content that's on that tape, right?
01:30:56:08 - 01:31:21:00
Joey
Yeah. It tells you what tapes have, what files. And more importantly, the LTO software is usually designed to make your copies and interactions efficient. You know, LTS as a as a thing it presents to the operating system as a regular file system. So you think you could just open up Explorer Finder and start navigating around. But when you start navigating around like in finder, if you open up a folder, it's going to like look at all the files and try to make thumbnails.
01:31:21:00 - 01:31:25:13
Joey
Well, if all those files are on tape, that tape drive is going to be spinning up and down like crazy.
01:31:25:15 - 01:31:27:07
Robbie
Beach ball was all like, what's going on here?
01:31:27:08 - 01:31:42:05
Joey
Yeah, right. Your normal file system interaction on the computer is not designed for tape. And there are programs available that yes, they copy via TFS, but they make it more of a tape based interaction. So you're not waiting on the tape drive. Yeah.
01:31:42:05 - 01:32:12:21
Robbie
And they do a lot of things, including formatting, erasing tapes. they can add some metadata to the, to the tape, you know, that kind of stuff. And there's a lot out there. Hej comes to mind. Yo, you got a got, you know, there's, there's various platforms out there and you just have to compare the features of this is not a show about what features are, but the thing that is the most confusing about those three components, you know, drive tape and software, I think, to people is the generational numbering of LTL.
01:32:12:21 - 01:32:36:09
Robbie
Right. So you see LTL five, six, seven eight, I think we're up to nine now. Right. And essentially this is the way it works, right? Every generation that comes out is a new tape format, and it is a new drive that supports that tape format. And essentially what you get over every successive generation is usually two things. Increased capacity on that tape, right.
01:32:36:09 - 01:32:42:22
Robbie
So, you know, going from a couple terabytes to 5 or 6, I forget how many, terabytes, eight and nine are now. But it's, it's LTO.
01:32:42:22 - 01:32:43:19
Joey
Eight is 12.
01:32:43:23 - 01:33:07:01
Robbie
Well right. Okay. And then that so more capacity on a single tape and then generally speaking, some speed improvements. going writing back to each tape. Right now, those generations are really important because any given LTO drive. Right. So let's say you go out and buy an LTO eight drive. The way that the standard works is that an LTO eight drive will read and write eight tapes.
01:33:07:03 - 01:33:30:15
Robbie
It will read and write the previous generation. So if I got an eight drive, it will read write eight and it will read and write seven, but only at the specs that seven was originally capable of. Yeah. And it will read and read only another generation back. So in the case of LTO eight. Read and write eight. Read and write seven at seven speeds and capacity, and read only from six.
01:33:30:17 - 01:33:37:19
Robbie
You have an LTO four tape. Guess what? You either need to get an LTO five or no to a four drive to be able to support that. You cannot read that.
01:33:37:21 - 01:34:02:18
Joey
And so and the reason for this is that these this kind of technology is a very long term investment with a very long life cycle. Right. Like Robby said, these tapes can last 20 to 40 years. So it is not something like the latest iPad where a new generation of tape comes out and you need to jump on and buy the new version to stay up to date.
01:34:02:18 - 01:34:15:10
Joey
No, you when you make an investment in your archiving solution, and that is an investment that lasts years and years and years and years. There's no reason why. If you have an LTO seven drive, you need to move to an AE eight.
01:34:15:10 - 01:34:16:03
Robbie
I was gonna say.
01:34:16:05 - 01:34:16:23
Joey
You have to buy more.
01:34:16:23 - 01:34:37:01
Robbie
Tapes. I'm on seven and I'd probably been on seven for 5 or 6 years and really not run into any limitations otherwise, other than I have to use more tapes than you do because you have an eight drive, right? So, that that is something to consider. I generally fine with LTO. I think probably every 6 or 7 years I consider an upgrade.
01:34:37:03 - 01:34:42:12
Robbie
and by that time, you know, if it's a huge jump in capacity or a huge jump in speed, maybe, but that investment.
01:34:42:13 - 01:34:51:11
Joey
Part of this also is, you know, the cost of entry. The biggest cost to entry is the drive, right? The drives range from like 3 to 6 or even 7 or $8000.
01:34:51:11 - 01:34:52:17
Robbie
Oh, yeah. Yeah, exactly.
01:34:52:21 - 01:34:55:21
Joey
You can save money by buying a generation back.
01:34:55:23 - 01:35:13:19
Robbie
Oh yeah, a hundred. And the tapes are cheaper too. So that's a generational thing about LTO. when it comes to the software again, the LTO software mainly think about it as a database for being able to quickly find and restore things because break get on tape. Now, who knows what you have? that that's a big thing.
01:35:13:19 - 01:35:30:22
Robbie
But the other thing, it helps you do is things like make redundant backups, which I'm going to talk about in a second. So you can actually have multiple tapes at the same time, be written, especially if you have dual drive systems. so one of the things to understand about LTO software, and this is a change that happened.
01:35:31:00 - 01:35:45:17
Robbie
What do you think? Probably around LTO four. LTO five was that prior to that? Yes, LTO was an open standard, but there was a lot of people that were doing their kind of own proprietary ways to write that tape.
01:35:45:17 - 01:35:56:23
Joey
Right there on my old quantal ECC system, literally plugged scuzzy into an LTO drive and archived in completely its own format. And that's how it worked, right? Only another quantal could pull those tapes.
01:35:56:23 - 01:36:07:08
Robbie
Right? So so that's not a concern for modern archiving, but it is something to pay attention to if you have I you'd be surprised how many, you know, clients say, whoa, I have this project. I have.
01:36:07:08 - 01:36:08:00
Joey
An LTO.
01:36:08:02 - 01:36:31:13
Robbie
I've been a project from 12 years ago. And I want you to restore this. Right. And you plug it into your drive, and it looks like the tape is dead. First thing is, it might be one of those proprietary formats. And so unfortunate with one of those primary formats, proprietary formats, the only thing you're going to be able to do is get that that piece of software that that tape was originally written to be able to read that which brings us to the change that happened right around then.
01:36:31:18 - 01:36:59:08
Robbie
And it's important for people to understand the technology of ltfrb. So essentially Ltfrb S is a open file system that is, shared between manufacturers and is used by everybody pretty much now. But what this means is that I can make an LTO archive on my end and bring that, assuming that they have the correct hardware to support that generation of tape, I can bring that tape anywhere and they can mount it.
01:36:59:08 - 01:37:27:08
Robbie
So for interop bility delivery, there was a period for a while where a lot of networks like Discovery Channel, for example, were were asking for deliverables on, on, LTO as an LTA first tape. Right. And they had some specifics about metadata and stuff. But the beauty is you can make archives and deliverables like that. So that ask the question that we've been hinting at and going around or the course of this episode, is that what is the business model in post-production?
01:37:27:10 - 01:37:52:08
Robbie
for doing this right, I want to clarify something that I think is if it's not abundantly clear about Joey and I's, outlook on this, I want to make it perfectly clear we archive for sanity. And what I mean by that is that we the feeling that we get of data going, you know, hitting the delete button is just not something that's in our DNA.
01:37:52:08 - 01:38:00:15
Robbie
Right? And so we invested in these solutions long before we were offering it as a service to our clients.
01:38:00:21 - 01:38:05:06
Joey
To me, an LTO drive is like anti-anxiety medication.
01:38:05:08 - 01:38:22:20
Robbie
I'm not like, I need to tell an anecdote here because I feel sorry. This is not meant to poke fun at you, but it's it's. And I feel you on this. I remember last summer, Joey, but got back to go on vacation with his family, right? And he's like rushing, you can tell. Like he was rushing when I talked to him in a phone call.
01:38:22:22 - 01:38:33:15
Robbie
What are you doing? He's like, I'm just trying to write another tape. I'm just trying to write another tape. Literally. Could not leave the house for vacation without archiving sweet stuff.
01:38:33:17 - 01:38:34:07
Joey
Because.
01:38:34:07 - 01:39:03:04
Robbie
Of the anxiety associated with that data. And I'm right there with you, man. I feel I feel the same, the same way. So it's one of those things that like, if you're obsessive compulsive like us and you're scared of data loss like us, it's probably something you want to just think about doing anyway. But I don't think as Joey pointed out a few minutes ago, you know, you could back up that, that online storage, do the project, deliver the project, give it back to the client and just wipe your hands of it.
01:39:03:04 - 01:39:22:22
Robbie
That's truly an option. So it brings up the idea of, do we charge people for this? And I think it really kind of depends. Right? I try to do it more and more and more and more. we have rates for, long form and short form archival, and we have rates for restoration, which is important part.
01:39:22:22 - 01:39:43:12
Robbie
About those years later, they come back to us. We got to find that tape. We got to restore those files back from tape to, nearline or online storage. And there's time involved in that. Right. So we have rates for, long form and short form for that. What that rate might be is something that I don't think, there's probably much standardization on.
01:39:43:13 - 01:40:11:12
Robbie
I think it varies, you know, depending on what people want to do. It's also something that I would just I would urge you to include in your project proposals and bids. Yep. On everything. Just have that fee, make it transparent to the client. Right. And I think the one important distinction to make to the client, if you're going to charge for this is we are archiving the media that you delivered to us, and we are archiving the work that we did.
01:40:11:14 - 01:40:17:12
Robbie
It does not mean that you are archiving everything for that project for the client. Yeah.
01:40:17:12 - 01:40:19:18
Joey
We're not becoming an archive vendor.
01:40:19:18 - 01:40:38:01
Robbie
Right. If you wanted to offer that as a service, great. It seems like a headache to me to to take mountains of data from client to archives. And it's a lot of wear on your tapes and drives, but I think that's something important just to make sure, like, hey, I'm not backing up all your camera originals, I'm just backing up the, you know, the conform that you gave me or something like that, right?
01:40:38:03 - 01:40:59:14
Joey
Yeah. And I think the last thing I want to talk about on the business and billing side of this is, you know, like Rob said, we archive just about everything, right? For our own sanity. That doesn't mean that we nickel and dime every client on every single archive. If it's a 32nd spot, we're probably not going to put a line item on the bid of tape archiving.
01:40:59:14 - 01:41:20:06
Joey
Right? But if it's a two hour feature, yes, we're going to talk to the client. Advance explain to them what the process of LTO archiving is and say this is what it costs to archive it. But if a client comes back two years later and says, I need my 32nd spot back because I deleted it off my system, sure, we can say, of course, but it'll be an hour of archive retrieval time to do that.
01:41:20:11 - 01:41:36:20
Robbie
Yeah, totally. And I think, you know, at the end of the day, that discussion is something your clients will ultimately appreciate going, you know, knowing that, oh, thank God I don't have to spend megabucks, on, you know, crappy hard drives to put on a shelf.
01:41:36:22 - 01:41:40:11
Joey
Or even if I do, when they blow up, I've got a backup.
01:41:40:11 - 01:42:13:03
Robbie
Right. And one of the the last thing I'll add to that part of the business, part of this is, I think it's important that clients walk with their data as well. Right? There's no guarantee that they're going to come back to you. So one of the things that I would consider is when you're making that archive of a client's project, that you think about doing it in duplicate, and we didn't really touch on this, but duplicate LTO authoring is something that for from a pure data integrity point of view, is something that you probably always want to do.
01:42:13:08 - 01:42:30:17
Robbie
You know, want to have that tape, you know, have it just like a backup of the, the online storage. We're backing up our archives to two different tapes. Should something go wrong with that. And putting ideally putting one of the one of the copies in a separate location from the other copy. So, you know, whatever office burns down, we're not losing two.
01:42:30:19 - 01:42:53:02
Robbie
But what I tend to think about like that is, hey, I'm going to make two copies, one for us to keep and one for, you know, a deliverable for the client to have. So I am going to charge them, not just the archiving and restoration fee, but the nominal cost, you know, the, the 50, 60, 70 bucks or whatever it is for, you know, lto seven or so tape, and charge that as a line item as well, just so they have their own copy of it.
01:42:53:04 - 01:42:57:06
Robbie
And as TFS, they can bring it anywhere they want if they need to restore.
01:42:57:07 - 01:43:18:22
Joey
So let me just wrap this all up. I'm going to try to wrap this up in one nice little bundle here. Because I know we've talked about a ton of things here. First, backup is different than archive backup is an absolute requirement. An archive is negotiable with your client. we've talked about very expensive solutions from beginning to end.
01:43:18:22 - 01:43:44:05
Joey
Right? We every single thing that we've mentioned has a cost associated with it. So you got to look at your data lifecycle and where it's required to be redundant, where it's required to be fast, where it's required to not be fast. Come up with your own strategy that works for your workflow, your budget, but like anything else, there's there's ratios of cost to benefit.
01:43:44:05 - 01:44:13:00
Joey
And you need to analyze that kind of yourself and figure out what makes the most sense. But having that plan, having real backups in place, having an understanding of what Raid can and can't do, what backup is, what archive is, I think is just absolutely essential to anybody working in post-production. And when you look at those costs, right, when you look at that initial investment of I need to buy extra drives, I need to buy extra tapes, I need to buy a tape drive.
01:44:13:00 - 01:44:27:23
Joey
I, you know, Joe is telling me to spend thousands of dollars on this thing when I've never had a problem just plugging in my SSD. Well, let me just tell you this. I have worked in post-production for almost 20 by 2025 years old.
01:44:27:23 - 01:44:28:04
Robbie
Man.
01:44:28:04 - 01:44:59:09
Joey
Yeah, I have been around computers my entire life. I've been around computers and post-production for literally the entire time. Computers have been used in post-production as data devices. I have never, ever once lost an essential piece of customer data. Yep, I have endured hundreds of hardware failures and software failures and temporary losses of data of many different things.
01:44:59:11 - 01:45:06:17
Joey
I have never using these philosophies that we've talked about. I have never lost a customer project.
01:45:06:19 - 01:45:45:06
Robbie
And it's one of those things where people are cavalier about it because they haven't had the pain, the guilt, the agony of losing. And all it takes is one time for somebody to be a true believer. But I think if you really digest what we've talked about on this episode, the time to become a believer is not in the middle of it happening, but is is ahead of time and yeah, I, I would concur that, you know, spending two, three, 4 or 5, six grand, whatever it is for an LTO solution, or an additional Raid or whatever, seems like a lot of money upfront.
01:45:45:06 - 01:46:07:03
Robbie
And it it is, it is. But in the grand scheme of things, think about the detriment to your, your, your business, your reputation, all of that kind of stuff. If in the middle of a, you know, super important project, things go wrong. And I've seen those arguments right, like, you know, like the operator blames the client for not having done their own battle.
01:46:07:03 - 01:46:26:04
Robbie
Like you just have to go into it assuming that the client doesn't care or understand how to maintain data integrity. And as we said at the top of this episode, that should be besides the amazing creative work you're going to do, data integrity should be your you know, it's the one B to the one ay of being creative, right?
01:46:26:06 - 01:46:48:19
Robbie
That data integrity throughout the entire pipeline and life cycle is something you should consider. So, awesome. I think this is a great talk. If you're new to LTO, new to archiving, feel free to, to leave us some comments. as always, we really appreciate you listening and watching to the show. As a reminder, the show is available on all major podcast platforms, including Spotify and Apple Music.
01:46:48:21 - 01:47:09:05
Robbie
you can also get in our RSS feed, right off of our site. if you're on, if you go to YouTube, you can find the show there where you can actually watch video, of this episode and listen to, Joey and I and watch us gesticulate with our hand talking that we always do. and if you do like the show, please, give us a thumbs up and a, like, wherever you've seen it.
01:47:09:06 - 01:47:21:16
Robbie
and subscribe wherever you're seen it. and if you are, able to do a review, even better. those reviews really help us out, gain traction with the show. So for The Offset Podcast, I'm Robbie Carman
01:47:21:18 - 01:47:23:07
Joey
And I'm Joey D’Anna -thanks for listening.
 
							    			
							    			Robbie Carman
Robbie is the managing colorist and CEO of DC Color. A guitar aficionado who’s never met a piece of gear he didn’t like.
 
							    			
							    			Joey D'Anna
Joey is lead colorist and CTO of DC Color. When he’s not in the color suite you’ll usually find him with a wrench in hand working on one of his classic cars or bikes
 
						    			
						    			Stella Yrigoyen
Stella Yrigoyen is an Austin, TX-based video editor specializing in documentary filmmaking. With a B.S. in Radio-Television-Film from UT Austin and over 7 years of editing experience, Stella possesses an in-depth understanding of the post-production pipeline. In the past year, she worked on Austin PBS series like 'Taco Mafia' and 'Chasing the Tide,' served as a Production Assistant on 'Austin City Limits,' and contributed to various post-production roles on other creatively and technically demanding projects.
 
  	 
								 
								    											    		 
								    											    		