Changes to Remote Media Retention on MastodonApp.UK

We're making some major changes to the duration we cache remote media files on MastodonApp.UK - Read this article to find out more.

4 min read
Cartoon Elephants

On January 1st 2023 I posted a toot on the MastodonApp.UK site asking folks to vote in a poll to help guide our policy on Remote Media Retention on the site.

Toot:

Ryan Wild (@[email protected])
We are currently considering adding a retention policy to the storage of media (Video’s / Images) that are stored from other instances. Originally I thought this was already set and enforced at 7 days, but it appears not to be. Would folks be happy with a 30 day retention policy for media that we…

What Is Remote Media and why does retention matter?

Before I explain the changes we're making, I want to explain what remote media is, how it works, how we store it and why the retention matters for a server of our scale.

Remote media in this context refers to any images, videos or similar attachments that are posted on toots that originate from a server that is not ours. When the toot is sent to us, we pull a copy of any images / videos that are attached to the toot and serve it locally on our site for our users. This reduces latency for our users and ensures we're not flooding the remote site with our traffic, and is something that works very well.

It's important to note, if the author of the original toot decides to delete / modify their toot that will also get propagated through, the purposes of us caching the media is not for resiliency / remote backups or anything like that.

What are we changing

From February 4th 2023 we will start to enforce a 30 day retention policy on all remote media. This means 30 days after the media was last pulled from the remote server, we will purge it from our disks.

We believe that at the moment the majority of remote content is only ever viewed on the day it is posted, so storing it for additional periods of time in a cache is taking up valuable space, which we pay for by the GB. If someone needs to load a toot older than 30 days with media on it, there will be a delay while our server reaches out and pulls the image back down and will store it then for 30 days.

Why Now?

At the time of writing this post, we're using nearly 3TB of Media Storage space, and while this is not all remote storage we believe the remote storage will take up a non-trivial amount of space, and if those files are never accessed again it takes up space we could otherwise use.

We're also going to be changing how and where we store our multimedia in the near future. At the moment we use Amazon Web Services Simple Storage Service (AWS S3) to store our multimedia content, and serve it over AWS CloudFront. Right now our AWS CloudFront bill is substantial and is unfortunately not something we can sustain long term, and is also quite a bit more significant than we had originally expected even with some aggressive caching by other content delivery networks that you as a user interact with directly.

We will be moving our multimedia storage to a new cloud provider operating behind a new media proxy server which for the shorter term will check if the file is present in our new storage service, and if it isn't will try to serve it from the soon to be legacy AWS S3 storage pool. Once we start the switch, we will need to start moving content out of AWS S3 and into the new storage, the less we have to move the cheaper and quicker the operation will be.

Our current plan will be to only move local storage files off of the AWS S3 storage, and we will manually age off the existing remote media files. We won't start this work until we've changed our retention of remote media so we can have purged a number of the older media files prior to trying to move content around.

How does this change impact me?

The short answer: It shouldn't.

We don't expect this change to have any noticeable impact to the vast majority (99%+) of our community. This is only going to have a noticeable impact to individuals that are regularly reviewing / looking through accounts of individuals from remote servers that store media, as there may be a short lag while we re-pull the media.

How do I find out more about future changes?

We're aiming to publish major changes such as this here on the ATLAS Blog, and we would encourage you to sign up for e-mail notifications from our blog so the latest updates can get delivered straight to your mailbox.

Alternatively, and for those already on the Mastodon Network, I'm tooting a lot of these posts and the other smaller changes we make, you can follow me @[email protected]