Mastodon and Amazon S3

30 Sep 2018 • on amazon english linux mastodon security

Back on 21st of August I ran into a conversation between Gargron, the founder and main developer of Mastodon, and a user about the security of images uploaded for the purpose of direct messages.

As the name indicates direct messages are messages that are only visible to mentioned people and this way kind of private.¹

The user pointed out that it’s possible to even access pictures from direct messages because they are publicly available on the internet, as soon as you know the link.

As counter argument Gargron said that the links itself are not really predictable and this way it shouldn’t be a problem and I agree with this.

https://[some hosting]/media_attachments/files/000/174/738/small/54dfe16d034b38b6.jpeg

That’s a long path and hard to predict. But no matter what, I thought I should mention that wrongly configured Amazon S3 buckets would allow to list the content of a bucket and this way expose them.

Now that I opened this box, I was nosy if I’m right. Mastodon is federated and this way of course also administrated by many different people which know (and care) more or less about security.

Getting a list of instances

The first step to figure out who is may affected by this, was to find out what instances use Amazon S3.

So I got a list from instances.social with all mastodon instances they could provide using their API.

With this set of data I could now isolate the instances host names to continue my research:

jq '.instances | map(.name)' results.json | grep -oP '[a-zA-Z0-9-]+\.[a-zA-Z0-9\.-]+' | uniq

This returns a plaintext list of all mastodon instances they have.

But now I still needed to identify the instances who were running AWS S3 as a storage backend.

Investigate instances for S3

If you know some basics about how Mastodon works in background, you know that it caches all content “locally”. While “locally” in this case doesn’t refer to “on the same host” but rather to “on the instance you are using”. This is a nice practice to keep CSPs tight and preventing XSS attacks as well as leaking information like Referrer information and IP-addresses (and tons of other things used to track people) to other instance owners.

If the Mastodon instance uses Amazon S3 as object storage for media data, cached media data will be uploaded to S3 for each instance.

With this knowledge and the fact that the Mastodon API provides a preview of the local timeline of a instance publicly, I just needed to write a crawler that searches for Amazon S3 URLs in this public timeline or to be more correct, in the avatars of the users on the public timeline.

#!/bin/bash

mkdir -p /tmp/aws-list/
curl -s https://$1/api/v1/timelines/public | jq '. | map(.account.avatar)' | grep -oPe 'https:\/\/s3(\.|-|\.dualstack\.)[^.]+.amazonaws.com\/[^\/]+\/' | uniq >> /tmp/aws-list/$1-aws.txt

mkdir -p /tmp/aws-list-vul/
cat /tmp/aws-list/$1-aws.txt | sort | uniq | xargs -L1 curl -s -o /dev/null -w "%{http_code}" {} > /tmp/aws-list-vul/$1.txt

This way we get a list of potential S3 bucket URLs. Now we just need to combine the hostname list and the script above to figure out which of the instances are vulnerable.

To call it I used gnu parallel to speed things a bit up, since usually it’s a not very CPU intensive but IO-wait intensive work to do:

jq '.instances | map(.name)' results.json | grep -oP '[a-zA-Z0-9-]+\.[a-zA-Z0-9\.-]+' | uniq | parallel --jobs 150 '/path/to/mastodon-scan.sh {}'

This will start 150 workers that run the script and create a a list of the S3 links in /tmp/aws-list/ as well as a list of all results in /tmp/aws-list-vul/.

Informing the vulnerable instances

Here things got a bit messy, starting with the fact that my first version of the script above didn’t cover all AWS S3 domain and scanned all images which causes in various false positives.

At the end of the day with the above version of the script I could get the results organized and started to inform people based on that.

Some interesting facts: Out of the ~5000 instances I scanned, only 131 used S3 which is something I really welcome. ~30 of them were wrong configured and this way leaking their user’s pictures.

Since there were only around ~30 vulnerable instances instead of writing yet another script, I decided to inform everyone by hand (and since I had false positives in the beginning, I also wanted to make sure that there are no).

So in the /tmp/aws-list-vul directory I simply run the following command to open all vulnerable instance’s main page:

grep -roPe '200$' . | sed -e 's/^\.\//https:\/\//' -e 's/.txt:200$/\//' | xargs -L1 xdg-open

Now just verify that the leak exists and then writing a direct message to the admin of this instance which looked like this:

@admin@example.com I want to inform you, that your instance is leaking all its user’s pictures in an index S3 bucket.

You should consider to disable the “list object” permission for everyone in the S3 console.

See: [link to leaking S3 bucket]

The fix of the S3 buckets

As already mentioned in the message to the admins, there is a simple option in the S3 bucket configuration which allows “list objects” and this causes a leak of all paths of uploaded images in the bucket.

So basically:

Login to AWS console
Go to the S3 console
Select your bucket
Switch to permission tab
Check under “public access” the entry “everyone”
Deselect the “list objects” box

Lessons learned

I really learned a few things while doing all this: First thing is that you should double-check your results even when they look correct. Especially when it comes to privacy or security issues.

Another thing I learned is that informing a ton of people is very annoying work. While doing it I reached out to Gargron about it and he gave me the hint that I can look at the /api/v1/instance API endpoint to determine the admin and an email address to send the information.

As it turns out the results have mixed quality. Not all instances provide an email address, and apart from that, I don’t trust these mail addresses to be read. Later on I also saw some pretty unmaintained admin accounts which also doesn’t make me feel very happy.

And finally, something positive: The people who got back to me about it were very happy for the heads up and even the people who were false positives were happy to got a heads up, while I of course apologized for the false positive.

Conclusion

Since the Fediverse is running by a few thousand people from admins who do this professional to hobbyists to people who just wanted to try it, it’s pretty annoying to inform everyone when you find something. The 30 leaking instances were easy, but I don’t want to think of what happens when there are a few hundred vulnerable instances (which I expect) for other problems.

But all in all I think it’s a pretty safe place to hang out. From more than 5000 instances I reached around 3000 and only ~130 use S3 and there again only ~30 were leaking. So basically less than 1%.

And to cover S3 itself: Keep an eye on what exactly you publish in your buckets. It may leads to sensitive content!

Just as hint, they are not encrypted so server admins can see them as well, if they really want ↩