We’ve been telling a white lie
We cloud security professionals™️ talk a big game about how important cloud security is. We hype tools and frameworks and best practices. “That bucket is public, ZOMGGGGGG” we make the CNAPP platforms yell at users regularly. What we don’t tell you is there’s a huge leap required to go from exposed AWS resource to hacked AWS resource - knowing it’s identifier.
This shouldn’t be a big surprise. Attackers have been port scanning the internet for decades looking for systems that might be vulnerable, noting their identifiers (IP addresses and ports) when they find them. But most cloud services can’t be targeted by an IP address and port. AWS talks APIs not IPs (with some exceptions). An attacker needs to provide an account ID, or an ARN, or a host name, or some random-looking string to an API to make it do bad things.
So, how do you know if your identifiers have leaked or are discoverable? I’m glad you asked!
Hacking AWS by example
That sounds nice in theory, but I sense that you sense that it’s not quite right. So, instead of just believing the premise, here are some recently published attack examples.
- AWS CDK Risk: Exploiting a Missing S3 Bucket Allowed Account Takeover
Ofek Itach and Yakir Kadkoda discovered it’s possible to takeover the CDK deployment process of some AWS accounts by creating a bucket with a predictable name of the formatcdk-{qualifier}-assets-{account_id}-{region}
, stuffing it with their own CloudFormation code, and waiting for a deploy to run. They figured out that the qualifier looks random but is almost never changed from the defaulthnb659fds
. Since there's a small finite list of region names, the whole attack hinges on finding valid account IDs. - Your Queues, Your Responsibility
Sid Rajalakshmi figured out a way to identify SQS queues that he could read from and write to. In order to do that, he needed to send a request to valid queue URLs of the format https://sqs.{region}.amazonaws.com/{account_id}/{queue_name}. In this case, Sid compiled a list of valid account IDs but brute-forced queue names with a wordlist, ultimately identifying 209 exposed queues after testing billions of mostly invalid URLs. - Abusing Misconfigured ECR Resource Policies
Nick Frichette observed that Elastic Container Registry (ECR) private repositories use resource-based policies which could be misconfigured to allow anyone with an AWS account to pull or push images. Of course to execute the necessary commands, an ECR URL is required, of the format{account_id}.dkr.ecr.{region}.amazonaws.com/{repository_name}
. - DNS and CloudFront Domain Takeover via Deleted S3 Buckets
Once upon a time, CloudFront distributions that pointed to a no-longer-existing s3 bucket would return a message indicating exactly which bucket no longer existed. An attacker could register the bucket, and the CloudFront distribution would start serving their malicious content. Cloudfront distribution names are randomly generated and long (e.g.d21y75miwcfqoq.cloudfront.net
) so the attack required a way to find valid host names because brute force was infeasible.
These are not cherry picked attacks. This pattern repeats over and over. To hack something in AWS, you need to identify ‘the something’.
Introducing Awseye
If an attacker can’t just port scan your AWS account, how are they going to hack it? Well, good reader, it turns out there are a variety of ways that AWS identifiers can be guessed, validated, discovered, leaked, and predicted. Here is but a sample to whet the nerd appetite:
- Tapping the Leaking AWS Account ID Faucet
- Breaching AWS Accounts Through Shadow Resources
- AWS OIDC Provider Enumeration
- Publicly Exposed AWS Document DB Snapshots
We’ve taken many of these, turned them into magical internet code that runs around the clock, and made the data searchable with Awseye.
Awseye (pronounced o-zee 🦘🇦🇺) is an open-source intelligence (OSINT) and reconnaissance service that analyzes publicly accessible data for AWS identifiers. It helps identify known and exposed AWS resources that might need your attention. It levels the playing field between attackers and defenders, by giving defenders access to the same data attackers have been harvesting since flip phones stopped being cool.
To try it out, navigate to awseye.com, type in your account ID (or any account ID really) and hit "check". If the account ID is not previously known, Awseye will check if it is active. If it is, it will do some enumeration to find predictable resource names in that account. You can even use the “Surprise me” button to see what’s discoverable about well known account IDs owned by various vendors.
That will show all (up to 100) of the resources Awseye has discovered for that account, including if it’s been able to determine if they are active and or publicly exposed.
If you want to be notified when Awseye finds new resources in your account, you can do that with “get notified”. Once you enter an account ID and email, we’ll send you instructions, including Terraform and CloudFormation templates, for creating a temporary deny-all role in your account. Awseye will use the same trick it uses to find common principals in accounts to check for the existence of the role. Once that’s done, it will send you a confirmation email, including a dump of all (even over 100) resources identified in that account. Then it’s just a waiting game.
Important identifiers
At the core of AWS is the concept of Amazon Resource Names (ARNs), which uniquely identify AWS resources. You might assume that’s the only way to identify things in AWS, but don’t do that. ARNs are often critical in many attacks but not all.
- Account IDs - Every AWS account is addressed by a 12 digit account ID (although that space will eventually get exhausted). All resources must belong to a single account. ARNs almost always have an account ID component.
- IAM keys - IAM keys on their own aren’t enough for an attack, usually a secret (password-like) component is required. However, it’s possible to extract the account ID from most IAM keys, as explained here and here.
- Amazon hosts - There are a surprising number of public (and public adjacent) resources that require a hostname to interact with them, SQS queues, CloudFront distributions, Amplify apps, EFS file systems, to name a few.
- Resource IDs - Sometimes resources don’t have an ARN, or in addition to having an ARN, have a randomly generated identifier that is used to address them. I believe they are generally unique across all accounts, not just the one they belong to, but I can’t see inside the machine to tell for sure. For example,
vpc-03914afb3ed6c7632
. - Resource names - The last component of an ARN often includes its resource name, and most resource names are user defined. They are typically unique within an account+region for a given service. Many APIs make an implicit assumption about what the relevant account ID is and only require a resource name.
Classifying identifiers isn’t the easiest task; consider how many types there are defined in CloudFormation. We’ve taken the approach of starting with this list of exposable resources. It’s not up to date or complete, but we’ll expand it over time.
At launch, Awseye captures hosts ending with amazonaws.com
, on.aws
, cloudfront.net
, awsapps.com
, elasticbeanstalk.com
, awsapprunner.com
, and amplifyapp.com
. Let us know if we’re missing something.
How does Awseye work?
It’s hot in here. You’re either a big nerd like me, or you are hitting on me. Let’s just keep it a mystery.
Awseye uses a few types of magic, that are not always super distinguishable from each other.
- Mention scraping - There are many places on the internet where code, and code-like blobs get saved. There are code repositories like Github, library registries like Nuget, Plugin registries like the Eclipse Marketplace, Paste sites like Pastebin, and so on. Awseye polls these for new submissions, downloads them, and searches for identifiers in the downloaded blobs.
- Direct listing - AWS hosts official and unofficial APIs that directly list public resources. For example, EC2 has a DescribeSnapshots action that can list publicly shared EBS snapshots. Awseye makes these calls regularly and stores the resources it finds.
- Guess and validate - For some resource types, it’s possible to guess, predict, or otherwise know their name, e.g. service-linked role names are documented. A subset of those resource types also has some kind of ‘trick’ to test for their existence, e.g. IAM principals can be validated using the policy validation engine. Awseye uses these conditions to run a reconnaissance on each new account it sees with wordlists we’ve curated.
- Other - sometimes an account ID is not needed to validate the existence of a resource. S3 buckets have global names and can be validated via a simple HTTP HEAD request. Awseye makes all sorts of guesses and inferences about bucket names. Sometimes it even uses account IDs inside bucket names because AWS and popular open source software automatically create buckets using specific naming patterns.
- Other other - Look, things get weird sometimes. We stumble into stuff, I don’t know.
That’s how Awseye finds the things. Once found, it stores them in 3 separate datasets:
- Accounts - This represents an individual account ID and the metadata known about it. Is it active, suspended, or invalid? Who we’ve guessed owns it? etc.
- Resources - These are things we believe are or were actually deployed resources inside AWS accounts. Most of them include a link to an account ID. Each unique resource should be represented only once. Many are parsed and validated from mentions.
- Mentions - This dataset is more raw. It’s the link between the identifier that was mentioned and where it was mentioned. Uniqueness and correctness is less of a concern here. For example, there are plenty of dummy identifiers from various AWS libraries that just happened to match a loose regex.
Who is Awseye for
Awseye is for everyone. It’s open and free to use. But we had these use cases in mind when building it:
- Cloud engineers trying to understand and secure their own AWS accounts
- Security engineers connecting more contextual data to their SIEM or other security tooling
- Bounty hunters looking for vulnerabilities in specific AWS accounts
- Security researchers looking for patterns and vulnerabilities that apply to a large number of accounts
- Security product owners improving the context in platforms they sell
Not on the list but have a cool use case? Let us know at weird@awseye.com.
Why would we give attackers access to this?
This is an important and valid question to ask but ultimately the premise of the question is flawed. The question implies that attackers don’t already have access to this information. They do. It’s the defenders who don’t.
The below graphic from the Orca 2023 Honeypotting in the Cloud Report illustrates this point. Attackers monitor public repositories like Github, Pastebin, Gitlab etc. for exposed AWS access keys and use them within minutes of publication. Rami McCarthy has compiled a list of similar research.
The average engineer or average business can’t justify spending the resources building the tooling to maintain a dataset like that of Awseye. It takes months of engineering to create, and then it must be maintained and improved. However, the data has limited utility to any individual organisation. It’s a nice to have. Whereas an attacker who builds the dataset can abuse it over and over for significant gain, thus making it a worthy investment.
How we’re integrating it into the Plerion platform
Our position at Plerion is that we identify the 1% of cloud risks that matter so you can spend your time and money where it’s needed most. To do that, we’ve built a risk engine that takes into account all the contributing factors when assessing a risk. You can start a free trial any time to test it out.
For example, it’s not enough to say that an S3 bucket is publicly exposed because it might be exposed on purpose to host a public website. There’s a difference between reading and writing to a bucket. Sometimes encryption keys act as access control even if the bucket is technically public.
One of the contributing factors in our risk engine is whether the resource identifier is known or guessable. The Awseye dataset will be used to establish that datapoint in the risk engine. It’s not the most important factor in a risk, but when an organisation has thousands of things it could be fixing or doing choosing amongst them becomes critical.
Tell us what you think
This is just the beginning. We want to make Awseye Awsome. Have ideas about what resources we should be looking for? Where we should be looking for them? Search options we should have? Send your thoughts to thoughts@awseye.com.