Post mortem: oauth2-proxy security incident

The use of the trusted-ip-flag for oauth2-proxy resulted in skipped authentication steps for various services in the Shivering-Isles infrastructure.

Impact

Multiple endpoints, that were considered to be protected by OIDC authentication through the central SI-Auth SSO provider, were exposed without authentication.

After a brief investigation regarding the exposed endpoints, it can be confidently said that no PII or otherwise sensitive information has been leaked. Further, no indicators of compromise were found.

While Prometheus, Alertmanager and Forecastle were exposed to the internet, its unlikely to be reason for concern, as their access is read-only in nature.

For the Longhorn web UI, that was exposed to the internet, the attack vectors were limited to adjusting longhorn configurations, deleting or creating backups and deleting or creating volumes. As metrics indicate that none of these actions have taken place, it’s assumed that no attacker took advantage of these capabilities.

All other services, including all IoT endpoints, that were affected by this incident, have not been exposed to the internet or untrusted network devices. Therefore it’s assumed that no compromise has taken place.

Root Causes

Graphic showing that requests from external users are forwarded to ingress-nginx, which requests the `/oauth2/auth` endpoint, which returns 202 because it thinks the IP from `ingress-nginx` is trusted and therefore signals `ingress-nginx` that the external, unauthicated user is allowed to access the protected resource.

The trusted-ip-flag was introduced in the misconception it would work similar to trusted IP/Proxy options in other software, where this option is used to allow trusted headers, like X-Forward-For to be interpreted by oauth2-proxy, providing the original request IP in the request logs.

The actual implementation however, allows to skip authentication entirely, when the request is submitted from an IP address of this range. The configuration option was set to the pod CIDR of the Kubernetes cluster, to allow ingress-nginx Pods to be identified as a trusted entity.

Trigger

The incident itself was triggered by the commit b404d3ca which rolled out the change for usage of trusted-ip to oauth2-proxy instances in the entire infrastructure.

Resolution

The mistake was fixed by removing the trusted-ip option from the deployment in the commit a500e1ca.

Detection

The issue was detected, when reviewing a configuration change in Alertmanager and no authentication screen was triggered, before access was granted. Investigation through a “private Firefox window” showed that no authentication was required.

First investigation expected a problem with ingress-nginx, but access logs from both ingress-nginx and oauth2-proxy confirmed that requests were successfully routed. However oauth2-proxy would always answer with HTTP Status 202 on the /oauth2/auth endpoint, which is only expected for authenticated users.

Action Items

Remove the use of trusted-ip from all oauth2-proxy instances
Put network-level restrictions in place to add an additional layer of security
Revisit all deployed oauth2-proxy instances and check configuration options
Add monitoring for expected authentication requests

Lessons Learned

“Trusted IPs” can have very different meanings depending on software implementations
Validate endpoints to be actually authenticated on a regular basis

What went well

Most endpoints protected by oauth2-proxy were also restricted to local networks only as a security-in-depth measure. As a result continued to be inaccessible from the internet
All services that were exposed contained non-critical information
Even if attacker had deleted volumes and backups, the second level of backups would have been able to be recovered and by that extend all content of the volumes
Most software was already using their built-in SSO capabilities for OIDC, resulting in not being vulnerable, even if they were additionally behind an oauth2-proxy (like Grafana or Minio)

What went wrong

The Longhorn web UI was exposed, this could have resulted in deleted volumes and backups of the volumes
The state of skipped authentication was kept unnoticed for 96 days and was only discovered by accident

Where we got lucky

Noticing this issue was pure luck, it could have stayed unnoticed for further weeks
The fact that no one decided to mess with the Longhorn web UI was also lucky, preventing actual damage to services

Timeline

Time (Europe/Berlin)	Action
2023-09-26 20:18:55	Introduction of the `trusted-ip` configuration option
2023-12-31 03:58:00	Noticing the unauthenticated endpoint for Alertmanager
2023-12-31 04:00:00	Restrict monitoring endpoints to local networks
2023-12-31 04:11:00	Validating configuration and searching for recent bug reports about external authentication with `ingress-nginx`
2023-12-31 04:14:00	Validating configuration and searching for recent bug reports about external authentication with `ingress-nginx` in combination with `oauth2-proxy`
2023-12-31 04:50:00	Validating issue with `trusted-ip`-flag
2023-12-31 04:56:00	Fix disabling `trusted-ip` lands in GitOps Repository and is deployed to production
2023-12-31 05:09:00	Investigating some unrelated problems with `oauth2-proxy` integration that now show up due to actual authentication taking place
2023-12-31 05:30:00	Add monitoring for authenticated Endpoints that validates authentication requirement
2023-12-31 05:45:00	Investigating exposure (relevant Endpoints and introduction of `trusted-ip` setting) and validation of `oauth2-proxy` logic
2023-12-31 06:31:00	Writing post-mortem for incident

Supporting information

Quote from the oauth2-proxy configuration page regarding the trusted-ip setting:

list of IPs or CIDR ranges to allow to bypass authentication (may be given multiple times). When combined with --reverse-proxy and optionally --real-client-ip-header this will evaluate the trust of the IP stored in an HTTP header by a reverse proxy rather than the layer-3/4 remote address. WARNING: trusting IPs has inherent security flaws, especially when obtaining the IP address from an HTTP header (reverse-proxy mode). Use this option only if you understand the risks and how to manage them.

Relevant sections in the oauth2-proxy code:

Further information regarding usage of the external-auth feature with ingress-nginx: