Your cloud bill has two parts: the costs you know about, and the costs that are quietly bleeding money while you focus on the big line items.

Most teams look at their bill and see EC2, RDS, S3 — the services they intentionally provisioned. They optimize those. They right-size instances, buy reservations, delete unused volumes. Good.

But underneath those obvious costs sits a layer of charges that most teams never examine closely: data transfer fees, NAT Gateway processing charges, idle load balancers, CloudWatch log ingestion, and over-provisioned Kubernetes clusters running at a fraction of their capacity.

These hidden costs typically account for 15–30% of a cloud bill. And the cloud providers have no incentive to make them obvious.

Data Egress: The Tax on Moving Your Own Data

Data transfer pricing is where cloud providers make a significant chunk of margin, and it’s designed to be confusing. There are at least six different data transfer rate categories in AWS alone, and they change depending on source, destination, and service.

Here are the ones that catch teams off guard:

Cross-AZ Transfer: $0.01/GB in Both Directions

Every time your service in us-east-1a talks to a database in us-east-1b, you pay $0.01/GB each way. That’s $0.02/GB round trip.

This sounds trivial until you do the math. A microservices architecture with 10 services making frequent cross-AZ calls can easily push 50–100 TB/month of inter-AZ traffic. At $0.02/GB, that’s $1,000–$2,000/month just for your services to talk to each other within the same region.

How to find it: In AWS Cost Explorer, filter by “Usage Type” and look for entries containing DataTransfer-Regional-Bytes. The number is almost always higher than people expect.

How to fix it:

Co-locate latency-sensitive services in the same AZ where possible
Use topology-aware routing in Kubernetes to prefer same-AZ endpoints
For read-heavy workloads, deploy read replicas in each AZ instead of crossing AZ boundaries for every query

CloudFront to Origin: You Pay Both Sides

When CloudFront fetches from your origin (say, an ALB or S3 bucket), you pay for the data transfer from the origin to CloudFront, plus CloudFront’s own data transfer to the end user. For frequently invalidated content or low cache-hit ratios, this double-charge adds up.

We worked with a team that had a CloudFront distribution with a 40% cache hit ratio serving 8 TB/month. They were paying ~$680/month in origin-to-CloudFront transfer that they could have eliminated by fixing their cache key configuration and increasing TTLs. Their cache hit ratio went to 92%, and origin transfer dropped to ~$60/month.

VPC Peering: Free Within Region, Expensive Across

VPC peering within the same region is free for data transfer. Cross-region VPC peering costs $0.01/GB. If you have a multi-region architecture with significant cross-region traffic, this can be substantial. One team we saw was running $3,400/month in cross-region peering costs because their analytics pipeline was pulling production data from us-east-1 to a processing cluster in eu-west-1 every hour.

The fix was replicating the data to an S3 bucket in eu-west-1 (S3 cross-region replication is cheaper than sustained cross-region peering traffic) and having the analytics pipeline read locally.

NAT Gateway: The Silent Budget Killer

If I had to pick one AWS cost that surprises the most teams, it’s NAT Gateway. It’s consistently in the top 5 line items for organizations running workloads in private subnets, and almost nobody budgets for it.

The pricing:

$0.045/hour per NAT Gateway (~$32/month just to exist)
$0.045/GB for data processed through it

That processing charge is the killer. Every byte of traffic from your private subnets to the internet passes through the NAT Gateway and gets charged at $0.045/GB. Docker image pulls, apt package updates, API calls to third-party services, sending logs to external monitoring — all of it.

A Real Example

A 50-person engineering team we worked with had their NAT Gateway as their #3 AWS line item at $2,100/month. When we dug in, here’s what was driving it:

Traffic source	Monthly GB	Monthly cost
Docker image pulls (CI/CD)	12,000 GB	$540
CloudWatch Logs to endpoint	8,500 GB	$382
Third-party API calls	6,200 GB	$279
OS package updates	4,800 GB	$216
Everything else	15,200 GB	$683

Total: ~46,700 GB processed, ~$2,100/month.

How to Cut NAT Gateway Costs

VPC Endpoints for AWS services. This is the single highest-impact fix. If your workloads in private subnets are talking to S3, DynamoDB, CloudWatch, ECR, or other AWS services through the NAT Gateway, you’re paying data processing charges for traffic that could go over AWS’s internal network for free (gateway endpoints) or at a lower rate (interface endpoints).

An S3 Gateway Endpoint is free. It routes S3 traffic over AWS’s backbone instead of through your NAT Gateway. If you’re pulling large datasets from S3 in private subnets, this alone can save hundreds per month.

Interface endpoints for ECR cost $7.20/month per AZ but eliminate the $0.045/GB charge for Docker image pulls. If you’re pulling more than 160 GB/month of images, the endpoint pays for itself.

Cache Docker images locally. Use a pull-through cache for ECR, or run a registry mirror. One team reduced their ECR-related NAT Gateway traffic from 12 TB/month to 400 GB/month by deploying a pull-through cache.

Move logging to VPC endpoints. CloudWatch Logs has an interface endpoint. If you’re shipping significant log volume, the endpoint cost ($7.20/month per AZ) is usually far less than the NAT Gateway processing charge.

Over-Provisioned Kubernetes Clusters

Kubernetes has a resource management problem, and it’s costing most teams significantly more than they realize.

The pattern looks like this: a developer writes a deployment manifest and sets resource requests to “something safe” — maybe 500m CPU and 512Mi memory. The pod actually uses 80m CPU and 120Mi memory at steady state. But the scheduler reserves the full requested amount on the node. Multiply that across 200 pods, and your cluster nodes are running at 25–35% actual utilization while Kubernetes reports them as 70%+ allocated.

The Math Gets Ugly Fast

Consider a cluster with 10 m5.2xlarge nodes ($0.384/hr each):

Monthly node cost: 10 x $0.384 x 730 hours = $2,803/month
Actual CPU utilization: 30%
If right-sized to actual usage (with headroom): 4–5 nodes needed
Potential savings: $1,100–$1,680/month

Now consider that many organizations run 3–5 clusters (prod, staging, dev, maybe per-region). The waste compounds.

Why This Happens

Default resource requests are too high. Most Helm charts and example manifests ship with generous defaults. The nginx-ingress chart defaults to 100m CPU / 90Mi memory per replica. The actual usage for a low-traffic cluster is a fraction of that.

Nobody revisits resource requests after initial deployment. The request was set during the first deployment when nobody knew the actual resource profile. Six months later, they have plenty of metrics data but the requests were never updated.

Request/limit confusion. Some teams set requests equal to limits, which means the scheduler reserves the maximum amount the pod could ever use, even though it almost never hits that peak.

How to Fix It

Step 1: Install a tool that shows the gap between requested and actual resources. Kubernetes Metrics Server plus a dashboard (Grafana with Prometheus, or your APM tool’s Kubernetes view) will show you actual usage vs. requests per pod.

Step 2: Reduce requests to the p95 of actual usage plus 20% headroom. Not the peak. The p95. If a pod’s CPU usage is 80m at p95 and you have requests set at 500m, drop to 100m.

Step 3: Enable the Cluster Autoscaler or Karpenter. Once requests are accurate, the autoscaler can make correct decisions about node count. Over-requested pods mean the autoscaler provisions nodes you don’t need.

Step 4: Consider separate node pools for different workload profiles. Memory-intensive pods on r-series instances, compute-heavy on c-series. Running everything on general-purpose m-series means you’re over-provisioning one dimension to satisfy the other.

Load Balancers That Do Nothing

Every ALB costs a minimum of ~$16.20/month (the hourly charge alone, before LCU costs). Every NLB costs ~$16.20/month. Every Classic Load Balancer costs ~$18/month.

Check how many load balancers you have. Then check how many are actually receiving traffic.

In a typical AWS account that’s been running for 2+ years, we find 3–8 idle load balancers. That’s $50–$130/month for resources serving zero requests.

They accumulate because:

A service was decommissioned but the infrastructure wasn’t fully torn down
A blue/green deployment left the old LB in place
A test environment was partially cleaned up
Someone created an LB manually “to test something” and forgot about it

How to find them: AWS Cost Explorer, filter by ELB, look for load balancers with near-zero LCU charges but non-zero hourly charges. Or check CloudWatch RequestCount metrics — anything with zero requests over 30 days is a candidate for deletion.

CloudWatch Logs: Ingestion Costs Add Up

CloudWatch Logs charges $0.50/GB for log ingestion. That’s the cost to write logs, before any storage or analysis costs.

A single verbose application logging at INFO level with request/response bodies can easily produce 50–100 GB/month of logs. That’s $25–$50/month for one service. Across 20 services, you’re looking at $500–$1,000/month just to write logs that, in many cases, nobody reads.

Common offenders:

Debug-level logging left enabled in production
Full request/response body logging for APIs
Kubernetes control plane logging with all log types enabled (audit logs alone can be massive)
AWS service logs (VPC Flow Logs, CloudTrail data events) enabled broadly without filtering

Fixes:

Set appropriate log levels per environment. Debug in dev, Warning or Error in prod for most services.
Sample verbose logs instead of capturing everything. Log 10% of requests at INFO level, 100% at ERROR.
Use log filters before ingestion. CloudWatch subscription filters can route only relevant logs, but you still pay ingestion for the full stream. Better to filter at the application level.
For VPC Flow Logs, use the custom log format to capture only the fields you need, and consider sending to S3 instead of CloudWatch ($0.50/GB vs. ~$0.023/GB for S3 storage).

Secrets Manager: Per-Secret Pricing

AWS Secrets Manager charges $0.40/month per secret, plus $0.05 per 10,000 API calls. This is negligible at small scale, but teams that store every config value as a separate secret can accumulate costs.

One team had 340 secrets in Secrets Manager — many of them non-sensitive configuration values that could have lived in SSM Parameter Store (free for standard parameters). At $0.40/secret, they were spending $136/month on Secrets Manager when $100/month of that could have been eliminated by using Parameter Store for non-sensitive values.

Not a massive cost, but it’s representative of the broader pattern: services with per-unit pricing that seem cheap individually but accumulate when nobody’s watching.

How to Find Hidden Costs Systematically

Hunting these costs one by one is tedious. Here’s a systematic approach:

Cost and Usage Report (CUR) Analysis

The AWS Cost and Usage Report is the most granular cost data available. It’s a CSV (or Parquet) dump of every line item on your bill. Set it up to export to S3, then query it with Athena.

Useful queries:

Group by product_product_name and sort by cost to find unexpected services
Filter for usage_type containing DataTransfer to see all transfer costs in one view
Look for line_item_type = "Usage" where the service isn’t in your known infrastructure list

Cost Explorer’s “Daily Unblended Cost” View

Switch to daily granularity and look for services with steady daily charges that you don’t recognize. A $3/day charge is easy to miss but costs $90/month.

Tag-Based Gap Analysis

If you have a tagging strategy, look for untagged costs. Untagged costs are usually either forgotten resources or services you didn’t know were incurring charges (like NAT Gateway data processing or cross-AZ transfer).

Automated Anomaly Detection

Manual reviews catch known problems. Automated anomaly detection catches unexpected ones. A spike in data transfer costs, a new service appearing on the bill, or a gradual increase in a line item that should be flat — these are the signals that lead you to hidden costs.

Xplorr provides anomaly detection across AWS, Azure, and GCP that surfaces exactly these kinds of hidden costs automatically. Instead of querying CUR data manually, you get alerts when something doesn’t look right — which is often the first sign of a hidden cost you didn’t know existed.

The Uncomfortable Truth

Cloud providers price data transfer, NAT Gateways, and managed service API calls in a way that makes each individual charge seem insignificant. $0.045/GB here, $0.01/GB there, $0.40/secret/month. Individually, nobody would notice or care.

But these charges are multiplicative. They scale with your traffic, your number of services, your data volume. And they’re not prominently displayed in any default billing view. You have to go looking for them.

The cloud providers have no incentive to make this obvious. They benefit from the complexity. The default billing dashboard shows you your top services by spend, which are the ones you already know about. The hidden costs stay hidden until someone does the work to find them.

That work — auditing data transfer paths, checking NAT Gateway processing, reviewing Kubernetes utilization, hunting idle resources — should be a quarterly exercise at minimum. For teams spending over $50K/month on cloud, it should be monthly.

The 15–30% savings sitting in hidden costs is real money. For a team spending $100K/month on cloud, that’s $15K–$30K/month waiting to be found. That’s an engineer’s salary, recovered from waste that never should have existed in the first place.

Keep reading

See how Xplorr helps → Features

Xplorr finds an average of 23% in unnecessary cloud spend. Get started free.