Your cloud bill has two parts: the costs you know about, and the costs that are quietly bleeding money while you focus on the big line items.
Most teams look at their bill and see EC2, RDS, S3 — the services they intentionally provisioned. They optimize those. They right-size instances, buy reservations, delete unused volumes. Good.
But underneath those obvious costs sits a layer of charges that most teams never examine closely: data transfer fees, NAT Gateway processing charges, idle load balancers, CloudWatch log ingestion, and over-provisioned Kubernetes clusters running at a fraction of their capacity.
These hidden costs typically account for 15–30% of a cloud bill. And the cloud providers have no incentive to make them obvious.
Data Egress: The Tax on Moving Your Own Data
Data transfer pricing is where cloud providers make a significant chunk of margin, and it’s designed to be confusing. There are at least six different data transfer rate categories in AWS alone, and they change depending on source, destination, and service.
Here are the ones that catch teams off guard:
Cross-AZ Transfer: $0.01/GB in Both Directions
Every time your service in us-east-1a talks to a database in us-east-1b, you pay $0.01/GB each way. That’s $0.02/GB round trip.
This sounds trivial until you do the math. A microservices architecture with 10 services making frequent cross-AZ calls can easily push 50–100 TB/month of inter-AZ traffic. At $0.02/GB, that’s $1,000–$2,000/month just for your services to talk to each other within the same region.
How to find it: In AWS Cost Explorer, filter by “Usage Type” and look for entries containing DataTransfer-Regional-Bytes. The number is almost always higher than people expect.
How to fix it:
- Co-locate latency-sensitive services in the same AZ where possible
- Use topology-aware routing in Kubernetes to prefer same-AZ endpoints
- For read-heavy workloads, deploy read replicas in each AZ instead of crossing AZ boundaries for every query
CloudFront to Origin: You Pay Both Sides
When CloudFront fetches from your origin (say, an ALB or S3 bucket), you pay for the data transfer from the origin to CloudFront, plus CloudFront’s own data transfer to the end user. For frequently invalidated content or low cache-hit ratios, this double-charge adds up.
We worked with a team that had a CloudFront distribution with a 40% cache hit ratio serving 8 TB/month. They were paying ~$680/month in origin-to-CloudFront transfer that they could have eliminated by fixing their cache key configuration and increasing TTLs. Their cache hit ratio went to 92%, and origin transfer dropped to ~$60/month.
VPC Peering: Free Within Region, Expensive Across
VPC peering within the same region is free for data transfer. Cross-region VPC peering costs $0.01/GB. If you have a multi-region architecture with significant cross-region traffic, this can be substantial. One team we saw was running $3,400/month in cross-region peering costs because their analytics pipeline was pulling production data from us-east-1 to a processing cluster in eu-west-1 every hour.
The fix was replicating the data to an S3 bucket in eu-west-1 (S3 cross-region replication is cheaper than sustained cross-region peering traffic) and having the analytics pipeline read locally.
NAT Gateway: The Silent Budget Killer
If I had to pick one AWS cost that surprises the most teams, it’s NAT Gateway. It’s consistently in the top 5 line items for organizations running workloads in private subnets, and almost nobody budgets for it.
The pricing:
- $0.045/hour per NAT Gateway (~$32/month just to exist)
- $0.045/GB for data processed through it
That processing charge is the killer. Every byte of traffic from your private subnets to the internet passes through the NAT Gateway and gets charged at $0.045/GB. Docker image pulls, apt package updates, API calls to third-party services, sending logs to external monitoring — all of it.
A Real Example
A 50-person engineering team we worked with had their NAT Gateway as their #3 AWS line item at $2,100/month. When we dug in, here’s what was driving it:
| Traffic source | Monthly GB | Monthly cost |
|---|---|---|
| Docker image pulls (CI/CD) | 12,000 GB | $540 |
| CloudWatch Logs to endpoint | 8,500 GB | $382 |
| Third-party API calls | 6,200 GB | $279 |
| OS package updates | 4,800 GB | $216 |
| Everything else | 15,200 GB | $683 |
Total: ~46,700 GB processed, ~$2,100/month.
How to Cut NAT Gateway Costs
VPC Endpoints for AWS services. This is the single highest-impact fix. If your workloads in private subnets are talking to S3, DynamoDB, CloudWatch, ECR, or other AWS services through the NAT Gateway, you’re paying data processing charges for traffic that could go over AWS’s internal network for free (gateway endpoints) or at a lower rate (interface endpoints).
An S3 Gateway Endpoint is free. It routes S3 traffic over AWS’s backbone instead of through your NAT Gateway. If you’re pulling large datasets from S3 in private subnets, this alone can save hundreds per month.
Interface endpoints for ECR cost $7.20/month per AZ but eliminate the $0.045/GB charge for Docker image pulls. If you’re pulling more than 160 GB/month of images, the endpoint pays for itself.
Cache Docker images locally. Use a pull-through cache for ECR, or run a registry mirror. One team reduced their ECR-related NAT Gateway traffic from 12 TB/month to 400 GB/month by deploying a pull-through cache.
Move logging to VPC endpoints. CloudWatch Logs has an interface endpoint. If you’re shipping significant log volume, the endpoint cost ($7.20/month per AZ) is usually far less than the NAT Gateway processing charge.
Over-Provisioned Kubernetes Clusters
Kubernetes has a resource management problem, and it’s costing most teams significantly more than they realize.
The pattern looks like this: a developer writes a deployment manifest and sets resource requests to “something safe” — maybe 500m CPU and 512Mi memory. The pod actually uses 80m CPU and 120Mi memory at steady state. But the scheduler reserves the full requested amount on the node. Multiply that across 200 pods, and your cluster nodes are running at 25–35% actual utilization while Kubernetes reports them as 70%+ allocated.
The Math Gets Ugly Fast
Consider a cluster with 10 m5.2xlarge nodes ($0.384/hr each):
- Monthly node cost: 10 x $0.384 x 730 hours = $2,803/month
- Actual CPU utilization: 30%
- If right-sized to actual usage (with headroom): 4–5 nodes needed
- Potential savings: $1,100–$1,680/month
Now consider that many organizations run 3–5 clusters (prod, staging, dev, maybe per-region). The waste compounds.
Why This Happens
Default resource requests are too high. Most Helm charts and example manifests ship with generous defaults. The nginx-ingress chart defaults to 100m CPU / 90Mi memory per replica. The actual usage for a low-traffic cluster is a fraction of that.
Nobody revisits resource requests after initial deployment. The request was set during the first deployment when nobody knew the actual resource profile. Six months later, they have plenty of metrics data but the requests were never updated.
Request/limit confusion. Some teams set requests equal to limits, which means the scheduler reserves the maximum amount the pod could ever use, even though it almost never hits that peak.
How to Fix It
Step 1: Install a tool that shows the gap between requested and actual resources. Kubernetes Metrics Server plus a dashboard (Grafana with Prometheus, or your APM tool’s Kubernetes view) will show you actual usage vs. requests per pod.
Step 2: Reduce requests to the p95 of actual usage plus 20% headroom. Not the peak. The p95. If a pod’s CPU usage is 80m at p95 and you have requests set at 500m, drop to 100m.
Step 3: Enable the Cluster Autoscaler or Karpenter. Once requests are accurate, the autoscaler can make correct decisions about node count. Over-requested pods mean the autoscaler provisions nodes you don’t need.
Step 4: Consider separate node pools for different workload profiles. Memory-intensive pods on r-series instances, compute-heavy on c-series. Running everything on general-purpose m-series means you’re over-provisioning one dimension to satisfy the other.
Load Balancers That Do Nothing
Every ALB costs a minimum of ~$16.20/month (the hourly charge alone, before LCU costs). Every NLB costs ~$16.20/month. Every Classic Load Balancer costs ~$18/month.
Check how many load balancers you have. Then check how many are actually receiving traffic.
In a typical AWS account that’s been running for 2+ years, we find 3–8 idle load balancers. That’s $50–$130/month for resources serving zero requests.
They accumulate because:
- A service was decommissioned but the infrastructure wasn’t fully torn down
- A blue/green deployment left the old LB in place
- A test environment was partially cleaned up
- Someone created an LB manually “to test something” and forgot about it
How to find them: AWS Cost Explorer, filter by ELB, look for load balancers with near-zero LCU charges but non-zero hourly charges. Or check CloudWatch RequestCount metrics — anything with zero requests over 30 days is a candidate for deletion.
CloudWatch Logs: Ingestion Costs Add Up
CloudWatch Logs charges $0.50/GB for log ingestion. That’s the cost to write logs, before any storage or analysis costs.
A single verbose application logging at INFO level with request/response bodies can easily produce 50–100 GB/month of logs. That’s $25–$50/month for one service. Across 20 services, you’re looking at $500–$1,000/month just to write logs that, in many cases, nobody reads.
Common offenders:
- Debug-level logging left enabled in production
- Full request/response body logging for APIs
- Kubernetes control plane logging with all log types enabled (audit logs alone can be massive)
- AWS service logs (VPC Flow Logs, CloudTrail data events) enabled broadly without filtering
Fixes:
- Set appropriate log levels per environment. Debug in dev, Warning or Error in prod for most services.
- Sample verbose logs instead of capturing everything. Log 10% of requests at INFO level, 100% at ERROR.
- Use log filters before ingestion. CloudWatch subscription filters can route only relevant logs, but you still pay ingestion for the full stream. Better to filter at the application level.
- For VPC Flow Logs, use the custom log format to capture only the fields you need, and consider sending to S3 instead of CloudWatch ($0.50/GB vs. ~$0.023/GB for S3 storage).
Secrets Manager: Per-Secret Pricing
AWS Secrets Manager charges $0.40/month per secret, plus $0.05 per 10,000 API calls. This is negligible at small scale, but teams that store every config value as a separate secret can accumulate costs.
One team had 340 secrets in Secrets Manager — many of them non-sensitive configuration values that could have lived in SSM Parameter Store (free for standard parameters). At $0.40/secret, they were spending $136/month on Secrets Manager when $100/month of that could have been eliminated by using Parameter Store for non-sensitive values.
Not a massive cost, but it’s representative of the broader pattern: services with per-unit pricing that seem cheap individually but accumulate when nobody’s watching.
How to Find Hidden Costs Systematically
Hunting these costs one by one is tedious. Here’s a systematic approach:
Cost and Usage Report (CUR) Analysis
The AWS Cost and Usage Report is the most granular cost data available. It’s a CSV (or Parquet) dump of every line item on your bill. Set it up to export to S3, then query it with Athena.
Useful queries:
- Group by
product_product_nameand sort by cost to find unexpected services - Filter for
usage_typecontainingDataTransferto see all transfer costs in one view - Look for
line_item_type = "Usage"where the service isn’t in your known infrastructure list
Cost Explorer’s “Daily Unblended Cost” View
Switch to daily granularity and look for services with steady daily charges that you don’t recognize. A $3/day charge is easy to miss but costs $90/month.
Tag-Based Gap Analysis
If you have a tagging strategy, look for untagged costs. Untagged costs are usually either forgotten resources or services you didn’t know were incurring charges (like NAT Gateway data processing or cross-AZ transfer).
Automated Anomaly Detection
Manual reviews catch known problems. Automated anomaly detection catches unexpected ones. A spike in data transfer costs, a new service appearing on the bill, or a gradual increase in a line item that should be flat — these are the signals that lead you to hidden costs.
Xplorr provides anomaly detection across AWS, Azure, and GCP that surfaces exactly these kinds of hidden costs automatically. Instead of querying CUR data manually, you get alerts when something doesn’t look right — which is often the first sign of a hidden cost you didn’t know existed.
The Uncomfortable Truth
Cloud providers price data transfer, NAT Gateways, and managed service API calls in a way that makes each individual charge seem insignificant. $0.045/GB here, $0.01/GB there, $0.40/secret/month. Individually, nobody would notice or care.
But these charges are multiplicative. They scale with your traffic, your number of services, your data volume. And they’re not prominently displayed in any default billing view. You have to go looking for them.
The cloud providers have no incentive to make this obvious. They benefit from the complexity. The default billing dashboard shows you your top services by spend, which are the ones you already know about. The hidden costs stay hidden until someone does the work to find them.
That work — auditing data transfer paths, checking NAT Gateway processing, reviewing Kubernetes utilization, hunting idle resources — should be a quarterly exercise at minimum. For teams spending over $50K/month on cloud, it should be monthly.
The 15–30% savings sitting in hidden costs is real money. For a team spending $100K/month on cloud, that’s $15K–$30K/month waiting to be found. That’s an engineer’s salary, recovered from waste that never should have existed in the first place.
Keep reading
- AWS Cost Optimization Strategies That Actually Work
- 5 Signs Your Cloud Bill Is About to Spike
- What Is a Cloud Cost Anomaly (And Why You Should Care)
See how Xplorr helps → Features
Xplorr finds an average of 23% in unnecessary cloud spend. Get started free.
Share this article