Using AWS WAF the Right Way (Not Just Turning It On)

if you think AWS WAF is “set it and forget it” security, you’re going to have a bad time. And an expensive one.

I learned this the hard way three years ago when I confidently enabled AWS Managed Rules on our production API, patted myself on the back, and went to lunch. By the time I got back, our Slack was on fire. Legitimate mobile app users couldn’t log in. Our monitoring dashboard looked like a Christmas tree. And our CEO was asking very pointed questions about why customers were locked out.

That day, i learned everything I’m about to share with you.

The Problem Nobody Talks About

AWS WAF is incredibly powerful. It’s also incredibly easy to misconfigure. The documentation makes it sound simple: enable a few managed rule groups, maybe add some rate limiting, and boom—you’re protected.

This is dangerously incomplete advice.

What actually happens in the real world is this: you enable AWS Managed Rules Core Rule Set (CRS), feeling responsible and security-conscious. Within hours, your WAF is blocking:

Your own monitoring tools (false positive on SQL injection rules)
Mobile apps with legitimate but “suspicious” user agents
International customers whose requests look different from your testing
That critical third-party integration you forgot sends XML in POST bodies

I’ve seen teams disable WAF entirely after incidents like these. That’s like removing your car’s airbags because they deployed accidentally once. It’s not the solution.

Start with Visibility, Not Protection

Here’s my controversial opinion: you should run AWS WAF in COUNT mode for at least two weeks before blocking anything.

I know, I know. Security teams hate this. “Two weeks of vulnerability!” they cry. But here’s the thing, if you don’t know what normal traffic looks like in your environment, you’re flying blind. And blind security is worse than no security because it gives you false confidence.

Here’s what I do on every new WAF implementation:

Week 1-2: Pure Observation

{
  "Name": "CoreRuleSet-COUNT-MODE",
  "Priority": 1,
  "Statement": {
    "ManagedRuleGroupStatement": {
      "VendorName": "AWS",
      "Name": "AWSManagedRulesCommonRuleSet"
    }
  },
  "OverrideAction": {
    "Count": {}
  },
  "VisibilityConfig": {
    "SampledRequestsEnabled": true,
    "CloudWatchMetricsEnabled": true,
    "MetricName": "CoreRuleSetCount"
  }
}

During this period, I’m obsessively checking CloudWatch Logs Insights. My go-to query:

fields @timestamp, httpRequest.uri, terminatingRuleId, action
| filter action = "COUNT"
| stats count() by terminatingRuleId
| sort count desc

This tells me exactly which rules would have fired and how often. The insights are gold.

What I Found (And You Probably Will Too)

In my last implementation for a SaaS platform serving 50,000+ daily users:

GenericRFI_BODY triggered 847 times—all false positives from our rich text editor
SizeRestrictions_BODY blocked legitimate file upload requests
CrossSiteScripting_BODY flagged our own HTML email templates being previewed

Without COUNT mode, I would have broken core product features on day one.

The Right Way: Layered Defense with Surgical Exceptions

After your observation period, here’s the architecture I’ve battle-tested across five production environments:

Layer 1: Rate Limiting

Before you even think about managed rules, implement rate limiting. This is where you get 80% of your protection with near-zero false positives.

{
  "Name": "RateLimitGeneral",
  "Priority": 0,
  "Statement": {
    "RateBasedStatement": {
      "Limit": 2000,
      "AggregateKeyType": "IP"
    }
  },
  "Action": {
    "Block": {
      "CustomResponse": {
        "ResponseCode": 429
      }
    }
  }
}

I set this to 2,000 requests per 5 minutes per IP. Adjust based on your traffic patterns, but start conservative. Real users don’t make 2,000 requests in 5 minutes. Bots do.

Real impact: This single rule blocked 2.3 million malicious requests last quarter for one of my clients. That’s 2.3 million fewer requests that needed deep inspection.

Layer 2: Endpoint-Specific Rate Limits

Your login endpoint needs different protection than your static assets. This is obvious, yet I constantly see blanket rules applied everywhere.

{
  "Name": "LoginRateLimit",
  "Priority": 1,
  "Statement": {
    "RateBasedStatement": {
      "Limit": 20,
      "AggregateKeyType": "IP",
      "ScopeDownStatement": {
        "ByteMatchStatement": {
          "SearchString": "/api/auth/login",
          "FieldToMatch": {
            "UriPath": {}
          },
          "TextTransformations": [{
            "Priority": 0,
            "Type": "LOWERCASE"
          }],
          "PositionalConstraint": "EXACTLY"
        }
      }
    }
  },
  "Action": {
    "Block": {}
  }
}

20 login attempts per 5 minutes per IP is generous for legitimate users, brutal for credential stuffing attacks.

Layer 3: Managed Rules with Calculated Exceptions

Only now do I enable managed rule groups. But here’s the critical part: I create exceptions based on my COUNT mode observations.

{
  "Name": "CoreRuleSet-With-Exceptions",
  "Priority": 10,
  "Statement": {
    "ManagedRuleGroupStatement": {
      "VendorName": "AWS",
      "Name": "AWSManagedRulesCommonRuleSet",
      "ExcludedRules": [
        {
          "Name": "GenericRFI_BODY"
        },
        {
          "Name": "SizeRestrictions_BODY"
        }
      ]
    }
  },
  "OverrideAction": {
    "None": {}
  }
}

Excluding rules isn’t “weakening” security if you’re doing it based on data. It’s making security actually work in your environment.

Layer 4: Custom Rules for Your Specific Threats

This is where you need to think about YOUR application, not generic threats.

For an e-commerce client, we added:

{
  "Name": "BlockCheckoutBots",
  "Priority": 5,
  "Statement": {
    "AndStatement": {
      "Statements": [
        {
          "ByteMatchStatement": {
            "SearchString": "/checkout",
            "FieldToMatch": {
              "UriPath": {}
            },
            "TextTransformations": [{
              "Priority": 0,
              "Type": "LOWERCASE"
            }],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "NotStatement": {
            "Statement": {
              "ByteMatchStatement": {
                "SearchString": "Mozilla/5.0",
                "FieldToMatch": {
                  "SingleHeader": {
                    "Name": "user-agent"
                  }
                },
                "TextTransformations": [{
                  "Priority": 0,
                  "Type": "LOWERCASE"
                }],
                "PositionalConstraint": "CONTAINS"
              }
            }
          }
        }
      ]
    }
  },
  "Action": {
    "Block": {}
  }
}

This blocked checkout bots that were sniping limited inventory during product drops. Generic WAF rules wouldn’t catch this.

Monitoring: The Difference Between Success and Disaster

Here’s what keeps me up at night: silent failures. Your WAF blocking legitimate traffic and nobody noticing until customers complain.

I set up three critical alerts:

1. Sudden Spike in Blocks

fields @timestamp
| filter action = "BLOCK"
| stats count() as blockCount by bin(5m)
| filter blockCount > 100

If blocks jump above baseline, something changed. Maybe you’re under attack (good catch!). Maybe you deployed new code that triggers rules (bad!).

2. New Rules Triggering

fields terminatingRuleId
| filter action = "BLOCK"
| stats count() by terminatingRuleId
| sort @timestamp desc
| limit 20

I check this daily. New rules appearing in blocks mean new attack patterns or new false positives.

3. Geographic Anomalies

If 90% of your blocks suddenly come from a country you don’t serve, you’re under attack. If 90% of your blocks suddenly come from a country where you just launched, you have a false positive problem.

The Cost Reality Check

Let’s talk money, because WAF costs add up fast if you’re not careful.

What you pay for:

Per Web ACL: $5/month (cheap)
Per rule: $1/month (still cheap)
Per million requests: $0.60 (this is where it hurts)

My staging environment for a medium-traffic app costs about $45/month. Production runs $800-1,200/month depending on attack volume.

Here’s my cost optimization strategy:

Don’t inspect everything. Your health check endpoint getting hit 100,000 times a day doesn’t need deep inspection. Scope your rules.
Use rule groups efficiently. Five separate rules cost $5/month. One rule group with five rules costs $1/month.
Monitor request counts. Set a CloudWatch alarm when WAF-inspected requests exceed your budget threshold.

Common Mistakes I’ve Seen (And Made)

Mistake #1: Enabling All AWS Managed Rule Groups

The “more is better” approach. I tried this. Our WAF bill hit the roof in one month. Most of those rules were irrelevant to our application.

Fix: Start with Core Rule Set only. Add Known Bad Inputs if you’re paranoid. Add others based on actual threats you observe.

Mistake #2: Blocking Instead of Challenging

AWS WAF can send CAPTCHA challenges instead of hard blocks. This is massively underutilized.

{
  "Action": {
    "Captcha": {
      "CustomRequestHandling": {
        "InsertHeaders": [
          {
            "Name": "x-waf-action",
            "Value": "challenge"
          }
        ]
      }
    }
  }
}

For suspicious-but-not-certain traffic, challenge them. Real users solve CAPTCHAs. Bots don’t.

Mistake #3: Not Testing Changes

I cannot stress this enough: test WAF changes in staging first. Not just “does the site load” testing. Real load testing with production-like traffic patterns.

I use Artillery to replay production traffic patterns against staging with different WAF configurations:

config:
  target: 'https://staging.example.com'
  phases:
    - duration: 300
      arrivalRate: 50
scenarios:
  - name: "Normal user flow"
    flow:
      - get:
          url: "/api/products"
      - post:
          url: "/api/cart"
          json:
            productId: "{{ $randomNumber(1, 1000) }}"

Mistake #4: Forgetting About IP Reputation

AWS IP Reputation Lists are free and effective. Enable them.

{
  "Name": "AWSIPReputationList",
  "Priority": 2,
  "Statement": {
    "ManagedRuleGroupStatement": {
      "VendorName": "AWS",
      "Name": "AWSManagedRulesAmazonIpReputationList"
    }
  },
  "OverrideAction": {
    "None": {}
  }
}

This blocks known malicious IPs before they even reach your custom rules. It’s blocked 15-20% of attack traffic in my experience.

When WAF Isn’t Enough

Unpopular opinion: AWS WAF alone isn’t complete application security.

You still need:

Input validation in your application code
Rate limiting at the application level for business logic abuse
CSRF tokens for state-changing operations
Proper authentication (WAF doesn’t replace this)

I’ve seen developers think WAF means they can skip input sanitization. No. Defense in depth means layers. WAF is one layer.

My Standard WAF Configuration Template

After three years and dozens of implementations, here’s my starting template for most applications:

Priority 0: Rate limit (2000 req/5min per IP)
Priority 1: Login endpoint rate limit (20 req/5min per IP)
Priority 2: AWS IP Reputation List
Priority 3: Geo-blocking if applicable
Priority 5: Custom rules for application-specific threats
Priority 10: Core Rule Set with exceptions
Priority 15: Known Bad Inputs (in COUNT mode initially)

This order matters. Cheap, effective rules first. Expensive rules last.

The Bottom Line

AWS WAF is not a checkbox on your security compliance form. It’s a tool that requires tuning, monitoring, and ongoing maintenance.

The teams that get WAF right:

Start with observation, not protection
Build exceptions based on data, not fear
Monitor continuously
Test changes thoroughly
Understand their application’s actual threat model

The teams that get it wrong treat it like antivirus software from 2005, install and forget.

My final piece of advice: Budget time for WAF maintenance. I spend 2-3 hours monthly reviewing logs, adjusting rules, and checking for new threats. That small investment has prevented multiple outages and caught attacks early.

Your mileage may vary, but if you follow this approach, you’ll avoid the painful lessons I learned. Your WAF will actually protect you instead of collecting dust or, worse, breaking your application.