Context
As part of an internal WAF module for Cloudflare in Terraform
, I had to implement a rule to exclude bots from accessing and scraping certain domains. Since it is a module, the implementation doesn’t know which domains are specified, therefore the resource definition has to be generic enough to accommodate any number of different domains.
Solution
After several iterations and failures, I landed on an implementation leveraging the following Terraform
string functions:
- formatlist: Produces a list of strings by applying a format to a list of inputs values
- join: Produces a single string by concatenating multiple strings
Building a filter matching variable number of domains
To build the section of the filter that matches a single domain, we can use:
"http.host contains \"example\""
This is where the formatlist
function comes into play. Since the number of domains is only known during module invocation, we need a more complex expression that matches incoming traffic trying to reach any of the domains:
formatlist("http.host contains \"%s\"", var.domains_without_bots)
The expression above will generate a new list where each string is of the form
http.host contains \"<domain>\"
The above list is not a valid filter expression. We need to combine these strings into a coherent filter expression, by using the join
function
join(" or ", formatlist("http.host contains \"%s\"", var.domains_without_bots))
Which would generate:
http.host contains \"<domain1>\" or http.host contains \"<domain2>\"
Lastly the filter only makes sense when the traffic is coming from a bot and the filter expression becomes:
"(cf.client.bot and (${join(" or ", formatlist("http.host contains \"%s\"", var.domains_without_bots))}))"
Final Terraform definition
Combining everything above the filter definition looks as follows:
resource "cloudflare_filter" "domains_without_bots" {
zone_id = var.zone_id
description = "Filter bots trying to access domains: ${join(", ", var.domains_without_bots)}"
expression = "(cf.client.bot and (${join(" or ", formatlist("http.host contains \"%s\"", var.domains_without_bots))}))"
}
Which is then used as part of the Cloudflare firewall rule definition, by referencing the filter id
dynamically:
resource "cloudflare_firewall_rule" "domains_without_bots" {
zone_id = var.zone_id
description = "Block bots trying to access domains: ${join(", ", var.domains_without_bots)}"
filter_id = cloudflare_filter.domains_without_bots.id
action = "block"
}
Invoking the module would then look like:
module "waf" {
source = "/path/to/cloudflare/waf"
zone_id = var.zone_id
domains_without_bots = [
"example.com",
"test.example.io"
]
}