Azure IaC · Part 2 of 2Functions, Scripting & Drift

Functions, Scripting & the Truth About Drift Detection

Part 1 covered how each tool talks to Azure. Part 2 gets practical: multi-cloud reach, how ARM, Bicep, and Terraform functions resolve, running Bash and PowerShell mid-deployment, Bicep's rough edges, and why terraform plan and What-If both fall short as production drift detection.

Multi-cloud usage

ARM and Bicep can only be used with Azure yet Bicep extensions has been recently made available for external uses and can well probably expand in future. Terraform on the other hand, can be used with other platforms such as AWS, GCP and beyond. This capability stands as one of the most compelling reasons to opt for Terraform. From a practitioner's perspective, it eliminates the need to learn a completely new declarative syntax every time you interact with a different cloud provider. Especially for engineers jumping between AWS and Azure in their day-to-day work, using a single tool like Terraform makes quite a sense.

ARM/Bicep functions vs. Terraform functions

While there are syntactical differences, ARM, Bicep and Terraform functions are quite similar in usage and purpose and in some cases even their syntax is exactly the same, as in the case of concat() method. The significant difference is that the Terraform functions are resolved on the client side whereas ARM and Bicep functions are resolved at ARM Engine. concat() method. The significant difference is that the Terraform functions are resolved on the client side whereas ARM and Bicep functions are resolved at ARM Engine.

Implicit Resource referencing

When writing Infrastructure as Code, both Terraform and Bicep allow you to reference another resource simply by using its symbolic object name directly in your code. This mechanism is known as implicit resource referencing. ARM template does not have this capability. You either need to use resourceId() method or directly write the Azure Resource Id of the resource you want to reference.

• ARM Template

"id": "[resourceId('Microsoft.Network/virtualNetworks/subnets', parameters('vnetName'), 'mySubnet')]"

• Bicep / Terraform

subnetId: mySubnet.id

Apart from syntactical advantages such implicit referencing enables the Bicep and Terraform figure out dependencies automatically even if you do not populate their dependency blocks. With ARM Templates this is not possible, you always have to explicitly fill-in the dependency blocks to let ARM engine know the deployment order. This is another complexity that users complained for long about ARM Templates serving as one of the major motivations behind Microsoft's decision to build Bicep.

Bash/PowerShell scripting within

When your deployment strategy requires executing custom Bash or PowerShell scripts mid-flight, ARM/Bicep and Terraform handle the orchestration through fundamentally different network pathways.

Terraform lets you do it with local-exec resource, which is quite handy. If you have a CI/CD pipeline where your runners have private IP from your company network, Terraform will run the bash scripts without any issue (such as getting blocked by some security rule that blocks anything other than company internal IPs) as Terraform process will run on your runners.

ARM, Bicep also lets you run your Bash/Powershell scripts but it does not run them on your CI/CD pipeline runners. To run scripts within ARM and Bicep, you need to use “Deployment Scripts” resource (Microsoft.Resources/deploymentScripts). The purpose of this resource is to run Bash/Powershell scripts within an Azure Container Instance. Until 2024, it was not possible to assign a private IP to these container instances. Hence, your script would get blocked company policies requiring private IP. But this is now resolved, and with use of the newer API Version of Deployment Script, its also now possible to run your scripts over private IPs.

Bicep issues

Since its inception Bicep has matured quite well but it’s still possible to hit bugs on edge cases. You can track these issue from the official Bicep repository. If you encounter a blocking parsing error or an API-specific compiler issue, you can register a bug report and until a patch is deployed, you can temporarily fall back to raw ARM templates for that specific resource since Bicep ultimately transpile down to the ARM standard format anyway.

Terraform Plan and drift detection

A fundamental architectural divergence between Terraform and native tools like ARM or Bicep is the reliance on a state file. Terraform maps your live Azure estate directly into these state files by querying the Azure REST API during execution. Historically, a major selling point of Terraform architecture has been inherent drift detection via terraform plan.

Once you run “terraform plan”, Terraform compares the deployment code against the State files (which it gets from Azure at time of “terraform plan” as a default, and this is called refresh). If a deviation is found (because, for example, someone had manually tampered with a resource through the Azure Portal outside of the CI/CD pipeline), the plan flags the difference. In an idealized GitOps pipeline, any direct manual modifications outside the repository are supposed to be caught by the plan and automatically reverted to the codebase's source of truth upon running terraform apply. While this promise sounds flawless in theory, you frequently run into a "ghost change" nightmare. Standard Azure behaviors, such as automated etag rotations, read-only property updates by the Azure backend, or shifting array object orders (such subnets in vnets), constantly trick Terraform into reporting dozens of non-existent drifts and suddenly your small deployment shows 30 "changes" that are NOT real.

Far from benign noise, these false positives actively break deployment pipelines. For example, the ghost change detected on a role assignment, will lead to re-deployment of this role assignment, but the role assignment can not be re-deployed (not with same name which is a Guid also). So. when you run your “terraform apply” Azure responds with an error saying that you can not deploy the same role assignment as there is already a role assignment with that same name. And you find yourself quite annoyed as even a simple deployment is not working! Most teams end up sprinkling ignore_changes everywhere into their Terraform files just to get a clean “terraform plan”.

Beyond false positives, state-driven drift detection suffers from deeper structural limitations. Consider this enterprise pattern: deploying Azure API Management (APIM) that utilizes a System-Assigned Managed Identity to pull SSL certificates from an Azure Key Vault. This architecture dictates the deployment of APIM twice within the same deployment:

First deployment: As a simple API Management with System Managed Identity within first step of your deployment which is followed by the secret reader role assignment to that System Managed Identity.

Second deployment:Re-deployment of the same API Management within next deployment step, now containing the certificate properties as well. The System Managed Identity is now able to read the certificates from Key Vault as secrets.

For such multi-stage patterns, state-file based drift detection simply breaks! This is because you would then have two different objects with the same resource ID in your state-file which means it will always show as drifted when you run “terraform plan” and if you mistakenly run terraform apply, it will break your APIM instantly. For this and also above mentioned issues, people dig into their state-files, or ignore fields to be tracked or even remove the whole resource from state-file directly. The moment you begin selectively purging objects from state files or blinding resource tracking, drift detection becomes a complete fairy tale.

Native Azure alternatives like Bicep and ARM offer a What-If deployment operation. However, the What-If method introduces its own set of noisy false positives and structural parsing gaps, preventing it from serving as a dependable, production-grade drift detection mechanism also.

Ultimately, defining drift from static deployment snapshots whether via terraform plan or via What-If is an incomplete strategy. A robust, enterprise-ready drift management framework requires continuous backend monitoring modeled after modern desired state engines. True compliance requires instantaneous alerts and knowing not just what changed, but capturing the complete lifecycle identity tracking: who changed it, exactly when it happened, and providing a safe, automated path to remediation.

For environments running complex multi-repository strategies across ARM, Bicep, and Terraform, Clophi is currently the only tool that addresses this precise engineering challenge. With use of an asynchronous monitoring plane, Clophi tracks live infrastructure accurately across hundreds of thousands of resources within a tenant resolving real-time platform drift, reliably without introducing pipeline-blocking noise mentioned above. Interested reader can check for the details over clophi.com/drift-detection .

Who uses what, and for what reason?

The adoption of an Infrastructure as Code (IaC) framework within the Azure ecosystem is rarely driven by syntax preference alone. Instead, it is almost always dictated by an organization's architectural history, existing skill sets, and migration path.

Enterprises use mostly Bicep (started with ARM)

Its a known fact that Azure has quite a big footprint on Enterprise Cloud landscape. 95% of Fortune 500 companies already use Azure with a significant proportion as its primary provider. Enterprise IT systems were long before integrated with Microsoft technologies like Active Directory and Windows Server and probably this integration has been the main reason why many enterprises chose and still choose Azure as their cloud provider. As it happened to be their first cloud provider they all have started with ARM (if not Azure Classic through Powershell). When Microsoft created Bicep, the simplicity attracted many and over time the organizations have steadily refactored their repositories into native Bicep configurations. Even if these large companies later embrace multi-cloud architectures, Azure usually represents the vast majority of their core digital assets. Consequently, if you get hired by a large, established enterprise, you are very likely to encounter a Bicep (sometimes ARM) driven code base and you will be required to use them as well.

Migrated enterprises — mostly Terraform

From my experience, I know that several enterprises which entered the Azure ecosystem rather later in their life cycle, often driven by a corporate mandate to migrate from AWS or Google Cloud, heavily favor Terraform. This is because the workforce of such enterprises are already well established with Terraform from their previous cloud providers and the decision makers don't want to burden their workforce with a new Azure specific declarative syntax. They mostly use AzureRM and when a deployment hits a blocker with AzureRM they opt for AzAPI.

Small to medium businesses — a mix

In the SMB space, the landscape is highly fragmented. While many try to stick to Bicep to enjoy its lightweight, others adopt Terraform to preserve future multi-cloud flexibility.

What should we use?

As a matter of fact, you (or I) will use what ever the company we work for has decided to use. Most of the time there is not much option there. But for your own use, its wise to stick to what you like until you hit a blocker, then you can check what options you have with other tools. Such blockers can and will happen when you get into more complex deployments, applications or systems. It's also clear that Bicep and Terraform is visibly better at user experience compared to ARM. On the other hand ARM strictly follows Azure REST API structure which you need to conquer if you want to dive really deep into mechanics of Azure.

If you are in a position of decision making for a company for future Azure use, consider your work force, their past knowledge, and what sort of blockers (mentioned in the above text) is more important to you and make your decision accordingly.

A different approach to Azure IaC

Throughout my career, I have constantly found myself jumping between different enterprise clients inheriting whatever legacy tool, state file setup, or corporate IaC standard they had already chosen for me. Dealing with the syntax discrepancies and architectural limitations between formats became a continuous source of frustration. To solve this, a couple of years ago I began building Clophi.

In a nutshell Clophi's IaC generator abstracts the resource configuration from the IaC format. With it, you simply decide the configuration and Clophi deterministically (it's not AI!) generates structured, enterprise-grade, production-ready code in the format you pick (ARM, Bicep or Terraform). This abstraction layer also means that if you hit a hard technical roadblock similar to ones discussed in the previous sections, you can instantly export the exact same configuration in one of the alternative formats instantaneously.

It is important to note that Clophi's IaC generator should not be confused with an AI tool or Azure Portals Export template option or just another user interface. The output generated by Clophi is fully engineered, structured code that requires no manual patching or cleanup before deployment.

While the full scope of Clophi deserves its own deep dive, you can explore how resources are represented and how the overall IaC generation process works, and request a test drive for your organization.