Infrastructure as Code at Enterprise Scale: Identify the Right Approach for Your Organization

February 23, 2021

An in-depth look at the tools and guidelines to help scale your IaC approach

One of the defining attributes of the cloud is that the resources it provides can be defined in code. The advantages of this approach are so obvious, but infrastructure code has long struggled with two key aspects of traditional software engineering: modularity and reuse.

  • Modularity—defining small components of functionality that can be composed together into a greater whole, leading to…
  • Reuse—where the same code can be used by many teams, eliminating redundant effort. These concerns exist for small organizations (for example, the start-up with a sole product) but compound to a significant productivity drag for large companies with hundreds of teams all trying to build and ship services.

Fortunately, as the concept of Infrastructure as Code (IaC) has matured so has the tooling available for implementation.

In this series, I’ll start by setting the stage with some general guidelines for scaling your IaC approach. Then, we’ll look at some options for creating and sharing infrastructure code and its suitability for use in an enterprise environment. How you define “enterprise” is up to you: whether you’re a Fortune 500 company or a garage-based upstart, this guide is for you.

I will focus on the two biggest public clouds: AWS and Azure. If you’re interested in a similar take on GCP’s Deployment Manager and other options, let me know!

Six Considerations for Success

There are some general guidelines that apply to scaling your IaC approach regardless of what tools you use.

  1. Be cautious of public modules. Just like public software libraries for any other language, be wary of having managed dependencies on unvetted public code. Validate the code, check licenses, and—if you decide to use it—mirror your own copy of it internally to avoid availability issues and version stomping. You don’t want your deployment pipeline to fail at a critical moment because the public Terraform module you rely on wasn’t available because GitHub or Terraform Cloud went offline. When specifying dependencies, use a hash instead of the version tag wherever possible to prevent situations where the upstream maintainer (or malicious actor) replaced the code in a published version. Use an artifact management tool like Nexus or Artifactory, or a privately hosted registry to host your infrastructure code where possible. At the very least, mirror the sources into your own Git platform.
  1. Make your code extensible. Your reusable infrastructure code should enforce enterprise requirements and standards while also allowing for teams to change or extend it allowed ways. Parameterize as much as possible so that the behavior can be configured by users rather than requiring new code. It can be a tricky balance to strike—especially in a large enterprise with diverse needs—but extensible infrastructure code modules reduce the need for teams to create and maintain their own fork. And fewer redundant code modules make it easy for new teams to decide what code they can reuse. Imagine having to choose between three similar load balancer modules with slight variations versus one standardized load balancer module that is configurable. Make this an easy choice for your teams.
  1. Open-source it within your enterprise. You will likely have a team developing and curating sharable infrastructure code. Try to develop in the open and encourage contributions from across the enterprise.
  1. Have a rigorous release process. This is no different than any software you publish, but arguably more important since it’s consumed by many teams and a bad release will cascade across the organization. Your customers are your engineering teams. Use semantic versioning and provide clear release notes. And since the software supply chain is an increasingly popular attack vector, consider digitally signing Git tags and released artifacts to build an extra layer of verification into your process.
  1. Curate and make it discoverable. One of the biggest challenges in getting broad enterprise adoption is making it easy for teams to find stable, functional, and vetted infrastructure code. At scale, expect to have a small team curating an online catalog of what’s officially sanctioned. It could be as simple as a GitHub Organization, or as complex as a private Terraform registry or searchable site with user reviews. The team that owns this doesn’t have to do all the maintenance work on the modules (encourage pull requests!) but should own the roadmap for each one and be the maintainers.
  1. Standardize interfaces. Consistency encourages adoption. Use a standard naming convention for code layout, module names, parameters, variables, and resource names. Reduce friction.

With the stage set, let’s look at some of the most popular IaC tools and how to best use them at enterprise scale.

AWS CloudFormation

AWS CloudFormation, one of the first IaC solutions, provisions infrastructure according to a declarative template that defines the desired state. Originally JSON only, these templates can now be written in YAML which is easier to read and handwrite. Templates define resources (e.g., Lambda functions, load balancers, queues) and their properties. Subsets of these resources and properties are exactly the thing you want to share across teams. Luckily, CloudFormation provides quite a few ways to accomplish this.

I explain more about AWS CloudFormation in this post, check it out.

Terraform

Terraform is one of the most popular Infrastructure as Code tools for good reason. It supports all the major public cloud platforms (and much more via a plethora of providers), its HCL language is declarative and easy to read while also offering useful constructs like loops, and it offers first-class support for code modularity and packaging.

Modules are simply Terraform code in a subdirectory. They have their own variables, resources, and outputs (and if you’re doing them well, their own documentation). If you’re using Terraform, you’re almost certainly using modules. At enterprise scale, the real question is how your organization is using modules to eliminate redundancies and drive reuse.

We’ve seen plenty of teams create their own modules with grand intentions of sharing them across their organization but those plans usually fall flat. Modules serve as useful tools for code organization within a team, but without a central clearinghouse of modules (like CloudFormation’s Registry), it’s hard to get widespread adoption. Fortunately, Terraform makes it much easier to share infrastructure code than CloudFormation. 

I describe Terraform in greater detail here.  

AWS Cloud Development Kit (CDK)

In contrast CloudFormation’s JSON/YAML and Terraform’s HCL, AWS’ CDK offers what purists would consider true Infrastructure as Code, using standard programming languages in favor of configuration markup and DSLs.

The CDK offers libraries for TypeScript, Python, Java, and .NET that allows you to define your infrastructure in software. Rather than wrestle with CloudFormation conditionals and its lack of loops, for example, you can use the native control flow of your language to achieve the same result with more readable code.

The CDK infrastructure is defined via “constructs”. The lowest-level constructs provide a direct mapping of CloudFormation resource properties. The next level provides a more idiomatic definition of a resource and includes sensible defaults. The highest-level constructs, patterns, are composed of lower-level ones to make it easy to build useful stacks with the least amount of code. For example, you can use patterns to easily define the infrastructure for a Fargate service behind a Network Load Balancer or a REST API Gateway backed by a Lambda function. CDK’s ability to create useful abstractions exceeds what CloudFormation alone can do, and unlike Terraform it does it using the same programming languages your application code is (probably) written in.

The CDK CLI synthesizes a CDK app into a CloudFormation template. From there, the CLI can deploy a stack from the template, or you can use your existing approach to deploy the stack.

CDK apps are separate from your application code, but they can be stored together and if you use the same language for both you make it easier for your engineers to jump between application logic to infrastructure. There is nascent support for unit testing your infrastructure code—TypeScript is currently the only supported language—but expect this to improve as CDK continues to mature.

The big benefit of CDK in an enterprise environment is that it’s straightforward to create your own organization-specific constructs and then share them as any other internal library via npm, PyPI, Maven, and NuGet. Constructs themselves must be written in TypeScript, but the jsii compiler underlying CDK allows the constructs to be used in any CDK-supported language. This is the killer feature of CDK.

Unless you’re using Terraform with AWS, CDK is the future. And unlike its predecessor CloudFormation, it has robust support for developing modular code that is reusable across your organization. Its chief downside is that it’s still relatively new compared to other options. But CDK is rapidly maturing and we expect it to become a new standard for infrastructure code on AWS.

And while they are both still in alpha, there’s also a CDK for Terraform (cdktf) and CDK for Kubernetes (CDK8s). Keep your eye on these.

AWS Service Catalog

AWS Service Catalog is not an infrastructure as code tool, but it is mentioned here because it provides administrators a way to offer standard cloud infrastructure stacks to teams in their organization. Service Catalog builds CloudFormation stacks on behalf of users. Service Catalog supports versioning the offerings and users can update their stacks on demand.

Service Catalog isn’t the best fit for teams building cloud-native solutions since the application and infrastructure code move in tandem. It’s highly reusable (as long as the catalog offers what teams need) but also less flexible compared to CDK and Terraform. If a team needs changes to a Service Catalog offering to meet their needs, they lack the agency to do it by themselves and must convince a catalog administrator to expand a catalog offering or create a new one to meet their need.

Service Catalog has its uses and is worth a look, but if you’re a progressive technology organization—or want to be one—then CDK or Terraform are better options.

AWS Proton

Proton is a new service that’s still in preview—don’t use it for anything important yet. Like Service Catalog, it’s not an infrastructure code tool but it does use CloudFormation to define and build pre-defined environments and then deploy pre-defined application service stacks into those environments. If your organization has a central platform team that is trying to drive standardization and remove infrastructure concerns from application teams, Proton may be an appropriate solution. Or it may not. Either way, the promise of Proton is that it can lessen the need to manage infrastructure code at enterprise scale because it abstracts infrastructure concerns away from application teams. This may (not) fit your organization’s team structure and way of working.

We’ll explore Proton more in a future blog post.

Azure Resource Manager

If you’re using Azure, Azure Resource Manager (ARM) is Microsoft’s native IaC solution. It provides a way to declaratively define infrastructure in JSON-formatted templates and deploy that infrastructure via the CLI, PowerShell, and the Azure Portal. It is well-integrated within Azure and supports rollbacks of failed deployments. ARM is like CloudFormation in concept and execution. And while writing JSON isn’t fun, tools like the Visual Studio Code extension are quite good and make the process more bearable. And just like CloudFormation, ARM provides a few ways to promote code reuse.

I talk more about Azure Resource Manager here.

Other Options

We’ve focused on the most popular IaC tools for the two biggest public cloud providers. There are, of course, other options out there, including:

If you’re already happily using one of these – that’s great! If you’re just getting started or looking to adopt a new tool, stick with the options discussed in this post. Having a novel or obscure IaC solution isn’t a competitive advantage. It’s harder to maintain and hire for, and it may be harder to scale for enterprise use.

Software agility = business agility

Programmability is one of the defining features of the cloud. In the past decade, numerous solutions have cropped up to make writing and managing infrastructure code easier. While “regular” languages are inherently modular and have standardized ways to package and reuse code, infrastructure code tools are still catching up. They have made significant strides, however, in adapting to the needs of enterprise customers who have hundreds of teams deploying cloud infrastructure daily.

How is your company managing its infrastructure code today? SingleStone has helped companies of all sizes, from start-up to Fortune 500, with getting into the cloud and using it effectively to drive significant business agility—and results. This series has touched on the highlights. If your company is ready to manage its infrastructure code like a high-performing organization, SingleStone is your guide.

AWS CloudFormation: IaC at Enterprise Scale Terraform: IaC at Enterprise Scale

Chris Belyea

Technical Director (DevOps)
Chris Belyea is SingleStone’s Cloud and DevOps Technical Director. Chris guides clients through Cloud and DevOps transformations, including cloud strategy, migrating workloads to the cloud, automating infrastructure deployment, creating CI/CD pipelines, and improving automated configuration management.