Managing infrastructure manually can be both time-consuming and error-prone. Setting up components like servers and databases through a graphical user interface often leads to inefficiencies, especially at the enterprise level. This becomes even more challenging when dealing with multiple environments for development, user acceptance testing, validation, and production.
What is Infrastructure-as-Code?
Lets introduce Infrastructure-as-Code, a powerful process used for managing and provisioning infrastructure through code. IaC brings the principles of software development to infrastructure management, making it possible to automate tasks, track changes, and ensure consistent environments across development, testing and production. By treating infrastructure like software, teams can write scripts to set up and configure servers, databases, networks, and more, ensuring that these configurations can be replicated, versioned, and maintained with ease. Implementing version control for your infrastructure enables automation of both the initial deployment and ongoing updates of specific components through a CI/CD pipeline.
Imperative vs declarative
When discussing Infrastructure-as-Code, two fundamental approaches emerge: imperative and declarative IaC. Each method offers a distinct perspective on how to manage and provision infrastructure.
Imperative IaC emphasizes how the infrastructure is created or modified. You provide a sequence of commands or instructions to be executed step-by-step, effectively detailing the process to build or change the infrastructure. However, it can be more complex to manage, as it necessitates a deep understanding of the underlying infrastructure and scripting languages. Note that this approach is like traditional scripting.
Declarative IaC, on the other hand, focuses on defining the desired end state of the infrastructure. You specify what the infrastructure should look like, and the IaC tool takes care of achieving that state, managing the underlying processes to ensure the infrastructure aligns with the defined specifications. This approach utilizes state management to track and manage the current configuration of your infrastructure. It ensures that the actual state of your infrastructure aligns with the desired state defined in your IaC code.
Example
For example, imagine you’re looking to secure a table for two at a restaurant. An imperative approach would involve a straightforward request: “A table for two, please.”. In contrast, a declarative approach would specify a particular table, such as: “We’d like to sit at that table by the window, the one without the reserved sign.”
While both approaches ultimately result in a table at the restaurant, the processes differ significantly.
Key differences
Both approaches offer unique advantages and are best suited for different use cases. Let’s delve into the some distinctions between imperative and declarative IaC.
- Focus
- Imperative: Details how to achieve the desired infrastructure by explicitly defining step-by-step instructions.
- Declarative: Specifies what the final infrastructure state should look like. The tool handles the steps to reach that state.
- Abstraction
- Imperative: Provides granular control over the process, exposing all the steps.
- Declarative: Abstracts the process, focusing only on the outcome.
- Ease of use
- Imperative: Can be more complex to implement, especially for large-scale infrastructure.
- Declarative: Often simpler to implement and maintain, as it focuses on the desired state.
- Error prone:
- Imperative: More prone to errors, as mistakes in the sequence of steps can lead to unintended consequences.
- Declarative: Less error-prone, as the tool manages the process of achieving the desired state.
HashiCorp Terraform
This post will delve into Terraform, a powerful tool from HashiCorp, which embodies the principles of Infrastructure-as-Code by enabling you to define and provision infrastructure resources using a declarative approach.
Manage any infrastructure
Terraform plugins, known as providers, allow Terraform to communicate with cloud platforms and other services through their APIs. The extensive Terraform Registry offers over 1,000 providers, covering a wide range of platforms, including Amazon Web Services (AWS), Azure, Google Cloud Platform (GCP), Kubernetes, and many others. If you require a provider that isn’t available in the registry, you can even develop your own.
Configure your infrastructure
At the heart of Terraform lies the HashiCorp Configuration Language (HCL), a human-readable and intuitive language that simplifies infrastructure definition. HCL’s syntax shares similarities with JSON, employing key-value pairs and curly braces for scoping. However, it surpasses JSON in flexibility, offering features like interpolation and custom data structures.
Example
The following code snippet shows a Terraform configuration to provision an Azure resource group.
Track your infrastructure
Terraform utilizes state management to track and record the current state of your infrastructure in a state file. This file acts as a central repository for information about your resources, including their attributes and configurations. By comparing the current state with the desired state defined in your Terraform configuration, Terraform can accurately identify the required changes to bring your infrastructure into alignment.
Collaborate on your infrastructure
Terraform supports various remote backends, including cloud-based storage solutions like AWS S3, Azure Blob Storage, and Google Cloud Storage. By storing the state file in a remote backend, you can ensure that it is accessible to multiple team members and can be version controlled, facilitating collaboration and enabling automated deployment pipelines.
Terraform init, plan, apply
We previously talked about what Terraform is and what is does but not how to use it. There are three main commands that you will use 99% of the time. These are terraform init , terraform plan and terraform apply.
The terraform init command performs several different initialization steps in order to prepare the current working directory for use with Terraform. Like downloading necessary plugins, initializing the backend configuration and it brings the working directory up to date with any configuration changes. This command will typically consists of extra parameters which will specify the Terraform variables file and some backend configuration.
To visualize the proposed changes to your infrastructure, you can use the terraform plan command. This command generates an execution plan, which provides a detailed preview of the modifications Terraform intends to make. Before generating the plan, Terraform performs the following steps:
- Synchronizes State: It fetches the latest state of remote objects to ensure the Terraform state is up-to-date.
- Compares Configurations: Terraform compares the current configuration with the previous state, identifying any discrepancies.
- Proposes Changes: Based on the identified differences, Terraform proposes a set of actions to bring the remote objects in line with the desired configuration.
By reviewing the execution plan, you can assess the impact of the proposed changes before applying them to your infrastructure.
At last, the terraform apply command executes the actions specified in the Terraform plan, making the necessary modifications to your infrastructure.
Use case
our company is embarking on a cloud migration journey, selecting Microsoft Azure as the cloud provider. To support your data engineering initiatives, you’re planning to leverage Azure Data Factory (ADF) for simple data pipelines and Databricks as a powerful data lakehouse. The goal is to establish three distinct environments: Development (Dev), User Acceptance Testing (UAT), and Production (Prod). Manually provisioning and configuring these environments can be time-consuming and error-prone.
To streamline this process, IaC principles will be employed. By leveraging Terraform and the AzureRM provider, we can automate the creation and management of our Azure infrastructure.
Some key infrastructure components are:
- Azure Subscriptions: Three separate subscriptions (Dev, UAT, Prod) within a single Azure tenant.
- Databricks Workspaces: A unified analytics platform for data engineering, data science, and machine learning.
- Storage Accounts: To store data, logs, and other artifacts.
- Data Factory: To orchestrate data ingestion, transformation, and movement between various data sources and sinks.
IaC repository
The first step in your IaC journey is to establish a centralized repository for your Terraform code. This repository will serve as a single source of truth for your infrastructure. To ensure your Terraform code is well-structured and maintainable, consider adhering to the official Terraform Style Guide (link). This guide provides best practices for organizing your code, including directory structure, naming conventions, and formatting.
By following this structure, you can effectively organize your Terraform code into reusable modules, making it easier to manage and maintain your infrastructure.
Defining your resources
The next step involves defining your Azure resources using the appropriate resource blocks provided by the AzureRM provider. To promote code reusability and modularity, we’ll organize our Terraform configuration into modules. Each module will encapsulate a specific set of resources, such as a Databricks workspace, storage account, or data factory.
The main.tf file will serve as the entry point, calling the necessary modules and passing the appropriate parameters. These parameters will be sourced from environment-specific .tfvars files, ensuring flexibility and consistency across different environments.
The following code snippet shows the configuration of a Databricks workspace module.
To tailor the configuration to each environment, we’ll use .tfvars files. For instance, the dev.tfvars file looks like this:
Remote backend
To maintain the state of your infrastructure, we’ll utilize an Azure Storage Account Blob Container as the Terraform backend. This allows us to track changes and manage the lifecycle of your resources.
Note: While this approach offers flexibility, it requires some manual setup before deploying Terraform. You’ll need to create a resource group and storage account manually to serve as the Terraform backend.
Once these resources are in place, you can configure your Terraform backend to point to the specified storage account container.
Deploying your infrastructure
The final step of the infrastructure setup is to deploy the defined resources. In this example we will deploy them using our local machine. The steps can be replicated when deploying via a pipeline. To log into the right Azure tenant and select the subscription, we will use the az login command from the Azure CLI. Once you have logged into Azure and selected the appropriate subscription (in this case, the Dev subscription), you’re ready to use Terraform to deploy your infrastructure.
To initialize our Terraform project, we use terraform init with some additional parameters. These point to the variable file for the dev environment and the resource group and storage account for the backend.
The terraform init command not only initializes the working directory but also creates an empty state file in the specified Azure Storage Account Blob Container, which is our backend.
The next step is to create an execution plan using the terraform plan command. This plan outlines the changes that Terraform will make to your infrastructure to bring it into the desired state as defined in your configuration files. The -out=tfplan flag saves this plan to a file named tfplan for later reference.
The terraform apply command executes the plan generated in the previous step, provisioning or modifying the specified Azure resources. By referencing the tfplan file, Terraform ensures that the correct changes are applied to your infrastructure.
After execution, the specified Azure resources will be created or modified as per the plan. You can verify these changes by logging into the Azure portal and inspecting the resource groups and resources.
With Terraform and AzureRM, we’ve successfully automated the provisioning of our Azure infrastructure for the Dev environment. By following the same approach and adapting the environment-specific variables, we can easily replicate this process to set up identical infrastructures for the UAT and Prod environments. This level of automation and consistency significantly streamlines our cloud operations.
Conclusion
By embracing Infrastructure-as-Code with Terraform, we’ve demonstrated how to efficiently and reliably provision complex Azure infrastructures, including Databricks workspaces and Data Factory pipelines. Terraform’s declarative approach and powerful module system enable us to manage our infrastructure as code, promoting consistency, reproducibility, and security.
While we’ve focused on Azure in this blog post, it’s important to note that Terraform’s versatility extends to a wide range of cloud providers, including Google Cloud Platform (GCP) and Amazon Web Services (AWS). This flexibility allows organizations to adopt a consistent IaC strategy across multiple cloud environments.
By adopting Terraform, you can significantly accelerate your cloud adoption journey, reduce operational overhead, and empower your team to focus on delivering innovative solutions.