Hello and welcome back to the third and final part of the article about serverless pipeline and how it is used to automate configuration management of EC2 instances. In the previous chapter we went through client requirements and all the problems encountered, while working on the project. We also discovered what solutions can be used one very stage of the process and in which cases. Today will be more about implemented solution, with all the steps of configuration, and a DEMO. To sum everything up, we will finish with conclusions reached after few months of use and answer some questions:
- What was good?
- What could be better?
- What are our plans for the future?
1. How we configured our pipelines?
1.1 Workflow process
We wanted to have a pipeline enhanced with notification directly to Slack, so we created a CodePipeline with 3 stages - Source, Deploy and Notify. Our Pipeline is managing configuration of EC2 instances. Let’s have a look at the workflow.
The whole mechanism is started by a user and, more specifically, the change (event) on the git repository, e.g. PUSH into specific branch.
- CodePipeline with GitHub source - getting a code with Ansible configuration from the source
- AWS CodeBuild - establishing connection to EC2 and deploying Ansible playbooks stored in git repository
- AWS Lambda - triggering Lambda function that connects to Slack, and sending notification about status of Ansible deployment to specific Slack channel
1.2 AWS High Level Design
Diagram shown below describes the main AWS services required to build apipeline: CodeBuild, CodePipeline, ECR, S3, Systems Manager, CloudWatch and Lambda.
- CodeBuild - for security purposes our EC2 instances are running only in private network. We don't want to allow any incoming traffic from the Internet. Because of that, we placed CodeBuild in Virtual Private Cloud (VPC) as well and all integrated AWS services need to go through VPC Endpoints. In that situation, connection between build server and EC2 is enabled by Security Groups. Moreover, due to the fact that Ansible connects to EC2, using SSH, the keys must be already in place - private key on CodeBuild and public in authorized_keys file on EC2 instances.
- CodePipeline is getting a code from the source git repository and generates.
- Artifact is stored in S3 bucket. CodeBuild is pulling artifact through VPC S3 Endpoint Gateway.
- CodeBuild needs to get a proper image besides compute type. We build our own docker images in another pipeline and store them in ECR.
- SSH private key for CodeBuild is stored as a SecureString and encrypted in AWS Systems Manager (SSM) Parameter Store, and can be retrieved through SSM VPC Endpoint Interface.
- Public key needs to be already in place on EC2 instance in ~/.ssh/authorized_key file
- During job execution, all logs are stored in CloudWatch Logs. CloudWatch Log Group is created automatically and you can search outputs from CodeBuild console any time, even when the project doesn’t exist anymore. In that case, CodeBuild also needs to send data through VPC Endpoint Interface.
- In the last step, CodeBuild is triggering Lambda function.
1.3 Pipeline as code
1.3.1 CodeBuild project
Ansible playbooks are executed in CodeBuild project. This is the heart of the pipeline and its second stage.
Code: Definition of CodeBuild project in Terraform.
Access to other AWS resources is granted through IAM service role. It allows us, for example, to read action on repository in ECR, create Network Interfaces in the subnet, send logs to CloudWatch Log Group and S3, get artifact from S3 and parameter from SSM, and store cache inseparated S3 bucket.
All those actions are defined in the project as well. Target artifacts and cache for a project are stored in S3, the environment is using predefined image with configured Ansible from ECR. The source is an artifact generated in the first stage. Specification of the project (buildspec) is stored in a separate file, so we are pointing to that directory. And the most important part, definition specifying that our job will be running inside our VPC and private subnet. Security group is allowing only outbound access:
-HTTPS to SSM, CloudWatch Logs and S3 VPC Endpoints.
-SSH to subnets with target EC2 instances.
1.3.2 CodeBuild module
Module for CodeBuild is defined to create all required resources at once, and to make a project running - IAM role, Security Group, S3 bucket for cache and CodeBuild project, of course. Source can be located in the same repository or accessed remotely. In this case, we are pointing to a remote source in separated GitHub repository, released and tagged with proper version accordant to semantic versioning 2.0.0 (MAJOR.MINOR.PATCH).
Code: Calling the CodeBuild module in Terraform.
The basic infrastructure with VPC Endpoints configuration is defined in a separate stack (VPC and S3 Endpoint Gateway are not managed by Terraform). Our state files for stacks are stored in separated remote states on S3. Because of such structure, we can reference variables required to run that project in a several ways:
1.3.3 CodeBuild specification and environment
CodeBuild project still needs information about Ansible playbooks and where we would like to execute them. We are able to define it, using shell commands in the Buildspec file. Here, we are specifying location for Ansible configuration files, additional plugins which are dependent on the service, required parameters from AWS SSM Parameter Store (SSH keys) and the rest of the variables.
Code: Buildspec file of CodeBuild project for Ansible playbooks execution.
Now, it’s important to remember that CodeBuild doesn’t allow functionality for variables like drop down list, only simple text field, at least for now. It doesn’t mean that this will not change in the future, but at the moment we had to work with a shell script that exports environment variables for us.
Code: Shell script for managing environment variables.
Order of pipeline stages is defined by CodePipeline. It allows multiple sources, actions and stages, simultaneously or transiently, depending on your needs.
Our pipeline was created in 3 steps. For each of them CodePipeline required proper permissions defined in IAM role like:
- Write access to S3 bucket to store artifact
- Start action of CodeBuild project with Ansible playbooks
- Invoke Lambda function for notification
- Get and decrypt SSM parameter with GitHub token
It is not possible to parametrize everything inside resource, amount and stage types of a pipeline has to be defined statically, like:
Stage 1: Defining Source, which, in our case, is a private GitHub repository. Target branch and OAuth token are stored in AWS SSM ParameterStore.
Stage 2: Referencing already created CodeBuild project for Ansible playbooks execution.
Stage 3: Referencing Lambda function that is already on place with all required variables.
Code: Definition of CodePipeline project and stages in Terraform
1.3.5 Lambda notification to Slack channel configuration
Our AWS Lambda function is using Python 3.7 runtime which, at that moment, provides boto3 - 1.9.221 botocore-1.12.221 and Amazon Linux v1 under lying environment. Besides libraries imported in pipeline_slack_notification.py script, it will also require requests package to post message into Slack channel. Message will contain information about Pipeline URL, AWS account ID, region, date, pipeline name and a commit ID with the change, as described inmessage.json template.
Code: Python script with template for message.
To get commit ID we are using commit.py function. Here AWS SDK for Python (Boto3) is looking for current CodePipeline and last execution (commit) ID.
Code: Python script that gets information about last commit ID used in CodePipeline.
Main function (pipeline_slack_notification.py) is taking care of sending messages to the Slack. Achieving this will require following environment variables:
- aws_region - region where pipeline is placed
- pipeName - pipeline name
- slack_channel_info - channel name on Slack
- slack_url_info - hook URL for Slack
What is pipeline_slack_notification.py function responsible for? In few words, getting all required information from Pipeline and the change, generating a message based on message template, and sending that message immediately to defined Slack channel.
Code: Python script that generates message and send it into Slack channel.
An example of Slack message output generated by Lambda function:
[video width = "1280" height = "720" mp4 = "https://chaosgears.com/wp-content/uploads/2019/12/Serverless-Pipeline-for-EC2-configuration-management-DEMO.mp4"][/video]
3.1 What have we achieved?
Thanks to native AWS services we have limited our time for maintenance to a minimum, which resulted in rebuilding docker images and updating pipeline configuration with newer one.
Unlike local deployments, that lasted from 4 to 5 minutes and were often interrupted by environmental errors or expiring tokens, new ones were shortened to ~50 seconds.
AWS native services are self-managed. AWS assumes responsibility for providing the most up-to-date and secure solutions. You pay only for what you use, and, actually, it works exactly like this. After 5 months we are paying ~$1.85 monthly for usage of main AWS services (CodeBuild, CodePipeline and ECR) in our pipelines, summarizing 2 AWS accounts with 5 regions.
*existing for more than 30 days and with at least one code change that runs through it during the month
3.2 What was good? What could be better?
What was good?
- Flexible - CodePipeline lets us create any workflow with multiple sources, stages and even Approval actions.
- Reusable standard - AWS configuration defined as code allows us to replicate the solution in any environment.
- Continuous provisioning - Version Control Systems and S3/ECR let run pipelines automatically based on events.
- No servers to maintain - there is no managing computing layer on CodeBuild and Lambda.
- Native integration with AWS services - they authorize each other using IAM roles, no password rotation and expiring tokens any more.
- Audit trail of logs - logs about changes and executions are quickly accessible from CodeBuild console or CloudWatch Logs, can be analyzed further or stored in S3 to save money.
- Notifications about status - Lambda's possibilities are limitless, with better 3-party tools integration allows us to achieve any effect we need.
- Pay for minute model - no commitment, paying only for used resources.
What could be better?
- Lack of drop down list for variables - nice feature which everyone appreciated in Jenkins, here, we had to use bash script to force environment variables.
- Path based triggers on CodePipeline source – CodePipeline, unlike CodeBuild, allows only two detection options to automatically start a pipeline:
- GitHub webhooks - when change occurs in the branch.
- Scheduler - check periodically for changes.
- More version control sources - lack of BitBucket source in CodePipeline, lack of GitLab deprives many teams of this solution.
- Better docs for Development services - very few implementations, not many good examples of CodeBuild inside VPC, we spent most of the time figuring out this configuration.
Someone could say that the AWS services used in our solution are too primitive, that CodePipeline, CodeBuild or other AWS services that we know lack many functionalities. Remember, however, that this does not necessarily mean it will never change. AWS is still actively developing its solutions. For example, CodePipeline announced this month that it will transfer globally environment variables. Until recently, we could only configure variables from CodeBuild. It is in our interest to report the demand for new features and feedback to AWS. In the end, we all want to use the most effective solutions :)
3.3 What are our plans for the future?
Despite the fact that the presented solution for the automation of the EC2 instance configuration is not finished, it allowed us to significantly speed up the work at a very low cost.
In the future, we plan at least the full parameterization of our pipeline and cross-region / cross-account deployment to reduce the number of pipelines. Moreover, we want to test infrastructure to increase reliability but also for security reasons, for we would like to use a private control version system, like AWS CodeCommit, as a source. It allows greater granulation of access rights to specific repositories, branches, and Pull Requests, whereas, at the same time, GitHub grants permissions to all repositories within organisation.These are not all of the changes, of course. Our demand will alter and grow over time. The main thing is to achieve all goals as effectively as possible, and that is what we have done.