May 15th | Full visibility and control of resources in IT environments | Nowa Wieś (pl)
April 20, 2024
November 13, 2018

Serverless EBS backup & first aid

One of our customers needed backup protection for their EBS volumes. We approached this serverless – and added some extras. Here's how.

Karol Junde

One of our current customers decided that they needed a backup protection for their EBS volumes. There’s a lot of already customized or provided by Amazon Web Services out-of-the-box solutions but for me, it was a pretty good chance to test myself against Serverless Framework (that I promised to write about last time) and another Python code and opportunity to customize the solution a little bit. Apart from the snapshot and retention feature, saving information to DynamoDB and passing notification messages to Slack have been added as extra functions.

Based on Cron event (we’ve changed that into SSM Maintenance Window — I’ll describe that later) configured in CloudWatch Events a Lambda function is invoked and basing on specific TAG value in Auto Scaling Groups or single instances snapshots of attached volumes are created. The final step of ‘Round 1’ is to save the data about new snapshots in DynamoDB table called “created-snapshots” (example below). For us, it was just a simple method of keeping the information about the time of finished tasks and created snapshots.

Apart from the small bunch of information in DynamoDB, Lambda is putting a response into a CloudWatch logs like following:

('Response Body:', set(['
{
    "Status": "SUCCESS",
    "Reason": "See the details in CloudWatch Log Stream: 2018/08/21/[$LATEST]a1e01c47b0a44b79af9d26c8ea2b6979",
    "Data": {
        "SnapshotId": [
            "snap-08b8b0819720d9f80",
            "snap-02da7b4af1edd5633",
            "snap-05fe0099524fd7f11"
        ],
        "Change": true
    },
    "PhysicalResourceId": "2018/08/21/[$LATEST]a1e01c47b0a44b79af9d26c8ea2b6979"
}
']))

There’s also a second part focused on deleting created snapshots based on retention policy and TAG value set by ‘snapshot’ Lambda. Periodically (the occurrence has been specified via CloudWatch Events) Lambda is checking “DeleteOn” tag whether it’s time to delete the snapshot (by comparison of the current date against the one set in tag).

We’ve glanced at the general concept of the solution, but I’d like to talk a little bit about the change of the approach in terms of deployment. Remember, last time I was talking about replacing (just a little bit) of CloudFormation for a new framework. And here it is…

First change — Serverless Framework instead of CloudFormation

In my previous article about ‘pythoning’ I unveiled some information about replacing a well-known CloudFormation into a fancy-named framework called Serverless. Literally I’ve wasted CloudFormation in terms of Lambda provisioning. Simply saying — in a project focusing on Lambdas (as a main force) I’ve started using Serverless framework because it’s much EASIER AND FASTER to launch the environment. I’ll show you this later but in the meanwhile let me tell you briefly about that comprehensive Swiss Knife.

Definition of the framework says “The Serverless Framework is a CLI tool that allows users to build & deploy auto-scaling, pay-per-execution, event-driven functions”, but for me, it is the easiest way to deploy your prepared Lambda functions with additional, necessary AWS services and, of course, to invoke them. Serverless is written in Node.js which might not be perfect for everyone, so you still need to install Node and NPM.

First of all, you have to install this shiny tool by typing:

$ npm install serverless -g

The CLI can be accessed using either serverless or sls. To create the template for your new project type:

$ serverless create --template TEMPLATE_NAME

The variety of available templates is quite huge:

“aws-nodejs”, “aws-nodejs-typescript”, “aws-nodejs-ecma-script”, “aws-python”, “aws-python3”, “aws-groovy-gradle”, “aws-java-maven”, “aws-java-gradle”, “aws-kotlin-jvm-maven”, “aws-kotlin-jvm-gradle”, “aws-kotlin-nodejs-gradle”, “aws-scala-sbt”, “aws-csharp”, “aws-fsharp”, “aws-go”, “aws-go-dep”, “azure-nodejs”, “fn-nodejs”, “fn-go”, “google-nodejs”, “kubeless-python”, “kubeless-nodejs”, “openwhisk-java-maven”, “openwhisk-nodejs”, “openwhisk-php”, “openwhisk-python”, “openwhisk-swift”, “spotinst-nodejs”, “spotinst-python”, “spotinst-ruby”, “spotinst-java8”, “webtasks-nodejs”, “plugin” and “hello-world”.

After generation of selected template in your project directory you should see:

-rw-r--r--  1 user  staff   497B Jun 27 23:03 handler.py
-rw-r--r--  1 user  staff   2.8K Jun 27 23:04 serverless.yml

Where:

The Serverless Framework translates all syntax in serverless.yml to a single AWS CloudFormation template which makes the whole process trivial. In other words, you’re defining services you want to add in kind of declarative way and Framework is then responsible for creating this little magic. To go even deeper the process, it can be disassembled into:

  1. Your new CloudFormation template is being born from serverless.yml
  2. Stack is being created with additional S3 bucket for you zip files of your Functions
  3. Code of your impressive functions is being packaged into zip
  4. Serverless gathers the hashes of all files belonging to previous, if existing, deployment and compares them against hashes of local files
  5. Deployment process is being terminated if all file hashes are the same, but if not then zip files of your functions are being uploaded to S3(provisioned by Serverless)
  6. Any additional AWS services like IAM roles, Events etc. are being added to CloudFormation
  7. CloudFormation template is being updated with the new template
  8. Important – each deployment creates a new version for each Lambda function

There’s much more information on the official website but now, more or less, we know what’s hidden inside. Let’s get back to the code.

First part of the serverless.yml file contains general configuration regarding AWS environment and if needed (and in my case it was) some custom variables:

provider:
  name: aws
  runtime: python2.7
  region: eu-central-1
  memorySize: 128
  timeout: 60 # optional, in seconds
  versionFunctions: true
  tags: # Optional service wide function tags
    Owner: chaosgears
    ContactPerson: chaosgears
    Environment: dev
custom:
  region: ${opt:region, self:provider.region}
  app_acronym: ebs-autobackup
  default_stage: dev
  owner: YOUR_ACCOUNT_ID
  stage: ${opt:stage, self:custom.default_stage}
  stack_name: basic-${self:custom.app_acronym}-${self:custom.stage}
  dynamodb_arn_c: arn:aws:dynamodb:${self:custom.region}:*:table/${self:custom.dynamodb_created}
  dynamodb_arn_d: arn:aws:dynamodb:${self:custom.region}:*:table/${self:custom.dynamodb_deleted}
  dynamodb_created: created-snapshots
  dynamodb_deleted: deleted-snapshots

I won’t focus on this part but keep in mind that if you want to use custom-defined variable in another variable use such pattern: variable_a: {self:custom.variable_b}. The really important part is ‘functions’ one. Here’s the place for your forged Lambda functions you’ve been creating for weeks. Look, how simple it is and with a couple of lines you’ll define the environment variables, timeout, event scheduling and even roles. I’ve omitted obvious elements like tags, names and descriptions.

functions:
  ebs-snapshots:
    name: ${self:custom.app_acronym}-snapshots
    description: Create EBS Snapshots and tags them
    timeout: 120 # optional, in seconds
    handler: snapshot.lambda_handler
    # events:
    #   - schedule: cron(0 21 ? * THU *)
    role: EBSSnapshots
    environment:
      region: ${self:custom.region}
      owner: ${self:custom.owner}
      slack_url: ${self:custom.slack_url}
      input_file: input_1.json
      slack_channel: ${self:custom.slack_channel}
      tablename: ${self:custom.dynamodb_created}
    tags:
      Name: ${self:custom.app_acronym}-snapshots
      Project: ebs-autobackup
      Environment: dev
  ebs-retention:
    name: ${self:custom.app_acronym}-retention
    description: Deletes old snapshots according to rentention policy
    handler: retention.lambda_handler
    timeout: 120 # optional, in seconds
    environment:
      region: ${self:custom.region}
      owner: ${self:custom.owner}
      slack_url: ${self:custom.slack_url}
      input_file: input_2.json
      slack_channel: ${self:custom.slack_channel}
      tablename: ${self:custom.dynamodb_deleted}
    events:
      - schedule: cron(0 21 ? * WED-SUN *)
    role: EBSSnapshotsRetention
    tags:
      Name: ${self:custom.app_acronym}-retention
      Project: ebs-autobackup
      Environment: dev

As you’ve probably noticed variables are presented as plain text and are easy to capture. Obviously, hardcoding anything into code is very, very bad idea, likewise putting sensitive data into plain text is also a bad one. My code is just an example, but I’d like to show you an easy way I often use, called Parameter Store, to avoid problems with sensitive data leaking. This is an AWS service that acts as a centralized config and secrets storage for whole bunch of your applications.

First of all, you might use AWS CLI to store your new SSM parameters:

aws ssm put-parameter --name PARAM_NAME --type String --value PARAM_VALUE

then in serverless.yml starting from version 1.22:

   environment:
      VARIABLE: ${ssm:PARAM_NAME}

OTE: Personally I am not a fan of hardcoding sensitive data into variables. I rather would use:

   ssm = boto3.client('ssm')
   parameter = ssm.get_parameter(Name='NAME_OF_PARAM', WithDecryption=True)
   api_token = parameter['Parameter']['Value']

Having functions and variables configured, we can seamlessly jump into ‘resources’ part which nothing more than additional CloudFormation resources which are required for our solution.

NOTE: If you don’t want to keep resources code in serverless file use the following method which is a pretty straightforward path YAML file containing your CloudFormation file (only Resources part)

resources:
  - ${file(FOLDER/FILE.yml)}
project_folder
	serverless.yml
	FOLDER -> FILE.yml

As you’ll see in my serverless.yml file, the IAM roles and Dynamodb tables are being added to the environment. Last but not least, a python project is to add plugin installation for requirements:

plugins:
  - serverless-python-requirements

So now, the time has come to launch your serverless project. Simply type:

serverless deploy --aws-profile YOUR_AWS_PROFILE or simply sls deploy --aws-profile YOUR_AWS_PROFILE

then launching information should appear on the screen:

Serverless: Installing requirements of requirements.txt in .serverless...
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Injecting required Python packages to package...
Serverless: Creating Stack...
Serverless: Checking Stack create progress...
.....
Serverless: Stack create finished...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service .zip file to S3 (4.6 KB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress…

It’s important to notice that you can simply implement that on different accounts via well-known profile. After a while you’ll get a final message saying that your new stack has been implemented:

Serverless: Stack update finished...
Service Information
service: ebs-autobackup
stage: dev
region: eu-central-1
stack: ebs-autobackup-dev
api keys:
  None
endpoints:
  None
functions:
  ebs-snapshots: ebs-autobackup-dev-ebs-snapshots
  ebs-retention: ebs-autobackup-dev-ebs-retention
Serverless: Publish service to Serverless Platform...
Service successfully published! Your service details are available at:
https://platform.serverless.com/services/YOUR_PROFILE/ebs-autobackup

After you check that everything has been launched properly you’re able to invoke your deployed function directly via serverless call:

serverless invoke -f FUCTION_NAME

Additional arguments you might use:

After all the hard work you’ve performed with your project it can be easily deleted. Just type:

serverless remove --aws-profile AWS_PROFILE

and our “new” baby will take care of the rest. Honestly, I have to say that Serverless Framework has literally made my day. Deployment of new Lambda functions, even with extra AWS services seems to be extremely easy. Of course, after some time spent on this framework you’ll realize that ‘the devil’s in the details’. Nonetheless, I strongly encourage you to test it and believe me or not, after the first day of trial you’re gonna love it.

Next let me talk a little bit about prepared Lambda functions.

Snapshotter: my way for volumes snapshot creation

In snapshot.py file you’ll see a function called “determine_snap_retention” which does nothing more than summing up today’s date with the number of days the newly created snapshot should be kept. The result is the date of deletion:

def determine_snap_retention(retention_type='monthly',mdays=21, wdays=7):
    d_today = datetime.datetime.today()
    d_today = d_today.replace(hour=23, minute=0, second=0, microsecond=0)
    snapshot_expiry = ""
    while d_today.weekday() != 4:
        d_today += datetime.timedelta(1)
    if retention_type == 'monthly':
        snapshot_expires = d_today + timedelta(days=mdays)
        snapshot_expiry = snapshot_expires.strftime('%Y-%m-%d %H:%M:%S')
    elif retention_type == 'weekly':
        snapshot_expires = d_today + timedelta(days=wdays)
        snapshot_expiry = snapshot_expires.strftime('%Y-%m-%d %H:%M:%S')
    return snapshot_expiry

I’ve also added class called “Volumes” which has got a method:

def create_snapshot(self, owner, slack_url, file, slack_channel, tablename=’created-snapshots’)

It creates snapshots for all attached EBS volumes belonging to instances with a specific tag. Then, after successful job it tags a snapshot with specific key/pair value: “DeleteOn”/DATE. This helps to determine the date of deletion by retention Lambda. As an extra feature, the function puts info about snapshots into Dynamodb table and sends Slack notifications via:

slack_notify_snap(slack_url=slack_url, file=file, channel=slack_channel, snap_num=len(snaps_notify), snap_ids=snaps_notify, owner=owner , region=self.region)

which is imported: from slack_notification import slack_notify_snap

Snippet:

def slack_notify_snap(slack_url, file, channel, snap_num, region, snap_ids, owner):
    snap_ids = ', '.join(snap_ids)
    slack_message = custom_message(filename=file, snapshot_number=snap_num, snapshot_ids=snap_ids, region=region, owner=owner)
    try:
        req = requests.post(slack_url, json=slack_message)
        if req.status_code != 200:
            print(req.text)
            raise Exception('Received non 200 response')
        else:
            print("Successfully posted message to channel: ", channel)

In handler pasted below, the environmental variables were used to avoid hardcoding. The function is still under development so elements like ‘event’ or ‘context’ are intended for future use (literally, each time I look at the code I find something that could be done another way). What they’re actually for is…

As you probably know AWS Lambda is an event-driven service, simply put, invoking a function means triggering an event within AWS Lambda. Moving further you’ve seen definition of handler def lambda_handler(event, context). First argument contains event which triggers the function, represented as a JSON object inside Lambda, but in python code it goes to the dictionary. In my particular case, it is an empty dictionary. If you were using API Gateway then whole HTTP request would be represented as a dictionary. For your Lambda it’s just an input with additional parameters that you want to pass.

Second one is a context with meta containing information about the invocation. The moment you start debugging your function, you’ll find context very useful.

def lambda_handler(event, context):
    region = os.environ['region']
    owner = os.environ['owner']
    slack_url = os.environ['slack_url']
    file = os.environ['input_file']
    slack_channel = os.environ['slack_channel']
    tablename = os.environ['tablename']
    ec2 = Volumes('ec2', region, event, context)
    ec2.create_snapshot(owner, slack_url, file, slack_channel, tablename)

Retention attention: my approach for snapshots life maintenance

I’ve followed the same methodology as with snapshot code and another class has been created. Method called “delete_old_snapshots” filters snapshots basing on tag and compares current date with the one saved in tag. If it matches or it’s after the “deletion” date, snapshot is being instantly removed. Information about job is being sent to Slack channel. Similarly to snapshot function, information is being put in DynamoDb table, but into different table. This particular one contains only info about deleted snapshots.

def delete_old_snapshots(self, owner, slack_url, file, slack_channel, tablename='deleted-snapshots'):
        delete_on = datetime.date.today().strftime('%Y-%m-%d')
        deleted_snapshots = []
        dynamo = Dynamodb('dynamodb', self.region)
        change = False
        filters = [
        {
            'Name': 'owner-id',
            'Values': [
                owner,
            ]
        },
        {
            'Name': 'tag-key',
            'Values': [
                'DeleteOn',
            ]
        },
        ]
        try:
            snapshot_response = self.client.describe_snapshots(Filters=filters, OwnerIds =[owner])['Snapshots']
            for snap in snapshot_response:
                for i in snap['Tags']:
                    if i['Key'] == 'DeleteOn':
                        data = i['Value'][:10]
                        if time.strptime(data,'%Y-%m-%d') == time.strptime(delete_on,'%Y-%m-%d') or time.strptime(delete_on,'%Y-%m-%d') > time.strptime(data,'%Y-%m-%d'):
                            print('Deleting snapshot "%s"' % snap['SnapshotId'])
                            deleted_snapshots.append(snap['SnapshotId'])
                            self.client.delete_snapshot(SnapshotId=snap['SnapshotId'])
                            dynamo.batch_write(tablename, deleted_snapshots, region=self.region)
                            change = True
                            slack_notify_snap(slack_url=slack_url, file=file, channel=slack_channel, snap_num=len(deleted_snapshots), snap_ids=deleted_snapshots, owner=owner, region=self.region)
                        elif time.strptime(delete_on,'%Y-%m-%d') < time.strptime(data,'%Y-%m-%d'):
                            print(str(snap['SnapshotId'])+' has to be deleted on %s. Now we keep it' % i['Value'])
                            change = False
            responseData = {
              'SnapshotId': deleted_snapshots,
              'Changed': change
             }
            sendResponse(self.event, self.context, 'SUCCESS', responseData)

Finale, finale what’s next…

I’m totally aware that there are bunch of similar tools, some of them already with out-of-the box features, but honestly, it was quite a nice lesson of implementing (almost from scratch) and what’s more important, combining some additional features together. Therefore, that mix of addons is quite frequently used by our team in other projects, like keeping data outside the function in DynamoDB or other light entity and notify ourselves about different events coming from the AWS. Moreover, it’s the next step towards being more familiar and feeling more comfortable with serverless framework which, in our case, has been moved to the first place in terms of serverless architectures implementations. Hmm, but what’s next? My piece of advice is to start using framework if you’re an enthusiast of IaC approach. Last but not least, find other areas to automate! At the end of the day it will bring some order to everyday’ chaos and give you time for more proactive tasks.

Technologies

Amazon EBS
Amazon EBS
AWS Lambda
AWS Lambda
AWS DynamoDB
AWS DynamoDB

Series

Remaining chapters

No items found.
Insights

Related articles

Let's talk about your project

We'd love to answer your questions and help you thrive in the cloud.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.