September 6, 2021
September 25, 2019

How to build your own serverless Dropbox in AWS

A way to get full transparency of your cloud storage bills.

Dorian Vetrila

There must be a way to get full transparency of your cloud storage bills which is something I never got out of commercial solutions out there with fixed subscription packages. I mean, some files I store are better off in cold storage and my family will always drop and forget and remember it 6 months later. Psychologically we are inclined to compensate on our ordered life in some unordered way and I chose to go full on hoarder-style with S3 Object storage and serverless architecture for this little PoC.

I am going to describe the infrastructure behind something which can be called a fully serverless object uploader, with some validation (in this case I have implemented a user upload cap but can be anything else) and less obvious but also important - object listing. I should point this is a proof of concept (PoC) which proved — it is possible in practice to push the boundaries of IAM and Lambda to control how much some authenticated user can upload, and where.

Dive in

My first cognitive checkpoint was to tap in check what my browser sends as part of the multipart upload request, to find something I can use.

And bingo, we can use the content length in number of octets to check our request is not shooting over the upload cap. 

“The Content-Length entity-header field indicates the size of the entity-body, in decimal number of OCTETs, sent to the recipient or, in the case of the HEAD method, the size of the entity-body that would have been sent had the request been a GET.”

https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Now back to IAM to check if that can handle this validation for us.

POST policy conditions

IAM can be really granular. In this case, we use POST policy conditions which allow us to check the minimum and maximum allowable size for the uploaded content. This is exactly the type of logic offloading I was looking for to make this solution cheaper. Which means we should be able to generate these policy on the fly if the usedStorage is less than the user uploadCap. 

An example POST policy condition to restrict exactly where and how much the request should be. This is attached to the formData used by your browser to upload object directly to S3. 

The glue

So far so good, it’s just how fast can we glue this together so that we can start working on a client application. Get this down on paper and you will see it clearer.

 

Diagram Flow 1

Diagram Flow 2

Clearly you don’t want to give access to your serverless app to anyone, AWS Cognito is the way to go - and you can read more about it on our blog.

The API is surely better off serverless as well. Go with API Gateway to define your endpoints for GET ‘/session’ and PUT ‘/session’. It will invoke your lambdas as per Diagram 1 and 2. 

While Diagram 1 is a straight forward wrapper around IAM roles, in Diagram 2 the flow makes more sense if you look from the perspective of DynamoDB which holds the state of your upload (INITIATED, REJECTED, COMPLETE) - using these flags made it easier to debug on the way and can we used in the future to backtrack issues with uploads, it doesn’t cost anything having here anyway.  

While the first two (INITIATED, REJECTED) are easy to settle on per first upload request, the last (COMPLETE) one can be written as a result of the S3 trigger. 

So in order of occurrence, on Diagram 2 we have got the happy path: 

  1. Request to API Gateway with content length we want to upload 
  2. The content length is validated and a POST policy condition is created
  3. S3 creates formData for the upload with the POST policy from 2 and insert item to DynamoDB with UNPROCESSED flag
  4. Used Storage + Free Storage + FormData is returned to the client
  5. Request directly to S3 with the upload request and delegate validation AWS
  6. S3 object created event triggers lambda which changes UNPROCESSED flag to COMPLETE

Cost

You are going to love to hear this part was not easy. 

This being completely designed for home use I estimated negligible Lambda and API Gateway requests so that we stayed in the Free Tier. 

2 requests per upload and 1 per List and 1 per Download = 4 requests per object max 

Give or take 1TB of pictures thats 312500 objects 

This is not the end...

While this is somehow more expensive than some of the shelf commercial solutions, I know there will be other S3 API compatible solutions which will take this POC to the next level. Surely this is somehow not a finished solution, it will require more though around upload state handling, object namespace - I would like to make use of object tags to avoid clashes, what about Object lifecycle? Glacier can give us that great saving, but this is all a good start - at least we learned what makes up serverless building blocks. 

Technologies

Amazon S3
Amazon S3
AWS Lambda
AWS Lambda
AWS IAM
AWS IAM
Amazon Cognito
Amazon Cognito

Series

Remaining chapters

No items found.
Insights

Related articles

Let's talk about your project

We'd love to answer your questions and help you thrive in the cloud.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.