Serverless is taking the tech world by storm. The breakthrough technology is changing the way organizations write, deliver, and maintain software. Amazon Web Services, Microsoft Azure, and Google Cloud all offer serverless products — and the hype has people smashing servers.
So what actually is serverless? The Wikipedia definition:
… a code execution model in which the cloud provider fully manages starting and stopping of a function’s container platform as a service (PaaS) as necessary to serve requests, and requests are billed by an abstract measure of the resources required to satisfy the request, rather than per virtual machine, per hour.
Personally, I think the “serverless” nickname is misleading — code is still running on servers!
I think “functions as a service” (FaaS) is better suited to what we are actually getting under the hood. Your application code (i.e. function) gets packaged into an ephemeral, stateless container and runs on a server managed by your cloud provider. Additionally, you only pay when your function is executed.
Some of the major benefits of FaaS include:
- focus on code, not infrastructure
- eliminate server patching & maintenance
- free reliability and scalability
As we’ll go over later in this post, switching an application to FaaS can significantly reduce costs. It also enables you to ship code faster by reducing total software delivery time (no infrastructure setup!) and focusing on smaller units of application logic. Utilizing FaaS can enable you to provide value to your organization and customers with a quicker turnaround time. For the Operations team, it eliminates the headache of constantly patching servers for security vulnerabilities. If you are utilizing a major cloud provider for your functions, you get reliability and the ability to scale quickly baked into the managed service.
At Red Ventures, we primarily use Lambda, which is the AWS implementation of FaaS. Lambda was launched at Re:Invent 2014 and was the first FaaS service offered by a major cloud provider. Below are some of the things you’ll need to know if you plan on using Lambda.
Lambda functions run in a Amazon Linux AMI container which is RHEL-based and uses kernel v4.4. The official languages and versions supported are NodeJS (4.3.2 & 6.10), Python (2.7 & 3.6), .NET Core (1.0.1), and Java (v8). The AWS SDK comes bundled in the container as well.
The Lambda service includes limitations on things like request/response payload size (6 MB), maximum duration (5 minutes), processes/threads (1024), and ephemeral disk space (512 MB at /tmp). The memory allocation range for functions starts at 128 MB and caps out at 1536 MB. The full list of limits can be found at http://docs.aws.amazon.com/lambda/latest/dg/limits.html
Common Code Model
Regardless of the language runtime you use, there is a shared code model for working with Lambda functions. You will need a handler function that will receive event data along with a context object that will contain runtime information. Your function code should also be stateless — we are operating in ephemeral containers, so keeping any state in the container is a bad idea.
Below is a Python example that prints out some information available in the context object and returns a message with name data received from an event:
Event sources are the heart of event-driven architectures utilizing Lambda functions. Some of the AWS services that can trigger functions include S3, Cloudwatch, Kinesis, API Gateway, and DynamoDB. You could theoretically have anything be an event source by calling the Invoke API directly. We’ll cover some practical use cases later in this post.
Zip files are supported by every official runtime for packaging your code and dependencies up to run in a Lambda function. However, the total size of the deployment zip files can’t exceed 50 MB. Python also supports virtualenv while Java supports jar files and C# supports deploys via the Visual Studio AWS toolkit. My recommendation is to use a framework for packaging and deploying Lambda functions — it makes things simpler.
While there are an ever growing amount of frameworks for FaaS, two of the more widely adopted ones are Apex and Serverless. Both support features like versioning, rollbacks, and tailing logs locally. Apex will create a project structure with the ability to have multiple environments and multiple functions per environment in a single project. There are two main differentiators for Apex — the first is that it supports Golang, Clojure, and Rust as additional runtimes for your function by utilizing a NodeJs shim. The second is that Apex provides a wrapper around Terraform for creating and managing the AWS infrastructure that your Lambda functions need.
Serverless has better built-in support for mapping event sources to your Lambda functions. Things like integrating with API Gateway are dead simple with Serverless and that is a huge plus — dealing with API Gateway in Terraform is an experience I wouldn’t wish on anyone. Serverless runs Cloudformation under the hood and also supports the C# runtime which Apex does not.
Logging & Monitoring
Standard best practices (structured logging, code instrumentation, etc.) still apply to your code running in Lambda functions. Logs will get automatically shipped to Cloudwatch Logs with
aws/lambda/<function_name> as the log group name. Lambda also provides Cloudwatch metrics for invocations, errors, execution duration, and throttled executions. One thing to note is that the error metric does not include AWS/Lambda service errors — only errors within your function execution.
Retry functionality depends on whether the event source is stream based or not. For stream based sources like Kinesis, the event source will keep trying to execute the Lambda function until it is successful or until the event data expires. For non-stream sources that invoke Lambda asynchronously, the event source will retry twice and then discard the event if it is still failing. For non-stream sources that invoke synchronously like an Echo skill, it is up to the client to implement retry logic. Lambda supports Dead Letter Queues so you can avoid losing event data — data for failed event triggers will instead get sent to an SQS queue or an SNS topic.
The first question to answer when it comes to Lambda networking is whether your function will need to access other AWS services in your account. If your function doesn’t need to access other services in your VPC, then the default network configuration will work great. Outbound internet access is provided by default; however, you can only make TCP connections. If you need to run your function inside a VPC, there are a few things you need to be aware of.
First off, you will need to choose a VPC, subnets, and security groups for your function. You’ll want to select multiple subnets in different availability zones (AZ) so your function can still execute if there are issues in a single AZ. Second, you lose automatic outbound internet access — you will need to apply a security group that allows egress traffic and make sure the subnets you select have a route to either an internet gateway or NAT. Third, since the function will create and attach elastic network interfaces (ENI) you need to ensure the function has an IAM role that has permissions to do so. Last potential gotcha: if you will have concurrent function executions, make sure have enough available IPs in your subnets because each concurrent execution will take up an IP in your subnet. You may also need to request an increase to the default 350 ENIs allowed per AWS region.
You’ll need IAM policies for doing anything useful with Lambda. If you are polling streams or storing data in Elasticache or DynamoDB, you’ll need to grant that access via an IAM policy. Your event triggers also need permissions to invoke the Lambda function and at a bare minimum your function needs a policy to write execution logs to Cloudwatch. If your function is running in a VPC, there is an AWS managed policy you can use named AWSLambdaVPCAccessExecutionRole. This policy allows Lambda to create ENIs for running in your VPC and allows logs to be written to Cloudwatch. A good reference guide for crafting custom policies is available here.
Figuring out the cost of your Lambda functions can get a little complicated. Thankfully AWS has a very generous free tier that doesn’t end after 12 months! There are two things that will affect your total cost: requests and request duration.
For requests (function executions), you get 1 million free every month, and then you pay 20 cents per million requests after that. For request duration, the function execution time is rounded up to the nearest 100 ms and then priced based on memory allocation using a formula called GB-seconds. Amazon doesn’t charge for the first 400,000 GB-seconds every month and then it’s $0.00001667 per GB-second thereafter.
As an example, if your function gets 4 million requests per month and takes 1 second per request with a 512 MB memory allocation, your total cost would be $18.74. Here is the breakdown with the GB-seconds formula as well:
4M requests –1mo (free tier) * 1s = 3,000,000
3,000,000 * 512mb / 1024 = 1,500,000 GB-s
1,500,000–400,000 (free tier) = 1,100,000 GB-s
1,100,000 * $0.00001667 = $18.74
In the serverless architecture below, we have a web app using API Gateway and backed by a multi AZ Lambda tier that communicates with RDS instances (within the VPC). We are also utilizing Cloudfront to cache static assets stored in an S3 bucket.
Tips & Gotchas
- If connecting to a DB, initialize the connection outside of the handler function so the connection is reused for the life of the container running your function.
- You can keep your function “warm” by invoking it every 5 minutes with a cloudwatch event rule. This can have a big impact on reducing the startup time that comes with AWS having to launch a new container for your “cold” functions.
- If your function is CPU intensive, allocate more memory to your function. While CPU is not a configurable option, CPU increases proportionally to RAM.
- If you don’t see logs for for function in Cloudwatch, make sure the IAM role has permissions to create log streams in Cloudwatch.
- Don’t DDOS yourself! If running synchronous functions in your VPC, make sure you have enough private IPs available to handle spikes in function executions. You may also hit limits on elastic network interfaces (ENIs), of which you are allotted 350 by default. If you running staging and prod in the same AWS account you may inadvertently take down prod functions by hitting concurrency limits or load testing staging functions.