Comparing Two Ways to Trigger Lambda from S3

Eoin Shanaghy
5 min readMar 12, 2020

--

Update 30 November 2021: There is now a third way to trigger Lambda (and many more services) from S3. This third method uses EventBridge but without CloudTrail: https://aws.amazon.com/blogs/aws/new-use-amazon-s3-event-notifications-with-amazon-eventbridge/.

There are two primary methods to trigger Lambda when an object is added to an S3 bucket — S3 Notifications and EventBridge. Using S3 notifications is more typical but there is one big drawback for me. For this reason, let’s look at EventBridge as an alternative.

S3 Notifications and CloudTrail/EventBridge are differing approaches to triggering functions on S3 Object events

An S3 notification is part of its NotificationConfiguration, a property of the bucket itself. When using CloudFormation, this generally means that the notification must be created or modified as part of the bucket resource itself. In a serverless application, you can imagine having some sort of shared bucket with separately deployable serverless services triggered by objects being created or deleted with specific prefixes. An upload to the /uploads/images path might trigger a separate function to an upload to /uploads/metadata. CloudFormation, and therefore its derivatives like CDK and AWS SAM don’t allow you to modify a resource outside of the CloudFormation stack in which the resource resides. The Serverless Framework has a workaround for this limitation by using a custom resource to create the notification outside of the bucket’s stack.

The alternative we will explore is to use EventBridge. To receive events from EventBridge, you create a rule with targets. A rule is a separate resource to your bucket and is also a separate resource to the event recipients. This facilitates the loose coupling we want. Many AWS services publish EventBridge events. Those that don’t can still be integrated using CloudTrail events. S3 publishes CloudTrail events for resource modifications but can also be configured to publish data events. Data events relate to creating, modifying and removing objects within a bucket. To receive such events via EventBridge, a CloudTrail trail should exist and be configured for Data Events on the bucket to be monitored.

The AWS Console CloudTrail section allows us to configure data events for all or specific buckets.
Enable S3 bucket data events in order to receive EventBridge events for objects

To run a comparison, we’ll use AWS SAM to create a simple application with two Lambda functions. One will react to EventBridge S3 CloudTrail data events and the other will use the S3 Notification approach. The SAM template for both functions looks like this (follow this link for the code):

The implementations for each function are similar. They use the AWS SDK’s HeadObject method to get the object’s LastModified property. We will log this along with the current time and any other event timings. These can be used to get a reasonable idea of the latency between object creation and Lambda execution. The handler code can be seen in full here in the GitHub repository for this application.

A simple shell script will copy a file repeatedly to different keys with varying prefixes in series. This not a proper benchmark by any means.

Since we are using structured JSON logging with Pino to log our timings, the results can be extracted and aggregated using CloudWatch Logs Insights. I first let the script create 2100 objects.

First, we use Insights to check how many events were received by each service.

stats count(*) by name

We see that the CloudTrail-EventBridge method received an extra event. We can put this down to EventBridge’s at-least-once delivery. This could also happen with S3. Only recently, S3’s delivery guarantees changed from probably once(!).

S3 Notifications have changed from probably-once to at-least-once

Now, let’s take a look at the timings. I’m only looking at the interval between the object’s reported LastModifiedAt time and the current time captured in the Lambda.

filter ispresent(name) | stats min(timings.now — timings.keyTime) as minIntervalMS, avg(timings.now — timings.keyTime) as avgIntervalMS, percentile(timings.now — timings.keyTime, 95) as pc95IntervalMS, max(timings.now — timings.keyTime) as maxIntervalMS by name

We’ve now got some significant differences between S3 notification timings and EventBridge/CloudTrail. Note that we have a negative value for the minimum interval with notifications. This is down to the fact that S3 last modified has second precision only. There may also be clock synchronization differences between services but we will assume that to be negligible given the size of intervals being observed.

Switching to CloudWatch Logs Insights’ Visualization tab gives us the following comparison bar chart.

EventBridge/CloudTrail intervals are higher than S3 notifications

EventBridge has some really long intervals pushing the maximum value high. Looking at the maximum for each minute, there are clearly a few incidents of intervals in the 30–50 second range.

filter name = ‘event-bridge’ | stats max(timings.now — timings.keyTime) as maxIntervalMS by bin(1m)

In general, we can conclude that if event latency is critical, S3 notifications are still the way to go and we have to accept the CloudFormation resource ownership limitation. If we can accept delays of close to a minute, EventBridge gives us better separation of triggering infrastructure.

Note that the official line on S3 Notification delivery timing is “Amazon S3 event notifications typically deliver events in seconds but can sometimes take a minute or longer.” Also note that if you want to avoid quick successive events for the same object resulting in missed events, you need to enable object versioning in the bucket. See https://docs.aws.amazon.com/en_gb/AmazonS3/latest/dev/NotificationHowTo.html.

Summary of Differences

It’s clear that the delivery latency with CloudTrail and EventBridge is higher than with S3 Notifications. What else should inform a decision on which method to use? Let’s finish with a summary of the other known differences between the two.

CloudTrail/EventBridge

S3 Notifications

  • There is no pricing associated with S3 Notifications. Lambda targets are subject to normal AWS Lambda billing.
  • S3 Notifications can target Lambda, SNS and SQS.
  • Object events can filter based on a prefix or suffix or both.
  • You cannot have multiple notifications with overlapping prefixes.
  • Object versioning is required to avoid missed events.

Eoin is the CTO of fourTheorem, an AWS Partner, and author of AI as a Service from Manning.

--

--