Using Lambda to Download to S3
Problem Statement
I have an old cron job that creates object-groups for firewalls based on country. I want to move this job into AWS Lambda and S3. I wrote this script close to a decade ago, primarily in bash with some PHP, and I’ve had to move it a few times with several operating systems being EOL’d. That’s right, EOL’d. Do you guys remember Debian Etch? Yeah, that hit EOL status in 2010. Since this script was written, it was only updated once to make sure the file downloaded fetched wasn’t empty. There were a few times where that happened and I ended up with empty files.
Why move now? The current box it’s on is running Ubuntu 12.04 which hits it’s EOL date about a year for now. I’d rather get this moved now as opposed to waiting until the EOL to make the move. Nothing runs PHP 5.3 anymore, and I’d rather move it into something like AWS Lambda instead of updating everything to work with PHP 5.6 or PHP 7.
This post isn’t going to show how this is completely re-written, but I’m going to break this into parts. Since AWS Lambda is meant to be a series of functions, I’m going to create one function at a time, starting with downloading. In future posts, I’ll work on aspects like processing.
IAM Policy for Download Job
By design, I’m creating a policy that only has access to a few operations against one bucket. This practice complies with the Principle of Least Privilege where you only give as many permissions as are absolutely necessary. Since we are building a Lambda function, once we have the permissions drilled down, we shouldn’t need to worry about them again. If you’re going to use this policy, update it to reflect your bucket name by replacing ip-blocks-raw
with the name of your bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:List*",
"s3:PutObject",
"s3:DeleteObject",
"s3:GetObject"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::ip-blocks-raw/*"
},
{
"Action": [
"s3:List*"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::ip-blocks-raw"
}
]
}
Once you’ve added this policy, attach it to the Role that you’re going to use to execute your Lambda job. This role must be setup as an AWS Service Role that is associated with AWS Lambda. In my case, I’ve created a role called lambda_download_raw_ip_info
with correct service role that I’m attaching the above IAM policy to.
As a note, the s3:GetObject
policy isn’t necessary for this Lambda function in this post, we’re just adding it so we can re-use it with another Lambda function later.
AWS Lambda Job
My Lambda job is written in Python, so select Python 2.7 as your run time. I’m naming my function fetch_ip_data
so my handler will be fetch_ip_data.lambda_handler
. Select the role that you’ve created with the previously created IAM policy, and you are good to go.
from __future__ import print_function
import boto3
import gzip
import urllib
BUCKET_NAME = "ip-blocks-raw"
TMP_FILE = "/tmp/country.db.gz"
def lambda_handler(event, context):
testfile = urllib.URLopener()
try:
testfile.retrieve("https://ip.ludost.net/raw/country.db.gz", TMP_FILE)
with gzip.open(TMP_FILE, 'rb') as f:
file_content = f.read()
s3 = boto3.resource('s3')
s3.Bucket(BUCKET_NAME).put_object(Key='country.db',
Body=file_content)
except Exception as e:
print(e)
raise e
When you run your job, if it runs successfully, you should see something like this:
START RequestId: a86dd418-08ed-11e6-8e18-59c41bb3d528 Version: $LATEST
END RequestId: a86dd418-08ed-11e6-8e18-59c41bb3d528
REPORT RequestId: a86dd418-08ed-11e6-8e18-59c41bb3d528 Duration: 4980.93 ms Billed Duration: 5000 ms Memory Size: 128 MB Max Memory Used: 62 MB
Scheduling Code Execution
Starting at the end of 2015, Lambda began to support the scheduling of execution of Lambda functions. This essentially makes Lambda a cron replacement for one-time tasks. If you want to trigger the above code to run on a recurring basis, go to your function in the AWS console, and go to the “Event sources” tab, click “Add event source”, select “CloudWatch Events - Schedule”, and then give it a name and define how frequently the job should run. That’s it! At the time of this post, I don’t believe there is a way to do this through the API.