Jeff Dwyer By Jeff Dwyer • June 11, 2017

Cron Jobs on Amazon ECS with DataPipeline

UPDATE: AWS liked my blog post so much they decided to do exactly what I suggested and implement this themselves right after i got it working :|  http://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduling_tasks.html  #lolcry

We've been a huge fan of Amazon ECS here at RateLim.it. It's delightful being able to setup your cluster one time and then run just everything without another thought about where it will run. To date, this has been RubyOnRails web apps, Java APIs, Java Kinesis Readers & Python DynamoDB scaler utilities. I wrote a bit about the overall ECS setup and RubyOnRails on ECS w/ Terraform which has a sample repo of Terraform to get you started..

One thing that hasn't been super obvious is how best to run a scheduled job though. Do we run a container that runs cron itself? Do we figure out how to schedule a lambda which starts ECS tasks? Can DataPipeline do this for us?

This all can to a head when AWS finally released CLI support for Athena. That meant we could start kicking off the Presto query that aggregates our logs to figure out how much to bill.

I started to head down the path of a scheduled lambda because an "always on" cron scheduler service just seemed lame. That said, lambda is definitely not zero setup. It's got its own ecosystem and it wasn't obvious to me that I wanted to go through the hassle of getting a clean lambda deploy just to run a single "aws ecs runtask" command.

DataPipeline to the rescue. DataPipeline is a kinda weird AWS service that I hadn't heard much about, but turns out it makes this very simple. Just pick the default AWS cli prototype, type in your command and choose a schedule.

Screen Shot 2017-06-11 at 12.03.09 PM.png

The only gotcha is making sure your IAM policies let DataPipeline can RunTask in ECS.I accidentally gave it "StartTask" permission which is of course totally different. :(

The only last mildly sad thing to note is that DataPipeline is not supported by Terraform, so this is the one piece of my infrastructure that is not is source control. I'm surviving, but would be rad if terraform did start supporting this.

In closing I still think that ECS should support Cron-like behavior this out of the box. Obviously the AWS infrastructure is there to do it. It would be great if this could all live cleanly right in the ECS console. That said, this is working great and I'm a happy camper for now.