We need high-availability databases and read slaves!
We want custom high-speed background workers!
Log archives and metrics would be nice.
Ruby's too slow!
What's the low-hanging fruit to scaling this thing?
Ultimately, the answer was to leave and run on EC2 directly.
Play with a net
OpsWorks gives us logical containers: stacks and layers.
These map nicely to Heroku apps and Procfiles.
$ cat Procfile web: bundle exec unicorn -c ./config/unicorn.rb worker: bundle exec rake resque:work INTERVAL=0.1 QUEUE=* consumer: bundle exec rake consumer
$ aws opsworks describe-stacks { "Stacks": [ { "Name": "groupme", "StackId": "XXXXXX", ...
$ aws opsworks describe-instances --stack-id XXXXXX { "Instances": [ { "PublicDns": "ec2-55-55-555-555.compute-1.amazonaws.com", ...
Since OpsWorks holds the running state and is easily queried, there's no need to synchronize this data anywhere.
This was a great design decision espoused by Netflix's Asgard.
OpsWorks is pre-baked with Chef.
We provide custom cookbooks.
Chef is only run at setup and teardown.
The formation tells us what process to run, and how many.
On Heroku, it sort of looks like this:
$ heroku scale web=10 -a groupme Scaling web dynos... done, now running 10
On OpsWorks, we map layers to formations. So in turn, that maps Procfile processes to EC2 instances.
The mapping is maintained by a little app called meta.
Basically, a simple REST interface on top of DynamoDB.
$ curl https://meta/apps/show/groupme { "formation.app": "web=1", "formation.worker": "worker=4", "formation.consumer": "consumer=1" ... }
$ echo "web=1" > /etc/app/formation
Config vars provide runtime configuration.
From Heroku:
$ heroku config -a groupme RAILS_ENV: production REDIS_URL: redis://ec2-123-123-10-1.compute-1.amazonaws.com:6379/0 DATABASE_URL: postgres://...
Once again, meta is in charge of this.
$ curl https://meta/config_vars/show/groupme { "RAILS_ENV": "production", "REDIS_URL": "redis://ec2-44-44-444-444.amazonaws.com:6479/0", "DATABASE_URL": "postgres://user:secret@ec2-66-66-666-666.amazonaws.com:5432/", ... }
$ echo "RAILS_ENV=..." > /etc/app/env
Here, we're currently using two approaches.
First, a pretty traditional remote git repo.
$ git fetch origin && git reset --hard $REF
And also, a Heroku-style slug-based deploy.
$ curl -s -o /tmp/SLUG https://anvil/slugs/SLUG $ tar x -C /tmp/app /tmp/SLUG
Upstart script snippet:
exec su groupme -l -s /bin/bash -c \ "rvm-exec default foreman start \ -p $PORT \ -f /tmp/app/Procfile \ -e /etc/app/env \ -m $(cat /etc/app/formation) 2>&1" >> /var/log/app/app.log
$ gun deploy $ gun console $ gun console.db $ gun status $ gun instances $ gun logs $ gun ps.restart $ gun ssh.host $ gun git.compare
Inspired by l2met, this is an EventMachine HTTP server that parses syslog-formatted logs and extracts metrics.
Those metrics are fed to statsd, and ultimately, graphite.
$ heroku scale web=10 -a groupme
$ gun app:groupme instances.create:app, \ count=3, \ instance_type=c1.xlarge, \ availability_zone=us-east-1a