Growing with Heroku

We need high-availability databases and read slaves!

Fork & Follow

Growing with Heroku

We want custom high-speed background workers!

Procfile

Growing with Heroku

Log archives and metrics would be nice.

Logplex

Growing with Heroku

Ruby's too slow!

Polyglot

Leaving Heroku

What's the low-hanging fruit to scaling this thing?

Postgres read slaves
DynamoDB
Services

Ultimately, the answer was to leave and run on EC2 directly.

Play with a net

Robert Frost ...well, sort of

Two Simple Rules

Developer-friendly

Heroku has turned us into little babies.
Devs should still wield power over their own ops.

See everything

A failure in monitoring delayed our migration.
Ensure that we don't degrade during the change.

Four Areas of Concern

Orchestration

Instance Management

Deployment

Monitoring & Logging

Four Areas of Concern

Orchestration

Which instances are running which apps?
Need to create logical, addressable groups of instances.
Netflix's Asgard

Four Areas of Concern

Instance Management

Setup EC2 instances to run apps.
Chef, Puppet, etc.

Four Areas of Concern

Deployment

Package-based? Source-based?
Capistrano

Four Areas of Concern

Monitoring & Logging

Real-time metrics for system and application health.
Archives for troubleshooting.
Nagios, Sensu
Graphite, Librato
l2met

AWS OpsWorks

Orchestration

OpsWorks gives us logical containers: stacks and layers.

These map nicely to Heroku apps and Procfiles.

AWS OpsWorks

Orchestration

$ cat Procfile

web:      bundle exec unicorn -c ./config/unicorn.rb
worker:   bundle exec rake resque:work INTERVAL=0.1 QUEUE=*
consumer: bundle exec rake consumer

AWS OpsWorks

Orchestration

AWS OpsWorks

Orchestration

$ aws opsworks describe-stacks
{
  "Stacks": [
    {
      "Name": "groupme",
      "StackId": "XXXXXX",
      ...

$ aws opsworks describe-instances --stack-id XXXXXX
{
  "Instances": [
    {
      "PublicDns": "ec2-55-55-555-555.compute-1.amazonaws.com",
      ...

AWS OpsWorks

Orchestration

Since OpsWorks holds the running state and is easily queried, there's no need to synchronize this data anywhere.

This was a great design decision espoused by Netflix's Asgard.

Chef

Instance Management

OpsWorks is pre-baked with Chef.

Chef

Instance Management

We provide custom cookbooks.

rvm & ruby
node.js and npm
nginx
syslog-ng

Chef

Instance Management

Chef is only run at setup and teardown.

Immutable Infrastructure

gun

Deployment

gun

Deployment

1. Upload formation

The formation tells us what process to run, and how many.

On Heroku, it sort of looks like this:

$ heroku scale web=10 -a groupme
Scaling web dynos... done, now running 10

On OpsWorks, we map layers to formations. So in turn, that maps Procfile processes to EC2 instances.

gun

Deployment

1. Upload formation

The mapping is maintained by a little app called meta.

Basically, a simple REST interface on top of DynamoDB.

gun

Deployment

1. Upload formation

$ curl https://meta/apps/show/groupme
{
  "formation.app":      "web=1",
  "formation.worker":   "worker=4",
  "formation.consumer": "consumer=1"
  ...
}

$ echo "web=1" > /etc/app/formation

gun

Deployment

2. Upload config vars

Config vars provide runtime configuration.

From Heroku:

$ heroku config -a groupme
RAILS_ENV:       production
REDIS_URL:       redis://ec2-123-123-10-1.compute-1.amazonaws.com:6379/0
DATABASE_URL:    postgres://...

gun

Deployment

2. Upload config vars

Once again, meta is in charge of this.

$ curl https://meta/config_vars/show/groupme
{
  "RAILS_ENV":    "production",
  "REDIS_URL":    "redis://ec2-44-44-444-444.amazonaws.com:6479/0",
  "DATABASE_URL": "postgres://user:secret@ec2-66-66-666-666.amazonaws.com:5432/",
  ...
}

$ echo "RAILS_ENV=..." > /etc/app/env

gun

Deployment

3. Update code

Here, we're currently using two approaches.

gun

Deployment

3. Update code

First, a pretty traditional remote git repo.

$ git fetch origin && git reset --hard $REF

gun

Deployment

3. Update code

And also, a Heroku-style slug-based deploy.

$ curl -s -o /tmp/SLUG https://anvil/slugs/SLUG
$ tar x -C /tmp/app /tmp/SLUG

gun

Deployment

4. Start it up

Upstart script snippet:

exec su groupme -l -s /bin/bash -c  \
  "rvm-exec default foreman start   \
     -p $PORT                       \
     -f /tmp/app/Procfile           \
     -e /etc/app/env                \
     -m $(cat /etc/app/formation) 2>&1" >> /var/log/app/app.log

gun

A quick aside

$ gun deploy
$ gun console
$ gun console.db
$ gun status
$ gun instances
$ gun logs
$ gun ps.restart
$ gun ssh.host
$ gun git.compare

Spillway

Monitoring & Logging

Inspired by l2met, this is an EventMachine HTTP server that parses syslog-formatted logs and extracts metrics.

Those metrics are fed to statsd, and ultimately, graphite.

Spillway

Monitoring & Logging

Spillway

Monitoring & Logging

Driven by syslog-ng.

Unfortunately, syslog-ng doesn't distribute well.

Scribe
Flume

Miscellany

Monitoring & Logging

CopperEgg

PagerDuty

New Relic

And that's V1

Please, hold your applause

Faster

What's next?

A few seconds

$ heroku scale web=10 -a groupme

A few minutes

$ gun app:groupme instances.create:app, \
    count=3,                            \
    instance_type=c1.xlarge,            \
    availability_zone=us-east-1a