Sys Admin/DevOps and developer.
This blog is mainly to display my different projects, as well as a place for me to share and discuss technology news and interests.
In this post, and the next couple of posts, I’ll be describing how you can very quickly start continuously delivering an ASP.NET project.
Before I go into details, let’s first briefly recap on what continuous delivery is, and why we do it. Continuous delivery is essentially an extension of continuous integration, which in its basic form is regularly committing source code to a version control system (VCS) with each commit being verified by an automated build and test. Continuous delivery enhances this approach by also automating deployments to one or more environments at the push of a button.The key to continuous delivery is automation as it enables teams to deploy software reliably, repeatedly, faster and more frequently, taking the pains and risks associated to manual deployments. This new-found ability to deploy on demand to any environment unlocks a number benefits, namely the speed in which features can be released into production (smaller releases, but more frequent = less risk). As mentioned, there are a number of benefits. But I won’t cover them all here; instead, here are a couple of links covering some benefits and principles:
The deployment pipeline
Continuous delivery is best visualised by the deployment pipeline, which is a set of quality gates a piece of software passes through on its way to release. Quality gates help shorten the feedback cycle of the build and deploy process.
A deployment pipeline may consist of the following:
- Commit source code into VCS
- Build/compile source code
- Create deployment package
- Run unit tests
- Run acceptance tests
- Deploy to a production like environment
Versioned deployment packages
Every commit to the VCS should trigger a build, every build should generate a deployment package, and every deployment package can potentially be deployed to any one of your environments. A deployment package is essentially a versioned zip file containing your build output.
In ASP.NET you could use a Web Deployment Package (msdeploy), however, when you create a Web Deployment Package, by default, you’re creating it for ‘Release’ (or other) configuration, which isn’t very helpful if you need to apply environment specific web.config transformations. If you need to apply environment specific web.config transformations then you’ll need to take a different approach. You’ll need to first create a ‘parameters.xml’ which defines overrideable config parameters, and you’ll also need to create an environment specific xml file containing the overrides. Documentation on how to do this can be found here: Configuring Parameters for Web Package Deployment
I find the Web Deployment Package approach a bit messy. I like my deployment packages to contain all of my environment specific configuration for a more ‘complete’ deployment package, so I choose to package my build output into a NuGet package. NuGet packages have rich metadata, such as versioning and author information, and they can be consumed via a feed, which is ideal for continuous delivery. Obviously, most .NET developers are more familiar with NuGet packages containing a reusable library, not a deployment package, but this is a genuine and practical use-case for NuGet packages. You’ll need a package repository to host your NuGet packages; I recommend you host an internal NuGet package repository. Personally, I use ProGet as there’s a free version which lets you host the server yourself, and offers multiple feeds which is useful. There are also hosted NuGet solutions available, such as, myget. The choice is yours.
Tools & infrastructure
So, what tools and infrastructure do you need to start continuously delivering your projects? The diagram below illustrates continuous delivery infrastructure and four environments.
The continuous delivery tools and infrastructure set up is made up of the following:
- Version control system
- Continuous integration servers
- Package repository server
- Deployment server
- Reporting server (SonarQube)
It’s your choice whether you use git, svn, or both; likewise, it’s your choice what continuous integration server and package repository you use. You’ll be able to continuously deliver projects with whatever tools you choose to use. At the end of the day, continuous delivery is powered by automation, and all continuous integration servers are capable of running custom commands such as MSBuild.
The number of servers you decide to use is again entirely up to you; you could load all of the above tools onto a single server, but I don’t recommend doing that. The bare minimum I recommend is balancing the tools installation across 2–3 servers. Your continuous integration server may support remote agents which will allow you to scale your build infrastructure with minimal hassle at a later point in time.
Hopefully from this post you’ve gained a fairly comprehensive overview of continuous delivery, and the tools & infrastructure required to get started. In the next post I’ll provide you with an ASP.NET MVC Web API template that’s been optimised for continuous delivery, I’ll explain how it’s different to the standard Visual Studio out the box template and how it’ll help you start continuously delivering faster.
This is a summary of the video I watched at infoq.com, titled "Growing from the Few to the Many: Scaling the Operations Organization at Facebook". The presenter is Pedro Canahuati who leads the Infrastructure Production Engineering and Site Reliability teams at Facebook. Warning: It is heavily paraphrased to improve reader understanding.
Originally in 2009, the team was called SRE and was on call 24/7. Every time there was any type of problem on the site or something went down, SRE just jumped into the problem area and tried to put out fires.
The problem the team was facing was that it was too “interupt driven”, so any time there was a problem with a server, an alert went out that a human had to deal with by manually SSH’ing in.
To give some scope to this model, in a 30 minute time period during 2012, the following was being handled by the servers:
- 5 billion realtime messages sent
- 3.8 trillion cache operations
- 10 billion profile pics served
- 160 million newsfeed stories created
- 7 million photos uploaded
- 200 billion objects checked
- 300 million objects blocked
So imagine the number of problems and errors that were being handled by SRE!
Step 1 of the Solution – Restructuring
Facebook’s realized that it wasn’t structured in a way that allowed it to fulfill its business goals, maintain growth and maintain performance.
The first thing they did was to reorganize their Ops team. They renamed it from SRE to SRO, and it’s only task was to keep the site running.
This indicated to the entire company that the team had a new focus, and would no longer “do everything for everybody”.
Facebook then created a new team called ClusterOps, and their job was to turn up capacity, perform load tests, and bring up new infrastructure in an automated way.
Step 2 – Re-Prioritizing
Next, Facebook made a list of all the jobs that Ops were performing, and tried to find a way to remove the human component from that job.
For example, if an employee SSH’ed into a box, this was logged. All such tasks were logged and then graphed every 2 weeks, so Facebook were able to see which areas of work were causing the most amount of pain.
Another pain point that became obvious was that Facebook simply didn’t have enough Ops Engineers, so they went on a recruitment drive to bring the needed talent to the company.
Step 3 – Reorganizing
After looking at all the services of the company, Facebook reorganized its services into 6 core service units, and reassigned Ops Engineers to them accordingly. This was a very hard conversation to have with the engineering managers because it was ‘different to the way things had been done’.
Incident Managers On Call (IMOCS) were also assigned, and whenever something went down, they made sure that the right set of people was looking at the problem. The key focus of this strategy was to not care about blame. Rather, what they were looking for was what caused the error and what they needed to fix so that it didn’t happen again.
Step 4 – Retooling
Automated tools were needed to monitor server health, so Facebook created graphical reporting tools that displayed heat maps of server health. This way, you could see if a cluster was having problems if it transitioned from Green (OK) to Yellow (warning) to Red (having serious problems). At a glance, you could see how Facebook’s entire fleet of servers was handling the load.
Previously, this was all done manually.