Download PDF RealWorld SRE The Survival Guide for Responding to a System Outage and Maximizing Uptime Nat Welch 9781788628884 Books

By Nelson James on Sunday, April 14, 2019

Download PDF RealWorld SRE The Survival Guide for Responding to a System Outage and Maximizing Uptime Nat Welch 9781788628884 Books


https://images-na.ssl-images-amazon.com/images/I/41AmDo1p55L._SX331_BO1,204,203,200_.jpg

Download As PDF : RealWorld SRE The Survival Guide for Responding to a System Outage and Maximizing Uptime Nat Welch 9781788628884 Books

Download PDF RealWorld SRE The Survival Guide for Responding to a System Outage and Maximizing Uptime Nat Welch 9781788628884 Books

This hands-on survival manual will give you the tools to confidently prepare for and respond to a system outage.

Key Features

  • Proven methods for keeping your website running
  • A survival guide for incident response
  • Written by an ex-Google SRE expert

Book Description

Real-World SRE is the go-to survival guide for the software developer in the middle of catastrophic website failure. Site Reliability Engineering (SRE) has emerged on the frontline as businesses strive to maximize uptime. This book is a step-by-step framework to follow when your website is down and the countdown is on to fix it.

Nat Welch has battle-hardened experience in reliability engineering at some of the biggest outage-sensitive companies on the internet. Arm yourself with his tried-and-tested methods for monitoring modern web services, setting up alerts, and evaluating your incident response.

Real-World SRE goes beyond just reacting to disaster―uncover the tools and strategies needed to safely test and release software, plan for long-term growth, and foresee future bottlenecks. Real-World SRE gives you the capability to set up your own robust plan of action to see you through a company-wide website crisis.

The final chapter of Real-World SRE is dedicated to acing SRE interviews, either in getting a first job or a valued promotion.

What you will learn

  • Monitor for approaching catastrophic failure
  • Alert your team to an outage emergency
  • Dissect your incident response strategies
  • Test automation tools and build your own software
  • Predict bottlenecks and fight for user experience
  • Eliminate the competition in an SRE interview

Who this book is for

Real-World SRE is aimed at software developers facing a website crisis, or who want to improve the reliability of their company's software. Newcomers to Site Reliability Engineering looking to succeed at interview will also find this invaluable.

Table of Contents

  1. Introduction
  2. Monitoring
  3. Incident Response
  4. Postmortems
  5. Testing & Releasing
  6. Capacity Planning
  7. Building Tools
  8. User Experience
  9. Networking Foundations
  10. Linux And Cloud Foundations

Download PDF RealWorld SRE The Survival Guide for Responding to a System Outage and Maximizing Uptime Nat Welch 9781788628884 Books


"This book sets up a framework for SRE supported by real-life examples and experiences. It focuses not only on the positive side, but also emphasizes continuous improvement of SRE tools and processes, as well as engineer skills and happiness. It's full of examples and use cases that present the perspective of companies of all sizes and priorities. I thought I was a complete SRE newbie when I started reading it and I was surprised how many of the outlined strategies followed best practices and common sense I was already familiar with.

I especially liked Nat's focus on communication - positive, encouraging, teamwork-driven culture that automates the boring parts of the job and gives space for growing in other areas. Building tools and thinking about user experience, especially for users we might never meet or get feedback from, shows the long-term focus on delivering quality products. Nat never shies away from sharing lessons learned from past experiences.

I think this book is a good introduction to SRE both on a detail-oriented level of building a system of alerts, metrics and schedules, as well as understanding the bigger picture and impact it can have on the whole team and company. There are certain parts of it I wouldn't expect to show up in such a technical read, e.g. examples of how to have a growth-mindset as an SRE expert.

I throughly enjoyed this book - the only caveat I experienced was that the code samples don't have syntax highlighting, so it was hard to parse them in longer snippets."

Product details

  • Paperback 340 pages
  • Publisher Packt Publishing (August 31, 2018)
  • Language English
  • ISBN-10 1788628888

Read RealWorld SRE The Survival Guide for Responding to a System Outage and Maximizing Uptime Nat Welch 9781788628884 Books

Tags : Real-World SRE The Survival Guide for Responding to a System Outage and Maximizing Uptime [Nat Welch] on . <b>This hands-on survival manual will give you the tools to confidently prepare for and respond to a system outage.</b> <h4>Key Features</h4> <ul><li>Proven methods for keeping your website running</li> <li>A survival guide for incident response</li> <li>Written by an ex-Google SRE expert</li> </ul> <h4>Book Description</h4> Real-World SRE is the go-to survival guide for the software developer in the middle of catastrophic website failure. Site Reliability Engineering (SRE) has emerged on the frontline as businesses strive to maximize uptime. This book is a step-by-step framework to follow when your website is down and the countdown is on to fix it. Nat Welch has battle-hardened experience in reliability engineering at some of the biggest outage-sensitive companies on the internet. Arm yourself with his tried-and-tested methods for monitoring modern web services,Nat Welch,Real-World SRE The Survival Guide for Responding to a System Outage and Maximizing Uptime,Packt Publishing,1788628888,Computers Internet / Software,COMPUTERS / Web / Web Programming,Computers/Software Development Engineering - Quality Assurance Testing,Computers/System Administration - Disaster Recovery,Computers / Software Development Engineering / Quality Assurance Testing,Computers / System Administration / Disaster Recovery

RealWorld SRE The Survival Guide for Responding to a System Outage and Maximizing Uptime Nat Welch 9781788628884 Books Reviews :


RealWorld SRE The Survival Guide for Responding to a System Outage and Maximizing Uptime Nat Welch 9781788628884 Books Reviews


  • While there is certainly already a book on SRE describing how Google runs their production systems, it can be a challenge to map their practices to your own system if they are not running at Google-scale. Nat has written a book that draws from his experiences as a SRE at Google along with his experiences at a variety of other companies operating at varying scales and budgets. His book is a well organized guide to SRE which is complete with many concrete examples of how to utilize readily available tooling and apply it to systems of all sizes.

    This book has something for everyone, from someone interested in making their personal site more resilient to an engineer on an SRE team who is responsible for large scale production systems. I highly recommend this book.
  • Most of the technical books I’ve read tend to fall into one of two categories. The first is a overview of the technical details that covers everything you’d like to know but is really dry. The other is a glorified autobiography of the author’s experience providing little technical information that can be found beyond a Wikipedia page. This book defies the odds and succeeds at providing a great level of technical detail while being an inherently easy to read. I certainly did not intend to finish half of it in one sitting, ignoring everything else I needed to do, but I did.

    What I like most about the book is that each major element of SRE isn’t just thrown out there as a fact. Nat introduces each topic follows it up with an explanation as to why each element is important and provides a story that shows why each major element is important. This style of writing is not only easily to read but it helps me retain It as well as having a concrete use case.

    Full disclosure I’ve known Nat since high school so I received a review copy for free. But since we had no issues making fun of each other back in school I’d certainly have no problem calling him out if this book was bad.
  • This book sets up a framework for SRE supported by real-life examples and experiences. It focuses not only on the positive side, but also emphasizes continuous improvement of SRE tools and processes, as well as engineer skills and happiness. It's full of examples and use cases that present the perspective of companies of all sizes and priorities. I thought I was a complete SRE newbie when I started reading it and I was surprised how many of the outlined strategies followed best practices and common sense I was already familiar with.

    I especially liked Nat's focus on communication - positive, encouraging, teamwork-driven culture that automates the boring parts of the job and gives space for growing in other areas. Building tools and thinking about user experience, especially for users we might never meet or get feedback from, shows the long-term focus on delivering quality products. Nat never shies away from sharing lessons learned from past experiences.

    I think this book is a good introduction to SRE both on a detail-oriented level of building a system of alerts, metrics and schedules, as well as understanding the bigger picture and impact it can have on the whole team and company. There are certain parts of it I wouldn't expect to show up in such a technical read, e.g. examples of how to have a growth-mindset as an SRE expert.

    I throughly enjoyed this book - the only caveat I experienced was that the code samples don't have syntax highlighting, so it was hard to parse them in longer snippets.
  • a great book that covers with details the ground from beginner to competent SRE. This is not only useful for SREs but anyone either working closely with SREs. My favorite chapter would be chapter 2 (monitoring) which goes over the more recent jargon and tools that SREs use that you will not find in a UNIX programming book.

    I also enjoyed chapter 9 about networking fundamentals. It explains in detail the way the internet works, but not enough detail to overwhelm the reader. Throughout the book, but especially on this chapter, the writer manages to keep the theoretical aspects grounded on practice by providing ways for the reader to test their knowledge with simple tools and techniques.

    I would recommend this book without hesitation.
  • This book really succeeds in covering the baselines of what you need to know for applying SRE to your organization, at a reasonable level. That is, Welch explicitly mentions multiple times in the book about scaling efforts to what makes sense for your company -- because in tech publishing, it's easy to act like everyone is Google, but we aren't. I learned something in every chapter, and I really appreciate the inclusion of UX and the "bonus" (to me) chapters on linux fundamentals. A great book for someone who needs to do SRE but doesn't come from an ops background.
  • Welch does a fantastic job describing the area of SRE. As the title suggests, he ties in much of his personal experience to explain the details of this field. The book covers a breath of topics, touching SRE for many different levels of scale. It is an easy read with a lot of useful information. I would highly recommend this book.
  • An easy to follow playbook with some insightful gems. Nat's book should be required reading for devops engineers and benefits anyone that is building or maintaining software. If you want to learn a lot about the headaches of building software and how to handle them (simulating years of experience), then this book is for you.