Senior Site Reliability Engineer

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

As a Site Reliability Engineer, you will be part of a team that is passionately automating everything possible to make Guidewire systems run more efficiently. The Platform team is dedicated full-time to creating and running software that improves the reliability of systems in production, serving hundreds of customers and supporting millions of transactions each day.

Ensure the reliability of Guidewireâs flagship cloud platform and InsuranceSuite products

Build tooling to help ensure efficient operations and optimal availability of all SaaS multi-tenant and customer-focused systems

Collaborate closely with Guidewireâs core product developers to ensure that the Guidewire core cloud products address functional and non-functional requirements such as availability, performance, observability, and maintainability

Engage with product development (PD) teams by participating in design reviews and production readiness checks

Analyze data from observability and monitoring tools to improve operational metrics of microservices as well as the entire platform

Create system documentation and training materials to empower and educate team members

Oversee and automate the teamâs growing presence in AWS

Build and maintain observability tooling, metrics, and dashboarding for a global platform product infrastructure

Improve incident management lifecycle to identify, mitigate, and learn from reliability risks and issues

Qualifications

Bachelorâs Degree in Computer Science or related field

Software engineering and task automation skills with Bash, Python, and/or Go

Experience supporting web applications running on Java / Apache / Tomcat in a live production environment

Familiarity with the Agile software development lifecycle

Deep background with Linux systems and engineering

Highly experienced with engineering and automating on Amazon Web Services (AWS)

Prior experience with IaC tools like Terraform/Terragrunt/Terraspace

Prior experience with devops/gitops tools (Git, Bitbucket, Flux CD, Teamcity)

Production-At-Scale support background in a heavily microservice-based world

Hands-on engineering and ops expertise in containerization (Docker, Helm, Kubernetes/EKS, CNI and Ingress networking)

Strong understanding of Single-Sign On, SAML, OAuth (Bonus if hands-on experience with Okta)

Seasoned expertise around x.509 certificate technology and basic concepts of encryption

Experience working with Relational Databases such as Aurora Postgres and/or Oracle RDS

Advanced exposure to application development, web UI (design and development), JSON, application architecture

Experience strongly utilizing observability tools (logging/APM) like Datadog, CloudWatch, and PagerDuty

Familiarity with event store/stream-processing technologies like Kafka or AWS SQS

Understanding of Open Application Model systems such as KubeVela or Crossplane

Requirements

Ability to read, write, and speak English

Ability to speak in public settings, interface with customers, partners and vendors confidently

Travel â Up to 25% of the job will require travel, approximately a week a month

Personal Qualities and Soft Skills

Greatly prefer writing code than clicking a GUI

Enjoy teaching, being a mentor to others, and working across boundaries

Outstanding troubleshooting skills; ability to think critically and display an aptitude for problem solving

Strong analytical mind with a penchant for process development and enhancement

A highly positive can-do attitude with desire for being a team player

Great communication skills and ability to explain complex technical concepts to a varied audience

Demonstrate strong follow-through, a strong work ethic and consistently keep and meet commitments

Speak Japanese

Apply To This Job

Senior Site Reliability Engineer

Similar Jobs

Recent Jobs

You May Also Like