If you are a Principal Engineer of Site Reliability, looking for an opportunity to join a disruptive, fast-growing and stable business then this could be the right role for you!
You will get to drive the system architecture, make fundamental technology environment decisions and be responsible for the reliability and stability of their sites.
· Build the roadmap for all engineering teams to use highly scalable, observable, and reliable infrastructure services.
· Drive the execution and delivery of this roadmap across the engineering organization.
· Mentor and coach other engineers to help solve technical challenges and grow overall technical abilities.
· Propose and drive large improvements to production systems to achieve a significant impact on the business and engineering teams.
· Make iterative improvements to incident management processes, incident post-mortems, reliability initiatives, and system recovery planning across the organization.
· Set practices for KPI measurement and monitor for logging, monitoring, alerting, and on-call practices
· 5+ years of experience in professional programming experience in an enterprise environment.
· Expert experience with cloud providers, AWS is a must.
· Deep knowledge of reliability testing at scale.
· Experience with multiple languages, web frameworks, and datastores.
Architecture, SDLC, SRE, Site, Reliability, Architecture, Technology, Mentor, Coach, Python, JS, OpenCV, React, Reactive Cocoa, Tornado, AWS, REST, JSON-RPC, SOAP/WSDL