Overview
Join to apply for the Principal Site Reliability Engineer role at Groupon .
Groupon is a marketplace where customers discover new experiences and services every day, and local businesses thrive. We aim to transform our business with a relentless pursuit of results, offering autonomy and opportunities to impact scale and reliability.
Role
Principal Site Reliability Engineer (SRE Level V / VI)
You will play a central role in ensuring the performance, availability, and resilience of our platforms. Lead initiatives that redefine operational excellence, collaborate with diverse teams to implement technologies and best practices, and mentor engineers to grow their skills.
Key Responsibilities
- Architect and maintain fault-tolerant systems with uptime SLAs of 99.9% or higher.
- Drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and related tools.
- Create and optimize CI / CD pipelines for reliable, secure, and efficient software delivery.
- Build and enhance observability solutions (monitoring, logging, alerting) using Prometheus, Grafana, and the ELK stack.
- Define and achieve SLIs, SLOs, and error budgets aligned with business needs.
- Lead incident response during on-call rotations with rapid resolution and root-cause analysis.
- Design and execute performance testing, capacity planning, and scalability strategies for evolving workloads.
- Identify and resolve bottlenecks to improve system performance and developer efficiency.
- Mentor junior engineers and foster a collaborative, growth-oriented team environment.
- Guide architectural decisions to drive innovation and reliability.
Qualifications
10+ years in systems engineering, with at least 5+ years in SRE or DevOps roles.Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker).Proficiency in programming / scripting languages such as Python, Go, and Bash.Advanced knowledge of Infrastructure as Code (Terraform, Ansible).Deep understanding of networking, DNS, load balancing, and security principles.Proven track record of managing high-availability systems.Exceptional analytical and problem-solving skills.Preferred Qualifications
Cloud or container certifications (e.g., AWS / GCP / Azure, Kubernetes CKA).Experience in eCommerce, FinTech, or SaaS industries.Familiarity with Agile development processes and frameworks.What We Offer
Opportunity to work with cutting-edge technologies in a transformative environment.Collaborative and innovative work culture that values expertise and contributions.Professional growth and leadership development pathways.Chance to shape the future of reliable and scalable systems.Join us to push the boundaries of platform reliability and drive meaningful change in a fast-evolving digital world.
Groupon is an AI-First Company. We encourage candidates to leverage AI tools during the hiring process where it adds value. Groupons’s purpose is to build strong communities through thriving small businesses. For legitimate job openings, visit Groupon Careers.
Important Notices
Beware of Recruitment Fraud : Groupon follows a merit-based recruitment process without charging job seekers any fees. Be cautious of fake job offers or interviews. Always check our official career site at Groupon Careers for legitimate openings.
Job Details
Seniority level : Not ApplicableEmployment type : Full-timeJob function : Engineering and Information TechnologyIndustries : Technology, Information and Internet#J-18808-Ljbffr