Senior Software Engineer - Site Reliability - India Remote

Introduction

We are seeking an accomplished Senior Site Reliability Engineer (SRE) to lead the design, implementation, and evolution of highly available, scalable, and resilient systems across our multi-cloud infrastructure. In this senior role, you will drive architectural decisions, establish reliability standards, and mentor teams while ensuring operational excellence across complex distributed systems. You will partner with engineering leadership, development teams, and product stakeholders to shape infrastructure strategy, implement sophisticated automation, and champion a culture of reliability engineering.

At Axelerant, we are committed to fostering an environment where innovation and operational excellence thrive. As a Senior SRE, you'll tackle sophisticated, large-scale challenges using cutting-edge technologies across AWS and Azure platforms. You will lead critical initiatives that impact system reliability at scale, architect solutions for complex infrastructure problems, and guide teams in adopting industry-leading practices that drive meaningful improvements across our entire technology ecosystem.

Key Responsibilities

Architect and implement highly reliable, scalable, and cost-effective infrastructure solutions for mission-critical applications across multi-cloud environments (AWS and Azure).
Lead the definition and refinement of service level objectives (SLOs), service level indicators (SLIs), and error budgets, establishing reliability standards across the organization.
Design and implement sophisticated Infrastructure as Code (IaC) solutions using Terraform, Ansible, and Azure Resource Manager (ARM) templates or Bicep.
Drive automation strategies to eliminate toil, improve operational efficiency, and enable self-service capabilities for development teams.
Lead incident response efforts, conduct thorough post-incident reviews, and implement systemic improvements to prevent recurrence.
Champion cloud-native architectures and modern reliability practices, serving as a technical advisor for infrastructure and platform decisions.
Mentor junior SREs and engineers, fostering a culture of reliability, observability, and continuous improvement.
Participate in and help optimize the on-call rotation, ensuring sustainable practices and effective escalation procedures.
Establish and maintain comprehensive documentation standards, runbooks, and knowledge repositories that enable team autonomy and effective incident response.
Design and implement advanced monitoring, logging, and alerting strategies using observability platforms to enable proactive issue detection and resolution.
Lead container orchestration initiatives using Kubernetes (AKS, EKS) and implement sophisticated deployment strategies including blue-green, canary, and progressive delivery patterns.
Ensure security, compliance, and governance standards are embedded throughout the infrastructure lifecycle, implementing security-as-code practices.
Drive capacity planning, performance optimization, and cost management initiatives across cloud platforms.
Collaborate with architecture and security teams to establish platform standards, reference architectures, and best practices.

Skills, Knowledge and Expertise

5+ years of proven experience as a Site Reliability Engineer or similar role, with demonstrated expertise in designing, implementing, and operating large-scale, distributed systems.
Deep expertise in Infrastructure as Code (IaC) with Terraform and Ansible, including module development, state management, and multi-environment orchestration.
Extensive hands-on experience with both AWS and Azure cloud platforms, including advanced services, networking, and security features in both environments.
Expert-level knowledge of container orchestration with Kubernetes, including architecture, custom resource definitions (CRDs), operators, service mesh implementations, and production-scale cluster management.
Advanced proficiency in Linux system administration, performance tuning, and troubleshooting complex system-level issues.
Proven experience implementing GitOps workflows using ArgoCD, Flux, or similar tools, including advanced deployment patterns and progressive delivery.
Deep understanding of observability principles and hands-on experience with tools such as Prometheus, Grafana, Datadog, Azure Monitor, or the ELK stack.
Expert knowledge of networking concepts, including load balancing, CDNs, DNS, VPNs, service mesh architectures, and distributed systems communication patterns.
Strong programming and scripting capabilities in Python, Bash, Go, or PowerShell, with the ability to develop custom tooling and automation frameworks.
Extensive experience designing and optimizing CI/CD pipelines using Jenkins, GitLab CI, Azure DevOps, GitHub Actions, or CircleCI.
Demonstrated ability to lead incident response, conduct root cause analysis, and drive systemic reliability improvements.
Excellent communication and leadership skills with proven ability to influence technical decisions and collaborate with stakeholders at all levels.
Current certification in AWS (Solutions Architect Associate/Professional or equivalent) and Azure (Azure Administrator or Azure Solutions Architect), with practical experience managing production workloads on both platforms.

Good To Have

Experience with hybrid and multi-cloud networking strategies, including ExpressRoute, Direct Connect, and cloud interconnects.
Knowledge of serverless architectures on AWS (Lambda) and Azure (Functions, Logic Apps) and their operational considerations.
Proven experience with disaster recovery planning, business continuity, and implementing multi-region active-active architectures.
Understanding of machine learning operations (MLOps), data pipeline orchestration, and supporting ML workloads in production.
Experience with service mesh technologies such as Istio, Linkerd, or Consul.
Familiarity with chaos engineering principles and tools like Chaos Monkey or Gremlin.
Experience with configuration management at scale and policy-as-code tools like Open Policy Agent (OPA).
Knowledge of FinOps principles and cloud cost optimization strategies.

What Would Success Look Like For You?

Success in this role means establishing and maintaining industry-leading reliability standards, consistently achieving or exceeding SLOs, and driving strategic initiatives that significantly enhance system resilience and operational maturity. You will be recognized for your technical leadership, ability to architect solutions that prevent classes of incidents, your impact on team capability through mentorship, and your contribution to establishing a robust reliability engineering culture across the organization.

Your Work's Impact:

Your contributions will fundamentally shape the reliability, performance, and scalability of our platform, directly enabling engineering teams to innovate with confidence and deliver exceptional value to our customers. Your architectural decisions and reliability practices will influence system design across the organization, setting standards that ensure operational excellence at scale.

Why Work At Axelerant?

We're a people-centric company, driven by our core values: Openness, Enthusiasm, and Kindness.

We highly value our people and invest in their growth and well-being through progressive benefits, which puts us among India's top 40 companies in health and wellbeing.

Excellent work exposure - Some of our recent clients were the UN, the University of East London, and Doctors Without Borders.
Meaningful projects to contribute back - Most of our projects are in the education, government, healthcare, and not-for-profit sectors. We also encourage and support team members for open-source contributions.
Work-life flexibility and remote work - You decide when and where to work. This has allowed many team members, who couldn’t have held a regular job otherwise, to have thriving careers.
Eight-hour workdays - We don't say 8 hours and expect 12 hours minimum.
No micromanagement - Micromanagement makes us grunt like the Hulk. So nobody would be looking over your shoulders. But help is always available when asked.
No discrimination - We believe in equal pay for equal work. Personal decisions like planning to have children will not stop you from getting promoted.
Championing inclusivity - We like diversity. It enriches our lives and products. If you see something wrong or that could be better on day 1, share through established channels to bring positive change. We listen.
Meaningful time off - 52 weekends and 40 days per year of consolidated leave, plus maternity, paternity, adoption, and sabbatical allowances. We also have Kindness leaves for emergencies.
Family Medical Insurance - You want your family’s health secured. So do we. We got you, your spouse, and your little ones covered. And free doctor and health and wellness consultations from medical experts, whenever you need.
Performance coaching - Our professional, empathetic coaches will help you become your best version through career and personal development.
Event sponsorship - If your session at any event is selected and aligns with sponsorship guidelines, we cover all expenses for the trip, whether domestic or international.
Continuing education allowance - We’ll cover up to 2% of your annual salary yearly for classes, certifications, or buying books to further your capabilities.

There Are Many Other Progressive Benefits:

Health and wellness allowance
Generous home office set-up allowance
Sponsored team meet-ups
Co-working space allowance
Event allowance

Growth can't be one-sided.
When you grow, we grow.

Apply Now

About Axelerant

As a global company that puts care into employee happiness, engineering excellence, and customer success, Axelerant brings together top talent, success management as our service framework, and an unconventional work environment that empowers — to deliver transformational outcomes for our clients and team members alike.

Apply for this role

Required fields are marked with an asterisk (*).

Speed this up — upload your CV

Drop in your résumé (PDF) or a PDF export of your LinkedIn profile and we'll fill out as much of the form as we can. You'll review everything before submitting.

We send your CV to a third-party AI service (Google Gemini via OpenRouter) to read it. AI can make mistakes — please verify all entries before you submit.

Click to upload your CV (PDF)

PDF only · 10 MB max

Senior Software Engineer - Site Reliability

Why?

Introduction

Key Responsibilities

Skills, Knowledge and Expertise

Why Work At Axelerant?

Growth can't be one-sided.
When you grow, we grow.

About Axelerant

Apply for this role

Speed this up — upload your CV

Don’t see the job you’re looking for?

Senior Software Engineer - Site Reliability

Why?

Introduction

Key Responsibilities

Skills, Knowledge and Expertise

Why Work At Axelerant?

Growth can't be one-sided.When you grow, we grow.

About Axelerant

Apply for this role

Speed this up — upload your CV

Don’t see the job you’re looking for?

Growth can't be one-sided.
When you grow, we grow.