To Apply for this Job Click Here
Job Summary
We are seeking a highly skilled and motivated AWS Cloud DevOps / Site Reliability Engineer (SRE) to join our team. This role focuses on building, automating, and maintaining reliable, secure, and scalable AWS infrastructure while supporting CI/CD pipelines and improving system observability. The ideal candidate is passionate about automation, resilient cloud systems, and continuous improvement in software delivery and operations.
Key Responsibilities
-
Design, build, and maintain scalable and secure AWS cloud infrastructure using services such as Lambda, EC2, S3, RDS, API Gateway, VPC, and IAM.
-
Develop and manage Infrastructure as Code (IaC) using Terraform and Terragrunt.
-
Implement and optimize CI/CD pipelines using Azure DevOps or similar tools.
-
Automate deployments and integrate IaC into delivery workflows.
-
Maintain monitoring, alerting, and observability systems using CloudWatch, Dynatrace, and Splunk.
-
Troubleshoot infrastructure and application issues, conduct root cause analysis, and support incident response.
-
Apply security best practices, manage IAM roles and policies, and perform vulnerability assessments.
-
Optimize system performance and cost efficiency through automation and tuning.
-
Collaborate with development teams to support application reliability and deployments.
-
Create and maintain comprehensive runbooks and documentation for operational procedures.
-
Participate in on-call rotation and continuously improve system resilience and recovery processes.
Required Qualifications
-
Bachelor’s Degree in Computer Science, Engineering, or a related field.
-
3+ years of experience in a DevOps or SRE role.
-
Hands-on experience with AWS cloud services and Infrastructure as Code tools like Terraform.
-
Proficiency with scripting languages such as Python, TypeScript, or Boto3.
-
Strong understanding of CI/CD concepts and experience with tools such as Azure DevOps.
-
Familiarity with monitoring/observability platforms: CloudWatch, Splunk, Dynatrace.
-
Solid grasp of cloud security, networking fundamentals, and cost management.
Preferred Skills
-
Experience optimizing infrastructure for cost and performance.
-
Knowledge of serverless architectures and event-driven systems.
-
Passion for automation, system reliability, and continuous improvement.
-
Excellent communication, team collaboration, and problem-solving skills.