Site Reliability Engineer (SRE)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Requirements
- 5+ years of experience in SRE, DevOps, or infrastructure engineering roles
- Day-to-Day Activities:
- Managing and scaling cloud infrastructure and services
- Monitoring system health, alerting, and incident response
- Automating deployment, updates, and infrastructure provisioning
- Performing capacity planning and performance tuning
- Troubleshooting and resolving outages or performance issues
- Collaborating with development teams to improve architecture and resilience
- Conducting post-incident reviews and implementing preventative measures
- Maintaining documentation for infrastructure and processes
- Bachelor's or Master's degree in Computer Science, Information Technology, or related fields
- Proven experience working with cloud providers and infrastructure automation tools
- Relevant certifications in cloud platforms (AWS Certified Solutions Architect, GCP Professional Cloud Architect, etc.) are preferred
- Soft Skills:
- Strong analytical and problem-solving skills
- Excellent communication and collaboration skills
- Ability to work under pressure and handle incidents calmly
- Proactive mindset with a focus on automation and efficiency
- Eagerness to learn new tools and technologies
- Good organizational and time-management skills
- S YNECHRON'S DIVERSITY & INCLUSION STATEMENT
- All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant's gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law .
- Candidate Application Notice
Benefits
Additional Information
Overall Responsibilities: Design, implement, and maintain scalable, reliable infrastructure Automate deployment, scaling, and management of applications and services Monitor system health and troubleshoot issues proactively Participate in on-call rotations to ensure uptime and incident management Develop runbooks, best practices, and automation scripts Collaborate with development teams to improve system architecture and reliability Conduct performance tuning and capacity planning Improve observability and monitoring across the stack Document operational procedures and incident post-mortems Software Requirements: Strong experience with cloud platforms such as AWS , GCP , or Azure Proficiency in Linux/Unix system administration Knowledge of scripting languages: Python , Bash , or Go Experience with Infrastructure as Code (IaC) tools like Terraform , CloudFormation Familiarity with container orchestration : Kubernetes , Docker Swarm Monitoring and alerting tools: Prometheus , Grafana , Nagios , Datadog Configuration management tools: Ansible , Chef , Puppet CI/CD pipelines setup and management (Jenkins, GitLab CI, CircleCI) Log management and analysis: ELK Stack (Elasticsearch, Logstash, Kibana) Category-wise Technical Skills: Cloud Platforms: AWS, GCP, Azure Containerization and Orchestration: Docker, Kubernetes Scripting & Automation: Python, Bash, Go Infrastructure as Code: Terraform, CloudFormation Monitoring & Logging: Prometheus, Grafana, Nagios, ELK Stack Configuration Management: Ansible, Chef, Puppet CI/CD Tools: Jenkins, GitLab CI, CircleCI Operating Systems: Linux/Unix administration skills Networking & Security: VPNs, Load balancers, firewalls, SSL/TLS
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at synechron? Share your experience