Skip to main content
Back to jobs

Site Reliability and Observability Engineer

External
hcf logoHcf · Sydney Cbd
Full-timeOn-site2w ago
AgileAnsibleAzureCI/CDComplianceConfluence
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Maintain and enhance automation frameworks for application deployment, configuration management, and patching.
  • Identify opportunities for process improvements, drive initiatives to enhance system performance, and implement best practices for DevOps and automation.
  • Develop dashboards, alerts, and analytics to monitor system health, performance, and user experience.
  • Troubleshoot and resolve system faults, performance bottlenecks, and integration issues across environments.
  • Support BAU activities including environment maintenance, break-fix resolution, and batch job troubleshooting.
  • Lead infrastructure-as-code (IaC) initiatives and environment provisioning using Ansible, Terraform, and related tools.
  • Maintain comprehensive documentation for infrastructure, automation processes, and system configurations to ensure knowledge sharing and continuity.
  • Collaborate with L1/L2 teams to mentor, document, and automate manual processes.
  • Work with cross-functional teams to deliver automation solutions for business processes.
  • Engage with vendors and partners to support platform upgrades, license management, and compliance.
  • Implement and maintain security best practices, conduct regular security audits, and ensure compliance with industry standards and regulations.
  • Represent the team in strategic initiatives and external forums where applicable.
  • Perform work in a manner that complies with relevant regulatory standards including Work Health & Safety (WHS) legislation.
  • Essential Skills
  • Bachelor's degree in Computer Science, Information Technology, or equivalent experience.
  • 5+ years in DevOps, Site Reliability, or Platform Engineering roles.
  • Hands-on experience with Linux (SUSE, RedHat), Windows Server, and container platforms (Docker, Kubernetes).
  • Strong experience designing and managing configuration management and automation platforms.
  • In-depth knowledge of server operating systems and managing microservices-based applications.
  • Strong understanding of security best practices and experience in implementing security measures in DevOps processes.
  • Familiarity with collaboration tools like Jira and Confluence for project management, as well as Agile methodologies and practices.
  • Excellent problem-solving, documentation, and communication skills.
  • Desirable (Any 2)
  • Experience with Azure DevOps, Ansible, Terraform and scripting languages (PowerShell, Python).
  • Experience with CI/CD pipelines, repo management and Infrastructure as Code (IaC) tooling.
  • Experience of cloud platforms and cloud native services.
  • Familiarity with RESTful APIs, service integrations, and secure automation practices.
  • Experience with ServiceNow ITOM, CMDB, and MID server configuration.
  • Exposure to database performance tuning and backup strategies (SQL, Informix, RDS)
  • About HCF
  • Culture & Benefits
  • Purpose-driven passion
  • We're united by a common purpose: to make healthcare affordable, understandable, high quality and member-focused.
  • Wellness and work-life balance
  • We'll empower you with the necessary skills and tools to support your personal wellbeing journey, ensuring you perform at your best. Our offerings include:
  • Flexible working arrangements
  • 50% subsidy on HCF hospital and/or extras cover
  • Family-friendly certified employer
  • 18 weeks of parental leave for all new parents
  • Mental health and wellbeing programs, including workshops, fitness classes, flu vaccinations, skin checks and more
  • Discounts on HCF's products, including life, pet and travel Insurance, as well as discounts at Fitness First gyms and on our eyecare products.
  • Collaboration and inclusivity

Benefits

Health insuranceVision insuranceFlexible scheduleParental leave

Additional Information

The Site Reliability and Observability Engineer plays a pivotal role in the Enterprise Services & Automation team, focusing on the design, implementation, support and maintenance of scalable, secure, and automated infrastructure services. This position is crucial for leading automation initiatives and supporting critical applications and core platforms.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at hcf? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect