Site Reliability and Observability Engineer

External

Hcf · Sydney Cbd

Full-timeOn-site2w ago

AgileAnsibleAzureCI/CDComplianceConfluence

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Maintain and enhance automation frameworks for application deployment, configuration management, and patching.
Identify opportunities for process improvements, drive initiatives to enhance system performance, and implement best practices for DevOps and automation.
Develop dashboards, alerts, and analytics to monitor system health, performance, and user experience.
Troubleshoot and resolve system faults, performance bottlenecks, and integration issues across environments.
Support BAU activities including environment maintenance, break-fix resolution, and batch job troubleshooting.
Lead infrastructure-as-code (IaC) initiatives and environment provisioning using Ansible, Terraform, and related tools.
Maintain comprehensive documentation for infrastructure, automation processes, and system configurations to ensure knowledge sharing and continuity.
Collaborate with L1/L2 teams to mentor, document, and automate manual processes.
Work with cross-functional teams to deliver automation solutions for business processes.
Engage with vendors and partners to support platform upgrades, license management, and compliance.
Implement and maintain security best practices, conduct regular security audits, and ensure compliance with industry standards and regulations.
Represent the team in strategic initiatives and external forums where applicable.
Perform work in a manner that complies with relevant regulatory standards including Work Health & Safety (WHS) legislation.
Essential Skills
Bachelor's degree in Computer Science, Information Technology, or equivalent experience.
5+ years in DevOps, Site Reliability, or Platform Engineering roles.
Hands-on experience with Linux (SUSE, RedHat), Windows Server, and container platforms (Docker, Kubernetes).
Strong experience designing and managing configuration management and automation platforms.
In-depth knowledge of server operating systems and managing microservices-based applications.
Strong understanding of security best practices and experience in implementing security measures in DevOps processes.
Familiarity with collaboration tools like Jira and Confluence for project management, as well as Agile methodologies and practices.
Excellent problem-solving, documentation, and communication skills.
Desirable (Any 2)
Experience with Azure DevOps, Ansible, Terraform and scripting languages (PowerShell, Python).
Experience with CI/CD pipelines, repo management and Infrastructure as Code (IaC) tooling.
Experience of cloud platforms and cloud native services.
Familiarity with RESTful APIs, service integrations, and secure automation practices.
Experience with ServiceNow ITOM, CMDB, and MID server configuration.
Exposure to database performance tuning and backup strategies (SQL, Informix, RDS)
About HCF
Culture & Benefits
Purpose-driven passion
We're united by a common purpose: to make healthcare affordable, understandable, high quality and member-focused.
Wellness and work-life balance
We'll empower you with the necessary skills and tools to support your personal wellbeing journey, ensuring you perform at your best. Our offerings include:
Flexible working arrangements
50% subsidy on HCF hospital and/or extras cover
Family-friendly certified employer
18 weeks of parental leave for all new parents
Mental health and wellbeing programs, including workshops, fitness classes, flu vaccinations, skin checks and more
Discounts on HCF's products, including life, pet and travel Insurance, as well as discounts at Fitness First gyms and on our eyecare products.
Collaboration and inclusivity

Benefits

Health insuranceVision insuranceFlexible scheduleParental leave

Additional Information

The Site Reliability and Observability Engineer plays a pivotal role in the Enterprise Services & Automation team, focusing on the design, implementation, support and maintenance of scalable, secure, and automated infrastructure services. This position is crucial for leading automation initiatives and supporting critical applications and core platforms.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at hcf? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect