Senior Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Join Alvys, an early-stage company ambitiously transforming the $800B US freight industry. As a Senior Site Reliability Engineer, you will play a crucial role in ensuring the reliability, performance, and efficiency of our systems. Your expertise will contribute to the stability of our infrastructure and directly impact the quality of service we provide to our customers. In this role, you will collaborate closely with both the development and operations teams to build scalable and robust systems. Industry Insight Transportation logistics, a complex and fragmented domain, is ripe for technological revolution. You'll be at the forefront of automating and standardizing a sector that moves trillions of dollars' worth of goods annually, predominantly by truck, yet lacks modern tools and solutions. About Alvys Alvys is on a mission to revolutionize transportation logistics. Combining hands-on industry experience with a world-class technical vision, we're building a multi-tenant SaaS platform that's becoming an essential tool for transportation companies. Our Principles Engineering Excellence: We're committed to a principled approach, blending the best of practical and theoretical techniques to ensure superior code quality and architecture. End-to-End Ownership: Our collaborative environment ensures that if you build it, you run it. Blameless Culture: We focus on ownership and learning from mistakes in a supportive, finger-pointing-free environment. Core Values: Trust, transparency, and fairness are not just our company values-they're also the solution to the industry's underlying problems. Tech Stack Our cloud-native environment leverages Azure, .NET/C#, CosmosDB, Cognitive Search, and a suite of Azure services. Our front end utilizes JavaScript, TypeScript, Angular, Dart, and Flutter. Monitoring and alerting are handled by Azure Monitor & Application Insights, alongside numerous integrations with external services.
Responsibilities
- Ensuring the reliability and availability of our systems and services.
- Collaborating with development teams to enhance system scalability and performance.
- Implementing and managing continuous integration and deployment pipelines.
- Conducting incident response and post-incident analysis to prevent recurrence.
- Automating system deployment and scaling processes to improve efficiency.
- Monitoring system performance and responding to alerts and issues as they arise.
- Contributing to the development of tools and processes that support system reliability and performance.
- Participating in on-call rotations, providing critical support outside of standard working hours as needed.
Requirements
- At least 5+ years of experience in software development with a modern programming language such as Python, Go, Java, or similar.
- Proficiency in managing and troubleshooting Linux/Unix-based environments.
- Understanding of network fundamentals (TCP/IP, HTTP, DNS).
- Experience with monitoring tools and automation software (like Ansible, Puppet, Chef).
- Proficiency in managing infrastructure as code using tools like Terraform or CloudFormation.
- Experience deploying and managing infrastructure in Azure.
- Strong analytical and problem-solving abilities, particularly in a high-pressure environment.
- Experience with continuous integration and continuous deployment practices.
- Knowledge of security best practices in a cloud environment.
- Experience in performance tuning and optimization, both at the code and system levels.
- Proven track record in incident response and post-mortem analysis.
Benefits
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at alvys? Share your experience