Skip to main content
Back to jobs

Senior System Debug Engineer

External
Intel logoIntel · Bangalore, India
Full-timeHybrid6d ago
Computer VisionCross-functional CollaborationDeep LearningDocumentationLinuxMachine Learning
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Requirements

  • Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
  • A minimum of 10+ years of industry experience with a Bachelor's degree, 9+ years with a Master's degree, or 8+ years with a PhD.
  • Linux Kernel Debugging Expertise:
  • Extensive experience in diagnosing and resolving kernel-level issues, including deep understanding of Linux internals.Expert-level proficiency in developing and debugging applications in Linux system and user space.
  • Strong System-Level Debug Skills:
  • Proven ability to isolate, root-cause, and resolve complex hardware-software interaction issues across multiple system components.
  • Deep Technical Knowledge Across Key Domains:
  • Proficient in RAS, Power Management (PM), PCIe, Performance, Security, Ethernet, HBM, and GPU subsystems, with strong command of logs, traces, instrumentation, and debug tools used in system bring-up and validation.
  • Machine Learning Framework Experience (Good to Have):
  • Familiarity with frameworks such as PyTorch and TensorFlow, and their application in data science and model development workflows.
  • AI/ML Deployment and Debugging (Good to Have):
  • Knowledge of deploying, debugging, and troubleshooting AI and machine learning models, including identifying and resolving model-level issues.
  • Platform Architecture Understanding (Good to Have):
  • Knowledge of Intel and ARM platform architectures, along with hands-on experience using related debug tools and frameworks.
  • Programming Skills:
  • Strong proficiency in Python, C, and C++.
  • Excellent Self-Learning and Communication Skills:
  • Ability to quickly acquire new technical knowledge, coupled with strong written and verbal communication capabilities for cross-functional collaboration.
  • GPU Architecture and Debug Expertise:
  • Strong understanding of GPU architecture, memory hierarchy, performance bottlenecks, and associated debug methodologies.
  • Don't miss this opportunity to join an elite team creating revolutionary AI solutions that shape tomorrow's technology. Apply now to make a meaningful impact at Intel.
  • Job Type:
  • Experienced Hire
  • Shift:
  • Shift 1 (India)
  • Primary Location:
  • India, Bangalore
  • Additional Locations:
  • Business group:

Benefits

Vision insurance

Additional Information

Job Details: Job Description: Responsible for the overall design and development of integrated Artificial Intelligence (AI) solutions for deep learning and machine learning systems that integrate hardware, software, firmware, board, and silicon components with specific focus on customer requirements and implementation limitations throughout the systems lifecycle. May also be responsible for AI systems architecture and definition, including translating the business opportunity into use cases and developing product specifications for required hardware and software needed to deliver system requirements. Impacts and influences the AI product roadmap and development based on profound comprehension of AI and deep learning algorithms, deep learning customer requirements, and deep learning software frameworks. Impacts related technologies/components such as memory, security, and OS that may be central to the final solution. Develops new methods in the areas of reinforcement learning, policy learning, computer vision, machine learning, simulation, sim2real, autonomous driving, and robotics. Leads design, analysis, and implementation of componentlevel choices across the integrated AI systems on performance, features, and cost, including analysis of risks and emphasis on ease of use, reliability, security, availability, maintainability, sustainability, and quality. Defines systems implementation and integration approach and plans to ensure optimum performance and reliability across hardware and software that comprise the system. Delivers endtoend technical solutions to solve customer problems, deploying solutions, executing benchmark tests, and preparing documentation. Conducts analysis and makes reliable engineering recommendations to ensure reliability/resiliency of the AI infrastructure. Monitors and reports on utilization and plans continuous process improvement. Collaborates with other teams to analyze next generation requirements and opportunities and may influence and guide research and academic collaboration in the space of cloud systems and solutions, including proofofconcept and solutions beyond current industry approaches. Simulates reallife environments in the cluster environment and analyzes performance of prototypes. Contributes applied/customer knowledge to AI roadmap working with AI system architects.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Intel? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect