Senior Systems Software Engineer, Performance Architecture - Analytics and Data Intelligence
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Extend JIT and compiler-based execution support in cuDF and related
- GPU-accelerated structured data processing systems.
- Design approaches for lowering expressions, ASTs, or query fragments into optimized GPU execution paths.
- Investigate kernel fusion strategies across cuDF operations to reduce materialization, memory traffic, launch overhead, and end-to-end query latency.
- Analyze structured analytics workloads to identify performance bottlenecks in expression evaluation, joins, aggregations, scans, data movement, and memory management.
- Build benchmarks and regression tests that capture real dataframe and SQL-like workloads, from micro-benchmarks to end-to-end pipelines.
- Collaborate with cuDF, CUDA, compiler/runtime, and query engine teams to translate workload analysis into implementation plans and architecture decisions.
- Prototype and evaluate execution strategies inspired by high-performance database engines, including fused operators, code generation, vectorized execution, and adaptive planning.
- What we need to see:
- Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field, or equivalent hands-on experience.
- 12+ years of validated experience in systems performance engineering or performance-focused architecture.
- Proven skills in profiling, instrumentation, and optimization for CPU and GPU systems, applying tools like tracing, counters, flame graphs, and kernel-level profiling.
- Experience with compiler, JIT, code generation, query execution, or runtime optimization techniques.
- Experience optimizing analytic database engines and/or query runtimes, including vectorized execution, join strategies, and columnar formats like Arrow and Parquet.
- Proficient in C++ and/or Python, with a strong ability to analyze performance-critical code and implement effective solutions.
- Experience with cuDF, RAPIDS, CUDA, Numba, LLVM, MLIR, NVRTC, or other JIT/codegen systems.
- Experience with benchmarking frameworks, performance dashboards, and CI/CD regression gating, along with a proven grasp of modern analytics and machine learning workflows.
- Ways to stand out from the crowd:
- Deep familiarity with NVIDIA GPUs and GPU programming (CUDA), including memory hierarchy, concurrency, and profiling toolchains such as Nsight Systems.
- Experience with TPC-style benchmarking (TPC-H, TPC-DS, or analogous), Click-Bench-like workloads, and building credible, repeatable performance narratives.
- Prior work on database execution engines, especially operator fusion, query compilation, vectorized execution, or adaptive execution.
- Demonstrated open-source contributions to performance-critical systems, including libraries, runtimes, databases, and ML or data tooling.
- Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD. You will also be eligible for equity and benefits .
- Applications for this job will be accepted at least until June 16, 2026. This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
Additional Information
NVIDIA's Analytics and Data Intelligence (ADI) organization is building the next generation of GPU-accelerated data analytics, data science, and vector search systems, spanning libraries, engines, and end-to-end reference architectures. As a NVIDIAN, you will find yourself immersed in a diverse, encouraging environment where everyone is encouraged to do their best work. Come join the team and see how you can make a lasting impact on the world! We are seeking a Senior Systems Software Engineer focused on performance architecture for GPU-accelerated structured data processing. This is a high-impact individual contributor role for someone passionate about developing coordinated SQL and user-friendly interfaces across diverse CPU and GPU query engines. It involves improving performance, reliability, and workload optimization. The ideal candidate has deep experience in systems performance, compiler/runtime design, and database or dataframe execution engines. This role will focus on compiler and JIT-based execution techniques for cuDF and related analytics runtimes.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at NVIDIA? Share your experience