AI/HPC Network Development Engineer – Networking

Job Expired
    Posted Date: Oct 28, 2025
    Closing Date: Nov 24, 2025
  • Full Time
  • Memphis, TN
  • Applications have closed

xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.

Job Description

xAI is seeking an experienced AI/HPC Network Development Engineer (Networking) to join its fast-growing infrastructure team. This role focuses on designing, optimizing, and operating ultra-high-performance networks that power AI and HPC workloads. You will work with cutting-edge GPU clusters, debug NCCL, analyze metrics, and drive network innovation that supports AI training and inference at scale.

This position requires a deep understanding of Ethernet-based RoCEv2 networking, congestion control, and large-scale distributed AI systems. You’ll collaborate with engineers across hardware, software, and data-center operations to build scalable network infrastructure and automation that minimizes downtime and maximizes throughput.

Key Responsibilities

  • Design, deploy, and optimize large-scale Ethernet/RoCEv2 networks for AI and HPC workloads.

  • Work closely with NCCL and GPU cluster frameworks to fine-tune performance and availability.

  • Build and maintain dashboards, monitoring systems, and observability tools for network performance.

  • Automate routine tasks using Python and other scripting tools.

  • Partner with data-center teams to support buildouts and maintenance activities.

  • Participate in on-call rotations and contribute to operational readiness and scalability initiatives.

Qualifications

  • Minimum 10 years of experience designing and operating large-scale networks, including 5 years in Ethernet AI/HPC environments.

  • Deep knowledge of RoCEv2 and congestion control techniques for high-bandwidth networks.

  • Familiarity with AI training and inference workloads and their impact on network performance.

  • Strong proficiency in Python for automation and data analysis.

  • Expertise in building metrics systems and performance dashboards for large distributed clusters.

  • Excellent communication skills and the ability to work effectively across teams.

Benefits

  • Competitive base salary plus equity opportunities.

  • Comprehensive medical, vision, and dental coverage.

  • 401(k) retirement plan, life and disability insurance, and additional perks.

  • Opportunities to work on frontier AI infrastructure that advances xAI’s mission to understand the universe.

Scroll to Top