xAI
xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.
Job Description
xAI is seeking an experienced AI/HPC Network Development Engineer (Networking) to join its fast-growing infrastructure team. This role focuses on designing, optimizing, and operating ultra-high-performance networks that power AI and HPC workloads. You will work with cutting-edge GPU clusters, debug NCCL, analyze metrics, and drive network innovation that supports AI training and inference at scale.
This position requires a deep understanding of Ethernet-based RoCEv2 networking, congestion control, and large-scale distributed AI systems. You’ll collaborate with engineers across hardware, software, and data-center operations to build scalable network infrastructure and automation that minimizes downtime and maximizes throughput.
Key Responsibilities
-
Design, deploy, and optimize large-scale Ethernet/RoCEv2 networks for AI and HPC workloads.
-
Work closely with NCCL and GPU cluster frameworks to fine-tune performance and availability.
-
Build and maintain dashboards, monitoring systems, and observability tools for network performance.
-
Automate routine tasks using Python and other scripting tools.
-
Partner with data-center teams to support buildouts and maintenance activities.
-
Participate in on-call rotations and contribute to operational readiness and scalability initiatives.
Qualifications
-
Minimum 10 years of experience designing and operating large-scale networks, including 5 years in Ethernet AI/HPC environments.
-
Deep knowledge of RoCEv2 and congestion control techniques for high-bandwidth networks.
-
Familiarity with AI training and inference workloads and their impact on network performance.
-
Strong proficiency in Python for automation and data analysis.
-
Expertise in building metrics systems and performance dashboards for large distributed clusters.
-
Excellent communication skills and the ability to work effectively across teams.
Benefits
-
Competitive base salary plus equity opportunities.
-
Comprehensive medical, vision, and dental coverage.
-
401(k) retirement plan, life and disability insurance, and additional perks.
-
Opportunities to work on frontier AI infrastructure that advances xAI’s mission to understand the universe.
