4 Remote Nvidia Engineers

NVIDIA AI Infrastructure & Kubernetes Platform Engineer (DGX Systems) Remote

NVIDIA Certification required or no interview

6 months to 1+ yrs

$open

USC or GC req

Alternate titles depending on context:

AI Platform Architect – DGX & SuperPOD
AI Infrastructure DevOps Engineer – NVIDIA DGX Stack
Senior AI Systems Engineer – DGX | Kubernetes | InfiniBand

Job Description:

We are seeking a highly skilled AI Infrastructure & Kubernetes Platform Engineer with a proven track record in deploying and managing NVIDIA DGX-based AI clusters, orchestrating containerized AI workloads using Kubernetes, and ensuring secure, high-throughput operations across InfiniBand-powered networks. The ideal candidate will hold a combination of Kubernetes certifications (CKA, CKAD, CKS) and NVIDIA certifications (NCA-AIIO, NCP-AIO, NCP-AII, NCP-AIN), coupled with hands-on training in DGX, BlueField, and high-speed network operations.

This position plays a key role in supporting AI/ML infrastructure at scale, enabling efficient training and inference for complex models, and integrating NVIDIA's cutting-edge compute, storage, and fabric solutions with modern DevOps practices.

AI Infrastructure Operations

Deploy and manage NVIDIA DGX BasePODs and SuperPODs for high-performance AI workloads.
Oversee DGX system lifecycle operations including provisioning, monitoring, firmware upgrades, and capacity planning.
Operate Base Command Manager to manage GPU clusters, schedule workloads, and integrate with MLOps tools.
Perform DGX node health validation, NCCL interconnect testing, and NVLink topology verification following new deployments or hardware changes.

Kubernetes Platform Engineering

Architect secure and scalable Kubernetes clusters optimized for GPU-accelerated workloads using NVIDIA GPU Operator.
Leverage expertise from CKA/CKAD/CKS to develop, deploy, and secure AI applications on Kubernetes.
Implement CI/CD pipelines and GitOps methodologies for deploying and managing ML workflows.

High-Performance Networking & DPUs

Administer InfiniBand networks and BlueField DPUs using Unified Fabric Manager (UFM).
Enable NVLink/NVSwitch performance across GPU nodes and tune fabric configurations for minimal latency and maximum throughput.
Use BlueField for offloading storage, firewalling, and telemetry, enhancing AI workload security and performance.

Security & Compliance

Apply best practices from the CKS certification to secure containerized AI environments.
Configure runtime security, secrets management, network segmentation, and auditing using DPU-enhanced Kubernetes deployments.
Support zero-trust architecture initiatives by enforcing workload identity, RBAC policies, and supply chain integrity across AI container images and model artifacts

Monitor GPU, CPU, and I/O performance using NVIDIA DCGM, Prometheus, Grafana, and Base Command APIs.

Tune system performance and model training pipelines for cost-efficiency and throughput.
Build and maintain operational runbooks, incident response playbooks, and SLA reporting dashboards covering GPU utilization, thermal thresholds, and fabric health.

Expertise With:

DGX System, BasePOD, and SuperPOD Administration
BlueField DPU Configuration & Operations
InfiniBand Fabric and UFM Management
Base Command Manager for workload orchestration

This is a remote position.

Compensation: $110.00 - $135.00 per hour

(if you already have a resume on Indeed)

Or apply here.

* required fields

First Name*

Last Name*

Email*

Phone*

Yes, Text Me!

I consent to receiving text messages about this hiring process and, if hired, future job related information from IT Search Corp.

Message and data rates may apply. Message frequency varies. Reply STOP to opt out. Reply HELP for help. See our User Terms of Service and Privacy Policy for details.

Get faster updates with texting!

Message and data rates may apply. Message frequency varies. Reply STOP to opt out. Reply HELP for help. See our User Terms of Service and Privacy Policy for details.

Location

Address

City

States

ZIP Code

Resume/CV *

Type/Paste Text

Cover Letter

Type/Paste Text

Recent Job Title

Recent Employer

Do you have a Nvida certification?*

You must have a Nvidia certification do you ???*

Referral code from a current employeeIf no code provided, add their name instead.