Weekend Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dumps65

NVIDIA NCP-AIO Dumps

Page: 1 / 7
Total 66 questions

NVIDIA AI Operations Questions and Answers

Question 1

An organization has multiple containers and wants to view STDIN, STDOUT, and STDERR I/O streams of a specific container.

What command should be used?

Options:

A.

docker top CONTAINER-NAME

B.

docker stats CONTAINER-NAME

C.

docker logs CONTAINER-NAME

D.

docker inspect CONTAINER-NAME

Question 2

You are an administrator managing a large-scale Kubernetes-based GPU cluster using Run:AI.

To automate repetitive administrative tasks and efficiently manage resources across multiple nodes, which of the following is essential when using the Run:AI Administrator CLI for environments where automation or scripting is required?

Options:

A.

Use the runai-adm command to directly update Kubernetes nodes without requiring kubectl.

B.

Use the CLI to manually allocate specific GPUs to individual jobs for better resource management.

C.

Ensure that the Kubernetes configuration file is set up with cluster administrative rights before using the CLI.

D.

Install the CLI on Windows machines to take advantage of its scripting capabilities.

Question 3

An administrator is troubleshooting a bottleneck in a deep learning run time and needs consistent data feed rates to GPUs.

Which storage metric should be used?

Options:

A.

Disk I/O operations per second (IOPS)

B.

Disk free space

C.

Sequential read speed

D.

Disk utilization in performance manager

Question 4

You are monitoring the resource utilization of a DGX SuperPOD cluster using NVIDIA Base Command Manager (BCM). The system is experiencing slow performance, and you need to identify the cause.

What is the most effective way to monitor GPU usage across nodes?

Options:

A.

Check the job logs in Slurm for any errors related to resource requests.

B.

Use the Base View dashboard to monitor GPU, CPU, and memory utilization in real-time.

C.

Run the top command on each node to check CPU and memory usage.

D.

Use nvidia-smi on each node to monitor GPU utilization manually.

Question 5

A system administrator needs to scale a Kubernetes Job to 4 replicas.

What command should be used?

Options:

A.

kubectl stretch job --replicas=4

B.

kubectl autoscale deployment job --min=1 --max=10

C.

kubectl scale job --replicas=4

D.

kubectl scale job -r 4

Question 6

What should an administrator check if GPU-to-GPU communication is slow in a distributed system using Magnum IO?

Options:

A.

Limit the number of GPUs used in the system to reduce congestion.

B.

Increase the system's RAM capacity to improve communication speed.

C.

Disable InfiniBand to reduce network complexity.

D.

Verify the configuration of NCCL or NVSHMEM.

Question 7

You are configuring cloudbursting for your on-premises cluster using BCM, and you plan to extend the cluster into both AWS and Azure.

What is a key requirement for enabling cloudbursting across multiple cloud providers?

Options:

A.

You only need to configure credentials for one cloud provider, as BCM will automatically replicate them across other providers.

B.

You need to set up a single set of credentials that works across both AWS and Azure for seamless integration.

C.

You must configure separate credentials for each cloud provider in BCM to enable their use in the cluster extension process.

D.

BCM automatically detects and configures credentials for all supported cloud providers without requiring admin input.

Question 8

You are configuring networking for a new AI cluster in your data center. The cluster will handle large-scale distributed training jobs that require fast communication between servers.

What type of networking architecture can maximize performance for these AI workloads?

Options:

A.

Implement a leaf-spine network topology using standard Ethernet switches to ensure scalability as more nodes are added.

B.

Prioritize out-of-band management networks over compute networks to ensure efficient job scheduling across nodes.

C.

Use standard Ethernet networking with a focus on increasing bandwidth through multiple connections per server.

D.

Use InfiniBand networking to provide low-latency, high-throughput communication between servers in the cluster.

Question 9

A data scientist is training a deep learning model and notices slower than expected training times. The data scientist alerts a system administrator to inspect the issue. The system administrator suspects the disk IO is the issue.

What command should be used?

Options:

A.

tcpdump

B.

iostat

C.

nvidia-smi

D.

htop

Question 10

A system administrator needs to collect the information below:

    GPU behavior monitoring

    GPU configuration management

    GPU policy oversight

    GPU health and diagnostics

    GPU accounting and process statistics

    NVSwitch configuration and monitoring

What single tool should be used?

Options:

A.

nvidia-smi

B.

CUDA Toolkit

C.

DCGM

D.

Nsight Systems

Question 11

A Fleet Command system administrator wants to create an organization user that will have the following rights:

For locations - read only

For Applications - read/write/admin

For Deployments - read/write/admin

For Dashboards - read only

What role should the system administrator assign to this user?

Options:

A.

Fleet Command Operator

B.

Fleet Command Admin

C.

Fleet Command Supporter

D.

Fleet Command Viewer

Question 12

A Slurm user is experiencing a frequent issue where a Slurm job is getting stuck in the “PENDING” state and unable to progress to the “RUNNING” state.

Which Slurm command can help the user identify the reason for the job’s pending status?

Options:

A.

sinfo -R

B.

scontrol show job

C.

sacct -j

D.

squeue -u

Question 13

You are deploying AI applications at the edge and want to ensure they continue running even if one of the servers at an edge location fails.

How can you configure NVIDIA Fleet Command to achieve this?

Options:

A.

Use Secure NFS support for data redundancy.

B.

Set up over-the-air updates to automatically restart failed applications.

C.

Enable high availability for edge clusters.

D.

Configure Fleet Command's multi-instance GPU (MIG) to handle failover.

Question 14

A GPU administrator needs to virtualize AI/ML training in an HGX environment.

How can the NVIDIA Fabric Manager be used to meet this demand?

Options:

A.

Video encoding acceleration

B.

Enhance graphical rendering

C.

Manage NVLink and NVSwitch resources

D.

GPU memory upgrade

Question 15

An administrator wants to check if the BlueMan service can access the DPU.

How can this be done?

Options:

A.

Via system logs

B.

Via the DOCA Telemetry Service (DTS)

C.

Via a lightweight database operating in the DPU server

D.

Via Linux dump files

Question 16

A cloud engineer is looking to deploy a digital fingerprinting pipeline using NVIDIA Morpheus and the NVIDIA AI Enterprise Virtual Machine Image (VMI).

Where would the cloud engineer find the VMI?

Options:

A.

Github and Dockerhub

B.

Azure, Google, Amazon Marketplaces

C.

NVIDIA NGC

D.

Developer Forums

Question 17

Which of the following correctly identifies the key components of a Kubernetes cluster and their roles?

Options:

A.

The control plane consists of the kube-apiserver, etcd, kube-scheduler, and kube-controller-manager, while worker nodes run kubelet and kube-proxy.

B.

Worker nodes manage the kube-apiserver and etcd, while the control plane handles all container runtimes.

C.

The control plane is responsible for running all application containers, while worker nodes manage network traffic through etcd.

D.

The control plane includes the kubelet and kube-proxy, and worker nodes are responsible for running etcd and the scheduler.

Question 18

You are using BCM for configuring an active-passive high availability (HA) cluster for a firewall system. To ensure seamless failover, what is one best practice related to session synchronization between the active and passive nodes?

Options:

A.

Configure both nodes with different zone names to avoid conflicts during failover.

B.

Use heartbeat network for session synchronization between active and passive nodes.

C.

Ensure that both nodes use different firewall models for redundancy.

D.

Set up manual synchronization procedures to transfer session data when needed.

Question 19

You are tasked with deploying a deep learning framework container from NVIDIA NGC on a stand-alone GPU-enabled server.

What must you complete before pulling the container? (Choose two.)

Options:

A.

Install Docker and the NVIDIA Container Toolkit on the server.

B.

Set up a Kubernetes cluster to manage the container.

C.

Install TensorFlow or PyTorch manually on the server before pulling the container.

D.

Generate an NGC API key and log in to the NGC container registry using docker login.

Page: 1 / 7
Total 66 questions