Big Halloween Sale Discount Flat 70% Offer - Ends in 0d 00h 00m 00s - Coupon code: 70diswrap

CertNexus AIP-210 Dumps

Page: 1 / 9
Total 92 questions

CertNexus Certified Artificial Intelligence Practitioner (CAIP) Questions and Answers

Question 1

When should the model be retrained in the ML pipeline?

Options:

A.

A new monitoring component is added.

B.

Concept drift is detected in the pipeline.

C.

More data become available for the training phase.

D.

Some outliers are detected in live data.

Question 2

You create a prediction model with 96% accuracy. While the model's true positive rate (TPR) is performing well at 99%, the true negative rate (TNR) is only 50%. Your supervisor tells you that the TNR needs to be higher, even if it decreases the TPR. Upon further inspection, you notice that the vast majority of your data is truly positive.

What method could help address your issue?

Options:

A.

Normalization

B.

Oversampling

C.

Principal components analysis

D.

Quality filtering

Question 3

Which of the following is the definition of accuracy?

Options:

A.

(True Positives + False Positives) / Total Predictions

B.

(True Positives + True Negatives) / Total Predictions

C.

True Positives / (True Positives + False Negatives)

D.

True Positives / (True Positives + False Positives)

Question 4

A big data architect needs to be cautious about personally identifiable information (PII) that may be captured with their new IoT system. What is the final stage of the Data Management Life Cycle, which the architect must complete in order to implement data privacy and security appropriately?

Options:

A.

De-Duplicate

B.

Destroy

C.

Detain

D.

Duplicate

Question 5

Which of the following is a common negative side effect of not using regularization?

Options:

A.

Overfitting

B.

Slow convergence time

C.

Higher compute resources

D.

Low test accuracy

Question 6

An AI system recommends New Year's resolutions. It has an ML pipeline without monitoring components. What retraining strategy would be BEST for this pipeline?

Options:

A.

Periodically before New Year's Day and after New Year's Day

B.

Periodically every year

C.

When concept drift is detected

D.

When data drift is detected

Question 7

When should you use semi-supervised learning? (Select two.)

Options:

A.

A small set of labeled data is available but not representative of the entire distribution.

B.

A small set of labeled data is biased toward one class.

C.

Labeling data is challenging and expensive.

D.

There is a large amount of labeled data to be used for predictions.

E.

There is a large amount of unlabeled data to be used for predictions.

Question 8

Given a feature set with rows that contain missing continuous values, and assuming the data is normally distributed, what is the best way to fill in these missing features?

Options:

A.

Delete entire rows that contain any missing features.

B.

Fill in missing features with random values for that feature in the training set.

C.

Fill in missing features with the average of observed values for that feature in the entire dataset.

D.

Delete entire columns that contain any missing features.

Question 9

For a particular classification problem, you are tasked with determining the best algorithm among SVM, random forest, K-nearest neighbors, and a deep neural network. Each of the algorithms has similar accuracy on your data. The stakeholders indicate that they need a model that can convey each feature's relative contribution to the model's accuracy. Which is the best algorithm for this use case?

Options:

A.

Deep neural network

B.

K-nearest neighbors

C.

Random forest

D.

SVM

Question 10

Which two techniques are used to build personas in the ML development lifecycle? (Select two.)

Options:

A.

Population estimates

B.

Population regression

C.

Population resampling

D.

Population triage

E.

Population variance

Question 11

Which of the following best describes distributed artificial intelligence?

Options:

A.

It does not require hyperparemeter tuning because the distributed nature accounts for the bias.

B.

It intelligently pre-distributes the weight of starting a neural network.

C.

It relies on a distributed system that performs robust computations across a network of unreliable nodes.

D.

It uses a centralized system to speak to decentralized nodes.

Question 12

Which of the following pieces of AI technology provides the ability to create fake videos?

Options:

A.

Generative adversarial networks (GAN)

B.

Long short-term memory (LSTM) networks

C.

Recurrent neural networks (RNN)

D.

Support-vector machines (SVM)

Question 13

Which database is designed to better anticipate and avoid risks of AI systems causing safety, fairness, or other ethical problems?

Options:

A.

Asset

B.

Code Repository

C.

Configuration Management

D.

Incident

Question 14

You and your team need to process large datasets of images as fast as possible for a machine learning task. The project will also use a modular framework with extensible code and an active developer community. Which of the following would BEST meet your needs?

Options:

A.

Caffe

B.

Keras

C.

Microsoft Cognitive Services

D.

TensorBoard

Question 15

In which of the following scenarios is lasso regression preferable over ridge regression?

Options:

A.

The number of features is much larger than the sample size.

B.

There are many features with no association with the dependent variable.

C.

There is high collinearity among some of the features associated with the dependent variable.

D.

The sample size is much larger than the number of features.

Question 16

Which of the following is the primary purpose of hyperparameter optimization?

Options:

A.

Controls the learning process of a given algorithm

B.

Makes models easier to explain to business stakeholders

C.

Improves model interpretability

D.

Increases recall over precision

Question 17

A data scientist is tasked to extract business intelligence from primary data captured from the public. Which of the following is the most important aspect that the scientist cannot forget to include?

Options:

A.

Cyberprotection

B.

Cybersecurity

C.

Data privacy

D.

Data security

Question 18

Which of the following are true about the transform-design pattern for a machine learning pipeline? (Select three.)

It aims to separate inputs from features.

Options:

A.

It encapsulates the processing steps of ML pipelines.

B.

It ensures reproducibility.

C.

It represents steps in the pipeline with a directed acyclic graph (DAG).

D.

It seeks to isolate individual steps of ML pipelines.

E.

It transforms the output data after production.

Question 19

You are developing a prediction model. Your team indicates they need an algorithm that is fast and requires low memory and low processing power. Assuming the following algorithms have similar accuracy on your data, which is most likely to be an ideal choice for the job?

Options:

A.

Deep learning neural network

B.

Random forest

C.

Ridge regression

D.

Support-vector machine

Question 20

In general, models that perform their tasks:

Options:

A.

Less accurately are less robust against adversarial attacks.

B.

Less accurately are neither more nor less robust against adversarial attacks.

C.

More accurately are less robust against adversarial attacks.

D.

More accurately are neither more nor less robust against adversarial attacks.

Question 21

Which of the following approaches is best if a limited portion of your training data is labeled?

Options:

A.

Dimensionality reduction

B.

Probabilistic clustering

C.

Reinforcement learning

D.

Semi-supervised learning

Question 22

Which of the following sentences is true about model evaluation and model validation in ML pipelines?

Options:

A.

Model evaluation and validation are the same.

B.

Model evaluation is defined as an external component.

C.

Model validation is defined as a set of tasks to confirm the model performs as expected.

D.

Model validation occurs before model evaluation.

Question 23

When working with textual data and trying to classify text into different languages, which approach to representing features makes the most sense?

Options:

A.

Bag of words model with TF-IDF

B.

Bag of bigrams (2 letter pairs)

C.

Word2Vec algorithm

D.

Clustering similar words and representing words by group membership

Question 24

Which of the following scenarios is an example of entanglement in ML pipelines?

Options:

A.

Add a new method for drift detection in the model evaluation step.

B.

Add a new pipeline for retraining the model in the model training step.

C.

Change in normalization function in the feature engineering step.

D.

Change the way output is visualized in the monitoring step.

Question 25

Which of the following text vectorization methods is appropriate and correctly defined for an English-to-Spanish translation machine?

Options:

A.

Using TF-IDF because in translation machines, we do not care about the order of the words.

B.

Using TF-IDF because in translation machines, we need to consider the order of the words.

C.

Using Word2vec because in translation machines, we do not care about the order of the words.

D.

Using Word2vec because in translation machines, we need to consider the order of the words.

Question 26

Which two of the following statements about the beta value in an A/B test are accurate? (Select two.)

Options:

A.

The Beta value is the rate of type II errors for the test.

B.

The Beta value is the rate of type I errors for the test.

C.

The statistical power of a test is the inverse of the Beta value, or 1 - Beta.

D.

The Beta in an Alpha/Beta test represents one of the two variants of the A/B test.

Question 27

In addition to understanding model performance, what does continuous monitoring of bias and variance help ML engineers to do?

Options:

A.

Detect hidden attacks

B.

Prevent hidden attacks

C.

Recover from hidden attacks

D.

Respond to hidden attacks

Page: 1 / 9
Total 92 questions