Summer Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: wrap60

CompTIA DY0-001 Dumps

Page: 1 / 9
Total 85 questions

CompTIA DataX Exam Questions and Answers

Question 1

A team is building a spam detection system. The team wants a probability-based identification method without complex, in-depth training from the historical data set. Which of the following methods would best serve this purpose?

Options:

A.

Logistic regression

B.

Random forest

C.

Naive Bayes

D.

Linear regression

Question 2

A data scientist built several models that perform about the same but vary in the number of features. Which of the following models should the data scientist recommend for production according to Occam's razor?

Options:

A.

The model with the fewest features and highest performance

B.

The model with the fewest features and the lowest performance

C.

The model with the most features and the lowest performance

D.

The model with the most features and the highest performance

Question 3

Which of the following layer sets includes the minimum three layers required to constitute an artificial neural network?

Options:

A.

An input layer, a pooling layer, and an output layer

B.

An input layer, a convolutional layer, and a hidden layer

C.

An input layer, a hidden layer, and an output layer

D.

An input layer, a dropout layer, and a hidden layer

Question 4

Which of the following types of machine learning is a GPU most commonly used for?

Options:

A.

Deep learning/neural networks

B.

Clustering

C.

Natural language processing

D.

Tree-based

Question 5

Under perfect conditions, E. coli bacteria would cover the entire earth in a matter of days. Which of the following types of models is the best for explaining this type of growth?

Options:

A.

Linear

B.

Logarithmic

C.

Polynomial

D.

Exponential

Question 6

A computer vision model is trained to identify cats on a training set that is composed of both cat and dog images. The model predicts a picture of a cat is a dog. Which of the following describes this error?

Options:

A.

Error due to reality

B.

False positive error

C.

Sampling error

D.

Type II error

Question 7

In a modeling project, people evaluate phrases and provide reactions as the target variable for the model. Which of the following best describes what this model is doing?

Options:

A.

Sentiment analysis

B.

Named-entity recognition

C.

TF-IDF vectorization

D.

Part-of-speech tagging

Question 8

A data analyst wants to find the latitude and longitude of a mailing address. Which of the following is the best method to use?

Options:

A.

One-hot encoding

B.

Binning

C.

Geocoding

D.

Imputing

Question 9

Which of the following explains back propagation?

Options:

A.

The passage of convolutions backward through a neural network to update weights and biases

B.

The passage of accuracy backward through a neural network to update weights and biases

C.

The passage of nodes backward through a neural network to update weights and biases

D.

The passage of errors backward through a neural network to update weights and biases

Question 10

Which of the following image data augmentation techniques allows a data scientist to increase the size of a data set?

Options:

A.

Clipping

B.

Cropping

C.

Masking

D.

Scaling

Question 11

A data scientist is building an inferential model with a single predictor variable. A scatter plot of the independent variable against the real-number dependent variable shows a strong relationship between them. The predictor variable is normally distributed with very few outliers. Which of the following algorithms is the best fit for this model, given the data scientist wants the model to be easily interpreted?

Options:

A.

A logistic regression

B.

An exponential regression

C.

A linear regression

D.

A probit regression

Question 12

A data scientist is attempting to identify sentences that are conceptually similar to each other within a set of text files. Which of the following is the best way to prepare the data set to accomplish this task after data ingestion?

Options:

A.

Embeddings

B.

Extrapolation

C.

Sampling

D.

One-hot encoding

Question 13

A data scientist would like to model a complex phenomenon using a large data set composed of categorical, discrete, and continuous variables. After completing exploratory data analysis, the data scientist is reasonably certain that no linear relationship exists between the predictors and the target. Although the phenomenon is complex, the data scientist still wants to maintain the highest possible degree of interpretability in the final model. Which of the following algorithms best meets this objective?

Options:

A.

Artificial neural network

B.

Decision tree

C.

Multiple linear regression

D.

Random forest

Question 14

Which of the following best describes the minimization of the residual term in a LASSO linear regression?

Options:

A.

|e|

B.

e

C.

0

D.

Question 15

The term "greedy algorithms" refers to machine-learning algorithms that:

Options:

A.

update priors as more data is seen.

B.

examine every node of a tree before making a decision.

C.

apply a theoretical model to the distribution of the data.

D.

make the locally optimal decision.

Question 16

A data scientist is designing a real-time machine-learning model that classifies a user based on initial behavior. The run times of these models are provided in the following table:

as

Which of the following models should the data scientist recommend for deployment?

Options:

A.

XGBoost

B.

Random forest

C.

Decision trees

D.

Artificial neural network

Question 17

A movie production company would like to find the actors appearing in its top movies using data from the tables below. The resulting data must show all movies in Table 1, enriched with actors listed in Table 2.

as

Which of the following query operations achieves the desired data set?

Options:

A.

Perform an INNER JOIN between Table 1 using column Movie, and Table 2 using column Acted_In.

B.

Perform a UNION between Table 1 using column Movie, and Table 2 using column Acted_In.

C.

Perform an INTERSECT between Table 1 using column Movie, and Table 2 using column Acted_In.

D.

Perform a LEFT JOIN on Table 1 using column Movie, with Table 2 using column Acted_In.

Question 18

A company created a very popular collectible card set. Collectors attempt to collect the entire set, but the availability of each card varies, because some cards have higher production volumes than others. The set contains a total of 12 cards. The attributes of the cards are shown.

as

The data scientist is tasked with designing an initial model iteration to predict whether the animal on the card lives in the sea or on land, given the card's features: Wrapper color, Wrapper shape, and Animal.

Which of the following is the best way to accomplish this task?

Options:

A.

ARIMA

B.

Linear regression

C.

Association rules

D.

Decision trees

Question 19

An analyst wants to show how the component pieces of a company's business units contribute to the company's overall revenue. Which of the following should the analyst use to best demonstrate this breakdown?

Options:

A.

Box-and-whisker chart

B.

Sankey diagram

C.

Scatter plot matrix

D.

Residual chart

Question 20

A data scientist is analyzing a data set with categorical features and would like to make those features more useful when building a model. Which of the following data transformation techniques should the data scientist use? (Choose two.)

Options:

A.

Normalization

B.

One-hot encoding

C.

Linearization

D.

Label encoding

E.

Scaling

F.

Pivoting

Question 21

Given a logistics problem with multiple constraints (fuel, capacity, speed), which of the following is the most likely optimization technique a data scientist would apply?

Options:

A.

Constrained

B.

Unconstrained

C.

Non-iterative

D.

Iterative

Question 22

Which of the following environmental changes is most likely to resolve a memory constraint error when running a complex model using distributed computing?

Options:

A.

Converting an on-premises deployment to a containerized deployment

B.

Migrating to a cloud deployment

C.

Moving model processing to an edge deployment

D.

Adding nodes to a cluster deployment

Question 23

A data analyst wants to generate the most data using tables from a database. Which of the following is the best way to accomplish this objective?

Options:

A.

INNER JOIN

B.

LEFT OUTER JOIN

C.

RIGHT OUTER JOIN

D.

FULL OUTER JOIN

Question 24

A data analyst is examining the correlation matrix of a new data set to identify issues that could adversely impact model performance. Which of the following is the analyst most likely checking for?

Options:

A.

Undersampling

B.

Multicollinearity

C.

Oversampling

D.

Overfitting

Question 25

A data scientist is using the following confusion matrix to assess model performance:

Actually Fails

Actually Succeeds

Predicted to Fail

80%

20%

Predicted to Succeed

15%

85%

as

The model is predicting whether a delivery truck will be able to make 200 scheduled delivery stops.

Every time the model is correct, the company saves 1 hour in planning and scheduling.

Every time the model is wrong, the company loses 4 hours of delivery time.

Which of the following is the net model impact for the company?

Options:

A.

25 hours lost

B.

25 hours saved

C.

165 hours lost

D.

165 hours saved

Page: 1 / 9
Total 85 questions