Question #1
Reading: Reading 4 Big Data Projects
PDF File: Reading 4 Big Data Projects.pdf
Page: 1
Status: Unattempted
Question
Which of the following uses of data is most accurately described as curation?
Answer Choices:
A. An investor creates a word cloud from financial analysts’ recent research reports about a company
B. An analyst adjusts daily stock index data from two countries for their different market holidays
C. A data technician accesses an offsite archive to retrieve data that has been stored there. Freja Karlsson is a bond analyst with Storbank AB. Over the past several months, Karlsson has been working to develop her own machine learning (ML) model that she plans to use to predict default of the various bonds that she covers. The inputs to the model are various pieces of financial data that Karlsson has compiled from multiple sources. After Karlsson has constructed the model using her knowledge of appropriate variables, Karlsson runs the model on the training set. Each firm's bonds are classified as predicted- to- default or predicted-not-to-default. When Karlsson's model predicts that a bond will default and the bond actually defaults, Karlsson considers this to be a true positive. Karlsson then evaluates the performance of her model using error analysis. The confusion matrix that results is shown in Exhibit 1. N = 474 Actual Bond Status Bond Default No Default Model Prediction Bond Default 307 31 No Default 23 113
Explanation
Curation is ensuring the quality of data, for example by adjusting for bad or missing data.
Word clouds are a visualization technique. Moving data from a storage medium to where
they are needed is referred to as transfer.
(Module 4.1, LOS 4.a)
Freja Karlsson is a bond analyst with Storbank AB. Over the past several months, Karlsson
has been working to develop her own machine learning (ML) model that she plans to use to
predict default of the various bonds that she covers. The inputs to the model are various
pieces of financial data that Karlsson has compiled from multiple sources.
After Karlsson has constructed the model using her knowledge of appropriate variables,
Karlsson runs the model on the training set. Each firm's bonds are classified as predicted- to-
default or predicted-not-to-default. When Karlsson's model predicts that a bond will default
and the bond actually defaults, Karlsson considers this to be a true positive. Karlsson then
evaluates the performance of her model using error analysis. The confusion matrix that
results is shown in Exhibit 1.
N = 474
Actual Bond Status
Bond Default
No Default
Model Prediction
Bond Default
307
31
No Default
23
113