Which library is NOT part of the Apache Spark distribution?
Answer : B
You are analyzing written transcripts of focus groups conducted on product X. You approach is to use TF-IDF for your analysis.
What combination of TF-IDF scores should you examine to ensure you only report on the most important terms?
Answer : C
Why would a company decide to use HBase to replace an existing relational database?
Answer : A
What is an intended application of the MapReduce framework?
Answer : A
You conduct a TFIDF analysis on 3 documents containing raw text and derive TFIDF ("data", document y) = 1.908. You know that the term "data'' only appears in document 2.
What is the TF of ''data" in document 2?
Answer : B
Which graph structure would best model the relationship between job seekers and employers?
Answer : A
A marketing team creates a graph using a square for each data point, where the length of each side is set to the data value. The data values are 10 and 20.
What is the lie factor of the graph?
Answer : B