Data Science Assessment

Page 1 of 1
You will have 90 minutes to complete the assessment.
You may only take the assessment once.
The assessment questions are multiple choice. Please pick the best answer.

Applicant Data



Section 1: Stats and Linear Algebra

Question 1

In an experiment, the depth of a lake was measured every day for approximately two years, and the historical average depth was subtracted from this value. This means that a measurement of zero is the same as the historical average value. Figure 1 is a histogram representing the collection of measurements. During the experiment, for approximately how many days was the lake at a depth above the historical average, but not more than 1 in above the average?

    Figure 1


Question 2

The cumulative distribution function for the lake depth experiment of Problem 1 is shown in Figure 2. During the experiment, what was the probability that the lake was deeper than the historical average when measured?

Figure 2


Question 3

For a weighted die, the probability of each outcome is located below. Which of the following correctly separates the results into quantiles?


Question 4

Question 6

Question 5

What is the mean value of the following set {80, 125, 140, 85}:

Question 6

What is the standard deviation of the set {1, 2, 3, 4, 5}?

Question 7

If a fair die is rolled six times, which result is more likely?

 a) 4, 4, 4, 4, 4, 4

 b) 3, 1, 1, 4, 2, 5


Question 8

If a fair coin is flipped three times, what is the probability of observing the sequence: heads, heads, tails?

Question 9

Which of the following is the least likely observation?


Question 10


Section 2 : Programming  

Question 11

Four hexadecimal digits represent the range from ‘0000’ to ‘FFFF’. How many bits are necessary to represent eight hexadecimal digits?

Question 12

The series 1+2+3+4+5+6+…+998+999+1000 is equivalent to:

Question 13

Different sorting algorithms may have very different average time complexities when sorting integer numbers. Using Big Oh notation, what is the expected time complexity of, respectively, [1] Bubble Sort, [2] QuickSort, [3] Mergesort, [4] Bucket sort (aka Count sort or Distribution sort) ?



Question 14

Djikstra’s Algorithm is typically used for:

Question 15

Given a sorted array with N numbers, what assertion is true about Binary Search?



Question 16

Dictionary lookups are often implemented using a Hash Table. What can be said about Hash Tables storing N key-value pairs?

Question 17

Divide and Conquer algorithms break down a problem recursively into a larger number of smaller problems that can be solved in a very simple way. Which of these algorithms are based on divide and conquer?



Question 18

A function to traverse a tree data structure “in-order” (aka in-order traversal) can be accomplished by this sequence of operations:



Question 19

A Heap is a tree-based data structure frequently used:

Question 20

What does the function Zombie(N) below do?

int Zombie(int N){

If (N<1)

return 1;


return N*Zombie(N-1);



Section 3: Databases and SQL

Question 21

A row in a relational database always contains:

Question 22

In a SQL database a BLOB is used:

Question 23

MiniCorp wants to track how their stores are doing by region. In order to do so, they added two new tables to their SQL database. The first table is called REGIONS and stores the name and ID of each region. The second is called STOREREGIONS and indicates which stores are in which regions.

You are beginning to explore these tables and you run the following queries with the results shown:

SELECT, count(s.store_id) FROM storeregions s, regions r WHERE s.region_id = r.regionid GROUP BY s.region_id;

 North  156
 South  591
 East  302
 Southwest 208
 Northwest 396

SELECT COUNT(DISTINCT store_id) FROM storeregions;


What can we conclude?


Question 24

A full table scan:

Question 25

Compared to an inner join, an outer join:


Question 26

 Which of the following is not true of SQL databases:

Question 27

In a SQL database table NULL values:

Question 28

In SQL, a WHERE clause:

Question 29

Consider the query:

SELECT price FROM products WHERE product_id = 2387436;

We run this query on two SQL databases, one from vendor A and one from vendor B. Both systems typically execute the query in 20 milliseconds on a database of ten thousand products. If we increase the number of the products in the database to ten million, the query takes 10 seconds on the system from vendor A and 25 milliseconds on the system from vendor B.

Which of the following is our best next step?


Question 30

Sharding is a technique for: