# Data Science Assessment

Page 1 of 1
You will have 90 minutes to complete the assessment.
You may only take the assessment once.

## Question

Section 1: Stats and Linear Algebra

## Question 1

In an experiment, the depth of a lake was measured every day for approximately two years, and the historical average depth was subtracted from this value. This means that a measurement of zero is the same as the historical average value. Figure 1 is a histogram representing the collection of measurements. During the experiment, for approximately how many days was the lake at a depth above the historical average, but not more than 1 in above the average?

Required.
 0 to 100 100 to 200 200 to 300 300 to 400

## Question 2

The cumulative distribution function for the lake depth experiment of Problem 1 is shown in Figure 2. During the experiment, what was the probability that the lake was deeper than the historical average when measured?

Required.
 less than 20% 20%-40% 40%-60% greater than 60%

## Question 3

For a weighted die, the probability of each outcome is located below. Which of the following correctly separates the results into quantiles?

Required.
 {1}{2,3}{4,5,6} {1,2}{3,4}{5,6} {1}{2}{3}{4,5}{6 {1}{2}{3,4,5,6}

## Question 4

Required.
 a b c d

## Question 5

What is the mean value of the following set {80, 125, 140, 85}:
Required.
 78 112.4 107.5 132

## Question 6

What is the standard deviation of the set {1, 2, 3, 4, 5}?
Required.
 3 v2 v15 4

## Question 7

If a fair die is rolled six times, which result is more likely?

a) 4, 4, 4, 4, 4, 4

b) 3, 1, 1, 4, 2, 5

Required.
 (a) (b) both are equally likely connot be determined

## Question 8

If a fair coin is flipped three times, what is the probability of observing the sequence: heads, heads, tails?
Required.
 1/3 1/8 3/8 1/4

## Question 9

Which of the following is the least likely observation?

Required.
 Normal distribution, mean 0, standard deviation 1, observed value 1 Normal distribution, mean 0, standard deviation 2, observed value 3 Normal distribution, mean 1, standard deviation 1, observed value -1 (2 std devs from the mean) Normal distribution, mean 0, standard deviation 3, observed value 4

## Question 10

Required.
 a b c d e

Section 2 : Programming

## Question 11

Four hexadecimal digits represent the range from ‘0000’ to ‘FFFF’. How many bits are necessary to represent eight hexadecimal digits?
Required.
 16 32 64 128

## Question 12

The series 1+2+3+4+5+6+…+998+999+1000 is equivalent to:
Required.
 (1+1000)*500 1000*(500+1) 1000*500/2 none of the above

## Question 13

Different sorting algorithms may have very different average time complexities when sorting integer numbers. Using Big Oh notation, what is the expected time complexity of, respectively, [1] Bubble Sort, [2] QuickSort, [3] Mergesort, [4] Bucket sort (aka Count sort or Distribution sort) ?

Required.
 [1] O(n^2), [2] O(n Log(n)), [3] O(log(n)), [4] O(n Log(n)) [1] O(n^2), [2] O(n Log(n)), [3] O(n Log(n)), [4] O(n) [1] O(n Log (n)), [2] O(n Log(n)), [3] O(log(n)), [4] O(n^2) none of the above

## Question 14

Djikstra’s Algorithm is typically used for:
Required.
 finding the overlap between two arbitrary strings. finding the elements that intersect between two sets in linear time. finding the shortest path in a graph with non-negative edges. finding the most likely ancestor between two leaves in a binary tree.

## Question 15

Given a sorted array with N numbers, what assertion is true about Binary Search?

Required.
 Binary Search can find an arbitrary element in the array in O(N log(N)) time Binary Search can find an arbitrary element in the array in O(N) time Binary Search can find an arbitrary element in the array in O(log(N) time Linear Search can find an arbitrary element in the array in O(N log(N)) time

## Question 16

Dictionary lookups are often implemented using a Hash Table. What can be said about Hash Tables storing N key-value pairs?
Required.
 hash Tables need O(N^2) storage to accomplish fast search times. hash Tables cannot be implemented with Linked Lists. hash Tables have O(1) average search time. hash Tables keep values internally in sorted order.

## Question 17

Divide and Conquer algorithms break down a problem recursively into a larger number of smaller problems that can be solved in a very simple way. Which of these algorithms are based on divide and conquer?

Required.
 discrete Fast Fourier Transform (FFT) quickSort binary search all of the above

## Question 18

A function to traverse a tree data structure “in-order” (aka in-order traversal) can be accomplished by this sequence of operations:

Required.
 (1) visit the root, (2) traverse the left subtree, (3) traverse the right subtree. (1) visit the root, (2) traverse the right subtree, (3) traverse the left subtree. (1) traverse the left subtree, (2) traverse the right subtree, (3) visit the root. (1) traverse the left subtree, (2) visit the root, (3) traverse the right subtree.

## Question 19

A Heap is a tree-based data structure frequently used:
Required.
 to find the top K largest elements of a set. to find the top K smallest elements in a set. to find the median value of the elements in a set. all of the above.

## Question 20

What does the function Zombie(N) below do?

int Zombie(int N){

If (N<1)

return 1;

Else

return N*Zombie(N-1);

}

Required.
 calculates the square root of N calculates N factorial calculates 2 to the power of N (that is, 2^N) calculates the Nth Fibonacci series number

Section 3: Databases and SQL

## Question 21

A row in a relational database always contains:
Required.
 a map of arbitrary keys to arbitrary values. a set of values for columns whose names and types are defined by the database scheme. an index of all data in the database. either a primary key or a foreign key.

## Question 22

In a SQL database a BLOB is used:
Required.
 to store binary data whose precise encoding and meaning is unknown to the database server. to move data to and from the disk. when the database grows too large to fit in memory. only as a hack, and should be avoided at all costs.

## Question 23

MiniCorp wants to track how their stores are doing by region. In order to do so, they added two new tables to their SQL database. The first table is called REGIONS and stores the name and ID of each region. The second is called STOREREGIONS and indicates which stores are in which regions.

You are beginning to explore these tables and you run the following queries with the results shown:

SELECT r.name, count(s.store_id) FROM storeregions s, regions r WHERE s.region_id = r.regionid GROUP BY s.region_id;

North  156
South  591
East  302
Southwest 208
Northwest 396

SELECT COUNT(DISTINCT store_id) FROM storeregions;

956

What can we conclude?

Required.
 there are more stores in Oregon and Washington than in Michigan. MiniCorp is growing most rapidly in the south. some stores are in more than one region. there are other regions where there are no stores.

## Question 24

A full table scan:
Required.
 is used to detect viruses in the database. is a critical step in verifying data integrity on the disk. uses more memory than any other database operation. can often be avoided by adding an appropriate index.

## Question 25

Compared to an inner join, an outer join:

Required.
 is never a better option. will always produce a larger result. will always produce a smaller result. none of the above is true.

## Question 26

Which of the following is not true of SQL databases:
Required.
 they can model many-to-many relationships. they can only be used effectively by writing regular expressions. they normally support unique key constraints. stored procedures run on the server.

## Question 27

In a SQL database table NULL values:
Required.
 are forbidden. can be disallowed for some columns with the appropriate schema definition. indicate that data has not yet been loaded but you should check back later. should rarely be present because they indicate an error condition.

## Question 28

In SQL, a WHERE clause:
Required.
 can cause the result set of the query to be empty. can only be used with a GROUP BY clause. will prevent the query planner from relying on an index. will force the query planner to rely on an index.

## Question 29

Consider the query:

SELECT price FROM products WHERE product_id = 2387436;

We run this query on two SQL databases, one from vendor A and one from vendor B. Both systems typically execute the query in 20 milliseconds on a database of ten thousand products. If we increase the number of the products in the database to ten million, the query takes 10 seconds on the system from vendor A and 25 milliseconds on the system from vendor B.

Which of the following is our best next step?

Required.
 select vendor B. Vendor A clearly cannot handle the larger data set. make sure that the server running A has as much memory as the server running B. make sure we have the same indices on our tables in both systems. make sure that the disks in the two systems are either exactly the same make and model or at least comparable models from different manufacturers.

## Question 30

Sharding is a technique for:
Required.
 slicing data in reports. distributing large data sets across more than one machine. compressing data to use disk space more efficiently. locating inflection points in large data sets.