Main | Dsinterviewdaily

Sample Problems

Core ML

What is the difference between Random Forest (RF) and Gradient Boosted Trees (GBT)?

Solution

SQL

Let's say you work for a supermarket chain company. The company leadership has noticed, that stores generated historically high profits in 2019, and decided to award top-3 stores that sold the most items that year in each city. For cities with fewer than 3 stores only top-1 store should be awarded. You are asked to extract stores that should be awarded.

Sample Input Tables:

Solution

Probability

You are a data scientist working for a global travel company. A company has 10 million customers, but only 10,000 of them bring the most revenue (we will call them "valuable" clients, and other clients are considered "regular").

Information about each customer is stored in one row in the database (thus, there are 10 million rows in the database). In order to increase the reliability of the database, the data is not stored on a single server, but instead, it is equally split across 5 different servers (horizontal sharding is applied meaning that for each customer information is stored on one server).

Company leadership wants to estimate risks associated with a potential equipment failure resulting in a complete loss of data from the failed server. You are asked to compute the probability of losing at least one valuable client after one server is crashed.

FAQ

What will happen after I subscribe?

I have subscribed two days ago. Why I still haven't received an email?

How can I unsubscribe or modify my subscription?

Who should I reach out to if I have a question?