BQ

A/B testing（0）
Statistics（46）
Product Case（14）
Coding（37）
Modeling（2） 高频题（8）

A/B testing（0）
Statistics（46）
Product Case（14）
Coding（37）
Modeling（2） 高频题（8）
1.扔骰子N次概率 2.Normal distribution
3.Occurance of odd and even
4.Even和odd 位置的各自sum 5.Binomial distribution
6.Length of longest continuous increasing subarray
7.Truncated normal distribution
8.颜色survey
10.AB testing
13.Pre-post analysis
14.Linear regression线性回归 15.Top 100 and bottom 100 16.如何提前predict unique user per month
17.Mean和median 18.Linear regression线性回归feature importance
19.Failure rate investigation
20.Hypothesis testing
21.Imbalance data
22.Logistic regression vs OLS
23.Sampling bias
24.Distribution check
25.Bootstrap
26.Writing functions
27.How to reduce standard error
28.Estimate waiting time
29.Estimate mean and median in bootstraping
30.Sampling
31.How to get unbiased ratings
32.Statistical basics
33.Linear regression and overfitting
34.Election forecasting
35.Estimate percent of violent videos
36.What is standard error
37.Overlapping schedule
38.LeetCode 658
39.LeetCode1347
40.Intern apartment arrangement
41.Design prediction system
42.Check timeout
43.Valid directions
44.Build restaurant waiting list
45.Record temperature
46.Replace word by dictionary
47.LeetCode 307
48.Legal squares
49.Return array from a list
50.Estimate arrival time of shuttle bus
51.Explain what AUC is to non-technical audience
52.How to approximate retention rate with insufficient data
53.Difference between independent and uncorrelated
54.Build logistic regression
55.K largest values with at most m labels
56.Embarrased vampires
57.Order of delete
58.Minimum time to travel in matrix
59.Writing a function to pick up balls from basket without replacement
60.Counting two sums
61.Sum of even and odd
62.LeetCode 509
63.Questions about Standard error
64.Relationship between education background and income
65.Delete odd index node
66.Saving time period which can insert and search
67.Data structure
68.test statistics
69.Measure impact of what we have done with no ab test
70.If it is good to survey whether education leads to higher salary in certain areas
71.Measure level of segragation
72.Probability
73.Estimate population
74.General statistics questions
75.CTR
76.Interpret results to different audiences
77.How to prove unfair
78.Sample sizes
79.Sum of even and odd locations
80.Causal effect
81.Explain p-value
82.Type I error
83.Mean VS. Median
84.Making shoes for dogs
85.Mean VS. Median
86.Compare regressions
87.Sampling regression VS. population regression
88.Sum of even and odd index
89.Measure impact of sending coupon
90.Pros and cons of flag bad content on youtube
91.Hypothesis testing
92.Hypothesis testing
93.Setting KPI
94.Launch or not
95.Linear regression线性回归 96.Will colleague degree will earn more salary
97.鞋店metrics问题 98.Linear regression线性回归 99.Graph (t, x)
1. 扔骰子N次概率 2. Normal distribution
Generate normal distribution and plot
3. Occurance of odd and even
Array of square root between 1 and 99, add occurance of odd and even值
4. Even和odd 位置的各自sum 5. Binomial distribution

6. Length of longest continuous increasing subarray

7. Truncated normal distribution
Generate 100 samples from X~N(0,1), filter x>0 and create histogram on it.
8. 颜色survey

simpson's paradox: 给了一个context 是 Google 的 一个product，说有一个 metrics 在每个country average 都上升 across time，但是 globally average 下降了，问为什么
10. AB testing
AB testing， 1000 个人treatment， 1000个人 control, 但是系统坏了， 100个看到 ads 的人 被记录成了没看到， 所以现在系统里显示的是 1100 VS900， 问对结果什么expectation

13. Pre-post analysis

14. Linear regression线性回归 If we have 1000 data points and 900 parameters, what's going to happen to the model.
15. Top 100 and bottom 100 if we test 1000 students on something and pull top 100 and bottom 100 to train for 1 week and test again. What outputs are we expect?
16. 如何提前predict unique user per month

17. Mean和median 1000个data points的sample， mean和median的区别，sample mean和sample median各自的variance怎么求
18. Linear regression线性回归feature importance

19. Failure rate investigation

20. Hypothesis testing

21. Imbalance data

22. Logistic regression vs OLS
logistic regression vs ols，系数怎么求，如何加penalty等.
23. Sampling bias
sampling bias问题，如何解决.
24. Distribution check

25. Bootstrap
bootstrap confidence interval
26. Writing functions

27. How to reduce standard error
How to reduce sample mean standard error from 3 to 0.3
28. Estimate waiting time
How to estimate the waiting time for users at airport security
29. Estimate mean and median in bootstraping
Sample user for a given week, how to estimate mean and median (boostrap)
30. Sampling
1000 students, every week evaluate the perf of new training. What is the problem of using only top 100 and bottom 100 students
31. How to get unbiased ratings
1000 youtube videos. 100 reviewers. Each reviewer rate 100 video and gives each a score from 1-10.How do you find a way to get unbiased ratings？
32. Statistical basics

error，bootstrap sample大小怎么选
33. Linear regression and overfitting
Linear regression, overfitting issue, and how to resolve it. (linear regression, sample size
1000，feature 900,可能什么问题，怎么解决？)
34. Election forecasting
Forecast elections in different states for two candidates (each state give one vote to one of the candidate, the final vote is the sum of all states)
Based on the historical data, for each state,we use average to get historical
probability of winnning for one candidate and we use bernoulli distribution to simulate (predict) this time. And we sum them up.What is the problem？
35. Estimate percent of violent videos
We want to measure % of videos are violent. Assuming two categories of videos
(one of them is more likely to get violent content).And we know the % of video for each category based on historical studies
1. Now if we want to select 10K videos to raters to judge whether it is violent. How should we split the samples between two categories
2. Every rater is going to rate 100 videos, and they will stop when they find the first violent videos.
rater 1: good,good, violent
rater 2:good ,good, good ,good violent
How can we estimate % (native approach: the number of violent videos/total rated videos, what is wrong? how to fix it)
36. What is standard error
standard error定义
37. Overlapping schedule

38. LeetCode 658
Given a sorted integer array arr, two integers k and x, return the k closest integers to x in the array. The result should also be sorted in ascending order.

An integer a is closer to x than an integer b if:

• |a - x| < |b - x|, or
• |a - x| == |b - x| and a < b

Example 1:

```Input: arr = [1,2,3,4,5], k = 4, x = 3
Output: [1,2,3,4]
```
Example 2:

```Input: arr = [1,2,3,4,5], k = 4, x = -1
Output: [1,2,3,4]
```

Constraints:

• 1 <= k <= arr.length
• 1 <= arr.length <= 104
• arr is sorted in ascending order.
• -104 <= arr[i], x <= 104
39. LeetCode1347
You are given two strings of the same length s and t. In one step you can choose any character of t and replace it with another character.

Return the minimum number of steps to make t an anagram of s.

An Anagram of a string is a string that contains the same characters with a different (or the same) ordering.

Example 1:

```Input: s = "bab", t = "aba"
Output: 1
Explanation: Replace the first 'a' in t with b, t = "bba" which is anagram of s.
```
Example 2:

```Input: s = "leetcode", t = "practice"
Output: 5
Explanation:
Replace 'p', 'r', 'a', 'i' and 'c' from t with proper characters to make t anagram of s.
```
Example 3:

```Input: s = "anagram", t = "mangaar"
Output: 0
Explanation: "anagram" and "mangaar" are anagrams.
```

Constraints:

• 1 <= s.length <= 5 * 104
• s.length == t.length
• s and t consist of lowercase English letters only.
40. Intern apartment arrangement
class internf {String name ="Tom"，int preference =1} 1表示想自己住，0表示想多人一起住
class apartment{String Num = "N001",int NumOfRooms = 2}

41. Design prediction system
design a search next word prediction system
42. Check timeout

example:
input: id,timestamp,status
["id1",0,"Start",
"id2"，0,"Start",
"id2",1, "End",
"id3",1, "Start",
"id1",2, "End",
"id3"，5,"End",

]

43. Valid directions
Check Directions
N，S，E,W表示东南西北，1 N 2表示1在2的北边1 NW 2表示1在2的西北, 给一个序列，检查是否合法["1N2"，"3N4"，"4NW5"]
44. Build restaurant waiting list
Restaurant waiting list
Build a data structure to perform three operations (Restaurant is full initially)
1) waitList (string customer name int table size):
Add customer with given name and table size they want to book into the waitlist
2) leave(string customer_name):
Customer wants to leave the waitlist so remove them
3) serve (int table_size):
This means restaurant now has a free table of size equal to table size. Find the best customer to serve from waitlist
Best Customer: Customer whose required size is less than or equal to the table size. If multiple customers are matching use first come first serve.
For e.g.if waitlist has customers with these table requirements => [2,3,4, 5,5,7] and restaurant is serving table_size = 6 then best customer is index 3 (0-based indexing)
TreeMap<lnteger,Deque<String>> key是人数size，value是一个存储人名的队列
HashMap<String，Integer> key是人名，value是人数size
45. Record temperature
Temperature
Given a Temperature Manager class, With functions recordTemp(int temp): and
getMaxTemp(): You need to record the temperature when a new temp comes in, and get the max within the last 24 hours.
46. Replace word by dictionary
replace word by dictionary

return "123a234"
return become "123a123"
47. LeetCode 307
Given an integer array nums, handle multiple queries of the following types:

1. Update the value of an element in nums.
2. Calculate the sum of the elements of nums between indices left and right inclusive where left <= right.
Implement the NumArray class:

• NumArray(int[] nums) Initializes the object with the integer array nums.
• void update(int index, int val) Updates the value of nums[index] to be val.
• int sumRange(int left, int right) Returns the sum of the elements of nums between indices left and right inclusive (i.e. nums[left] + nums[left + 1] + ... + nums[right]).

Example 1:

```Input
["NumArray", "sumRange", "update", "sumRange"]
[[[1, 3, 5]], [0, 2], [1, 2], [0, 2]]
Output
[null, 9, null, 8]

Explanation
NumArray numArray = new NumArray([1, 3, 5]);
numArray.sumRange(0, 2); // return 1 + 3 + 5 = 9
numArray.update(1, 2);   // nums = [1, 2, 5]
numArray.sumRange(0, 2); // return 1 + 2 + 5 = 8
```

Constraints:

• 1 <= nums.length <= 3 * 104
• -100 <= nums[i] <= 100
• 0 <= index < nums.length
• -100 <= val <= 100
• 0 <= left <= right < nums.length
• At most 3 * 104 calls will be made to update and sumRange.
48. Legal squares

49. Return array from a list

50. Estimate arrival time of shuttle bus

shuttles[5,10]capacity =2 arrivals=[3,8,12,12]return 9
51. Explain what AUC is to non-technical audience

52. How to approximate retention rate with insufficient data

test这个feature是否可以提高retentionrate。现在情况是没有足够的数据（比如说只有最近一个
quarter的data，怎么来approximate retention rate)
53. Difference between independent and uncorrelated
Independent和uncorrelated的区别，举一个correlated但是not independent的例子
54. Build logistic regression
Build logistic regression from scratch.
55. K largest values with at most m labels
Given an array of data,data有value和label两种属性，给出前k大的数据，要求其中不同label种类不超过m。
56. Embarrased vampires

57. Order of delete

58. Minimum time to travel in matrix

59. Writing a function to pick up balls from basket without replacement

unknown size可能有重复，每一个element is between 0to 99 inclusive。写一个function 模

until all elements in the array has been drawn.
60. Counting two sums
given a binary tree, count how many 2 sum =k, 此处2 sum的两个node必须相连

follow up 1: 3 sum
follow up 2:n sum
61. Sum of even and odd

62. LeetCode 509
The Fibonacci numbers, commonly denoted F(n) form a sequence, called the Fibonacci sequence, such that each number is the sum of the two preceding ones, starting from 0 and 1. That is,

```F(0) = 0, F(1) = 1
F(n) = F(n - 1) + F(n - 2), for n > 1.
```
Given n, calculate F(n).

Example 1:

```Input: n = 2
Output: 1
Explanation: F(2) = F(1) + F(0) = 1 + 0 = 1.
```
Example 2:

```Input: n = 3
Output: 2
Explanation: F(3) = F(2) + F(1) = 1 + 1 = 2.
```
Example 3:

```Input: n = 4
Output: 3
Explanation: F(4) = F(3) + F(2) = 2 + 1 = 3.
```

Constraints:

• 0 <= n <= 30
63. Questions about Standard error
Standard error, standard error of the median, meaning and methods to find
64. Relationship between education background and income
Find the relationship between education background and income
65. Delete odd index node

66. Saving time period which can insert and search

67. Data structure

68. test statistics
| sample size = 1时怎么判断是不是从N(m,1)中sample的，用什么test，test statistics怎么算，95%CI怎么算(要公式)。
69. Measure impact of what we have done with no ab test
| 给subset of一个ecommerce website customers发promo 邮件，这个邮件已经发出去了，而且已知不是ab test，问你怎么去measure the impact of what we have done。
70. If it is good to survey whether education leads to higher salary in certain areas
| 从MTV随机招人做问卷来研究education是否lead to higher salary，你怎么看。
71. Measure level of segragation

72. Probability
73. Estimate population

74. General statistics questions
| 用什么distribution，怎么确定sample size（公式），简单的stats概念比如type 1 type 2 error。
75. CTR
Find the CTR for iOS user clicking on 'what' s new' button
76. Interpret results to different audiences
| How to interpret the result to DS person and non technical person
77. How to prove unfair
| You flip the coin 7 more times, and you get 5 heads and 2 tails. Now you suspect that the Coin is not fair. How to prove that
78. Sample sizes
| What to do if two tests with different sample size? (问算samping size and intuition of parameters
79. Sum of even and odd locations
| 一个list，算出even和odd位置的各自sum
80. Causal effect
| Talk about the causual effect of a training program. Random 1000 students, take top 100, bottom 100 we want to see the causual effect of the training program
81. Explain p-value

82. Type I error
Can you write type l error in the form of cdf？
83. Mean VS. Median
Give me an example when mean is better than median and an example when median is better than mean.
84. Making shoes for dogs

85. Mean VS. Median
means and median, ste(mean), ste(median)
86. Compare regressions

87. Sampling regression VS. population regression

88. Sum of even and odd index

89. Measure impact of sending coupon

90. Pros and cons of flag bad content on youtube
91. Hypothesis testing

92. Hypothesis testing
hypothesis test to identify if a coin is unfair; linear reg里把X,y都copy append，贝塔和r2怎么变
93. Setting KPI

94. Launch or not

95. Linear regression线性回归 If we have 1000 data points and 900 parameters, what's going to happen to the model.
96. Will colleague degree will earn more salary

97. 鞋店metrics问题 98. Linear regression线性回归 99. Graph (t, x)