Facebook数据分析面试真题

职位分类
全部
数据科学
数据分析
计算机科学
人工智能
产品经理
BQ
面试题
全部(42)
Coding(3)
SQL(7)
Stats(3)
ML basics(2)
Product Case(7)
高频题(2)
全部(42)
Coding(3)
SQL(7)
Stats(3)
ML basics(2)
Product Case(7)
高频题(2)
1.SQL aggregation/ window function
2.SQL aggregation/ window function
3.Product sense
4.Product sense
5.Product sense
6.SQL functions
7.贝叶斯/ stats
8.ML model basic
9.SQL aggregation/ window function
10.SQL aggregation/ window function
11.SQL aggregation/ filter
12.Coding test
13.数组的运算
14.Model basics
15.Improve product
16.Hypothesis test
17.Ctr increase reasons
18.Launch new functions
19.SQL case when
20.AB Test
21.SQL query
22. Spam Friend Request
23. Dwell time/usage
24.Posts
25.Video(call/portal)
26.SMS Verification
27.Work Place
28. Facebook Shop
29.Friendship
30.Message
31.Ads
32.Spam
33.Composer
34.Search
35.LeetCode 1211
36.Fundrasier
37.Video(watch)
38.Feature
39.Video(watch)
40.LeetCode71
41.SQL
42.SQL
1. SQL aggregation/ window function
video call table

caller | recipient | date | call_id | call length

Q: past 30 days, top 10 callers initiated calls
2. SQL aggregation/ window function
user table

user_id | age_group | country | date | dau_flag

Q: sum time spent / active user in us yesterday?
3. Product sense
如果filter spam friends requests. 列举出一系列criterion之后,如何验证你的criterion是正确的,最后一个问题是,可以添加什么new feature来防止spam.
4. Product sense
筛选出在某天有多少人通过了好友请求,对于某个人overtime他的好友通过率是多少.
5. Product sense
news feed 缩小25% 怎么衡量有没有用,然后就是经典的如何解释美国和泰国两个地区metric 表现不一样?
6. SQL functions
Sample Rows:

date | search_id | user_id | age_group | search_query

--------------------------------------------------------------------
'2020-01-01' | 101 | 9991 | '<30' | 'justin bieber'
'2020-01-01' | 102 | 9991 | '<30' | 'menlo park'
'2020-01-01' | 103 | 5555 | '30-50' | 'john'

TABLE 2

Sample Rows:
date | search_id | result_id | result_type | clicked

--------------------------------------------------------------------

'2020-01-01' | 101 | 1001 | 'page' | TRUE
'2020-01-01' | 101 | 1002 | 'event' | FALSE
'2020-01-01' | 101 | 1003 | 'event' | FALSE
'2020-01-01' | 101 | 1004 | 'group' | FALSE

Q1: Over the last 7 days, what were the 10 most popular search queries?
Q2: What % of users performed a search that returned multiple types of results?
7. 贝叶斯/ stats
考的基础概率,bayes,comment/dau distribution 要画分布然后根据你估计的mean和median,问你x天后p95和p50的mean怎么变(regression to mean)
8. ML model basic
一个binary classification的case,问了rf和gbdt的区别,它们的优缺点,loss function是什么,用什么求解(gradient descent)然后问了gradient descent的意义是什么
9. SQL aggregation/ window function
Table: user_actions

ds (STRING) | actor_id (BIGINT) | post_id (BIGINT)| relationship (STRING)| interaction (STRING)
'2019-07-01'| 431 | 7921 | 'Friend' | 'like'
'2019-07-01'| 431 | 7921 | 'Friend' | 'comment'
'2019-07-01'| 938 | 9235 | 'Page' | 'WOW'
'2019-07-01'| 209 | 7220 | 'Group' | 'LOVE'
'2019-07-01'| 384 | 7128 | 'Group' | 'share'
'2019-07-01'| 492 | 0879 | 'Page' | 'like'
'2019-07-01'| 887 | 3842 | 'Group' | 'HAHA'

Table: user_posts
ds (STRING) | poster_id (BIGINT) | post_id (BIGINT)|

'2019-07-01'| 123 | 7921 |
'2019-07-01'| 123 | 3910 |
'2019-07-01'| 004 | 3495 |
'2019-07-01'| 832 | 3294 |
'2019-07-01'| 283 | 4820 |
'2019-07-01'| 822 | 2472 |
'2019-07-01'| 119 | 8204 |

Q1:How many likes were made on Friend posts yesterday?
Q2: If I were user 123, how would you calculate average number of likes on all of my posts?

Product:
How to define meaningful interactions in news feed?
10. SQL aggregation/ window function
session id, userid, session start time, session end time, date_stamp, app

a. 求各app的session之间的平均间隔,就是关闭一个session然后打开另一个session之间的时间

b. 每个user的bounce rate,好像是app1到app2再回到app1 算一个 bounce
11. SQL aggregation/ filter
reaction

date, timestamp, senderid, receiverid, messageid, has_reaction (0 or 1)

a. 每天和3个人以上聊天的用户百分比

b. 每天60秒内回复的用户的百分比

c. 问从第一个短信到第一个reaction之间的avg time
12. Coding test
找出至少消费了三天,第一天和最后一天的消费量总和大于一个数的客户.
13. 数组的运算
1. 算一个数组的平均‍‍‌‌‌‌‌‌‍‍‍‌‍‍‌‍‍‍‍‍数

极为基础但有交流的点,输入的数组会有特殊情况.

2. 输入书的单价数组和一个预算,求最多能买几本书,每本书只能购买一次.

3. 输入一个书名数组,包含原版和衍生,衍生用原版书名开头加自己的尾缀,请输出所有衍生书籍.
14. Model basics
问hashtag,如何用已有信息推荐hashtag,如果什么信息都没有怎么推荐。可以用什么模型推荐,怎么知道哪些feature最重要
15. Improve product
How to use location data to improve products.
16. Hypothesis test
Some advertisers are bad actors. How to verify the hypothesis that bad actors tend to have multiple accounts.
17. Ctr increase reasons
问最近这周点击event提高了很多,面试者怎么看
18. Launch new functions
fb 有个in-built的dating的功能,但是使用率不高,现在团队想要发展这个功能。这个dating feature可以选择你感兴趣的人,也可以选择已经对你感兴趣的人,如果两个互相感兴趣就match成功
 
 问题:
 
 1. 为什么facebook要发展这个功能?这个功能符合FB的mission吗?
 
 2. 具体描述怎么向用户推荐这个功能?
 
 3. 如何选取可能会使用这个功能的用户?
 
 4. 如何设计metrics来衡量这个dating function的growth?
19. SQL case when
Table 1: searches (search_id is unique, primary key)
 
 date | search_id | user_id | age_group | search_query
 
 '2020-01-01' | 101 | 9991 | 'U30' | 'justin biber'
 
 '2020-01-01' | 102 | 9991 | '30-50' | 'menlo park'
 
 table 2: search _results (一个 search_id 可以产生多个 results)
 
 date | search_id | user_id | result_type | clicked
 
 '2020-01-01' | 101 | 1001 | 'page' | TRUE
 
 '2020-01-01' | 101 | 1001 | 'event' | FALSE
 
 
 1. 过去30天内,搜索超过10次的用户
 
 2. the percentage of users who had ever performed search that returned more than 10 results.
 
 (一些小tip,FB的这类percentage 题用case when xx as 1 else 0 end 就可以很快求出percentage)。
20. AB Test
1. 现在我们的产品是fb search, 假如你在fb里search一个地名,你怎么推荐search的结果?
 
 2. 假如我们要把uprank event(也就是把event类结果放在前面),how do you decide whether we should lanuch or not?
 
 3. followup question: 你觉得在做上题说的ab test时,应该要注意哪一些点?
21. SQL query
Table 1
 
 caller | recipient | ds | call_id | duration
 
 123 | 456 | 2019-01-01 | 4325 | 864.4
 
 032 | 789 | 2019-01-01 | 9395 | 263.7
 
 456 | 032 | 2019-01-01 | 0879 | 22.0
 
 table 2
 
 user_id | age_bucket | country | primary_os | dau_flag | ds
 
 123 | 25-34 | us |
 
 q1:% of user in France on video call yesterday
 
 q2: time spend / dau in the us today
22.  Spam Friend Request
friending:
sender_id | receiver_id | send_time | accept_time
users:
user_id| spam_type ("fraud", etc)

-- 1.上周每一天的 same day accept rate
-- 2.上周所有的 request 里,不是由 spam 用户发出的比例

23.  Dwell time/usage
Table1: user_sessions
date | user_id | start_time | end_time | end_reason
----------------------------------------------------------------
'2020-09-01' | 1 | 1964783746 | 1964783924 | 'close-app'
'2020-09-01' | 1 | 1964783528 | 1964783809 | 'crash'
'2020-09-02' | 2 | 1964783123 | 1964783345 | 'close-app'
'2020-09-02' | 3 | 1964783252 | 1964783658 | 'crash'

Table2: dim_all_users
user_id | country | is-active
-----------------------------------
1 | 'US' | TRUE
2 | 'US' | FALSE
3| 'CANADA' | TRUE
-- 1. What is the average time (in seconds) each user stays on Facebook on specific date?

-- 2. What is the percentage of active users who crashed when using Facebook from the  US area on a specific date? 
 -- | ratio |
-- 1. filter US & date & active 2,user_crash/all
select
 count(distinct s.user_id)/count(distinct u.user_id) as ratio
from dim_all_users u
join user_sessions s
on u.user_id = s.user_id
and s.end_reason = 'crash'
where u.country = 'US'
and u.is_active is True
and date = 'xxxx-xx-xx'
--------------------------------------------------------------------------
user_event: User_id | event_time | event_name | session id
 | open menu
| close menu
 
-- 1. Calculate the average dwell time in seconds across all sessions (i.e. return one 
number)? 
-- Dwell time is the length of time between opening and closing the menu.

 
-- 2. Get the percentage of all sessions that have both nav_menu_open and 
nav_menu_close?

 
-- 3. Lets say we want to account for missing events by setting the dwell time to 60 seconds 
whenever a nav_menu_close event is missing. 
-- Can you write a query to re-calculate the new average dwell time when we default to 
60 seconds of dwell time whenever nav_menu_close is missing?
 
-- 4. Calculate the average gap between each session
-- assume gap is the difference between open time - previous close time
 


每次登陆一个 app 算一个 session ,user ID 有重复,session ID unique,有每个 session
的起始时间和结束时间
q1: 每个 app average daily performance 如何,哪个比较好,会定个metric 
来算,写一个 average daily 
q2:算平均每个 app 用户每次使用 app 之间的时间差
q3: 算 每天每个 app bounce rate,就是从 app1 -> app 2-> 再回到 app1
 
-- Q1: What is the average number of sessions per user for the past 30 days
-- Q2:  Who at least spent more than 10s on each session(users)?
-- Q3: Time distribution of each user (Exponential Distribution)
-- Q4: daily active user for the past 30 days
For the past 30 days, number of user who use facebook everyday?
 
 

给一个 app session table
Columns:
userid (varchar)
app ['fb', 'messenger', 'IG'] (string)
interface ['andriod', 'iphone', 'web']
sessionid (varchar)
session_start_time (int)
session_end_time (int)
1. 哪个 app 在昨天用户访问量最多
2. 对于有多个 session 的用户,平均访问时间间隔是多少?
 
-- 1. Find the absolute differences in day-by-day volume between iOS and Android, 
ignoring countries
-- 2. Find the absolute differences in day-by-day volume for every pair of countries 
(without repeating pairs), ignoring platforms
24. Posts
comments: 
user_id | ds | comments
ds -- date
comments -- # of comments posted by the user
user:user_id | country -- country where the user register at
-- 1. return the total number of posts per user in May 2020
-- count(post)/count(user_id)
-- filter: may & 2020
-- 1. join 2. count post in 2020/5 3. divide by user
select
 count(comments)/
 count(distinct user_id) as post_per_user
 from user u
left join comment c
on u.user_id = c.user_id
and year(c.ds) = 2020 and month(c.ds) = 5 -- assume the denominator is [all user] regarless 
of the time[confirm with interviewer]
-- 2. return the percentage of users that posted comments in May 2020
select
 count(distinct c.user_id)/
 count(distinct u.user_id) as ratio
 from user u
left join comment c
on u.user_id = c.user_id
and year(c.ds) = 2020 and month(c.ds) = 5 
-----------------------------------------------------------------------------------------------
Table: user_actions
date | actor_id | post_id | relationship | interaction
Table: user_posts
date | poster_id | post_id
-- 1. How many likes were made on Friend posts yesterday?
-- 2. If I were user 123, how would you calculate the average number of likes on all of my 
posts?

-------------------------------------------------------------------------
Content_action:
 | date | user_id | content_id | content_type | target_id | 
User_id (content_creator_id)
Content_id (this is the primary key),
Content_type (with 4 types: status_update, photo, video, comment),
Target_id (the original content_id associated with the comment, if the content type is not 
comment, this will be null)
-- 1.find the distribution of stories(photo+video) based on comment count?

-- 2.Now what if content_type becomes {comment, post, video, photo, article},what is 
the comment distribution for each content type ?

25. Video(call/portal)
video_calls:
caller| recpient| ds| call_id| duration
fb_dau:
user_id| DAU_flag| ds| country
--Q1: On 2020-01-01 how many people initiated multiple calls?
--Q2: % of DAU used the video calls function by each country on 2020-01-01?
-----------------------------------------------------------------------------
Table:
timestamp, callerid, receiverid, call length
-- 1. 找出在 20190101 这一天的新用户。新用户可以是打过电话的人,也可以是接
过电话的人
-- 2. 对这一天的新用户,算出之后每一天的 retention rate,直到今天,例如
day, retention rate
1, 0.9
2, 0.8
26. SMS Verification
sms_message (fb to users)
| date | country | cell_numer | carrier | type
|2018-12-06 | US | xxxxxxxxxx | verizon | confirmation (ask user to confirm)
|2018-12-05 | UK | xxxxxxxxxx | t-mobile | notification
confirmation (users confirmed their phone number)
|date | cell_number |
(User can only confirm during the same day FB sent the confirmation message)
-- Q1. How many requests have we sent to each carrier and country?
-- Q2. Number of users who received notification every single day during the last 7 days
      -- Q: cell_number as id?
-- Q3. confirmation rate over the last 30 dayssele
-- Q4. On dec 06th, overall confirmation rate.
 
27. Work Place
Region | country | company_id | no_of_onboarded_employee
-- Q1. Calculate the average number of onboarded employees per country per region
select
 sum(no_of_onboarded_employee)/count(distinct concat(Region,country)) as ratio
from workspace
28.  Facebook Shop
date | order_id | sender | timestamp, sender = 1: seller, sender=0: buyer
-- Q1. how many orders have message from buyer on date xxxx.
-- |# order|
select
 count(distinct order_id) as ct
from order
where sender = 0
and date = 'xxxx'
-- how many percent of order have message from buyer on date xxxx?

-- Q2. how many orders have messages both from buyer and seller.

-- if there are other type, use the following version
select
 count(distinct order_id) as ct
from order o
join (select
 order_id
 from order
 where sender = 0)s
on o.order_id = s.order_id
where o.sender = 1

-- Q3. how many orders with last message sent from seller.

 
29. Friendship
-- Friending: send_id || receive_id || send_time || accept_time || country
-- Age: user_id || age_group
-- 1) Same-day acceptance rate in the last 7 days
select
 sum(case when date(send_time) = date(accept_time) then 1 else 0 end)/
 count(*) as same_day_rate
from Friending
-- 2) Average requests sent per user for each age group
-- friending: | ds | action | actor_uid | target_uid |
--ds = date
--action = {'send_request', 'accept_request', 'unfriend'} 
--actor_uid = uid of person pressing the button to take the action
--target_uid = uid of the other person involved
 
Q1: How is the overall friending acceptance rate changing over time? 
-- v1: How is the overall friending acceptance rate changing over time?
-- acceptance rate per day
-- v2. Define how long you have to wait before a friend request is considered rejected (e.g. 
1 week).
 
--1. self join: left(request), right(accptced), reverse id, one week
Q2: Who has the most number of friends?
------------------------------------------------------------------------------------
-- Q1: For each month in 2020, what is the friend request acceptance rate?
-- Q2: What percent of users in the United States accepted a friend request last week?
--------------------------------------------------------------------------------------------
2 张表 user_network_requests 和 user_country.
success 代表发送好友请求是否通过,1 为通过,0 为没通过。
 
1.求每个 data center,request fail 的比率
2.求每个国家,request fail 的比率
3.求每个国家,有多少个 user 发出的好友请求从来没有 fail 过
30. Message
-- | date | timestamp | senderid | receiverid | reaction flag | message_thread
-- Q1 unique conversation created
-- Q2 percentage of conversations has at least 1 reaction
-- Q3 average num of days between when the conversation started to 1st reaction
-- Q4 如何看有 reaction 的对话比没有 reaction 的更加 active 
-----------------------------------------------------------------------------
table 1: connection
user1 | user 2
123 | 456
456 | 123
table 2: interaction
sender | receiver | action | date
123 | 456 | comment| 2019-01-01
-- question: find friends who haven't had interactions last year?

31. Ads
1. 求 average/total spend per advertiser today
-- follow-up:after we get this table, what will the distribution look like?
--2.how many advertisers have at least one conversion
-------------------------------------------------------------------------------
-- Ads click table
-- |ad_id | user_id | status: {'click', 'view', 'hide'} |
-- 1:用什么 metrics 去衡量 Ads peformance? 写 SQL 找出 performnce 最好的 AD
--2:如果人 click 其中一个 ad,写 SQL 推荐下一个 show 给这个 user 的 ad
-- 3:显示 user CTR 的 distribution
-- Questions: choose a bin size or just count frequency if exist?
-- 4: 还有什么别的 metrics that we should derive and monitor from this table.
-----------------------------------------------------------------------------
Ad4ad
Date | event | user_id | unit_id | ad_id | cost | spend
2018-08-01 impression 123 1111 null 0.12 null
2018-08-01 impression 123 1111 null 0.15 null
2018-08-01 impression 123 1111 null 0.12 null
2018-08-01 impression 456 2222 null 0.14 null
2018-08-01 click 456 2222 null null null
2018-08-01 create_ad 456 2222 9988 null 10
User
User_id | country | age
123 German 38
456 China 20
789 US 28
-- Q1. last 30 days, by country, total spend (facebook 的 spend 是表里的 cost)
-- Q2. how many impressions before users create an ad given a unit?
-- Q3. 有个 ads_rolling table,是每个 ads 的 lifetime_spend 和 lifetime_revenue。问
怎么把每天新的信息加进去 update 这个 table. 
----------------------------------------------------------------------------
Ad_engagement table:
User_id, session_id, ad_id, action(view, click, hide), timestamp
Q1: find the ad that performs the best.
Q2:given a user click ad_id=123, what's the next ad to show him. 
Q3. Average time after people view a ad to click.
 
1.昨天百分之多少的正常 advertiser 是活跃的(有付钱)
2.昨天有多少被封了
3. 如果让 banned 账户申诉重启,怎么衡量经济利益
---------------------------------------------------------------------------------------------------------------
/*We are studying ecommerce advertisers on FB over a certain time period (say a week).
The time period does not matter for this problem.*/ 
adv_info:advertiser_id|| ad_id|| spend (primary: ad_id) 
-- adv_info:Contains information on advertisers.
-- Advertiser_id is id of advertiser
-- ad_id is id of an ad being run by advertiser
-- spend is the amount of money in $ that advertiser pays Facebook for ad-id to show it to  FB users.
ad_info:ad_id|| user_id|| price (primary: ad_id, user_id)
-- price is how much the user_id spend through this ad.,assuming all prices > 0.
Q1: What would the average advertiser spend on Facebook? Your query should return a 
Q2. The fraction of advertisers has at least one conversion.
32. Spam
Given the table about posts
User_actions: date | user_id | post_id | content_type | extra
-- content_type : 'view', 'comment', 'photo', 'report'
-- extra : post text, comment text, report type = {'SPAM', ...}
-- 1. how many posts were reported yesterday for each report reason?
-- 2. Introduce a new table: reviewer_removals, please calculate what percent of daily 
content that users view on FB is actually spam?
-- reviewer_removals: ds(date, String) | reviewer_id |post_id
--Q: how do we define spam: report as spam and also removed
-- 3. How to find the user who abuses this spam system?
-- 4. What percent of yesterday's content views were on content reported for spam?
--------------------------------------------------------------------------------------------
Tech Analyis: valid report 问题。 the table has the following fields: 
report
date, owner_id, content_id, reporter_id, is_valid. 
1: % of users who reported at least one valid report in December 2017? 
2. find the users who are spamming the reporting tool? 
33. Composer
-- FB is launching a 'composer' feature on the phone which helps user to post faster.
COMPOSER
user_id | date | event("enter", "post", "cancel")
User
user_id | date | country | dau_flag
dau_flag:active user ,0 or 1
Q1: what success metric you will define for the feature? Write sql to calculate the success 
metric.
-- % of dau use it
-- % of users successfully make a post
-- what is the average rate of successful post last week?
 
Q2: what is the average rate of post for daily active users by country on today?
34. Search
user_search 和 search_result
user_search
| userid | date | queryid |
--------------------------------------------------
| 10032 | 10122020 | 1 |
| 10032 | 10122020 | 2|
| 10033 | 10132020 | 3|
search_result
success 代表用户点击了,1 为点击,0 为没点击。
| queryid | type | sucess |
--------------------------------------------------
| 1 | event | 1 |
| 2 | friends | 0|
| 3 | event | 0|
1.求最近一周排名前 10 的搜索量最大的 user
2.求 对于 user 成功找到 event 的并且 sucess 的 user 比例
--------------------------------------------------------------------------------------------------------
Table: search_id, query, position, result_relevance
问题是 定义 metrics 来衡量 search quality
35. LeetCode 1211
Ssearch {search_id (int), query (varchar), position (int), result_relevance (int) [1-5] }
Position 表示 search 结果出现在页面的位置,相当于 rank,正整数,越小代表越靠前
result_relevance 代表 search 结果跟 query 的相关程度,范围 1 到 5, 越大代表越相关。比如:
1, “dog”, 1, 5
2, “dog”, 3, 4
3, “cat”, 2, 2
Q1: 定义一个 metric to measure search quality,既要考虑 position 也要考虑相关程度,计算每个 query 的 metric
Q2: 假如某个 query 的所有 relevance 都低于 3,就定义为一个 quality 不好的 search,找到这些 query 在所有 query 中占的比例
 
Table 1: searches
* date (string), date of the search
* search_id (int), the unique identifier of each search
* user_id (int), the unique identifier of the searcher
* age_group (string), ('U30', '30-50', '50+')
* search_query (string), the text of the search query
Sample Rows:
 date | search_id | user_id | age_group | search_query
------------------------------------------------------------------------------------------
'2020-01-01' | 101 | 9991 | 'U30' | 'michael jackson'
'2020-01-01' | 102 | 9991 | 'U30' | 'menlo park'
'2020-01-01' | 103 | 5555 | '30-50' | 'john'
'2020-01-01' | 104 | 1234 | '50+' | 'funny dogs'
 
Table 2: search_results
* date (string), date of the search action
* search_id (int), the unique identifier of each search
* result_id (int), the unique identifier of the result
* result_type (string), ('page', 'event', 'group','person','post', etc.)
* clicked (boolean), did the user click on the result?
Sample Rows:
date | search_id | result_id | result_type | clicked
-----------------------------------------------------------------------------------------
'2020-01-01' | 101 | 1001 | 'page' | TRUE
'2020-01-01' | 101 | 1002 | 'event' | FALSE
'2020-01-01' | 101 | 1003 | 'event' | FALSE
'2020-01-01' | 101 | 1004 | 'group' | FALSE
 Q1: by each age group, how many unique users searched for "john" in the last 7 days?
 Q2: what are the top 10 search terms that are most likely to return at least one result 
about an Event?
36. Fundrasier
user_fundraiser_summary
Columns:
date STRING (format - 2020-01-20)
user_id BIGINT (format - 81238123) - unique identifier for each user on FB
amount_donated BIGINT (format - 1000), amount in dollars
num_fundraiser_viewed INT (format - 1, 2, 3 etc) - total # of fundraisers viewed
num_fundraiser_donated INT (format - 1, 2, 3 etc) - total # of fundraisers donated
--Users don't need to donate to show up in this table but need to have viewed at least one  fundraiser
all_fb_users: Daily snapshot of all FB users along with some demographic information and 
if they have been active in the last 30 days
Columns:
date STRING (format - 2020-01-20)
user_id BIGINT (format - 81238123) - unique identifier for each user on FB
country STRING (format - US, CA etc) - two letter country code. 1point3acres
locale (format - en_US, fr_FR etc) - language code
primary_interface STRING (format - desktop, ios, android, msite) - 4 possible values
30d_active STRING (format - yes or no)
 
Question 1: How many users donated more than $100 in the last 30 days?
Question 2: Broken down by locale, what is
a) the total amount of money donated in the last 30 days
-- assume user's demographic information does not change over day
b) the percentage of 30d active users who viewed a fundraiser in the last 30 days
-- what if user's active status changed over the last 30 days?
c) the percentage of 30d active users who donated to fundraiser in the last 30 days
 
37. Video(watch)
Users {id|country}, 
Videos{userid|videoid|duration|...}
-- 1. find total_watch_time for each country
-- 2. find top three countries with highest total watch time
-- 3. find average of watch time for non-top 3 countries (answer is just one number)
38. Feature
There is a table that tracks every time a user turns a feature on or off, with columns
table: user_id |action ("on" or "off") |date | time
1) How many users turned the feature on today?
-- How many users have ever turned the feature on?
2) Create a table that tracks the user last status every day.
3) In a table that tracks the status of every user every day, how would you add today's 
data to it?
4) 如何找出在一天之内始终保持 feature on 的人
39. Video(watch)
根据 cus_id 和 date 生成 count,count 的意思是过去 7 天(不包含当天),有几个不
同的 cus_id 出现
cus_id | date | count
 c1 | 12/1 | 0
 c1 | 12/2 | 1
 c2 | 12/3 | 1
 c3 | 12/9 | 2
select
 t1.date,
 count(distinct t2.cus_id) as ct
from table t1
left join table t2
on datediff(t1.date, t2.date) between 1 and 7
40. LeetCode71
Given a string path, which is an absolute path (starting with a slash '/') to a file or directory in a Unix-style file system, convert it to the simplified canonical path.
 
 In a Unix-style file system, a period '.' refers to the current directory, a double period '..' refers to the directory up a level, and any multiple consecutive slashes (i.e. '//') are treated as a single slash '/'. For this problem, any other format of periods such as '...' are treated as file/directory names.
 
 The canonical path should have the following format:
 
 The path starts with a single slash '/'.
 Any two directories are separated by a single slash '/'.
 The path does not end with a trailing '/'.
 The path only contains the directories on the path from the root directory to the target file or directory (i.e., no period '.' or double period '..')
 Return the simplified canonical path.
 
  
 
 Example 1:
 
 Input: path = "/home/"
 Output: "/home"
 Explanation: Note that there is no trailing slash after the last directory name.
 Example 2:
 
 Input: path = "/../"
 Output: "/"
 Explanation: Going one level up from the root directory is a no-op, as the root level is the highest level you can go.
 Example 3:
 
 Input: path = "/home//foo/"
 Output: "/home/foo"
 Explanation: In the canonical path, multiple consecutive slashes are replaced by a single one.
  
 
 Constraints:
 
 1 <= path.length <= 3000
 path consists of English letters, digits, period '.', slash '/' or '_'.
 path is a valid absolute Unix path.
41. SQL
user_network_requests 和 user_country
user_network_requests
luserid | timestamp | data_center | success |
---------------------------------------------
| 10032 | 15009 | A | 1 |
| 10032 | 15097 | C | 0 |
user_country
| userid | country |
-------------------
| 10032 | US |
1.request failure rate for each data center
2.request failure rate per country
3. for each country, how many users never had a failed request
42. SQL
video call
Table1
Caller | recipient | ds | call_id | duration
123 | 456 | 2019-01-01 | 4325 | 864.4
032 | 789 | 2019-01-01 | 9395 | 263.7
456 | 032 | 2019-01-01 | 0879 | 22.0
Table2
user_id | age_bucket | country | primary_os | dau_flag | ds
123 | 25-34 | us | android | 1 | 2019-07-01
456 | 65+ | gb | ios | 1 | 2019-07-01
789 | 13-17 | frlios | 0 | 2019-07-01
032 | 45-54 | eg | android | 1 | 2019-07-01
Q1: Identify the 10 users who initiated/created most video calls in the last 30 days?
Q2: What % of daily acti?e users from France were on a video call yesterday:
Hint for Q2: create a temp table with caller and recipient unioned as users could be both caller and recipient.