<iframe src="https://www.googletagmanager.com/ns.html?id=GTM-KVGHS6G" height="0" width="0" style="display:none;visibility:hidden"></iframe>

Spotify人工智能面试真题

职位分类
全部
数据相关
计算机科学
人工智能
产品经理
BQ
面试题
全部(4)
ML Domain(4)
全部(4)
ML Domain(4)
1.ML System Design for a Ranking Problem
2.Designing a Data Pipeline for ML Data Processing
3.Feature Engineering for a Machine Learning Model
4.Evaluating the Reasonableness of a Signal for Predicting User Activity
1. ML System Design for a Ranking Problem
Discuss the engineering decisions involved in a ranking problem, including feature consistency, model serving, model versioning, deployment strategies, preprocessing features, serving embeddings, and model retraining. Be prepared to propose different designs and discuss their trade-offs.
2. Designing a Data Pipeline for ML Data Processing
How would you design a data pipeline for processing large volumes of data? Discuss the tools you would use (e.g., Spark, HDFS), how you would design the schema, what data you would provide to downstream users, how to make data more accessible to users (e.g., dumping to a convenience storage or publishing to a dashboard), handling bot traffic in the table, detecting data anomalies, and optimizing for very large data volumes (e.g., using approximation algorithms).
3. Feature Engineering for a Machine Learning Model
Discuss how you would handle the following features for a machine learning model using XGBoost: categorical features (user_region, device_type), high cardinality features (track_id, top 10 artists), months since registration, milliseconds of registration time of the track, artist, and unique count of the reading of tracks, albums. Also, consider how to incorporate seasonal features.
4. Evaluating the Reasonableness of a Signal for Predicting User Activity
Do you think the signal 'If a user listens to music at least 1 min in an hour then we will consider this hour as active' is reasonable for predicting the percentage of hours that the user will listen to music next week using historical data? Should we push back on this signal, and how should we define the target variable to approach the requirements?