Spotify人工智能面经_Spotify人工智能面试真题及高频题

1.ML System Design for a Ranking Problem

2.Designing a Data Pipeline for ML Data Processing

3.Feature Engineering for a Machine Learning Model

4.Evaluating the Reasonableness of a Signal for Predicting User Activity

1. ML System Design for a Ranking Problem

Discuss the engineering decisions involved in a ranking problem, including feature consistency, model serving, model versioning, deployment strategies, preprocessing features, serving embeddings, and model retraining. Be prepared to propose different designs and discuss their trade-offs.

2. Designing a Data Pipeline for ML Data Processing

How would you design a data pipeline for processing large volumes of data? Discuss the tools you would use (e.g., Spark, HDFS), how you would design the schema, what data you would provide to downstream users, how to make data more accessible to users (e.g., dumping to a convenience storage or publishing to a dashboard), handling bot traffic in the table, detecting data anomalies, and optimizing for very large data volumes (e.g., using approximation algorithms).

3. Feature Engineering for a Machine Learning Model

Discuss how you would handle the following features for a machine learning model using XGBoost: categorical features (user_region, device_type), high cardinality features (track_id, top 10 artists), months since registration, milliseconds of registration time of the track, artist, and unique count of the reading of tracks, albums. Also, consider how to incorporate seasonal features.

4. Evaluating the Reasonableness of a Signal for Predicting User Activity

Do you think the signal 'If a user listens to music at least 1 min in an hour then we will consider this hour as active' is reasonable for predicting the percentage of hours that the user will listen to music next week using historical data? Should we push back on this signal, and how should we define the target variable to approach the requirements?