1. Statistical Sample and Hypothesis Testing
Discuss statistical sampling issues, how to conduct hypothesis testing, the underlying assumptions, and concepts related to type I and II errors.
2. Model Variables and Interpretation
Discuss which variables you would choose for building a model and explain the results of a given model output.
3. SQL Query Adjustment for More Information
After completing a SQL task that involved self-joins and multiple WHERE clauses, how would you adjust the query to obtain more information? Consider what additional questions you could explore and what user information might be valuable.
4. Find All Unique IP Addresses from CSV Logs
Given CSV logs where each log is 100MB in size, how would you find all unique IP addresses? Follow-up 1: How would you handle the situation if the CSV files are very large, like a few hundreds of GB in size? Follow-up 2: How would you approach the problem if the CSV files are located on their own server?
5. Handling Large CSV Files to Extract Unique IP Addresses
Given a large CSV file, how would you process it to extract unique IP addresses, especially when the data volume is very large?