What are the top 10 machine learning questions I ask during a Data Science interview?

Recent 2 years in my career, I have been part of strategy making and recruiting new team members. The most frequent role I interviewed is Data Scientist. I summarized the top 10 questions I always ask. In my next few posts, I will give my answer separately.

  1. What is logistic regression model? When do you use it?
  2. How do you interpret R^2 and p-value?
  3. Why random forest is called “random”?
  4. How is decision tree built? How do you select the next node?
  5. What is bootstrapping / bagging? What is out-of-bag error?
  6. What is the support vector in SVM?
  7. What is PCA? How do you select PCA? What is the limitation of PCA? What are the precautions before apply PCA?
  8. What are the commonly used regularization methods?
  9. What is Bayesian theorem? What is the assumption?
  10. What will you do to avoid overfitting?

These questions focus on the basics of statistics. I do believe it is important to be able to answer them. It definitely demonstrates you don’t only know how to call a library but also know why. But at the same time, I would say, more than half of the candidates who claim themselves applied machine learning models successfully are not able to give me good answers. Maybe you can share with me your thoughts.

As I promised, I will give my answers in the next few posts.

Leave a Reply

Your email address will not be published. Required fields are marked *