When we think of building a data science team for your business, the first role we will want to have is data scientist. Is it good enough to have data scientists only? What else? And even only for data scientist role, what kind of skills do we need from them? Here we only talk about a data science team outside of the companies who create data science algorithms like Google, Facebook, or BAT.
Data scientist specialized in structured data — if your business mainly deals with structured data, they are the people you need. They will know supervised and unsupervised machine learning algorithms, including but not limited to naive bayes, linear regression model, logistic regression models, tree-based models, clustering algorithms, fully-connected neural networks, topic modeling methods. Within a good understanding of these models, they should be able to pick the right one.
Data scientist specialized in unstructured data— if your business mainly deals with unstructured data, like text or image, they are the people you need. Take image as example, their knowledge should cover convolutional neural networks, including the typical architectures like ResNet, EfficientNet, DenseNet, InceptionNet. There are other more basic architecture like VGG, which is less used nowadays. If they know these networks, and the commonly used activation functions, loss functions, optimizers within a neural network, they should have the knowledge to select a good one / ones.
Data engineers — they are the people who are programmers, who know how to deploy a data science model. It is a quite important but sometimes undervalued role. They should architect the CI/CD pipeline, find out the best way to expose the model result to the users. They don’t need to know the statistics behind the model, but they know the programming language, like Java, node.js, react.js to build the front-end and back-end. If this model is going to be deployed into the cloud, they should be familiar with the services provided by AWS, Azure, or Google cloud to host the models.
Lead who understands business needs — It is all about managing expectations of stakeholders! The gap between a business problem and viable technical solutions sometimes can be huge. How to translate the business problem into a data science problem, how to explain what can be done what cannot be done, how to get the stakeholder’s buy-in is equally important than creating the right data science model.
Sometimes a person may have multiple skills. It usually happens when this person is more senior. It is also rare to have all of them at the very beginning. So who to hire first is important. I will share some of my thoughts about who will be the first one to hire in my next post.