By Fritz Lechnitz, Corporate Training Manager
So, you are planning to establish a Data Science Center of Excellence?
If you ask around in the community in general, and in Lucient’s Data Science team in particular, you keep coming up against the following central questions:
- What are the most important guidelines when setting up a Data Science Center of Excellence?
- Which important roles should be filled?
- What distinguishes a good data scientist from an excellent data scientist?
The goal is to detect errors and anomalies more quickly and to secure business decisions through interpreting data instead of relying on one’s intuition.
You have already invested in a decent cloud infrastructure. A team of data engineers is also already on board. You may have also already thought about a few research questions, and you are ready to go.
Let us assume that the central goal of data science is to generate information from the correct data using the right algorithms and deliver it to the right people.
A little more concretely: the goal is to detect errors and anomalies more quickly and to secure business decisions through interpreting data instead of relying on one’s intuition.
The main guidelines
What should one look for when you want to build a Data Science Center of Excellence?
Here are 4 essential characteristics of a Data Science Center of Excellence, according to industry experts, including the Lucient Data Science Team.
1. Give preference to generalists
Especially at the beginning, it is quite normal to not know the full suite of all data science analyses and in which exact direction these should go. In this environment, generalists are ideal. I
t may be that a generalist needs a little longer initially, but they are usually more flexible in terms of approach, which can be advantageous when first starting out. Generalists are also often more skilled at communicating with stakeholders, which is an enormously important asset, especially in the initial phase.
It may be that a generalist needs a little longer initially, but they are usually more flexible in terms of approach.
It is not advisable to look for the “unicorn” that can do everything. One should not expect detailed knowledge in every area. An honest “I don’t know that” shows that there is an awareness of knowledge gaps.
On the other hand, our generalist should at least have sufficient experience and technical expertise in enough areas that it can be leveraged and applied to the overall data science process.
2. Ensure your team can work autonomously
Of course, there are many questions to clarify before our Data Science team can get started. (What data should we work with? What projects should be picked? Business decision support or software for production?) Once all this is answered, the team should be able to act autonomously, as too many outside dependencies can block progress unnecessarily.
To keep the project moving forward, the team should have a broad skill set. A self-sufficient team can work more independently and overcome roadblocks more easily. The most important roles are (we will go into more detail below):
- Data Scientists focus on data analysis and statistics.
- Software Engineers ensure efficient and easy-to-maintain coding.
- Data Engineers manage databases and scalable infrastructures.
- Product Managers specify requirements and coordinate with other teams.
3. Pick low-hanging fruits
Choose a simple, manageable project over an excessively ambitious one! Fraud detection, for example, is easier to accomplish than asking, “what makes our customers happy?” In the process, the team has a chance to learn procedures and processes. Inevitable “teething problems” won’t have far-reaching ill effects when the scope is concise, and corrective measures and refinements are easier to implement quickly with a more discrete scope.
It is also important, as banal as it sounds, to actually come to an end, to really complete the project. By definition, a project has a beginning and an end. Data scientists tend to get bogged down in details and want to fine-tune repeatedly, after all, there is no theoretical end to the potential refinements and deeper understanding possible in the world of Data Science. This is where a Product Manager with an overview and decision-making skills is needed. The sooner we complete a project, the sooner we know what we need to improve.
If it must be a big project, it is recommended to divide it into subprojects. If it becomes too complicated, we look for intermediate solutions. Here, Machine Learning can serve as a support for decision making.
4. Set the right expectations
It is no news that companies and organizations are becoming more digital and accumulating large amounts of data. On the one hand, of course, this is beneficial because you can learn from this data.
But it is possible to be overwhelmed by the sheer volume of data. This is where Data Science comes in to give predictions for optimizations and automation. Often, it does not have to be high-end machine learning algorithms at the beginning. In most cases, relatively simple statistical methods are sufficient.
It does not have to be high-end machine learning algorithms at the beginning. In most cases, relatively simple statistical methods are sufficient.
Data Science projects are difficult to plan. Much happens based on trial and error. But the earlier you start, the faster the workflows and processes mature. The greater the chance of becoming a data-driven organization.
In any case, as many stakeholders as possible should be involved to explain what can be achieved with Data Science in practice. We try to talk to all stakeholders (marketers, managers, engineers) in their language. It may be that something takes time to build an operational data infrastructure and see results. That is where working communication is worth its weight in gold.
Roles in the Data Science Team
We have already discussed a few roles in a functioning Data Science Center of Excellence. In discussions with Lucient’s experienced specialists and after reading the relevant literature, the following important types have crystallized.
It should be emphasized at this point that these are not positions that must be filled by a particular individual. In practice, it is common that these competencies are distributed among different personnel and also a single person might cover multiple roles.
- The Chief Analytics Officer/Chief Data Officer is the technical lead. He forms the bridge to the specialist departments. He can be considered a visionary.
- The Data Analyst ensures that the right data is interpreted and analyzed correctly.
- The Business Analyst has the necessary technical expertise, which they bring to data analysis.
- The Data Scientist solves business problems using machine learning and data mining technologies.
- The Machine Learning Engineer focuses on the practical / software (architecture) part of the data model.
- The Data Journalist specializes in telling a story from the acquired data.
- The Data Architect designs a suitable data infrastructure. His focus is on cloud know-how.
- The Data Engineer keeps the infrastructure running.
What distinguishes Data Scientists from really good Data Scientists?
Now that we have briefly examined the most important guidelines for setting up a Data Science Center of Excellence, we would like to turn to the question of what distinguishes Data Scientists from really good Data Scientists.
Willingness to learn
The development in the field of machine learning and AI is advancing rapidly. Especially when you lead a team of data scientists, it is enormously important to have your finger on the pulse of the times to implement new technologies at an early stage.
Focus on business impact
Data Science is not l’art pour l’art. We do it to achieve added value for the company. Really good data scientists think entrepreneurially, use resources economically, and do not get bogged down unnecessarily by esoteric or arcane issues that do not provide real business ROI.
Solid software engineering skills
Really good Data Scientists also think like developers and pay attention to practically applicable code.
Expectation management
Really good Data Scientists know about the different approaches to different stakeholders and how to communicate with them.
At home in the cloud
You will not get very far offline on your own notebook. Really good data scientists naturally move deftly on platforms such as Azure, AWS or Google Cloud.
Data Science projects are difficult to plan. Much happens based on trial and error. But the earlier you start, the faster the workflows and processes mature. The greater the chance of becoming a data-driven organization.
Lucient has the necessary experience and know-how to build up a Data Science Center of Excellence and drive it forward by means of training and project knowledge and developing the vision for future tasks and solutions
We know how to communicate with all stakeholders and keep our finger on the pulse as active community members and in close contact with vendors. At the same time, we keep an eye on economic necessities. We look forward to transforming data into knowledge together with you.