hit counter

How to build an effective DataOps team

A DataOps strategy relies heavily on collaboration as data flows between managers and consumers across the enterprise. Collaboration is essential to DataOps success, so it’s important to start with the right team to drive these initiatives forward.

It’s natural to think of DataOps simply as DevOps for data – not quite. It would be more accurate to say that DataOps is trying to achieve for data what DevOps achieves for coding: a dramatic improvement in productivity and quality. However, DataOps has some other problems to solve, most notably maintaining a mission-critical system in continuous production.

The distinction is important when it comes to assembling a DataOps team. If the DevOps approach is a template, the focus for product managers, scrum masters, and developers will be on delivery. DataOps also needs to focus on ongoing maintenance and requires some other frameworks for it to work with.

A major influence on DataOps has been lean manufacturing techniques. Managers often use terms from the classic Toyota production system, which has been widely studied and imitated. There are also terms like data factory when talking about data pipelines in production.

This approach requires a strong team structure. First, let’s look at some roles within a DataOps team.

Key roles for DataOps

The roles described here apply to a DataOps team using data science in mission-critical production.

What about teams less focused on data science? Do you also need DataOps, for example for a data warehouse? Sure, some of the techniques may be similar, but a traditional team of ETL (Extract, Transform, and Load) developers and data architects will likely work just fine. By its very nature, a data warehouse is less dynamic and more constant than an agile pipeline data environment. The following DataOps team roles cover the slightly more volatile world of pipelines, algorithms, and self-service users.

Still, DataOps techniques are becoming more relevant as data warehouse teams push to be more agile, especially in cloud deployments and data lakehouse architectures.

Let’s start by defining the roles required for these new analysis techniques.

The data scientist

Data scientists do research. If an organization knows what it wants and just needs someone to implement a prediction process, then get a developer who knows algorithms. The data scientist, on the other hand, conducts research for a living and in the process discovers what is relevant and meaningful.

In the course of exploration, a data scientist may try numerous algorithms, often in ensembles of different models. You can even write your own algorithms.

The DataOps team can make the difference between a company that occasionally does cool things with data and one that works efficiently and reliably with data, analytics and insights.

The key attributes for this role are a relentless curiosity and interest in the domain, as well as technical understanding – particularly in statistics – to understand the meaning of what they discover and the impact of their work on the real world.

This care counts. It’s not enough to find a good model and stop there, as business areas evolve quickly. While not everyone works in fields with compelling ethical dilemmas, data scientists in any field sooner or later encounter problems of personal or commercial privacy.

This is a technical role, but don’t overlook the human side, especially if the organization is only hiring a data scientist. A good data scientist is a good communicator, able to explain results to a non-technical audience, often executives, while being clear about what’s possible and what’s not.

Finally, the data scientist, especially when working in a new area for them, is unlikely to know all operational data sources – ERP, CRM, HR systems, etc. – but they certainly have to work with the data. In a well-managed system, they may not have direct access to all of an organization’s raw data. You need to work with other roles that understand the source systems better.

The data engineer

In general, it is the data engineer who moves data between operating systems and the data lake – and from there between zones of the lake such as raw, cleaned and production areas.

The data engineer also supports the data warehouse, which in itself can be a challenging task as they need to maintain history for reporting and analysis while ensuring continuous development.

In the past, the data engineer might have been called a data warehouse architect or an ETL developer, depending on their expertise. but data engineer is the new technical term that better captures the operational focus of the role at DataOps.

The DataOps engineer

Another engineer? Yes, and one focused on operations. But the DataOps engineer has another specialty: assisting the data scientist.

Data scientist skills focus on modeling and deriving insights from data. However, it is common that what works well on the workbench can be difficult or expensive to deploy in production. Sometimes an algorithm on a production dataset runs too slowly, but also consumes too much processing power or disk space to scale effectively. The DataOps engineer helps here by testing, tuning and maintaining models for production.

As part of this, the DataOps engineer knows how to evaluate a model accurately enough over time when data drifts. They also know when to retrain or redesign the model, even if that work falls to the data scientist.

The DataOps engineer ensures models run within budget and resource constraints, which he probably understands better than anyone on the team.

The Data Analyst

In a modern organization, the data analyst can have a wide range of skills, ranging from technical knowledge to aesthetic understanding of visualization to so-called soft skills like collaboration. They are also less likely to have had a lot of technical training compared to a database developer.

Their data ownership—and influence—may depend less on where they sit in the organizational hierarchy and more on their personal engagement and willingness to take ownership of a problem.

There are people in every department. look around Someone is “the data person” who, regardless of job title, knows where the data is, how to interact with it, and how to present it effectively.

To be fair, this role is becoming more formalized today, but there are still a large number of data analysts who have grown into the role from a business rather than a technical background.

The executive sponsor

Is the lead sponsor a member of the team? Maybe not directly, but the team won’t get very far without it. A C-level sponsor can be instrumental in aligning the specific work of a DataOps team with the company’s strategic vision and tactical decisions. You can also ensure the team has budget and resources with long-term goals.

A diagram depicting the different teams, goals, and challenges for DevOps and DataOps
The difference between DevOps and DataOps

Match the team to the organization

Few organizations can or will immediately deploy a team of four or more just for DataOps. The team’s skills and value must grow over time.

So how should a team grow? Who should be the first employee? It all depends on where the organization starts. But there has to be a lead sponsor from day one.

The team is unlikely to start from scratch. Enterprises need DataOps precisely because they already have work in progress that needs to be better operationalized. They may have started exploring DataOps because they have data scientists pushing the boundaries of what they can manage today.

In this case, the first hire should be a DataOps engineer, as their job is to operationalize data science and make it manageable, scalable, and comprehensive enough to be business-critical.

On the other hand, it is possible for a company to have a traditional data warehouse involving data engineers and data analysts downstream from them. In this case, the first position on the DataOps team would be a data scientist for advanced analytics.

An important question is whether to create a formal organization or a virtual team. This is another important reason for the lead sponsor who may have a lot to say in the answer. Many DataOps teams start out as virtual groups that work across organizational boundaries to ensure data and data flow is reliable and trusted.

Whether loosely or tightly organized, these individual disciplines will grow in strength and influence over time, and their strategic direction and use of resources will be related in a consistent framework for exploration and delivery. In this case, the organization can add more engineering for scale and governance, and more scientists and analysts for insights. At this point, wherever the organization began, the team is likely to become more formally organized and recognized.

It’s an exciting process. The DataOps team can make the difference between a company that occasionally does cool things with data and one that works efficiently and reliably with data, analytics and insights.

Leave a Comment