data lineage Archives

The Journey to Data Mesh – Part 2 – Assembling a Development Team & Data Platform for the Pilot Project

by Zeenea Software | Apr 15, 2024 | Data Mesh, News & events

While the literature on data mesh is extensive, it often describes a final state, rarely how to achieve it in practice. The question then arises:

What approach should be adopted to transform data management and implement a data mesh?

In this series of articles, get an excerpt from our Practical Guide to Data Mesh where we propose an approach to kick off a data mesh journey in your organization, structured around the four principles of data mesh (domain-oriented decentralized data ownership and architecture, data as a product, self-serve data infrastructure as a platform, and federated computational governance) and leveraging existing human and technological resources.

Part 1: Scoping Your Pilot Project
Part 2: Assembling a Development Team & Data Platform for the Pilot Project
Part 3: Creating your First Data Products
Part 4: Implementing Federated Computational Governance

Throughout this series of articles, and in order to illustrate this approach for building the foundations of a successful data mesh, we will rely on an example: that of the fictional company Premium Offices – a commercial real estate company whose business involves acquiring properties to lease to businesses.

—

In the previous article, we discussed the essential prerequisites for defining the scope of your data management decentralization pilot project, by identifying domains and selecting a use case. In this article, we will explain how to establish its development team and data platform.

Building the pilot development team

As mentioned, the first step in our approach is to identify an initial use case and, more importantly, to develop it by implementing the 4 principles of data mesh with existing resources. Forming the team responsible for developing the pilot project will help implement the first principle of data mesh, domain-oriented decentralized data ownership.

PREMIUM OFFICES EXAMPLE

The data required for the pilot belongs to the Brokerage domain, where the team responsible for developing the pilot will be created. This multidisciplinary team includes:

A Data Product Owner
Should have both a good understanding of the business and a strong data culture to fulfill the following responsibilities: designing data products and managing their lifecycle, defining and enforcing usage policies, ensuring compliance with internal standards and regulations, and measuring and overseeing the economic performance and compliance of their product portfolio.

Two Engineers
One from the Brokerage domain teams - bringing knowledge of operational systems and domain software engineering practices, and the other from the data team - familiar with DBT, GCP, and BigQuery.

A visualization developer
Who can design and build the dashboard.

Domain tooling: the data platform of the data mesh

One of the main barriers to decentralization is the risk of multiplying the efforts and skills required to operate pipelines and infrastructures in each domain. But in this regard, there is also a solid state-of-the-art inherited from distributed architectures.

The solution is to structure a team responsible for providing domains with the technological primitives and tools needed to extract, process, store, and serve data from their domain.

This model has existed for several years for application infrastructures and has gradually become generalized and automated through virtualization, containerization, DevOps tools, and cloud platforms. Although data infrastructure tooling is not as mature as software infrastructure, especially in terms of automation, most solutions are transferable, and capabilities are already present in organizations as a result of past investments. Therefore, nothing is preventing the establishment of a data infrastructure team, setting its roadmap, and gradually improving its service offering: simplification and automation being the main axes of this progression.

The three planes of the Data Mesh platform

The data platform for data mesh covers a wide range of capabilities, broader than infrastructure services. This platform is divided into three planes:

1. The Data infrastructure provisioning plane – provides low-level services to allocate the physical resources needed for big data extraction, processing, storage, real-time or non-distributed distribution, encryption, caching, access control, network, co-location, etc.

2. The Data product developer experience plane – provides the tools needed to develop data products: declaration of data products, continuous build and deployment, testing, quality controls, monitoring, securing, etc. The idea is to provide abstractions above the infrastructure to hide its complexity and automate the conventions adopted on the mesh scale.

3. The Data mesh supervision plane – provides a set of global capabilities for discovering data products, lineage, governance, compliance, global reporting, policy control, etc.

On the infrastructure side, the data mesh does not require new capabilities – the vast majority of organizations already have a data platform. The implementation of the data mesh also does not require a centralized platform. Some companies have already invested in a common platform, and it seems logical to leverage the capabilities of this platform to develop the mesh.

But others have several platforms, some entities, or certain domains having their infrastructure. It is entirely possible to deploy the data mesh on these hybrid infrastructures: as long as the data products respect common standards for addressability, interoperability, and access control, the technical modalities of their execution are of little importance.

PREMIUM OFFICES EXAMPLE

Premium Offices has invested in a shared cloud platform – specifically, GCP (Google Cloud Platform). The platform includes experts in a central team who understand its intricacies. For its pilot project, Premium Offices simply chose to integrate one of these experts into the project team. This individual will be responsible for finding solutions to automate the deployment of data products as much as possible and identifying manual steps that can be automated later, as well as any missing tools

In conclusion, establishing a dedicated development team is essential for the success of your data management decentralization pilot project. By bringing together individuals with diverse skills and expertise, organizations can effectively implement the principles of data mesh and drive meaningful insights from their data. Moreover, leveraging existing platforms and investing in automation facilitates the development process, paving the way for scalability and long-term success.

In our next article, learn how to execute your data mesh pilot project through the design and development of your first data products.

The Practical Guide to Data Mesh: Setting up and Supervising an enterprise-wide Data Mesh

Written by Guillaume Bodet, co-founder & CPTO at Zeenea, our guide was designed to arm you with practical strategies for implementing data mesh in your organization, helping you:

✅ Start your data mesh journey with a focused pilot project
✅ Discover efficient methods for scaling up your data mesh
✅ Acknowledge the pivotal role an internal marketplace plays in facilitating the effective consumption of data products
✅ Learn how Zeenea emerges as a robust supervision system, orchestrating an enterprise-wide data mesh

Get the ebook

The Journey to Data Mesh – Part 1 – Scoping your Pilot Project

by Zeenea Software | Apr 9, 2024 | Data Mesh

While the literature on data mesh is extensive, it often describes a final state, rarely how to achieve it in practice. The question then arises:

What approach should be adopted to transform data management and implement a data mesh?

Part 1: Scoping Your Pilot Project
Part 2: Assembling a Development Team & Data Platform for the Pilot Project
Part 3: Creating your First Data Products
Part 4: Implementing Federated Computational Governance

—

The initial step to transforming data management and implementing data mesh within your organization involves building a pilot project, an embryo of mesh. This will be developed based on the 4 principles of data mesh, using existing resources, meaning without impacting the organization.

To ensure a successful start to your data management decentralization journey, you must focus on two essential prerequisites: well-defined domains and the choice of an initial use case.

Domain identification

The primary prerequisite for launching the pilot project is the identification of domains – the federation of autonomous domains being at the core of the data mesh.

This step generally poses no difficulty. Indeed, the concept of domains is already widely understood, and the division into domains is often stable – whether structured according to value chains, major business processes, or organizational operational capabilities. Domains sometimes have their own technical teams and operational systems that generate the majority of the data. The transition often involves reallocating data ownership according to an existing structure.

PREMIUM OFFICES EXAMPLE

Premium Offices is already structured around domains that reflect its major capabilities. Here are three examples of domains:

Asset
A domain responsible for acquiring and managing real estate assets. It primarily relies on asset management software.

Brokerage
A domain that manages the commercialization of properties for rent and tenant management. It utilizes Tenant Management software and is responsible for the commercial website and posting offers on specialized marketplaces.

Capital Markets
A domain responsible for loans to finance purchases and optimize the loan portfolio. It uses another specialized software.

Premium Offices already has a modern data platform, based on DBT, Google BigQuery, and Tableau. It is managed by a centralized team supported by a centralized Data Office.

Organisation En Domaines Chez Premium Offices

Organization by domains – From the main capacities of Premium Offices

Choosing an initial use case

The choice of a use case for the pilot project is relatively arbitrary – it could involve revamping an existing dashboard, creating a new dashboard, adding AI capabilities to an application, or even commercializing certain data. However, this first use case must possess specific characteristics to facilitate optimal learning conditions:

🚫 It must focus on usage, not just one or more data products – the intrinsic value of a data product is null, and its value is realized through its uses.

🚫 It should not be overly cross-cutting and should consume data from one or two domains at most – ideally, just one.

🚫 It should not be overly simplistic and should consume more than one data product; two or three are sufficient.

🚫 It should not be overly experimental – the goal is to achieve concrete results quickly.

PREMIUM OFFICES EXAMPLE

For the pilot project, Premium Offices has chosen to build a credit risk dashboard for its tenants to better anticipate and prevent potential defaults. This dashboard must combine tenant data from its software and credit data acquired from a specialized provider. These data are already used operationally in the process of evaluating a new tenant.

In conclusion, initiating a data mesh transformation and launching a pilot project begins with key prerequisites: identifying domains and choosing an initial use case. By defining a scope upfront, organizations can lay a solid foundation for decentralized data management, all without impacting the organization.

In the next article, we delve into the establishment of a development team and a robust data platform to support the data mesh pilot project.

The Practical Guide to Data Mesh: Setting up and Supervising an enterprise-wide Data Mesh

Written by Guillaume Bodet, co-founder & CPTO at Zeenea, our guide was designed to arm you with practical strategies for implementing data mesh in your organization, helping you:

Get the ebook

Zeenea Product Recap: A look back at 2023

by Zeenea Software | Jan 8, 2024 | Data Catalog, Zeenea Product

2023 was another big year for Zeenea. With more than 50 releases and updates to our platform, these past 12 months were filled with lots of new and improved ways to unlock the value of your enterprise data assets. Indeed, our teams consistently work on features that simplify and enhance the daily lives of your data and business teams.

In this article, we’re thrilled to share with you some of our favorite features from 2023 that enabled our customers to:

Decrease data search and discovery time
Increase Data Steward productivity & efficiency
Deliver trusted, secure, and compliant information across the organization
Enable end-to-end connectivity with all their data sources

Decrease data search and discovery time

One of Zeenea’s core values is simplicity. We strongly believe that data discovery should be quick and easy to accelerate data-driven initiatives across the entire organization.

In fact, many data teams still struggle to find the information they need for a report or use case. Either because they couldn’t locate the data because it was scattered in various sources, files, or spreadsheets, or maybe they were confronted with an overwhelming amount of information they didn’t even know how to begin their search.

In 2023, we designed our platform with simplicity. By providing easy and quick ways to explore data, Zeenea enabled our customers to find, discover, and understand their assets in seconds.

A fresh new look for the Zeenea Explorer

One of the first ways our teams wanted to enhance the discovery experience of our customers was by providing a more user-friendly design to our data exploration application, Zeenea Explorer. This redesign included:

New Homepage

Our homepage needed a brand-new look and feel for a smoother discovery experience. Indeed, for users who don’t know what they are looking for, we added brand-new exploration paths directly accessible via the Zeenea Explorer homepage.

Browsing by Item Type: If the user is sure of the type of data asset they are looking for, such as a dataset, visualization, data process, or custom asset, they directly access the catalog with it pre-filtered with the needed type of asset.

Browsing through the Business Glossary: Users can quickly navigate through the enterprise’s Business Glossary by directly accessing the Glossary assets that were defined or imported by stewards in Zeenea Studio.

Browsing by Topic: The app enables users to browse through a list of Items that represent a specific theme, use case, or anything else that is relevant to business (more information below).

New Item Detail Pages

To understand a catalog Item at a glance, one of the first notable changes was the position of the Item’s tabs. The tabs were originally positioned on the left-hand side of the page, which took up a lot of space. Now, the tabs are at the top of the page, more closely reflecting the layout of the Studio app. This new layout allows data consumers to find the most significant information about an Item such as:

The highlighted properties, defined by the Data Steward in the Catalog Design,
Associated Glossary terms, to understand the context of the Item,
Key people, to quickly reach the contacts that are linked to the Item.

In addition, our new layout allows users to find all fields, metadata, and all other related items instantly. Divided into three separate tabs in the old version, data consumers now find the Item’s description and all related Items in a single “Details” tab. Indeed, depending on the Item Type you are browsing through, all fields, inputs & outputs, parent/children Glossary Items, implementations, and other metadata are in the same section, saving you precious data discovery time.

Lastly, the spaces for our graphical components were made larger – users now have more room to see their Item’s lineage, data model, etc.

New Filtering system

Zeenea Explorer offers a smart filtering system to contextualize search results. Zeenea’s preconfigured filters can be used such as by item type, connection, contact, or by the organization’s own custom filters. For even more efficient searches, we redesigned our search results page and filtering system:

Available filters are always visible, making it easier to narrow down the search,
By clicking on a search result, an overview panel with more information is always available without losing the context of the search,
The filters most relevant to the search are placed at the top of the page, allowing to quickly get the results needed for specific use cases.

Easily browsing the catalog by Topic

One major 2023 release was our Topics feature. Indeed, to enable business users to (even more!) quickly find their data assets for their use cases, Data Stewards can easily define Topics in Zeenea Studio. To do so, they simply select the filters in the Catalog that represent a specific theme, use case, or anything else that is relevant to business.

Data teams using Zeenea Explorer can therefore easily and quickly search through the catalog by Topic to reduce their time searching for the information they need. Topics can be directly accessed via the Explorer homepage and the search bar when browsing the catalog.

Alternative names for Glossary Items for better discovery

In order for users to easily find the data and business terms they need for their use cases, Data Stewards can add synonyms, acronyms, and abbreviations for Glossary Items!

Ex: Customer Relationship Management > CRM

Improved search performance

Throughout the year, we implemented a significant amount of improvements to enhance the efficiency of the search process. The addition of stop words, encompassing pronouns, articles, and prepositions, ensures a more refined and pertinent outcome for queries. Moreover, we added an “INFIELD:” operator, enabling users the capability to search for Datasets that contain a specific field.

Microsoft Teams integration

Zeenea also strengthened our communication and collaboration capacities. Specifically, when a contact is linked to a Microsoft email address, Zeenea now facilitates the initiation of direct conversations via Teams. This integration allows Teams users to promptly engage with relevant individuals for additional information on specific Items. Other integrations with various tools are in the works. ⭐️

Increase Data Steward productivity & efficiency

Our goal at Zeenea is to simplify the lives of data producers so they can efficiently manage, maintain, and enrich the documentation of their enterprise data assets in just a few clicks. Here are some features and enhancements that help to stay organized, focused, and productive.

Automated Datasets Import

When importing new Datasets in the Catalog, administrators can turn on our Automatic Import feature which automatically imports new Items after each scheduled inventory. This time-saving enhancement increases operational efficiency, allowing Data Stewards to focus on more strategic tasks rather than the routine import process.

Orphan Fields Deletion

We’ve also added the to manage Orphan Fields more effectively. This includes the option to perform bulk deletions of Orphan Fields, accelerating the process of decluttering and organizing the catalog. Alternatively, Stewards can delete a single Orphan Field directly from its detailed page, providing a more granular and precise approach to catalog maintenance.

Building reports based on the content of the catalog

We added a new section in Zeenea Studio – The Analytics Dashboard – to easily create and build reports based on the content and usage of the organization’s catalog.

Directly on the Analytics Dashboard page, Stewards can view the completion level of their Item Types, including Custom Items. Each Item Type element is clickable to quickly view the Catalog section filtered by the selected Item Type.

For more detailed information on the completion level of a particular Item Type, Stewards can create their own analyses! They select the Item Type and a Property, and they’re able to consult, and for each value of this property, the completion level of all your Item’s template, including its description, and linked Glossary Items.

New Analytics Dashboard Gif Without Adoption

New look for the Steward Dashboard

Zeenea Explorer isn’t the only application that got a makeover! Indeed, to help Data Stewards stay organized, focused, and productive, we redesigned the Dashboard layout to be more intuitive to get work done faster. This includes:

New Perimeter design: A brand new level of personalization when logging in to the Dashboard. The perimeter now extends beyond Dataset completion – it includes all the Items that one is a Curator for, including Fields, Data Processes, Glossary Items, and Custom Items.

Watchlists Widget: Just as Data Stewards create Topics for enhanced organization for Explorer users, they can now create Watchlists to facilitate access to Items requiring specific actions. By filtering the catalog with the criteria of their choice, Data Stewards save these preferences as new Watchlists via the “Save filters as” button, and directly access them via the Watchlist widget when logging on to their Dashboard.

The Latest Searches widget: Caters specifically to the Data Steward, focusing on their recent searches to enable them to pick up where they left off.
The Most Popular Items widget: The most consulted and widely used Items within the Data Steward’s Perimeter by other users. Each Item is clickable, giving instant access to its contents.

View the Feature Note

Deliver trusted, secure, and compliant information across the organization

Data Sampling on Datasets

For select connections, it is possible to get Data Sampling for Datasets. Our Data Sampling capabilities allow users to obtain representative subsets of existing datasets, offering a more efficient approach to working with large volumes of data. With Data Sampling activated, administrators can configure fields to be obfuscated, mitigating the risk of displaying sensitive personal information.

This feature carries significant importance to our customers, as it enables users to save valuable time and resources by working with smaller, yet representative, portions of extensive datasets. This also allows early identification of data issues, thereby enhancing overall data quality and subsequent analyses. Most notably, the capacity to obfuscate fields addresses critical privacy and security concerns, allowing users to engage with anonymized or pseudonymized subsets of sensitive data, ensuring compliance with privacy regulations, and safeguarding against unauthorized access.

Powerful Lineage capabilities

In 2022, we made a lot of improvements to our Lineage graph. Not only did we simplify its design and layout, but we also made it possible for users to display only the first level of lineage, expand and close the lineage on demand, and get a highlighted view of the direct lineage of a selected Item.

This year we made significant other UX changes, including the possibility to expand or reduce all lineage levels in one click, hide the data processes that don’t have at least one input and one output, and easily view the connections via a tooltip for connections that have long names.

However, the most notable release is the possibility to have Field-level lineage! Indeed, it is now possible to retrieve the input and output Fields of tables and reports, and for more context, add the operation’s description. Then, users can directly view their Field level transformations over time in the Data Lineage graph in both Zeenea Explorer and Zeenea Studio.

Data Quality Information on Datasets

By leveraging GraphQL and knowledge graph technologies, Zeenea Data Discovery Platform provides a flexible approach to integrating best-of-breed data quality solutions. It synchronizes datasets via simple query and mutation operations from third-party DQM tool via our Catalog API capabilities. The DQM tool will deliver real-time data quality scan results to the corresponding dataset within Zeenea, enabling users the ability to conveniently review data quality insights directly within the catalog.

This new feature includes:

A Data Quality tab in your Dataset’s detail pages, where users can view its Quality checks as well as the type, status, description, last execution date, etc.
The possibility to view more information on the Dataset’s quality directly in the DQM tool via the “Open dashboard in [Tool Name]” link.
A data quality indicator of Datasets directly displayed in the search results and lineage.

View the Feature Note

Enable end-to-end connectivity with all their data sources

At Zeenea, connect to all your data sources in seconds. Our platform’s built-in scanners and APIs enable organizations to automatically collect, consolidate, and link metadata from their data ecosystem. This year, we made significant enhancements to our connectivity to enable our customers to build a platform that truly represents their data ecosystem.

Catalog Management APIs

Recognizing the importance of API integration, Zeenea has developed powerful API capabilities that enable organizations to seamlessly connect and leverage their data catalog within their existing ecosystem.

In 2023, Zeenea developed Catalog APIs, which help Data Stewards with their documentation tasks. These Catalog APIs include:

Query operations to retrieve specific catalog assets: Our API query operations include the retrieval of a specific asset, using its unique reference or by its name & type, or retrieving a list of assets via connection or a given Item type. Indeed, Zeenea’s Catalog APIs enable flexibility when querying by being able to narrow results to not be overwhelmed with a plethora of information.

Mutation operations to create and update catalog assets: To save even more time when documenting and updating company data, Zeenea’s Catalog APIs enable data producers to easily create, modify, and delete catalog assets. It enables the creation, update, and deletion of Custom Items and Data Processes as well as their associated metadata, and update Datasets and Data Visualizations. This is also possible for Contacts. This is particularly important when users leave the company or change roles – data producers can easily transfer the information that was linked to a particular person to another.

Read the Feature Note

Property & Responsibility Codes management

Another feature that was implemented was the ability to add code to properties & responsibilities to easily use them in API scripts for more reliable queries & retrievals.

For all properties and responsibilities that were built in Zeenea (e.g.: Personally Identifiable Information) or harvested from connectors, it is possible to modify its name and description to better suit the organization’s context.

More than a dozen more connectors to the list

At Zeenea, we develop advanced connectors to automatically synchronize metadata between our data discovery platform and all your sources. This native connectivity saves you the tedious and challenging task of manually finding the data you need for a specific business use case that often requires access to scarce technical resources.

In 2023 alone, we developed over a dozen new connectors! This achievement underscores our agility and proficiency in swiftly integrating with diverse data sources utilized by our customers. By expanding our connectivity options, we aim to empower our customers with greater flexibility and accessibility.

View our connectors

The top 5 benefits of data lineage

by Zeenea Software | May 15, 2023 | Data Catalog, Data Compliance

Do you have the ambition to turn your organization into a data-driven enterprise? You cannot escape the need to accurately map all your data assets, monitor their quality and guarantee their reliability. Data lineage can help you accomplish this mission. Here are some explanations.

To know what data you use, what it means, where it comes from, and how reliable it is throughout its life cycle, you need a holistic view of everything that is likely to transform, modify or alter it. This is exactly the mission that data lineage fulfills, which is a data analysis technique that allows you to follow the path of data from its source to its final use. A technique that has many benefits!

Benefit #1: Improved data governance

Data governance is a key issue for your business and for ensuring that your data strategy can deliver its full potential. By following the path of data – from its collection to its exploitation – data lineage allows you to understand where it comes from and the transformations it has undergone over time to create a rich and contextualized data ecosystem. This 360° view of your data assets guarantees reliable and quality data governance.

Benefit #2: More reliable, accurate, and quality data

As mentioned above, one of the key strengths of data lineage is its ability to trace the origin of data. However, another great benefit is its ability to identify the errors that occur during its transformation and manipulation. Hence, you are able to take measures to not only correct these errors but also ensure that they do not reoccur, ultimately improving the quality of your data assets. A logic of continuous improvement that is particularly effective for the success of your data strategy.

Benefit #3: Quick impact analysis

Data lineage accurately identifies data flows, making sure you never stay in the wrong for too long. The first phase is based on the detailed knowledge of your business processes and your available data sources. When critical data flows are identified and mapped, it is possible to quickly analyze the potential impacts of a given transformation on data or a business process. With the impacts of each data transformation assessed in real-time, you have all the information you need to identify the ways and means to mitigate the consequences. Visibility, traceability, reactivity – data lineage saves you precious time!

Benefit #4: More context to the data

As you probably understood by now, data lineage continuously monitors the course of your data assets. Therefore, beyond the original source of the data, you have full visibility of the transformations that have been applied to the data throughout its journey. This visibility also extends to the use that is made of the data within your various processes or through the
applications deployed in your organization. This ultra-precise tracking of the history of interactions with data allows you to give more context to data in order to improve data quality, facilitate analysis and audits, and make more informed decisions based on accurate and complete information.

Benefit #5: Build (even more!) reliable compliance reports

The main expectations of successful regulatory compliance are transparency and traceability. This is the core value promise of data lineage. By using data lineage, you have all the cards in your hand to reduce compliance risks, improve data quality, facilitate audits and verifications, and reinforce stakeholders’ confidence in the compliance reports produced.

5 essential Zeenea features for a five-star Data Stewardship Program

by Zeenea Software | Jan 27, 2023 | Data Catalog, Data Compliance, Data governance, Metadata Management

You have data – and lots of it. However, it is messy, incomplete, and scattered into several different platforms, databases, and even spreadsheets. On top of this, some of your information is inaccessible, or worse – accessible to the wrong people. And as the go-to data experts of the company, Data Stewards must be able to identify the who, what, when, where, and why of their data to build a reliable stewardship program.

Unfortunately, Data Stewards face a major roadblock to success – the lack of tools to support their role. When dealing with large volumes of data, maintaining data documentation, managing enterprise metadata, and tackling quality & governance issues can be quite challenging.

This is where Zeenea steps in. Our data discovery platform – and its smart and automated metadata management features – facilitates the lives of Data Stewards. Discover 5 of them in this article.

Feature 1: Universal connectivity

Automatically extract and inventory metadata from your data sources

As mentioned above, a lot of enterprise data is spread across many different information sources, making it difficult, even impossible, for Data Stewards to manage and control their data landscape. Zeenea provides a next-generation data cataloging solution that centralizes and unifies all enterprise metadata into a single source of truth. Our platform’s wide-range of native connectors automatically retrieves and collects metadata through our APIs and scanners.

Feature 2: A Flexible & Adaptable Metamodel

Automate data documentation

Documenting information can be extremely time-consuming, with sometimes thousands of properties, fields, and other important metadata that need to be implemented for business teams to fully understand and have the necessary context on the data they are consulting.
Zeenea provides a flexible and adaptable way to build metamodel templates for pre-configured (datasets, fields, data processes, etc) and an unlimited amount of custom objects (procedures, rules, KPIs, regulations, etc).

Import or create your documentation templates by simply dragging & dropping your existing properties along with your tags, and other custom metadata into your templates. Made a mistake in your template? No problem! Add, remove, or modify your properties and sections as you please – your items are automatically updated after you’ve finished editing them.

After you’ve defined your templates, easily visualize all the assets that make up your metamodel, as well as their relationships with our dynamic diagram. Our user-friendly design shows the details of each type of object – their sections and their properties – and updates automatically after each template change. You can also zoom in or out on the object of your choice and export an image of your metamodel.

Do the same for your Glossary information! We separated the Physical & Logical metamodel from the Glossary metamodel so Data Stewards and other contributors can easily define and find their Business Glossary assets. Using the same process as the Physical & Logical metamodel, create or import semantic objects, organize them in hierarchies, and configure the way your glossary items are mapped with technical assets with our flexible templates.

Feature 3: Automatic Data Lineage

Trace your data transformations

In order for Data Stewards to build accurate and trustworthy compliance reports, data lineage capabilities are essential. Many software developers offer lineage capabilities, but rare are those who understand it. Via a visual and easy-to-interpret lineage graph, Zeenea offers your users the possibility to navigate through the lifecycle of their data. Click on any item to get an overview of its documentation, relations to other assets, as well as its metadata to obtain a 360° view of your catalog items.

Feature 4: Smart suggestions

Quickly identify personal data

With the GDPR, California Consumer Privacy Act, and other regulations regarding the security and privacy of the information of individuals, it can be a hassle to go through each existing set of information to ensure you’ve correctly indicated the data as personal. To always ensure your information is correctly labeled, Zeenea analyzes similarities between existing personal data by identifying and giving suggestions on which fields to tag as “personal data”. Data Stewards can accept, ignore, or delete suggestions directly from their dashboard.

Feature 5: An effective permission sets model

Ensure the right people are accessing the right data

For organizations with various types of users accessing their data landscape, it doesn’t make sense to give everyone full access to modify anything and everything. Especially when dealing with sensitive or personal information. For this reason, Zeenea designed an effective permission sets model to allow Data Stewards to increase efficiency for your organization and reduce the risk of errors. Assign read-only, edition, and admin rights in all or different parts of the catalog to not only ensure a secure catalog but also save time when data consumers need to find an asset’s referent.

Ready to start your data stewardship program with Zeenea?

If you’re interested in Zeenea’s features for your data documentation & stewardship needs, contact us for a 30-minute personalized demo with one of our data experts.

GET A DEMO

How does a data catalog help companies implement successful Data Stewardship programs?

by Zeenea Software | Jul 6, 2022 | Data Catalog, Metadata Management

By implementing a data stewardship program in your organization, you ensure not only the quality of your data but also that it can be used easily and effectively by all your employees. As a key player in data governance and management, the Data Steward needs specific tools, the first of which is the data catalog.

The role of data in companies is becoming increasingly strategic, and not just for large organizations! Indeed, to define business strategies, manage distribution, or organize production, the exploitation of data constitutes a major competitive advantage. To deliver its full potential, data must be reliable, of high quality, and perfectly organized. These characteristics are linked to a discipline: Data Stewardship.

The Data Steward, also known as the Master of Data, acts as the guarantor of optimal data exploitation. How? By centralizing all data, regardless of its source, in an environment that is accessible to all business lines in a simple, intuitive, and operational manner. A Data Stewardship program is based on a rigorous methodology, a global vision of available data, and an ambition to rationalize data in order to develop a strong data culture. However, vision, understanding, and methodology do not exempt the Data Steward from relying on the right tools to accomplish their missions: a data catalog is one of the essential tools for a successful Data Stewardship project.

A data catalog’s objectives

A data catalog exploits metadata – data on data – to create a searchable repository of all enterprise information assets. This metadata collected by various data sources (Big Data, Cloud services, Excel sheets, etc.) is automatically scanned to enable users of the catalog to search for their data and get information such as the availability, freshness, and quality of a data asset. A data catalog centralizes and unifies the metadata collected so that it can be shared with IT teams and business functions. This unified view of data allows organizations to:

Sustain a data culture,
Accelerate data discovery,
Build agile data governance,
Maximize the value of data,
Produce better and faster
Ensure good control over data.

The benefits of a data catalog for Data Stewards

From importing new data sources to tracking information updates, the ability of a data catalog to track and monitor metadata in real-time automatically allows data stewards to gain efficiency. A data catalog provides 360° visibility into your data from its origin to all of its transformations over time. There are four key benefits to using a data catalog as part of a Data Stewardship program:

Benefit #1: Maintain up-to-date documentation

Your data is constantly active. It is collected, valued, exploited, enriched… To have a perfect understanding of your data assets, you need up-to-date documentation regarding its data sources and how they are used. A data catalog is designed to do just that.

Zeenea’s advantage: Our catalog automatically retrieves and collects metadata through our APIs and scanners to always ensure that your data is up-to-date. View your data’s origins and transformations over time with our smart lineage capabilities.

Benefit #2: Ensure data quality

The first vocation of a data catalog is to keep a clear view of your data via metadata. The definitions, structures, sources, uses, procedures to follow… by nature, metadata management by a data catalog contributes to guarantee the quality of your data.

Zeenea’s advantage: Our data catalog enables your Data Stewards to build flexible metamodel templates for predefined and custom item types. Simply drag & drop your properties, tags, and other fields into your documentation templates for all your catalog items.

Benefit #3: Comply with data regulations

Compliance with data regulations is a crucial issue in a Data Stewardship program. A data catalog, through its ability to organize data and centralize it in a clear, healthy, and readable environment, helps to comply with these regulatory requirements.

Zeenea’s advantage: Through machine learning capabilities, our Data Catalog speeds up time-consuming tasks by analyzing similarities between existing personal data. It provides smart recommendations by identifying and giving suggestions to tag personal data.

Benefit #4: Monitor data lifecycle

Between governance, quality, and security, your Data Stewardship project implies monitoring the lifecycle of your data in real-time. The data catalog responds to this challenge by offering you the possibility to monitor all activities affecting your data.

Zeenea’s advantage: our data catalog provides Data Stewards with a dashboard that tracks and monitors metadata activity. Check the completion levels of your documentation, the most frequently accessed and searched for catalog items, the connectivity status of your catalog, and get smart recommendations on the sensitivity level and additional properties to add to your fields.

Organization, knowledge, transparency, scalability… a data catalog is tailored to accompany your Data Stewardship project!

Start a Data Stewardship program with Zeenea

Zeenea Data Catalog provides a metadata management solution that enables Data Stewards to overcome the challenges associated with handling increasingly large volumes of data. Our solution helps organizations maximize the value of their data by reducing the time spent on complex and time-consuming documentation tasks, and by breaking data silos to increase enterprise data knowledge.

Contact us now for a free and personalized demonstration with one of our experts:

Exploiting the value of Data Lineage in the organization: A user-centric approach

by Zeenea Software | Nov 14, 2021 | Data Catalog

In our previous article, we broke down Data Lineage by presenting the different lineage typologies (physical layer, business layer and semantic layer) and the different levels of granularity (values, fields, datasets, application).

In this article, we will present our matrix to help you concentrate your efforts and resources where the value of Data Lineage is strongest for your different teams.

Our business centered matrix

To fully understand Zeenea’s approach to data lineage, which is centered on the business teams within the company, please read our article on the breakdown of data lineage.

The different business profiles in the organization

We have categorized the populations who wish to leverage the value of Data Lineage in an organization into 4 broad categories:

IT: The engineers and architects responsible for developing and maintaining the infrastructure, flows and data applications.

Analytics: The teams in charge of analyzing data, building indicators, dashboards, reports, etc.

Business: All the people in charge of conceiving and working on the uses and functional applications around the data – project managers, product managers, business analysts, etc.

Compliance: The teams responsible for regulatory compliance, security, internal control, etc..

Added value of Data Lineage according to the business profile

The following matrix summarizes the value added of Data Lineage for the different combinations of typology, granularity and business profile.

This matrix, bearing in mind that the lineage from the upper level can be deduced from the lineage of the lower level, could tempt one to have lineage management at the field level as the objective: it is on this level that the benefits are the most obvious, and from here that lineage can be produced automatically at the levels above.

Of course, things are not that simple!

While there are many benefits to field-to-field line age, it has one major drawback: Its cost. Whatever the lineage layer being looked into, the production and maintenance cost will depend mainly on two variables: the volume (number of objects taken into account and number of links between them) and the ability to automate the retrieval and updating of this information.

On both these aspects, field-to-field lineage clearly presents the most unfavorable profile…

The limits of field-to-field lineage: huge volumes of information

Concerning the volume, it is easy to understand that the number of materialized fields in an information system, even a modest-sized one, easily reaches tens of thousands, if not hundreds of thousands or even millions. Maintaining the lineage information manually on such a volume of objects is not feasible. The only feasible solution is therefore automation on a large scale.

Limited automation capabilities

In theory, field-to-field technical lineage can be automated by inspecting the different processing stages, from the initial capture of the data to its final uses. In practice, this automation comes up against the very great heterogeneity of data integration and processing solutions. Some vendors offer solutions to perform these operations.

We confess: we don’t believe in those solutions, and for two reasons. First, reverse engineering is a delicate operation and its reliability cannot be 100% guaranteed. And secondly, the range of solutions and languages used in data pipelines is too vast, and the constant innovation in this field makes it difficult for a commercial solution to guarantee full coverage of all the technologies implemented in a given environment.

Field-by-field granularity is attractive, but out of reach in practice.

Our approach for optimized Data Lineage

The pivot: the physical layer at the dataset level

If we go back to the matrix presented above, it appears that the value of the lineage at the dataset level is very close to that of the lineage field-to-field.

For IT, business and analytical profiles, the value is in most cases very similar. The main difference arises with compliance. For most standards, the lineage documentation requirement relates to fields. But compliance does not apply to all data in the organization, only those that are considered critical data elements (CDE).

There are different types of CDEs – personal data, sensitive data, risk data, etc. But they have the advantage of constituting only a minute percentage of all the data, often a few dozen or a few hundred fields whose downstream or upstream lineage must be provided.

Going forward, here is the general approach we favor for the physical layer:

Focus the effort on the lineage at the dataset level and strive for the most advanced automation possible.

Associate datasets (and other physical objects on the same level) with the applications to which they are attached. This operation is generally easy to automate, globally stable over time, and can, at worst, be managed manually in the catalog.

Fill in locally with field-to-field lineage focusing on the CDEs – this can be automated (if possible), but can also rely on periodic review processes which are commonplace in regulatory frameworks.

Business and semantic layers of lineage

As for the other layers (business and semantic), the approach is significantly different. Indeed, in this area, automation is hardly possible. Therefore: business lineage and semantic lineage will probably have to be managed manually.

For business lineage, I propose a top-down approach. This means that the first task should be devoted to defining the business lineage at the application level. The datasets and fields contained in the applications will inherit this business lineage. We should also be able to define the business lineage at a finer level, but only when a use case justifies it.

For the semantic layer, things are a little different. Indeed, a specific effort is necessary to build the glossary. This (modeling) effort will be more or less important depending on the size of your data landscape, and the prior existence of models that can be imported or integrated into the catalog.

The natural anchor point of the semantic model on the physical layer of the lineage is at the field level. But again, automation is impractical – you probably don’t have a system that systematically references the meaning of each field in all your systems.

The association between the fields of the physical layer and the definitions of the semantic layer will therefore have to be done manually, which again represents a time consuming task if you want to do it thoroughly.

Conclusion

Data Lineage is a complex concept, which can be broken down in several layers (physical, business and semantic) and several levels of granularity (value, field, dataset, application).

The value of the lineage can also be represented in the form of a matrix that is very dependent on use cases, and the populations that exploit it. The cost of production and maintenance of lineage information is a function of the automation capacity and the volume of objects at the level considered.

To learn more about Data Lineage best practices, download our eBook: All you’ve ever wanted to know about Data Lineage!

Download

Breaking down Data Lineage: typologies and granularity

by Zeenea Software | Nov 2, 2021 | Data Catalog

As a concept, Data Lineage seems universal: whatever the sector of activity, any stakeholder in a data-driven organization needs to know the origin (upstream lineage) and the destination (downstream lineage) of the data they are handling or interpreting. And this need has important underlying motives.

For a Data Catalog vendor, the ability to manage Data Lineage is crucial to its offer. As is often the case however, behind a simple and universal question lies a world of complexity that is difficult to grasp. This complexity is partially linked to the heterogeneity of answers that vary from one interlocutor to another in the company.

In this article, we will explain our approach to breaking down data lineage according to the nature of the information sought and its granularity.

The typology of Data Lineage: seeking the origin of data

There are many possible answers as to the origin of any given data. Some will want to know the exact formula or semantics of the data. Others will want to know from which system(s), application(s), machine(s), or factory it comes from. Some will be interested in the business or operational processes that produced the data. Some will be interested in the entire upstream and downstream technical processing chain. It’s difficult to sort through this maze of considerations!

A layer approach

To structure lineage information, we suggest emulating what is practiced in the field of geo-mapping by distinguishing several superimposable layers. We can identify three:

The physical layer, which includes the objects of the information system – applications, systems, databases, data sets, integration or transformation programs, etc.
The business layer, which contains the organizational elements – domains, business processes or activities, entities, managers, controls, committees, etc.
The semantic layer, which deals with the meaning of the data – calculation formulas, definitions, ontologies, etc.

A focus on the physical layer

The physical layer is the basic canvas on which all the other layers can be anchored. This approach is again similar to what is practiced in geo-mapping: above the physical map, it is possible to superimpose other layers carrying specific information.

The physical layer represents the technical dimension of the lineage; it is materialized by tangible technical artifacts – databases, file systems, integration middleware, BI tools, scripts and programs, etc. In theory, the structure of the physical lineage can be extracted from these systems, and then largely automated, which is not generally the case for the other layers.

The following seems fundamental: for this bottom-up approach to work, it is necessary that the physical lineage be complete.

This does not mean that the lineage of all physical objects must be available, but for the objects that do have lineage, this lineage must be complete. There are two reasons for this. The first reason is that a partial (and therefore false) lineage risks misleading the person who consults it, jeopardizing the adoption of the catalog. Secondly, the physical layer serves as an anchor for the other layers which means any shortcomings in its lineage will be propagated.

In addition to this layer-by-layer representation, let’s address another fundamental aspect of lineage: its granularity.

Granularity in Data Lineage

When it comes to lineage granularity, we identify 4 distinct levels: values, fields (or columns), datasets and applications.

The values can be addressed quickly. Their purpose is to track all the steps taken to calculate any particular data (we’re referring to specific values, not the definition of any specific data). For mark-to-model pricing applications, for example, the price lineage must include all raw data (timestamp, vendor, value), the values derived from this raw data as well as the versions of all algorithms used in the calculation.

Regulatory requirements exist in many fields (banking, finance, insurance, healthcare, pharmaceutical, IOT, etc.), but usually in a very localized way. They are clearly out of the reach of a data catalog, in which it is difficult to imagine managing every data value! Meeting these requirements calls for either a specialized software package or a specific development.

The other three levels deal with metadata, and are clearly in the remit of a data catalog. Let’s detail them quickly.

The field level is the most detailed level. It consists of tracing all the steps (at the physical, business or semantic level) for an item of information in a dataset (table or file), a report, a dashboard, etc., that enable the field in question to be populated.

At the dataset level, the lineage is no longer defined for each field but at the level of the field container, which can be a table in a database, a file in a data lake, an API, etc. On this level, the steps that allow us to populate the dataset as a whole are represented, typically from other datasets (we also find on this level other artifacts such as reports, dashboards, ML models or even algorithms).

Finally, the application level enables the documentation of the lineage macroscopically, focusing on high-level logical elements in the information system. The term “application” is used here in a generic way to designate a functional grouping of several datasets.

It is of course possible to imagine other levels beyond those 3 (grouping applications into business domains, for example), but increasing the complexity is more a matter of flow mapping than lineage.

Finally, it is important to keep in mind that each level is intertwined with the level above it. This means the lineage from the higher level can be worked out from the lineage of the lower level (if I know the lineage of all the fields of a dataset, then I can infer the line age of this dataset).

We hope that this breakdown of data lineage will help you better understand it for your organization. In a future article, we will share our approach so that each business can derive maximum value from Lineage thanks to our typology / granularity / business matrix.

To learn more about Data Lineage best practices, download our eBook: All you’ve ever wanted to know about Data Lineage!

Download

What is Data Lineage?

by Zeenea Software | Sep 13, 2021 | Data Catalog, Data Compliance

In order to access and exploit your data assets on a regular basis, your organization will need to know everything about your data! This includes its origins, transformations over time, and overall life cycle. All of this knowledge can be gathered from Data Lineage!

In this article, we will define Data Lineage, give an analogy, and explain its main benefits for data-driven organizations.

After human resources, data has become the most valuable asset for business today.

It is the foundation that links companies, clients, and partners together. Knowing this, data must be preserved and leveraged as it contains all of an organization’s intelligence.

However, with great information, comes great responsibility for those who manage or use this data. On one hand they must identify the data that reveals strategic insights for the company, and on the other, they must appropriate the right security measures to prevent devastating financial and reputational consequences.

With the arrival of data compliance laws such as the BCBS-239 or the GDPR, the person in charge (usually the DPO) of data compliance must put in place transparent conditions to ensure that no data will be exploited to the detriment of a customer.

This is where Data Lineage intervenes. Behind the word lineage lies an essential concept: data traceability. This traceability covers the entire life cycle of the data, from its collection to its use, storage and preservation over time.

How Data Lineage works

As mentioned above, the purpose of Data Lineage is to ensure the absolute traceability of your data assets. This traceability is not limited to knowing the source of an information. It goes much further than that!

To understand the nature of lineage information, let’s use a little analogy.

Imagine that you are dining in a gourmet restaurant. The menu includes dishes with poetic names, composed of many more or less exotic ingredients, some of which are foreign to you. When the waiter brings you your plate, you taste, appreciate, and wonder about the origin of what you are eating.

Depending on your point of view, you will not expect the same answer.

As a fine cuisine enthusiast, you will want to know how the different ingredients were transformed and assembled to obtain the finished product. You will want to know the different steps of preparation, the cooking technique, the duration, the condiments used, the seasoning, etc. In short, you are interested in the most technical aspects of the final preparation: the recipe.

As a controller, you will focus more on the complete supply and processing chain: who the suppliers are, places and conditions of breeding or cultivation of raw products, transport, packaging, cutting and preparation, etc. You will also want to make sure that this supply chain complies with the various labels or appellations that the restaurant owner highlights (origin of ingredients, organic, “home-made”, AOC, AOP, etc.).

Others may focus on the historical and cultural dimensions – from what region or tradition is the dish derived or inspired from? When and by whom was it originally created? Others (admittedly rarer) will wonder about the phylogenetic origin of the breed of veal prepared by the chef…

In short, when it comes to gastronomy, the question of origin does not wait for a unique and homogeneous answer. And the same is true for data.

Indeed, with Data Lineage, you will have access to a real-time data monitoring tool.

Once collected, the data is constantly monitored in order to :

detect and monitor any errors in your data processing,
manage and continuously monitor all process changes while minimizing the risks of data degradation,
manage data migrations,
have a 360° view on metadata.

Data Lineage ensures that your data comes from a reliable and controlled source, that the transformations it has undergone are known, monitored, and legitimate, and that it is available in the right place, at the right time and for the right user.

Acting as a control tool, the main mission of Data Lineage is to validate the accuracy and consistency of your data.

How do you do this? By allowing your employees to conduct research on the entire life cycle of the data, both upstream and downstream, from the source of the data to its final destination, in order to detect and isolate any anomalies and correct them.

The main advantages of Data Lineage

The first benefit of Data Lineage has to do with compliance. It helps identify and map all of the data production and exploitation processes and limits your exposure to the risk of non-compliance of personal data.

Data Lineage also facilitates data governance because it provides your company and its employees with a complete repository describing your data flows and metadata. This knowledge is essential to design a 100% operational data architecture.

Data Lineage makes it easier to automate the documentation of your data production flows. So, if you are planning to increase the importance of data in your development strategy, Data Lineage will allow you to save a considerable amount of time in the deployment of projects where data is key.

Finally, the last major benefit of Data Lineage concerns your employees themselves. With data whose origin, quality and reliability are guaranteed by Data Lineage, they can fully rely on your data flows and base their daily actions on this indispensable asset.

Save time, guarantee the compliance of your data, make the action of your teams more fluid while inscribing your company in a new dimension, based on an uncompromising data strategy… Don’t wait any longer, get started now!

WhereHows: A data discovery and lineage portal for LinkedIn

by Zeenea Software | Apr 20, 2020 | Data Inspiration, Metadata Management

Metadata is becoming increasingly important for modern data-driven enterprises. In a world where the data landscape is increasing at a rapid pace, and information systems are more and more complex, organizations in all sectors have understood the importance of being able to discover, understand and trust in their data assets.

Whether your business is in the streaming industry such as Spotify or Netflix , the ride sharing industry such as Uber or Lyft, or even the rental business like Airbnb, it is essential for data teams to be equipped with the right tools and solutions that allow them to innovate and produce value with their data.

In this article, we will focus on WhereHows, an open source project led by the LinkedIn data team, that works by creating a central repository and portal for people, processes, and knowledge around data. With more than 50 thousand datasets, 14 thousand comments, and 35 million job executions and related lineage information, it is clear that LinkedIn’s data discovery portal is a success.

First, LinkedIn key statistics

Founded by Reid Hoffman, Allen Blue, Konstantin Guericke, Eric Ly, and Jean-Luc Vaillant in 2003 in California, the firm started out very slowly. In 2007, they finally became profitable, and in 2011 had more than 100 million members worldwide.

As of 2020, LinkedIn significantly grew:

More than 660 million LinkedIn members worldwide, with 206 million active users in Europe,
More than 80 million users on LinkedIn Slideshare,
More than 9 billion content impressions,
30 millions companies registered worldwide.

LinkedIn is definitely a must-have professional social networking application for recruiters, marketers, and even sales professionals. So, how does the Web Giant keep up with all of this data?

How it all started

Like most companies with a mature BI ecosystem, Linkedin started out with a data warehouse team, responsible for integrating various information sources into consolidated golden datasets. As the number of datasets, producers and consumers grew, the team increasingly felt overwhelmed by the colossal amount of data being generated each day. Some of their questions were:

Who is the owner of this data flow?
How did this data get here?
Where is the data ?
What data is being used ?

In response, Linkedin decided to build a central metadata repository to capture their metadata across all systems and surface it through a unique platform to simplify data discovery: WhereHows!

What is WhereHows exactly?

WhereHows integrates with all data processing environments and extracts metadata from them.

Then, it surfaces this information via two different interfaces:

A web application that enables navigation, searching, lineage visualization, discussions, and collaboration,
An API endpoint that empowers the automatization of other data processes and applications.

This repository enables LinkedIn to solve problems around data lineage, data ownership, schema discovery, operational metadata mashup, data profiling, and cross-cluster comparison. In addition, they implemented machine-based pattern detection and association between the business glossary and their datasets, and created a community based on participation and collaboration that enables them to maintain metadata documentation by encouraging conversations and pride in ownership.

There are three major components of WhereHows:

A data repository that stores all metadata
A web server that surfaces data through API and UI
A backend server that fetches metadata from other information sources

How does WhereHows work?

The power of WhereHows comes from the metadata it collects from Linkedin’s data ecosystem. It collects the following metadata:

Operational metadata, such as jobs, flows, etc.
Lineage information, which is what connects jobs datasets together,
The information catalogued such as the dataset’s location, its schema structure, ownership, create date, and so on.

How they use metadata

WhereHows uses a universal model that enables data teams to better leverage the value from the metadata; for example, by conducting a search across the different platforms based on different aspects of datasets.

Also, the metadata in a dataset and the job operational metadata are two endpoints. The lineage information connects them together and enables data teams to trace from a datasets/jobs to its upstream/downstream jobs/datasets. If the entire data ecosystem is collected into WhereHows, they can trace the data flow from start to finish!

How they collect metadata

The method used to collect metadata depends on the source. For example, Hadoop datasets have scraper jobs that scan through HDFS folders and files, reads the metadata, then stores it back.

For schedulers such as Azkaban, they connect their backend repository to get the metadata, aggregate it and transform it to the format they need, then load it into WhereHows. For the lineage information, they parse the log of a MapReduce job and a scheduler’s execution log, then combine that information together to get the lineage.

What’s next for WhereHows?

Today, WhereHows is actively used at Linkedin as not only a metadata repository, but also to automate other data projects such as automated data purging for compliance. In 2016, they integrated with systems down below:

In the future, Linkedin’s data teams hope to broaden their metadata coverage by integrating more systems such as Kafka or Samza. They also plan on integrating with data lifecycle management and provisioning systems like Nuage or Goblin to enrich the metadata. WhereHows has not said its final word!

Sources:

50 of the Most Important LinkedIn Stats for 2020: https://influencermarketinghub.com/linkedin-stats/
Open Sourcing WhereHows: A Data Discovery and Lineage Portal:
https://engineering.linkedin.com/blog/2016/03/open-sourcing-wherehows–a-data-discovery-and-lineage-portal

Learn more about data discovery solutions in our white paper: “Data Discovery through the eyes of Tech Giants”

Discover the various data discovery solutions developed by large Tech companies, some belonging to the famous “Big Five” or “GAFAM”, and how they helped them become data-driven.

download our white paper

TECHNOLOGY

SOLUTIONS

CAPABILITIES

APPLICATIONS

INDUSTRIES

DATA LEADERS

KNOWLEDGE HUB

PRODUCT HUB

ABOUT

GET IN TOUCH

SERVICES

BELIEFS

The Journey to Data Mesh – Part 2 – Assembling a Development Team & Data Platform for the Pilot Project

Building the pilot development team

Domain tooling: the data platform of the data mesh

The three planes of the Data Mesh platform

The Practical Guide to Data Mesh: Setting up and Supervising an enterprise-wide Data Mesh

The Journey to Data Mesh – Part 1 – Scoping your Pilot Project

Domain identification

Choosing an initial use case

The Practical Guide to Data Mesh: Setting up and Supervising an enterprise-wide Data Mesh

Zeenea Product Recap: A look back at 2023

Decrease data search and discovery time

A fresh new look for the Zeenea Explorer

New Homepage

New Item Detail Pages

New Filtering system

Easily browsing the catalog by Topic

Alternative names for Glossary Items for better discovery

Improved search performance

Microsoft Teams integration

Increase Data Steward productivity & efficiency

Automated Datasets Import

Orphan Fields Deletion

Building reports based on the content of the catalog

New look for the Steward Dashboard

Deliver trusted, secure, and compliant information across the organization

Data Sampling on Datasets

Powerful Lineage capabilities

Data Quality Information on Datasets

Enable end-to-end connectivity with all their data sources

Catalog Management APIs

Property & Responsibility Codes management

More than a dozen more connectors to the list

The top 5 benefits of data lineage

Benefit #1: Improved data governance

Benefit #2: More reliable, accurate, and quality data

Benefit #3: Quick impact analysis

Benefit #4: More context to the data

Benefit #5: Build (even more!) reliable compliance reports

5 essential Zeenea features for a five-star Data Stewardship Program

Feature 1: Universal connectivity

Feature 2: A Flexible & Adaptable Metamodel

Feature 3: Automatic Data Lineage

Feature 4: Smart suggestions

Feature 5: An effective permission sets model

Ready to start your data stewardship program with Zeenea?

How does a data catalog help companies implement successful Data Stewardship programs?

A data catalog’s objectives

The benefits of a data catalog for Data Stewards

Benefit #1: Maintain up-to-date documentation

Benefit #2: Ensure data quality

Benefit #3: Comply with data regulations

Benefit #4: Monitor data lifecycle

Start a Data Stewardship program with Zeenea

Exploiting the value of Data Lineage in the organization: A user-centric approach

Our business centered matrix

The different business profiles in the organization

Added value of Data Lineage according to the business profile

The limits of field-to-field lineage: huge volumes of information