Data lineage and impact analysis can be applicable in several ways.
- As an ETL Developer:
-
- There are changes in my source system, such as fields which are added, deleted and renamed. What parts of my ETL processes need to adapt? (Impact Analysis)
- I need additional information in my target system, such as for reports. What sources are can provide this additional information? (Data Lineage)
- As a Data Steward:
-
- There is a need for auditability and transparency to determine where data is coming from. A global, company-wide, metadata repository needs data lineage information from different systems and applications, i.e. very fine-grained metadata.
- What elements (fields, tables, etc.) in my ETL processes are never used? How many times is a specific element used in some or all of my ETL processes?
- As a Report/Business User:
-
- Is my data accurate?
- I want to find reports which include specific information from a source, such as a field. This process is "data discovery." For example, are there any data sources which include sales and gender? Are there any reports which include sales and zip codes?
- As a Troubleshooting Operator:
-
- The numbers in the report are wrong (or supposed to be wrong). What processes (transformations, jobs) are involved to help me determine where these numbers are coming from?
- A job or transformation did not finish successfully. What target tables and fields are affected which are used in the reports?
- As an Administrator:
-
- For documentation and auditing purposes, I want to have a report on external sources and target fields, tables, and databases of my ETL processes. I need the data for a specific date and version.
- To ensure compliance, I want to validate naming conventions of artifacts (fields, tables, etc.)
- For integration into third-party data lineage tools, I want a flexible way of exporting the collected data lineage information.