Companies dispose of enormous amounts of data, ranging from operational data over financial data, to benchmarks, social media data, … This data is then transformed into comprehensive reports to inform strategic decision-making processes. To ensure the accuracy and reliability that is expected of these reports, we must first ensure that the data itself is correct, complete, and in the right format.
Equally paramount is the freshness of the data in the report. It should always be as recent as possible concerning the requirements. Failing data pipelines most definitely impact the freshness and therefore usability and credibility of the data in the report.
However, in reality, the integrated data in many data warehouses is of bad quality. This is mainly due to release problems as a consequence of poor quality of deliver, corrupt data, … The result is that the data becomes unusable. Data errors that find their way through the pipeline can harm the credibility of the data analytics report and negatively impact fast and correct decision making. It may even result in financial losses.
Automated Testing
An automated testing approach is preferable to a manual one, since it is much more time-efficient and cost-effective. Furthermore, this approach ensures a quick testing process, allowing for a timely detection of poor quality of the data warehouse.
We have build an automated testing framework and approach based on the the open source Robot Framework.