Business Network New York
Companies:51,220
Products and Services:2,879
Articles and publications:31,510 (+4)
Tenders & Vacancies:17

ETL Testing vs. Data Warehouse Testing – Is There a Difference?
Information may not be reliable

ETL Testing vs. Data Warehouse Testing – Is There a Difference?
6/23/2016
The ETL process, as you know, is a data pipeline where the throughput gets altered. Perhaps it is easier to think of it as a manufacturing assembly line.

ETL testing and data warehouse testing are two terms that are often used interchangeably. However, ETL is just one step in the everyday life of a data warehouse (well, okay, it is more likely to be a nightly procedure).  But what more is there to data warehouse testing besides ETL, and is it okay to still refer to the non-ETL part as ETL testing?  First, we have to understand what the ETL process entails.

 

The ETL process, as you know, is a data pipeline where the throughput gets altered.  Perhaps it is easier to think of it as a manufacturing assembly line. Extract (the E in ETL) involves the acquiring of components and getting them into your system.  Transform (the T in ETL) is the alteration of these components into their final form.  Load (the L in ETL) involves delivering the assembled end product, ready to be used for whatever it is to be used for. Like real-life assembly lines, source materials and their quality concerns are known prior to the process and may require inspection, mid-flow allows the progress to be monitored, and the completed end products are eligible for defect inspection.  The only real difference is that ETL is run more like a changeover process to the line, and the operators don’t need to wear protective gear or worry about personal injury.  But let’s get back on track and talk about how this applies to testing.

Logically, functional ETL testing by Qualitest would consist of Extracting, Transforming, and Loading, and non-functional ETL testing would test the performance of those actions. Problems during Load may in fact point to physical problems with hardware devices or physical storage capacity, which has created a push towards cloud-based implementation and its virtues of being virtual. Similarly, there could be load problems during backup recovery, which should exclude the transformation connected to the load process (not to be confused with the ETL load process). But there are more potential problems which fall outside but connected to the actual E, T and L processes.

First of all, before the extraction occurs, the extracting has to come from somewhere.  The variations of the types of those source data somewhere may lay at the root of problems encountered later on. Mid-transformation, a record could be flagged for being incapable of transformation – the question is, is that accurate? And has the business need been properly conceived and programmed?  Next, after the load, testing needs to occur to ensure that everything got loaded right. This post-load checking involves so much data that it is impossible to verify all of the entries – cherry picking sample data records must be involved, and the transformation must be considered when making the comparison. Possible data problems include curtailment, data caching, mistransformation, echoing the same record multiple times, and inability to transform (this should have been flagged during the initial attempted transformation). In addition, there are some problems that are unique to the initial load of the system.

On top of this, there are problems that can occur that are totally outside the extraction, transformation and loading processes. Even if a record is correct after it has been loaded, that does not guarantee that the stored record will not corrupt at a later point. It is also possible that the scheduling software that kicks off the backup and ETL processes may be defective, either in terms of frequency, event kickoff or expected function. The backup procedure may introduce some form of data corruption during transcription, an accidental transformation of data.  Or the backup may have a problem when being resurrected back into the system.

And let's not forget the reporting side. After all, the whole purpose of building a data warehouse is to be able to pull analyzed reports based on the raw data stored within, built in a way that should optimize the speed of your SQL queries and report-building. The scalability of the data warehouse should not pose a performance problem when generating reports; size should not complicate the speed in the way that databases built to optimize CRUD speed may suffer.

Lastly, on the non-functional side, the whole system could suffer from security problems.  The permission accesses may not function as expected.  The stored data, or processes that access the stored data may not have proper security.  Since data warehouses often contain medical records (which require HIPAA protection) and point-of-sale information, proper security may be necessary to prevent access to credit cards numbers, Social Security Numbers, birthdates, home addresses, and personal medical information.

Author Bio:

 

Scott Andery is an expert marketer and author who specialize in software testing tools and resources.

view all (2)

Other articles and publications:

CHOOSING YOUR HOSPITAL TEST AUTOMATION TOOL CHOOSING YOUR HOSPITAL TEST AUTOMATION TOOL
Information may not be reliable
Selenium deficits: It is not designed for testing desktop apps. It does not come with supporting tools, so you need to build out a framework and customize it which is a time drain (for instance..
7/28/2016

Articles and publications of other companies:

How to install avast?
Information may not be reliable
Obtained no Internet, yet required a new antivirus? Don't worry about this, just choose the one, download offline version and put it on your PC. For instance, Avast Offline Installer.
2/5/2020
The benefits of UK EPOS system is experienced by the owners prove how tough these regulations really are.
6/10/2016
Best Tips to Reduce Mobile App Development Cost Best Tips to Reduce Mobile App Development Cost
Information may not be reliable
Today, with technology it is very easy to reach out to people through mobile apps as almost everyone uses mobile devices in their everyday life for different reasons like communication, entertainment.
8/1/2018
The IT sector has benefitted a lot from DevOps innovation. DevOps Development involves creating a single system of systems that incorporate cultural and technical philosophies that regard development.
9/12/2018
Important Practices for DevOps in the Cloud Important Practices for DevOps in the Cloud
Information may not be reliable
Cloud computing and DevOps are two aspects of the technological shift which are completely inseparable.
10/12/2018
In the traditional customer service setup, there were customer service personnel taking calls and responding to customer queries on emails.
12/20/2018
Business details
Wifinotes. com is platform for technology information. You will get research papers, buyers guide and reviews of different tech in the world.
×