ETL testing in QA role?
Hello to all QAs:
I would like to know what exactly is ETL testing? I know the concepts of extracting from source data, transferring to staging tables and then loading them into your final tables. However, what will a QA be involved in? What tools will he use? How will he use them? I have seen the other thread on here but was still unable to understand it. I will really appreciate all the help i can get.
She/he will attempt to find places where the final tables are loaded incorrectly, or too slowly, or by using too many resources, or whatever else the requirements require.
Originally Posted by NaitikT
First off, what Joe said.
Secondly, the ETL process loads the data somewhere. There might be some interesting tests surrounding whether the loaded data is actually usable or not. That may fall on your shoulders. It may fall on the shoulders of someone else on your team. Or, it could fall on the shoulders of a completely separate team.
"The single biggest problem in communication is the illusion that it has taken place."
-George Bernard Shaw, Irish playwright and Nobel Prize winner, 1856-1950
to add more details on Jason and Joes comments
Role of QA will be to validate
A. if target wareshouse schema is implemeted as per design
some techniques to test are
1. Verify all columns defined in schema design are present
2. Analyze data types of each column
3. In case of snowflake schema, verify if the foreign keys are propagated properly
4. How are slowly changing dimensions depicted
B.Verify if correct and complete data is loaded in the warehouse(as compared to source)
Various methods to ensure data completeness are
1. Row count check- Row counts must match between source and target
2. Control total- It is total of sum of any numeric fields which are used in calculations such as account balance, cost etc. Total of source tables is compared with total of target table. If any data has been lost or changed, a mismatch signals an error
3. Hash total- It is a total of several fields of data in a file, fields not normally used in calculations, such as account number. Hash total is calculated for the target and compared with the source. If any data has been lost or changed, a mismatch signals an error
1. Create test data so that every possible condition defined in transformation rule gets validated
2. Validate no data truncation should occur in each column
3. Verify all transformations based on STTM(source to target mapping )
4. In case of incremental loading verify if history tables are correctly populated, and if new data is consistent with old data
another technique is to compare sample data amongst source and target (say top 1000 rows)
D. If exceptions are handled correctly in ETL process
Along with this some non functional requirements can also be tested like
1. This mainly includes monitoring the time required for execution of packages and population of tables
2. Test the warehouse under extreme conditions; try executing procedures in low memory conditions
Infocepts technologies pvt Ltd
Last edited by shrutiC; 03-18-2013 at 11:02 PM.
HI, ETL means Extract the data from the different source systems and transform the data based on bussiness logics and load the data in target systems
What palagiri said: the "T" in ETL stands for Transform, not transfer. In addition to the testing outlined above for the loading process, you'll also be testing the transformations from legacy values to new values.
So for example, if you are ETLing from Business Unit 0001 in legacy to Company Code A0000 in the new system, you'll be wanting to test that all data has been both transformed correctly to the new codes, and also loaded against the new code in the new system.
hey nice information for me,thanks for sharing the information of the ETL testing in QA role. ETL plays the main role in the market and It has been experimental that self-determining Verification and Validation is in advance huge market possible and many companies are now seeing this as prospective business gain.
ETL testing is a unique kind of testing that the customer needs to have it accomplished for their determining and investigation of their business. This is for the most part utilized for the reporting purposes. Case in point if the customers need gives an account of the clients who utilize or go for their item taking into account the day they buy, they need to make utilization of the ETL reports.
Post examination and reporting, this information is information warehoused to an information distribution center where the old authentic business information must be moved.
This is a various level testing as the information from the source is changed into different situations before it achieves the last destination location.