There are four basic ways to get data into your test environment: production sample, starting from scratch, seeding data, or generating it.
If your application is already in production and you need data for regression testing, the most common test-data acquisition technique is to take it from production. Production represents reality, in that it contains the actual situations the software must deal with and it offers both depth and breadth while saving the time required to create new data.
However, something to keep in mind is that capacity in a test environment is rarely the same as production. Therefore, an entire copy of production is not possible and you must ensure you’re getting the right sampling from production (data that will meet various testing needs with different data combinations).
Starting from Scratch:
This approach has the benefit of complete control — the content is always known and can be enhanced or extended over time, preserving prior efforts. Internal cohesion is assured because the software itself creates and maintains the interrelationships, and changes to file structures or record layouts are automatically incorporated.
But reconstructing test data also has its drawbacks.
- Without automation, it’s highly impractical for large-scale applications.
- Some files cannot be created through online interaction: they are system generated only through interfaces or processing cycles.
Because of the above limitations, it may not be possible to start from a truly clean slate.
Seeding test data is a combination of using production files and creating new data with specific conditions. This approach provides a dose of reality tempered by a measure of control.
Using a test data generator tool, automatically generated test data can be used to create databases containing defined cases and conditions as well as enough information to approximate real-world conditions for testing capacity and performance.
If you are doing a large scale response time test based on a database design can support millions of customers or billions of transactions and still deliver acceptable response times, generation may be the only practical means of creating that data.