I am creating automation scripts such that it can handle data say upto 100 employees. But the client is demanding that it should be run with their live database which has enormous data (say around 10000 employees). Which is better whether to run on sample data or live data?
(I am thinking running on live data can create unnecessary complication of handling such huge data)
1) What are you trying to accomplish with these scripts?
Are you automating your functional testing? If so, why would you run them on a live database? Why wouldn't you run them on a test database?
2) Are your scripts constructed in a data-driven manner?
Doing this often means separating the script actions for the data. And that usually means that it doesn't matter if you have 100 or 10000 sets of input data - you would still be processing one set at at time.
3) What are your clients' expectations?
If they "demand" running with 10000 employees, are they really asking you for a load test? Are they just trying to see if you can handle the wide variety of the data in their full set?
For me, the vast majority of my functional testing would be performed with smaller test data sets - constructed specifically for the purpose of testing the features.
Then as the project nears completion, I usually use some form of the full data set (with personally-identifiable data removed) to ensure the system can handle the size and variety of "real-world" data. I might copy this on to the test system, clean out the identifiers, and use it as a useful set of test data.
(And for many of us, 10,000 would not be "huge" at all.)