Data Mining Tools Can Achieve Results When Used Correctly

Data mining is a very powerful tool, but actuaries need to exercise caution when building insurance risk models because of the potential limitations of data.

“I’ve heard data mining defined as ‘torturing the data until it confesses,’ and there is a lot of truth in that,” featured speaker Dr. John Elder IV, chief scientist, Elder Research Inc., told attendees of the Casualty Actuarial Society (CAS) Predictive Modeling Seminar in Las Vegas, Nev.

“Data mining is very powerful, but it is also very dangerous if you make mistakes. We have to be careful to guard against that,” he said.

Elder went on to outline the potential missteps in the field of data mining, focusing on ways that data mining can lead to wrong conclusions.

He observed that the first potential mistake that can be made is lack of data. “You make a mistake if you’re building a model for data and you’re defining relationships from the data, but you actually lack data, or the right kinds of data,” he said.

“You need labeled cases — cases where you have the answer. From studying those cases, you get into the general rules that can apply to similar but new cases,” he explained.

Elder cited the example of a project on contractor fraud that his company completed for the Government Accounting Service.

“There was an assumption that with all of the government contracts out there, there must be a lot of people defrauding the government. But they had only about 12 known examples of fraud out of tens of thousands of contractors.”

Elder noted that it took strenuous effort to go through those few known cases to look for similarities.

“But, it did a wonderful thing that data mining is very good at, in terms of the ‘needle in a haystack’ problem. It got rid of 90 percent of the hay with virtually no needles in that hay. We could then focus the attention on the cases that were most likely to include fraud,” he said.

Placing too great a focus on the training process is another mistake that can be made with data mining. Elder noted that models have to ‘live’ in the real world and need to work on new data that is similar to, but different than, the training data.

Another mistake is to rely on a single technique. Elder said: “There’s a saying “For a little boy with a hammer, all the world’s a nail.” For the little boy, there are a lot of things that a hammer could be used for. How many of us are guilty of being an expert with one method and seeing that method as the best?” he asked the attendees.

He suggested that actuaries should compare any new method they are introducing to a conventional one. “It’s best to use a handful of tools — not just a hammer, but a whole toolkit,” he said.

Another mistake is to ask the wrong question of the data. Elder noted that the wrong project goal or the wrong model goal can lead to this mistake.

“It’s a mistake to listen only to the data. The whole point of data mining is to learn from the data, not to have our preconceived notions dominate the discussion. We need to listen to the data and bring in external expertise,” he said.

Discounting pesky cases that could provide the answer or extrapolating, or going beyond the evidence to draw some kind of conclusion, are two other key data mining mistakes.

According to Elder, it’s also a mistake to answer every inquiry. “There should be some cases for which you don’t know. The uncertainty should be dependent on the question. Allow that. Don’t feel compelled to answer every inquiry,” he observed.

Lastly, Elder noted that it is a mistake to only believe the best model. “A model can be useful without being ‘correct’ or explanatory. Don’t pay attention to the particular variables used by the best model; it is best to build a distribution of models,” he said.

Nevertheless, Elder urged actuaries to learn from the challenges in the data mining process. “There is a saying that good judgment comes from experience and experience comes from making mistakes,” he said.

Source: Casualty Actuarial Society