Fishing the Data Pool
(Note - I previously published this article in Marketing News - Jan 1993)
In a field where great emphasis is currently being placed on manners in which qualitative data can be quantified, it may seem quite unusual to suggest that many "big fish" (valuable marketing "treasures") could be tracked, netted and reeled in using a converse technique: qualitatively exploring quantitative data.
As a marketing researcher, teacher of statistics, and psychologist, I have been consulted on many quantitative studies which are incompletely analyzed. Valuable relationships "swim" in the "pools" (sometimes "oceans") of data formed by previously conducted tracking studies, omnibus studies, and custom quantitative research of all types (Penetration Studies, U & A's, Home Use Tests, Ad Recall and Persuasion Tests, Concept Tests, Diary Panels, etc). These fish don't swim that far below the surface; the problem is that many fishing holes remain untried, and the proper rod, hook and bait are frequently not used.
The research scenario is familiar: a study is designed and information collected to answer a particular marketing problem, or to track results of an ongoing campaign. In either case, the particular statistical analysis to be performed on the data is usually predetermined (frequently not much more than cross-tabs and correlations).
And there are good reasons for these procedures, for, (as even many experienced marketers do not realize), conducting numerous unplanned statistical tests can seriously jeopardize the validity of your findings. That's because there's a certain chance of error inherent in every single test. So the more you run, the more likely you are to find something that really isn't there (like catching an old tire that seems like a big fish).
There are ways to control for this (e.g by becoming stricter about how much chance of error you are willing to accept in each test), but nevertheless, once you begin running a lot of unplanned tests, your results become less actionable.
So, most exploration of quantitative data starts and ends with the particular tests and runs designed to solve the particular problem the study was commissioned to solve. Only a few precise spots in the pond/ocean are fished, and only with one particular kind of rod & bait. The rest of the water is left alone because of the fear of mistakenly feeding management a tough old shoe.
But there is another angle. "What's wrong with fishing the whole pond with all the different tools you have, if you admit that's what you're doing?". In other words, when you have reached the point in the quantitative analysis of your data where you have answered the questions you set out to answer in a statistically precise & actionable manner, why not continue exploring the data for the purposes of developing hypotheses and generating new ideas?
You could consider such fishing expeditions simply as another weapon in the qualitative research arsenal. This technique could be used to it's fullest extent so long as the results were interpreted in a qualitative perspective (as preliminary hypotheses in need of further testing). The benefits of doing this are similar to the benefits for all qualitative research. We accept a certain degree of error in exchange for the possibility of more thorough understanding of the issues. In focus groups, for example, we concede the possibility of biased results due to the influence of group members upon each other, unstructured questioning, small samples, and many other factors. Yet we continue to conduct this research because we feel it provides a certain "richness" of information and gives us new ideas -- it puts "flesh on the bones". We attempt to minimize the degree of error introduced by the above mentioned elements by precise recruiting and screener development, using moderators skilled in asking unbiased questions, and encouraging individual participants to express their opinions even if they are different from the general consensus in the group. We acknowledge that the results of these sessions will require further testing before we consider them actionable.
Similarly, we can obtain valuable, deeper insight from our quantitative data with the statistical fishing trips discussed above. We can accept a certain degree of error introduced by running numerous tests and violating some statistical assumptions. We can minimize error by tightening the criteria we use for reporting significance, and by avoiding tests which are unlikely to yield practically valuable information. We can acknowledge that the results will require further testing. In effect, qualitative exploration of quantitative research is kind of like doing a focus group with data.
Some examples should illustrate the value of this technique.
Sometimes a whopper can be lurking just under the surface in even the simplest of studies. A client in the office supply business conducted a large national incidence check to determine the characteristics (length of employment, education, position in the office, etc.) of the person who made decisions on high ticket items. Simple frequency tabulations and t-tests answered that question with ease. However, since the survey also asked who the vendor of preference was in each particular office, it was possible to embark on a more in-depth "fishing" expedition to determine factors predictive of vendor preference.
An extremely important finding emerged in a 3-way interaction between city, position in the office, and number of years of employment (several dozen ANOVAs were run to hook this fish). In a particular city it was found that among managers who had held their job for less than X years, the incidence of preference for this particular vendor was extremely small! (Their market share was closer to 50% overall, and not far from that in this city!).
The client was very enthusiastic about these findings. Apparently, exactly X years ago the client had all but totally withdrawn their advertising in that city's newspapers. They reasoned that they had strongly established themselves there and that time/money would be better spent focusing on other cities (they supported this position by pointing to the overall market share in that city). However, the results of this "fishing" trip led them to strongly reconsider this view point and to further investigate the matter.
Another client had a large, ongoing study, and had accumulated thousands of lengthy interviews. A massive expedition was undertaken which included many multivariate techniques and much data collapsing and recombining. An extensive series of factor analyses revealed that one of this client's main brands was evaluated along a different dimension than the five major competitive brands. This dimension represented a strength in the product of which the client had previously been unaware. Factor scores were dichotimized to represent this perceived strength, and a C.H.A.I.D. analysis identified several large segments of users where awareness of the strength was greatest, and several more where awareness was weakest.
These segments were subjected to further study in focus groups in order to better understand how awareness of the strength came about, and what language to use to heighten awareness (and increase sales). Although these kinds of findings are very exciting and potentially valuable, it should be reiterated that they should be reported with qualitative validity only; like the results of a focus group. This is because of the potential for introducing error based upon 1) too many statistical tests being run, 2) violating some statistical assumptions, and 3) findings derived from small bases. A qualifying statement approximating the following will usually suffice:
"Findings reported herein may be based upon small segments, and/or methods of data exploration which may diminish statistical power and generalizability. Although still potentially valuable, they should be considered valid as a hypotheses only, which should be subjected to further quantitative testing before being considered actionable."
IN SUMMARY:
When data is collected to solve a particular marketing problem, valuable relationships can frequently be found within the data that are not necessarily related to the problem the study was commissioned to solve. Because of the methods necessary to uncover this data, it is probably best to consider the impact of such relationships the same way one would consider results from qualitative exploration of marketing issues -- as hypotheses to be validated at a later date. To return briefly to the fishing analogy, we're really just hooking the fish with this exploration -- additional quantitative work is necessary to reel them in and be sure they really are fish and not just driftwood.
These qualitative prizes are more or less routinely ignored. This is surprising because such exploration is usually less expensive than other avenues of qualitative research, and produces results which are at least as valuable (if not more)!
The practical reasons "fishing" is not done more frequently appear to be the unique combination of skills needed to effectively "fish" in a database. Although modern statistical software greatly reduces the time and energy requirements of "fishing", a good analyst must possess all of the following in order to effectively explore a database:
- Broad and advanced knowledge of statistics: how to use multivariate techniques, how to combine and create new variables, which techniques to use on what kind of data, how to effectively collapse data, etc. This is like the seasoned fisherman who carries a sophisticated tackle box of hooks, feathers and different kinds of bait. He knows which ones to use to catch the type of fish he wants.
- Extensive knowledge of marketing in general and of the category he or she is working in, in particular. Different kinds of fish reside in different waters, and different kinds of marketing insights reside in different portions of a database. The good statistical fisherman will know not only what kinds of insights (fish) he is after, but will also have a sense of:
- Where the good fishing spots may be: or, what relationships to examine.
- How much time to spend there: how many tests to run.
- Which fish are too small and should be thrown back: what relationships are not practically valuable although perhaps statistically significant.
- How to track the movement of a school of fish: where to look next once an important finding is uncovered.
- Excellent computer skills: so that the implementation of the above can be accomplished with relative ease, and without extensive timely correspondence between the researcher and his/her M.I.S. department. This is like the fisherman's ability to cast, hook, and reel in the catch without expending unnecessary time and effort.
- The personality traits of patience, perseverance, and stamina (just as with the fisherman).
As statistical software becomes more and more powerful, and less and less difficult to use; and as more and more researchers become familiar with advanced statistical techniques, the above obstacles will be diminished. Until then, those juicy morsels under the surface will continue to remain hidden in the vast data pools of information, only to be discovered by the rare researcher and/or fisherman who make a special trip to find them. Enough of this now, I better get some sleep. I'm going fishing in the morning!
Article Notes
Some dramatic license has been taken and facts altered slightly to protect confidential client information.
It is noteworthy that the particular number of years referred to here did not exist as a predefined category within the survey. C.H.A.I.D. analyses performed earlier in this fishing expedition indicated that a grouping of six years was predictive of other variables in that city, so the variable was collapsed as such.