May 01, 2003

Conducting Sensory Research with Children

Sensory testing with children can provide insight into their likes and dislikes, and sensory professionals need to use methods appropriate for different age groups.

RICHARD POPPER, JEFFREY KROLL

Food Technology Magazine

There is no argument that kids have become one of the largest markets in many parts of the world. While it may be difficult to put a finger on any precise amount, the purchasing influence of kids has been estimated at $300 billion in the United States alone. Food and beverages represent as much as 60% of that impressive youth market. Small wonder that food and beverage manufacturers are continuously scrambling to discover what tickles the palates of kids and teens.

Children’s Behavior, Likes, and Dislikes
Today’s kids have more choices and are more in control of their diet than ever before. Parents, in many cases, are more than ready to give in to what their kids want, especially as their kids grow older.

For the child, the power to choose can be confusing and conflicting. A choice might be made to express real personal preference, to exercise control of themselves and their environment, or to be viewed as older and more mature. Urbick (2002) suggests that another key driver of a child’s behavior can be the craving to create excitement and stretch boundaries. He says the stated desire of youths is to push the limits and avoid “the same old thing.”

While children’s need for self-expression and marketers’ wish to capture their attention are important variables in kids’ food choices, early childhood reveals a more basic interplay of nature and nurture in the development of food likes and dislikes.

Some aspects of food preference are innate. We are born with a liking for sweet and an aversion to bitter. The fact that these preferences are hardwired in our brains makes evolutionary sense—in scavenging for food, our ancestral forebears would have been guided by their food preferences to seek foods high in caloric energy (such as sweet fruits) and avoid bitter, potentially poisonous plants. Sour tastes are also rejected by newborns—they grimace when tasting sour substances. A genetic disposition towards liking salt has not been so clearly established—newborns are indifferent to salt, but infants at four months show a liking for moderate levels of salt, possibly the result of a natural maturation process.

While basic tastes such as sweetness and bitterness may be intrinsically pleasant and unpleasant, preferences for specific foods are largely learned. The diversity of world cuisines attests to the role of culture and environment in shaping what we like to eat (Rozin, 1984). Exposure, in and by itself, plays a key role in the acquisition of food preferences. We like to eat foods with which we are familiar. Conversely, unfamiliar foods are often rejected, a culturally universal phenomenon referred to as food neophobia (Pliner, 1982).

--- PAGE BREAK ---

Zajonc (1968) was the first to identify the “mere exposure effect” across a variety of domains, demonstrating that simply the repeated exposure to a stimulus (such as a sound or shape) can enhance liking. In a study on the food preferences of young children, Birch (1979) found that familiarity was a key factor in explaining what foods children liked, and Birch and Marlin (1982) demonstrated experimentally the “exposure effect” in the development of food preferences. In the latter study, the authors observed the change in food preferences among a group of two-year-olds as they were exposed to unfamiliar foods over a period of several weeks. How often the child experienced a food determined how much the child eventually liked it. Children who were given frequent opportunity to taste a food over the course of the study grew to like it, unlike those kids who were offered the same food less frequently.

The importance of familiarity has several implications for companies trying to introduce new food products to the market. The fact that repeated exposure is likely to build acceptance (rather than breed contempt) means that ensuring repeated exposure can be a key marketing strategy in introducing a new food product. Urbick attributes Procter & Gamble’s successful introduction of Sunny Delight into the UK market to its decision to offer quantities of free product, giving families the opportunity to try the product lots of times.

The role of exposure in developing food preferences also has implications for how novel foods are tested. Most sensory research protocols expose a child only once to a novel food, usually providing only a small sample, not even a representative portion of the product. Urbick has pointed out that new foods may require repeated testing to accurately assess the product’s true potential.

Meanwhile, marketers should be conscious of the need to balance novelty and innovation (something likely to give them a competitive edge) with children’s propensity to prefer the familiar. Urbick suggests that by combining familiar and unfamiliar elements in the same product, marketers may be able to achieve both goals. The flavor may be familiar, but the product’s color may be unexpected (e.g., green ketchup) and the packaging innovative.

The role of peers has often been noted as a key influence on what children like. A study by Birch (1980) provides experimental evidence for the role of peers in children’s food preferences. According to Birch’s research, three- to five-year-old children will change their preferences depending on what they see other children eat, e.g., choosing vegetables that they initially didn’t like after seeing other children eat them. This behavioral change was not just the result of momentary peer pressure. The shift in food choice was also reflected in liking ratings collected weeks after the experiment and in the absence of any peers, suggesting that the change was relatively long lasting and reflected a true change in preference.

Again, there are implications of peer influence for marketers as well as sensory researchers. The fact that children influence other children, even at a very young age, suggests that finding ways to leverage peer influence can be an important element in growing the market for a new product. For example, as part of a grassroots marketing campaign, a company might enlist cool kids to champion their products.

The researcher must make a decision on how to handle the potential for the interactions among children. The purist approach, often advocated by researchers, is to attempt to minimize such interactions to assure obtaining an unbiased opinion. While the merits of this approach are self-evident to many, Urbick, who is a proponent for testing in schools, believes that capturing the peer influence in a study can be advantageous and indicative of the real world. Probably all would agree that peer influences in a research setting must be carefully managed. Hemingway (2002) points out that testing in school environments poses many challenges, not the least of which is that the existence of friendship relationships and other social structures may not be immediately apparent to the researcher and may unduly influence the study results.

--- PAGE BREAK ---

Designing Research with Children
It is always important, in designing research with children, to ask what information we desire to obtain from children and what information children are capable of providing. The Swiss psychologist Jean Piaget is well known for his description of the stages of a child’s cognitive and linguistic development. Gollick (2002) describes some of the limitations of children that may affect their ability to answer research questions at any particular age. Young children, for example, have difficulty with concept formation (e.g., sweetness) and classification (e.g., like/dislike). Even when they understand the principles, their attention span may limit their ability to perform the task. For example, 3½-year-old children can understand a standardized sorting task, but only about half the children may have the attention span to remember the assignment and successfully complete the task.

“Seriation,” the ability to rank things in order of magnitude, is not fully mastered until age seven, according to Gollick, and this has implications for the reliability of any scaling results from younger children. In addition, children have limited memory skills, which may affect their ability to remember a succession of flavors presented for evaluation in a sensory test.

Young children also have limited linguistic skills, which will affect their ability to understand directions, and can have difficulty with the abstract nature of symbols or pictures. For example, children may respond to pictures, such as smiley faces often used in children’s hedonic scales, based on what they show (a happy person), rather than based on what they are supposed to represent (how the food makes you feel). Gollick also notes the difficulty that children under six have in attending to more than one aspect of a situation at one time. For example, when viewing two rows with an equal number of pennies, young children will easily see that the number of pennies is the same, provided the pennies are perfectly lined up. However, when one row of pennies is then spread out, children judge the row with the spread-out pennies to have more pennies, showing an inability to separate numerosity from linear extent in making their judgment. In judging foods, young children may attend to one dimension at the expense of another, unlike older children or adults, who may base their reaction on a simultaneous consideration of multiple aspects.

In school-age children, reasoning ability, memory, and language skills are more mature and allow for more complex tasks. However, there is tremendous variation in skills among children of the same age. Gollick’s experience with cognitive testing has shown that the age at which 10% of children can master a particular task, compared to the age at which 90% of children can do so, varies by as much as four years. Thus, assumptions regarding what a particular age group can do are often going to be true only approximately, and researchers need to take into account the considerable variation in children’s abilities, even at similar ages.

Given the cognitive and linguistic limitations of children at any given age, it is not surprising that research on children has focused attention on what test methods are most appropriate for different ages (see review by Guinard, 2001). The younger the age group, the more challenging it is to devise valid, reliable test methods. Therefore, when products are expected to appeal to a wide age range, it is often convenient to test older children (above age 12), who require far fewer special considerations compared to adults than younger children. However, when the target age for the product is specifically younger children, it may not be appropriate to focus on the older age group.

--- PAGE BREAK ---

The taste preferences of newborns and infants have been studied using behavioral measures (e.g., Beauchamp and Moran, 1984). Using a procedure adapted from the baby food industry, Bovell-Benjamin et al. (1999) obtained data on food preferences of infants and toddlers by asking mothers to interpret the behavior of their child as the child tasted the food. Mothers rated their child’s reaction on a traditional (adult-version) 9-point hedonic scale. Using this methodology, the authors were able to draw conclusions regarding the relative acceptability of different fortifications added to porridge.

Testing of children age three and above allows for more-direct methods. Since consumer testing with kids is mostly concerned with measuring a child’s liking of a product, it is especially important to know what the most appropriate hedonic methods are for testing with kids. Kroll (1990) introduced a liking scale for testing children age five and older. It is similar to the traditional 9-point hedonic scale, except that the verbal anchors associated with the scale are more child-friendly—instead of using terms such as “like extremely” and “dislike extremely,” for example, it employs the terms “super good” and “super bad.” This so-called Peryam and Kroll (P&K) scale, now widely used in the industry, was shown to perform better than the adult scale (and better than a variety of alternative children-oriented scales) in determining liking among children.

While older children may be able to use verbal scales effectively (even if they may require some interviewer assistance), there is still no consensus on how best to test pre-schoolers. Pictorial scales continue to be popular in testing pre-literate children, based on the rationale that pre-schoolers cannot read and may not fully understand complex words or phrases but can more accurately deal with facial expressions. Besides, pictures are entertaining and are thought to inspire closer attention to the task. Kroll (1990), however, found that a 9-point face scale actually discriminated less well than the 9-point “super good/super bad” scale.

One of the reasons that face scales may be difficult for children may be that the faces themselves may be ambiguous. In her discussion of pictorial scales, Cooper (2002) makes the point that a face intended to show a degree of “dislike” can be interpreted by a child as saying “I am angry,” whereas the child might not feel anger in response to the food but might dislike it. Cooper has found that the eyes and the mouth are particularly important to the interpretation of the facial expression and are more likely than other elements to lead to misinterpretations of the scale unless carefully chosen.

Despite some of the difficulties with pictorial scales, Cooper has had success in applying them, and even extending them to measuring not just acceptability but also sensory intensity. Working in conjunction with a graphic artist, she has devised scales for measuring level of fruit or chocolate flavor, as well as more conceptual attributes, such as stickiness.

Cooper also reports on her research on devising pictorial scales for testing in different cultures. Working with Asian and Pacific Island groups of children, she has attempted to create culturally relevant acceptance scales. She notes the difficulty of judging cultural relevance from a Western perspective, and the importance of developing the scale from within a culture, piloting it to ensure appropriateness. Certain expressions are appropriate in some cultures, but not others (e.g., showing the tongue to indicate liking is inappropriate for Thai and Malay children).

--- PAGE BREAK ---

Two Preschooler Methods Studied
Popper et al. (2002) compared two different methods for measuring liking among preschoolers. They chose not to include a face scale because of some of its potential problems, but instead chose ranking and rating as two methods for eliciting liking judgments. They also assessed the effect the interviewer has on preschoolers’ responses. Preschool-age children are pre-literate and must by necessity be interviewed one-on-one. Typically, research personnel (usually female) serve as interviewers. The study compared using the child’s mother as interviewer to using a female interviewer who was not familiar to the child. On the one hand, children might feel more comfortable with their Mom than with an unfamiliar interviewer, making it possible for them to better focus on the task. Children might also feel more at ease telling Mom they don’t like something, as compared to a researcher, who they might feel expects them to like what they taste. On the other hand, Mom is untrained at interviewing and could introduce her own biases into the test.

The study involved three different formulations of a powdered orange drink, formulated with 0, 30, or 100% of the recommended sugar level. The respondents were 206 children age 3–5, about an equal number of boys and girls.

There were two interviewer conditions—children were either interviewed by their Mom or by a trained female interviewer (a member of Peryam and Kroll’s research staff). When interviewed by the research staff, Mom was not present in the room with her child. When Mom acted as interviewer, she did not taste the products herself, only her child did.

In each interviewer condition, a child evaluated all three orange drinks, using two different procedures. The order of procedures was alternated across children, and children were never explicitly told that the three formulations were repeated.

In the ranking procedure, children first tasted all three products and selected the one they liked best, which was then set aside. Children then retasted the remaining two samples, and selected the one they liked better. The third sample, by default, was the one they liked least.

The other procedure was a rating task using a bifurcated 5-point scale. In this procedure, the child was first asked if the sample was “good” or “bad,” and, depending on the answer, was then asked whether the sample was “really good” (or “really bad”) or “just a little good” (or “just a little bad”). If the child had trouble committing to whether the sample was good or bad, the answer was recorded as “neither.” Only a few children had trouble expressing whether they liked or disliked a sample—even the three-year-olds were quite outspoken and opinionated.

A comparison of the test methods showed that the ranking and rating procedures produced very similar results—both methods showed significant differences in liking among all three formulations. Not surprisingly, children liked the sweetest formulation the most, regardless of interviewer condition or test procedure.

Perhaps the most interesting finding concerned the effect of the interviewer. The chart on p. 64 shows the difference in liking ratings when Mom was the interviewer compared to when the unfamiliar researcher was the interviewer. When Mom did the interviewing, the average ratings for the three formulations spanned a larger range than when the researcher did the interviewing, and the differences were more likely to be statistically significant. This effect varied by age—the benefit of Mom as interviewer was evident at ages three and four, but was largely absent by age five. The ranking data showed similar interviewer effects.

While this study showed that using Mom as the interviewer can increase the sensitivity of the test, especially when working with three-year-olds, it is important to keep in mind that this study posed little risk that Mom could introduce her own biases about the products tested. All formulations looked the same, and Mom was intentionally asked not to taste the samples. In situations when the appearance of the samples might suggest something about the quality or nutrient content of the foods, or when brand information is provided as part of the test, Mom’s role in the interview would need to be reevaluated.

Simple methods of scaling, such as the ranking procedure or the bifurcated liking scale, appear to work well with pre-schoolers. Children age five to seven are probably capable of longer scales (7- or 9-point), as some studies have demonstrated (Kroll, 1990; Kimmel et al., 1994). At that age, kids will still require assistance, and Mom is probably best for that purpose. With children over eight, the test can be self-administered, although the research staff will on occasion need to assist the children.

--- PAGE BREAK ---

Differences Between Kids and Adults
There is indication (see Guinard, 2001) that sensory thresholds are higher in children than adults, suggesting reduced sensory sensitivity. Zandstra and de Graaf (1998) report that in children age 6–12, the perception of sweetness increases less rapidly with increasing sucrose concentration than in older children and adults, although the same was not found with regard to the perception of sourness in response to changes in citric acid. Of course, any determination of threshold sensitivity and supra-threshold perception is complicated by the fact that measuring sensory perception in children is difficult—differences between adults and children in sensory perception may reflect, in part at least, differences in how children interpret the questions they are asked and in how they use the intensity scales on which the research is based.

How do children and adults differ in their preferences? It is well documented that children prefer sweeter foods than adults, a difference which seems to diminish in mid- to late adolescence. Also, as many parents of young children know, and as Urbick (2002) reminds us, children like simple, smooth textures. Even the appearance of bits (e.g., specks, flecks) can turn young children off.

While the prevailing influence of culture is bound to lead to many broad similarities between the food preferences of adults and kids (Pliner, 1983; Rozin and Millman, 1987), it is also the case that the process of optimizing a product for children often results in a formulation that differs from one that is optimal for adults—and from one that adults think would be optimal for children (Moskowitz, 1994).

Other Sensory Research Methods
Swaney-Stueve (2002) undertook the challenge of attempting to conduct descriptive analysis with children. Using six different brands of creamy peanut butter, she trained four sensory panels differing in the age of the panelists—a panel of fourth graders (ages 9–10), eighth graders (ages 13–14), high school students (age 16–18), and a college-age panel (ages 18–22)—and compared their results with those she obtained using an experienced adult panel (ages 24-56). Effect of interviewer (Mom vs. Peryam & Kroll researcher) on children’s liking ratings (5 = “really good”) for powdered orange drinks differing in sugar concentration. Means sharing a common letter are not significantly different from one another (p < 0.05). From Popper et al. (2002).

Surprisingly, even the fourth graders performed exceptionally well throughout the process. The fourth-grade, eighth-grade, and high school panels generated many terms similar to those generated by the college-age and experienced adult panels, and all panels were quite consistent in how they rated the differences in appearance, flavor, and texture among the products (although the fourth-grade, eighth-grade and high school panels tended to agree more with each other than with the adult panels). In some respects, among the three children’s panels, the fourth graders were the most consistent and reliable.

Since the products were perceived similarly across different ages, the results support the current practice of using adults for the descriptive analysis of children’s products. It is uncertain how generalizable this similarity of adults and kids is—as noted above, children’s sensory perceptions differ in certain respects from those of adults, and children may differ from adults in the importance they place on various product dimensions (e.g., attending to appearance more than flavor). These age differences may not always yield to panel training and the use of reference standards.

But Swaney-Stueve’s results are also noteworthy because they show that even 10-year-olds are capable of performing sensory tasks that are cognitively quite demanding. Crucial to the success of such tasks is providing training and keeping the children engaged. Urbick makes a similar point with reference to conducting his creative discovery work with children—keeping kids involved in the process and excited about working on a real project are key elements of success.

In addition to quantitative testing, qualitative research with children is also a very important tool in product research. Hemingway (2002) identifies several qualitative techniques that she has found worked well with children, such as drawing (where/how the child figures in the picture), collages (children group material provided to them into montages), concept boards (identify which icons are cool or gross for each age), and story telling (the child tells a story or finishes one off).

--- PAGE BREAK ---

Importance of Test Conditions
Part of making any sensory testing with children successful includes not only using age-appropriate measurement scales, instructions, and wording of questionnaires, but also making the child feel comfortable in the research environment (e.g., the benefit of using Moms with preschoolers). The research staff needs to create a friendly and inviting atmosphere. Things to avoid, according to Hemingway, are an authoritarian style, criticizing comments that kids make, or using a hands-up or other classroom-style behavior (especially in a qualitative research context). Hemingway (2002) and others (e.g., Kimmel et al., 1994) have argued for the merits of warm-up exercises as a way of introducing children to the research methods employed.

Several researchers have mentioned the importance of time of day in children’s research. According to Gollick (2002), depending on the time of day, a child’s IQ score on a standardized test can vary by as much as 30 points. Urbick (2002) advocates conducting consumer tests with children in the morning, when kids are most alert, and avoiding the after-school hours, when children are mentally tired and need unstructured playtime and a chance to be physically active.

Another aspect of time of day relevant to sensory testing is whether it is appropriate to test morning foods (e.g., breakfast cereals) at times other than the morning, and lunch/dinner foods at times other than around lunch or in the afternoon. Birch et al. (1984) showed that children as young as three years old have already learned to categorize foods as “for breakfast” or “for dinner.” In their study, both adults and children were asked to taste breakfast and dinner foods at two different times of day, and showed a significant preference shift for foods with time of day, with breakfast items being more preferred when tasted in the morning than in the afternoon, and dinner items more preferred in the afternoon than in the morning (the shifts, however, were larger for adults than for children).

The issue of how time of day affects test results is interesting and deserves further research in children. In contrast to the results of Birch et al. (1984), Kramer et al. (1992), working with adult subjects, did not find an effect of time of day on liking ratings or food intake, when testing breakfast and lunch foods at different times of day. It has also yet to be determined whether time of day affects the relative differences in liking among similar formulations of a product (e.g., several versions of the same breakfast cereal), in addition to affecting the level of liking for very different foods (e.g., cereal vs pizza). Most sensory tests are concerned with relative differences among similar products, and the appropriateness of time of day may be less of a factor in that situation.

Time of day is only one context that may be important to consider. Urbick (2002) notes that for kids, products are a “holistic experience” and that kids are less able than adults to respond meaningfully to a product that is lacking its real-world context, such as packaging, concept, or brand identity. Urbick recommends testing a product in a form “as close to the real thing” as possible. For example, if the product is yogurt in a tube, Urbick says the sensory testing should be done using the tube. Sensory researchers are prone to think about the product itself, but, especially for kids, success of a new product may depend not just on how it tastes but also on how it handles, its play value, and the image it projects (e.g., whether it satisfies kids’ aspiration to be viewed as older).

More Research Needed
Kroll (1990) noted that while sensory testing with children was becoming increasingly important to the food industry, sensory research itself was not keeping pace with the need. “Testing with children is in an embryonic stage,” she wrote. “Over the years, a few sensory researchers have considered the problems involved in applying their science to this special population, but for the most part the field has been static. The need for serious investigation is pointed up by how little research has been done in this area.”

Some of Kroll’s concerns remain true today. Researchers have taken up the challenge during the ensuing dozen years, and several have been the focus of this article. But given the size of the market and the potential that reliable kid testing has for the food industry, there is still a need for more research to help maximize the insights that research with children is able to provide.

Richard Popper and Jeffrey J. Kroll
Author Popper, a Professional Member of IFT, is Vice President, Research and Development, Peryam & Kroll Research Corp., 3033 W. Parker Rd. Suite 217, Plano, TX 75023. Author Kroll is Executive Vice President, Peryam & Kroll Research Corp., 6323 N. Avondale Ave., Chicago, IL 60631. Send reprint requests to author Popper.

References

Beauchamp, G.K. and Moran, M. 1984. Acceptance of sweet and salty tastes in 2-year-old children. Appetite 5: 291-305.

Birch, L.L. 1979. Dimensions of preschool children’s food preferences. J.Nutr.Educ. 11: 91-95.

Birch, L.L. 1980. Effect of peer models’ food choices and eating behaviors on preschoolers’ food preferences. Child Development 51: 489-496.

Birch, L.L. and Marlin, D.W. 1982. I don’t like it; I never tried it: Effects of exposure on two-year-old children’s food preferences. Appetite 3: 353-360.

Birch, L.L., Billman, J., and Richards, S.S. 1984. Time of day influences food acceptability. Appetite 5: 109-116.

Bovell-Benjamin, A.C., Allen, L.H., and Guinard, J.X. 1999. Toddlers’ acceptance of whole maize meal porridge fortified with ferrous bisglycinate. Food Qual. Preference 10: 123-128.

Cooper, H. 2002. Designing successful diagnostic scales for children. Presented at Ann. Mtg., Inst. of Food Technologists, Anaheim, Calif., June 15–19.

Gollick, M. 2002. Asking kids questions: Possible pitfalls. Presented at Ann. Mtg., Inst. of Food Technologists, Anaheim, Calif., June 15–19.

Guinard, J.X. 2001. Sensory and consumer testing with children. Trends Food Sci. Technol. 11: 273-283.

Hemingway, M. 2002. Effective techniques for consumer research in a challenging market. Presented at Ann. Mtg., Inst. of Food Technologists, Anaheim, Calif., June 15–19.

Kimmel, S.A., Sigman-Grant, M., and Guinard, J.X. 1994. Sensory testing with young children. Food Technol. 48 (3): 92-99.

Kramer, F.M., Rock, K., and Engell, D. 1992. Effects of time of day and appropriateness on food intake and hedonic ratings at morning and midday. Appetite 18: 1-13.

Kroll, B.J. 1990. Evaluating rating scales for sensory testing with children. Food Technol. 44 (11): 78-80, 82, 84, 86.

Moskowitz, H.R. 1994. “Food Concepts and Products: Just-in-Time Development.” Food and Nutrition Press, Trumbull, Conn.

Pliner, P. 1982. The effects of mere exposure on liking for edible substances. Appetite 3: 283-290.

Pliner, P. 1983. Family resemblance in food preferences. J. Nutr. Educ. 15: 137-140.

Popper, R. Schraidt, M., and Kroll, J. 2002. Testing with pre-school children: The effect of the interviewer. Presented at Ann. Mtg., Inst. of Food Technologists, Anaheim, Calif., June 15–19.

Rozin, P. 1984. The acquisition of food habits and preferences. In “A Handbook of Health Enhancement and Disease Prevention,” ed. J.D. Matarazzo, S.M. Weiss, J.A. Herd, and N.E. Miller, pp. 590-607. Wiley, New York.

Rozin, P. and Millman, L. 1987. Family environment, not heredity, accounts for family resemblance in food preferences and attitudes: A twin study. Appetite 8: 125-134.

Swaney-Stueve, M. 2002. Can children perform descriptive analysis? Presented at Ann. Mtg., Inst. of Food Technologists, Anaheim, Calif., June 15–19.

Urbick, B. 2002. Kids have great taste: An update to sensory work with children. Presented at Ann. Mtg., Inst. of Food Technologists, Anaheim, Calif., June 15–19.

Zandstra, E.H. and de Graaf, C. 1998. Sensory perception and pleasantness of orange beverages from childhood to old age. Food Qual. Preference 9: 5-12.

Zajonc, R.B. 1968. Attitudinal effects of mere exposure. J. Personality Social Psychol. 9(Part 2): 1-27.