Lies, Damned Lies, and Surveys:
Ten Things You Need to Know Before Believing, or Publishing, a Survey
by Lee Gruenfeld (VP Strategic Initiatives, Support.com)
(Originally published on Medium.com)
Let me say this up front, so there’s no ambiguity about my position: Surveys are among the most unreliable “seemingly reliable” information sources in existence. Like the Bible or the U.S. Constitution, they can be manipulated to “prove” almost anything. To say that you should take survey results with a grain of salt is a towering understatement.
And yet, we in industry rely on survey results to guide nearly every facet of our businesses, including how we market, sell, develop, strategize, compete and serve our customers.
There’s nothing inherently wrong with the survey process. The problems are a) how we design surveys and gather information, and b) how we read surveys undertaken by others.
In the case of those carrying out surveys, the problem subdivides into two categories of issues: intentional vs. inadvertent deception. I can’t help you with the first — if you’re out to mislead people, you’re not going to be interested in the rest of this piece — but I’m hopeful that I can help you avert unintentional misdirection.
In the case of those who rely on surveys, here’s a simple rule whose merit will become clearer as you proceed through this polemic: Don’t ever believe a survey unless you’ve examined the questions that were asked and understand how the survey was conducted.
Here at my company (Support.com), we’ve commissioned several major industry surveys over the past year, and we went to great pains to make sure that any information we published would be truthful, reliable and, therefore, usable. To back that up, we make the survey forms we used freely available, so that any potential users of the results could assess for themselves our methodological rigor.
Here, in no particular order, are ten ways a survey can go horribly wrong. The list isn’t complete, but it should be enough to help you ascertain the veracity of a study before you make the decision to act on the results.
1. Responder self-perception bias: A major problem with surveys is that we’re asking people to self-report. Unfortunately, people are people, and they behave like people, which means that they’re subject to the slings and arrows of human psychology in all its glorious chaos. We’re not getting facts when we ask people questions; we’re getting what people think, and what they think departs from reality with alarming frequency. Ask people what they read and eat, and they’ll tell you the New Yorker and bean sprouts. Watch what they read and eat? It’s People and Twinkies.
People will often respond with how they think they should respond rather than with the truth, usually because they won’t admit it to themselves. People often honestly don’t know what drives them. They think they do, but sometimes they’re just plain wrong, and this type of bias has led to all kinds of bad operational processes in the support industry.
A true story to illustrate: When I was with Deloitte, two partners of mine were returning from Cleveland to Southern California late at night, on two different airlines, during a bad weather incident that caused both their flights to be canceled. When Partner A went to get rebooked, he was treated with open hostility by the airline agent, who made it clear that he was interrupting her more important work, and who practically threw his new ticket at him when she was finished, saying, “You’ll be arriving two hours later than your original flight, but you’ll just have to live with it,” then held him in an “I dare you to say anything” glare before he went to find his new gate.
Partner B’s experience was different. The agent expressed great regret at the inconvenience, sent someone to get him some coffee, rolled up her sleeves and spent half an hour trying to find him another flight. She eventually called in a favor from a friend at a competing airline, set up two flights with a change in Salt Lake City, and proudly announced that he’d be home the next morning in time to get to the office by 9:00.
Both of these travelers wrote follow-up letters to the airline. Which of them do you think said he would fly this airline every single time even if he had to go through Istanbul to get from New York to Chicago, and which swore that he wouldn’t fly the same airline again if they had the last escape flight from the re-eruption of Krakatoa? Easy answer, right?
I exaggerated a little on the contents of the letters, but this story is a true one. Now I’m going to make up something by way of illustration.
The day before their flights, both of these guys were surveyed by the airlines. One of the questions was this one:
What’s most important to you if a flight is canceled or delayed?
1. Making me feel like a valued customer, nurturing me, being sensitive to my needs and feelings and making me feel personally cared for.
2. Getting me re-booked and home as quickly as possible.
Knowing that Big Four partners, like surgeons and fighter pilots, believe themselves to be uber-rational Masters of the Universe, you’ve already guessed that they both were insulted by #1 and checked off #2, right? You’d check off the same answer yourself.
Fact is, you’re all wrong. #1 is the correct answer, as it is in almost every customer service situation. It certainly was in the above example. The guy who got home only two hours late was absolutely livid. The one who traveled all night long and had to change planes thought the airline had pulled out all the stops and treated him splendidly.
(As an aside: The problem with “as quickly as possible” answers is that there is no way to measure that. Was two hours late “as quickly as possible?” Twelve hours? There’s no baseline of comparison, so the only thing you can ever get a sense of is how hard the provider tried.)
Tech support users will self-report that getting the problem fixed “as quickly as possible” is all that matters, but it actually matters very little. Consumers don’t care how long a support call takes so long as they’re treated well. They don’t even mind if it takes another call to get the problem resolved. I once spent three months trying to get an Apple ‘@me.com’ problem resolved. It never was, and the service was eventually dropped altogether, but the way they handled the situation, and handled me, was so superb that it deepened my loyalty to the brand.
The lesson here is that “What’s important to you?” questions are almost always worse than useless. They presume that the responder knows, and they don’t. What makes this type of question especially superfluous is that the people conducting the survey generally know the answer anyway. If you’ve been in the support business for any length of time, you already know what matters to your customers, better than they do. That’s why the agents with the highest customer satisfaction scores are not necessarily the ones with the lowest handling times and the best first-call resolution rates.
2. Sampling bias: This has to do with making sure you’re surveying an appropriate group of people.
In the support world, it’s a common survey system rule not to call on someone who was surveyed more recently than some interval of time, say a month or two. This makes sense, since you don’t want to bother your customers too frequently. It’s also common practice not to survey someone who’s just had a very negative support experience, since it’s well know that this tends to anger people further.
The problem is, these policies skew the post-support episode in a positive direction. The first rule makes it impossible to include someone who’s had two recent support incidents and is therefore more likely to be unhappy about yet another one. He’s also more likely to report more negatively on a subsequent one than he would had the first not occurred.
The second rule — declining to follow up on the worst experiences — is far more obvious in its effect.
That’s only one example of sampling bias. A recent survey asked customers how important the support experience was in creating brand loyalty. The population sample included people who’d never had a support episode as well as those who had. Good practice would have dictated asking these two groups two different questions, one about how they thought their brand loyalty might be affected, and the other about how it was affected.
If you don’t sample correctly, you can’t generalize the results beyond the group of people who responded to the survey, thereby defeating the fundamental purpose of surveying.
3. Set-up bias: This occurs when a question or series of questions is phrased in such a way as to incline the responder in a certain direction. I saw this deployed in a particularly egregious manner at a major New York bank. Management had reason to believe that the customer service department wasn’t as good as its (self-administered) surveys indicated. Among the questions were these two, the first separated from the second by a couple of intervening questions:
On a scale of one to ten, how important is it for the phone to be answered promptly?
How would you rate our customer service department, based on how fast we answer the phone?
Nobody is going to say that they don’t want the phone answered quickly, and explicitly saying so sets them up for the second question that comes later. As it happened, this bank’s customer service department was wildly overstaffed, at great expense, so answering the phone quickly was one of the (few) things they were able to do well, and they got high scores for it.
Which brings us to another kind of bias.
4. Wiring the question: The second question, above (How would you rate our customer service department, based on how fast we answer the phone?) is a perfect example of “wiring,” framing the question to engineer a positive response. One of my favorites comes from an old political survey I once received that had this question:
Do you think that [Party A presidential candidate] should be given the chance to straighten out the economic mess left by [Party B incumbent]?
There are really only two possible responses to this question. One is “Yes,” and the other is not to respond at all. How do you say, No, I don’t think the candidate should be given a chance?
The subtle beauty of this question was only made manifest when the results were eventually published and it was reported that 87% of respondents believed that the Party B guy had created an economic mess. Logically, this is a legitimate conclusion, because for anyone to answer the question, it’s necessary that they first buy into the premise.
Furthermore, it was reported elsewhere in the same article that 87% of respondents felt that the candidate should be given a chance to straighten it out. The beauty part is that everyone with the presence of mind not to answer such a wired question was self-unselected out of the poll. So while it’s true that 87% of the people who answered the question answered it “Yes,” it’s quite possible that many of the people surveyed never answered the question at all.
Not answering means that they were alert to another type of wiring, predicate baiting.
5. Superfluous predicates: As seen above, this occurs when a marginally relevant but utterly unnecessary premise is added to a question, and then the premise is cited as part of the results, or worse, obscures the results into uselessness. Here’s another example:
“Given the skyrocketing popularity of new devices that make everyone’s life easier and more enjoyable, do you plan to buy a home automation device in the coming twelve months?”
Superfluous (or leading) predicates can also confuse questions to the point of incomprehensibility. History’s greatest example of predicate confusion occurs in the Second Amendment to the United States Constitution. The Amendment is often quoted like this: “The right of the people to keep and bear arms shall not be infringed.” Seems pretty straightforward, and the gun rights advocates cite this phrase frequently.
Problem is, the phrase has a predicate: “A well-regulated militia, being necessary to the security of a free state.” What this predicate presents is the basis for the oft-cited phrase. In other words, the right of the people to bear arms is predicated on the necessity of a militia. So does that mean that the right to bear arms extends to the militia only, and not the people at large? Or is the right absolute, the militia part being explanatory only? And where does “well-regulated” enter in?
I’m not going to weigh in with an opinion, since the argument has been raging for over two hundred years and is patently irresolvable, the Second Amendment being about the worst-articulated (and grammatically atrocious) rule of law in the entire canon of American regulation. The point is that, for a survey question to be valid, it needs to be free of superfluous predicates that are confusing at best, misleading or downright deceptive at worst.
6. Belief vs. Behavior: This problem arises frequently when the topics of security and privacy are on the table. It came up during a recent conference where I participated in a panel on this issue. The moderators kicked it off with a question along these lines: “We know from our research that consumers are deeply concerned about privacy and security. What are we as an industry doing to address these concerns?” A brisk discussion ensued, centered around how we’re going to deal with the fears and anxieties of consumers.
Here’s the problem: When it comes to privacy, we ask a lot of questions about what people feel. What we should be asking is, what are they doing about it? Because the answer to that is that, in real life, they’re doing very little.
If you ask me about terrorism, I’ll tell you that I’m deeply concerned about it, and that’s a truthful answer. I’ve thought about it, written about it, and it worries me a great deal.
What do I actually do about it? Nothing. Does it affect my behavior? Not in the slightest. Do I factor it in when making travel decisions? Nope.
Consumers who report great concern about security and privacy generally do nothing about it, either. They don’t do backups, their passwords are the same for every gated site they visit (and they’re usually “12345” or “password1”), they haven’t changed them since the earth cooled, they open email attachments, they type their credit numbers into unsecure sites, and last year they spent $15.5 million helping a deposed royal family move funds out of Nigeria.
40 million accounts were stolen when Target was hacked, yet the stock is now trading more than 25% higher than it was before the breach. Consumers just plain didn’t care — there was no reason to, since none of them lost so much as a dime — but a phenomenon called “socially desirable responding” makes it difficult for them to confess their apathy, despite headlines in the popular media about how shocked they were. (For some really eye-opening insight into “socially desirable responding,” check out a new study in the Journal of Consumer Research. According to one of the authors: “The tendency of people to portray themselves in a more favorable light than their thoughts or actions…is a problem that affects the validity of statistics and surveys worldwide.”)
So don’t ask them how they feel when what you really want to know is how they behave. A lot of companies are spending some serious development money to assure customers that they’re safe. But you want to know the best way to assure them? Just tell them that they’re safe. They’ll believe you. When it comes to things they can’t see, they’ll believe nearly anything.
Let me pull no punches on this topic. Consumers are gullible and uncritical. The amount of malarkey that they willingly swallow is absolutely breathtaking. They’ll spend billions on diet drugs that don’t work, the latest golf equipment that doesn’t affect their games, they’ll vote for politicians that openly lie to them, so why do we think that, all of a sudden, they’re going to do something about security? They’ll say they will, but if they really have their hearts set on a product, all the manufacturer needs to do is say, “Listen: Don’t worry about security, we’ve got it covered,” and they’ll buy it. Horror stories about Jennifer Lawrence and Hulk Hogan and Erin Andrews getting hacked don’t make any difference. Last year researchers remotely hijacked a Jeep and drove it off the road, and it didn’t affect car sales in the slightest, including Jeep sales.
Surveys focused on what people care about aren’t necessarily misleading. The fallacy is in automatically assuming that they’re doing something about it.
7. Insider jargon: As in most businesses or fields of endeavor, customer support has its own argot, which is fine. Who wants to keep explaining or even spelling out AHT or FCR or CustSat or BPO?
The problem occurs when we let our internal terminology sneak out into the real world and assume people are going to understand it, and then interpret the results based on that assumption. Professional “customer insight” specialists do this depressingly often.
I answer every survey I get. I know how much those data mean to the people who provided me with a product or service and I’m glad to give feedback. But I usually quit when I see the survey going so far off the rails that I know my responses will be worthless. I’ve been in this arena a long time and, I swear, I don’t know what the question, “Did the agent take ownership of your problem?” means. No one I know knows what it means, and I’ve asked. Some professionals tell me that it means the rep made it his or her personal responsibility to resolve the problem. (Well, they solved the problem, so I guess they took ownership. Are we maybe talking about attitude?) Few outsiders I’ve ever asked have cited that interpretation.
“Experience” is another overused and misused word. Consumers don’t think of things this way. They don’t sit back and contemplate the totality of their support experience. They just got some help, is all. They also didn’t experience a customer journey and they are rarely delighted about anything to do with getting help. I mean, they are delighted, but they wouldn’t think of calling it that.
The trick to a valid survey is not to teach customers to think and speak our way. It’s for us to think and speak their way. Jargon-filled surveys should make you suspicious.
Speaking of insiderdom: There’s a tendency of survey designers to attempt to divine levels of subtlety that are way beyond the consumer’s ken. A ubiquitous example of this is a standard form from a major surveying company that is used by thousands of corporations, most notably major financial institutions and healthcare providers. You’ve seen this one. It has three questions in a row that go like this (thinly disguised here to protect the transgressor):
What is your overall satisfaction with this site?
How well does this site meet your expectations?
How does it compare to your idea of an ideal Website?
These are all the same question. I honestly can’t tell what they’re even trying to get at by asking three separate questions assuming the responder understand them. If they were asking the questions of professional Website designers, I could maybe get that, but how much effort do you suppose the average Website visitor is going to expend trying to figure this out enough to conclude that, while he really, really likes this site, by golly, it doesn’t come very close to his ideal? Stated another way, what possible difference can that difference make, and what exactly is the recipient of those answers going to do with them?
One last thing on insiderdom: We eat, breathe and sleep our businesses. Our customers don’t. Yet sometimes we treat them like they just joined a club. When I opened a checking account at one of the largest banks in the U.S., all I wanted to do was be able to access their network of ATMs, but they came at me like I’d just entered a religious cult. Not only was I bombarded with physical, emailed and phoned descriptions of the hundred ways I could take advantage of my “exclusive membership,” I was invited to join focus groups to help make “my bank” better, urged to sit in on customer councils and, of course, surveyed endlessly on my “journey” and “experiences.” After a while I told them exactly what I thought of my journey, and why it was ending.
Using insider jargon on consumers, and making insider assumptions about how they’re going to interpret questions, also gets us directly into the next topic.
8. Bad construction: This is pretty simple: It’s when you phrase a question that doesn’t get you the answer you want. It also brings up the problem of the respondent not knowing what you’re really trying to get at.
I spend a lot of time in hotels, and I almost always get surveyed. I tried to be as honest as possible in my answers, but sometimes I get caught between literal interpretations and what I suspect is the real question hidden beneath the over question. The same thing just happened to a colleague of mine in the aerospace sector who just stayed at a famous hotel in Austin, Tex. She really loved the place and was happy to provide feedback. Three of the questions had to do with how satisfied she was with the room décor, cleanliness and the comfort of the bed. She was perfectly satisfied with all three, and the sliding scale answer key, which went from 0 to 10, indicated that 5 meant satisfied. So that’s what she checked for all.
Two days later she got a call from the hotel manager, who wanted to know what the problem was. She told him there was no problem. “So why did you just give us fives?” he asked, somewhat plaintively. She responded that 5 meant satisfied, and she was. As you who are reading this know, fives result in calls from the hotel owner wanting to know why management was falling down on the job. But what was she supposed to do: rate them a 10? Doesn’t that mean she was deliriously happy? If the room is perfectly clean, isn’t that kind of binary? How much cleaner can you get if you don’t see any dirt or dust? How does “clean” jump from a 5 to a 10? She was absolutely clueless as to why the manager was offended.
Better labelling would address the problem. “Clean” should be range from “Pigsty” to “Perfectly clean,” without the “satisfied” nonsense. Or maybe four choices instead of ten (filthy, not really clean, somewhat clean and just plan “clean”). Seriously, how do you tell the difference between 4 and a 6 when it comes to cleanliness?
There are many other examples of bad construction, such as multiple choice answers that aren’t mutually exclusive or which don’t include all possible responses, questions and answers that are so long the respondent can’t parse out the meaning without a level of effort he isn’t willing to expend, using Yes/No as the only possible answers where grades of nuance are called for, and the confusing use of multiple negatives. “How much do you agree or disagree with the statement that the Senator shouldn’t be denied the opportunity to oppose legislation to block a bill that would remove landing rights?”)
The much-vaunted NPS (Net Promoter Score) has an inherent construction problem. I find the question “How likely are you to recommend this company to a friend?” problematical. Being in the business, I know what’s expected of me from this question. But the fact is, I would never recommend, say, an insurance company or a kitchen appliance or a smartphone to a friend. Those suggestions have backfired on me too many times. So is the correct answer ‘1’ (would never recommend)? Technically, yes, but now I have to read into the question more than is there. What if I had a fantastic customer service experience with my Toyota but, if asked by a friend, would always recommend a Jeep? Also (and I’m not the first one to wonder about this), is the phrasing “Would you recommend…” interpretable as “Do you plan to recommend…?”
One last personal favorite: “How did this experience compare to your expectations?” The first time I visited an Apple store or Nordstrom’s or the Fairmont Orchid hotel, I would have rated them 9 or 10, as they “far exceeded” my expectations. What am I supposed to answer the twentieth time? The truth is, if they were every bit as good as they were when I first visited, the correct answer is “Met my expectations,” which is a 5. How do they interpret that 5 if I’m on my twentieth visit? Do they even know if I’ve been there before? Do I inadvertently ding the employees by answering honestly, i.e., my expectations were for a fantastic experience and they delivered?
8. Positioning bias: Remember the Pepsi Challenge? Random people on the street were asked to taste two cups of soda, labeled A and B, and pick out which one they liked best. One had Pepsi in it, the other Coke. Amazingly, Pepsi was the clear winner.
I say “amazingly,” because I can barely tell the difference between the two. I can if they’re side by side, kind of, but I can’t say I like one better than the other, or tell you which is which, and that’s true for most people. So how did Pepsi win the Challenge?
Turns out the Pepsi was always in the cup labeled A. And if there’s one thing we know for sure from a lot of research, even if you put the same stuff in both cups, more people are going to report liking the one that was labeled A than the one labeled B.
Want to make the Pepsi victory even more dramatic? Label the Coke cup X. Or ‘32q937.’
Studies in which all four answers to a question are essentially the same, just worded differently, demonstrate conclusively that ‘a’ and ‘b’ always come out as the most frequently picked answers, even when the order of the questions is scrambled. Political pollsters know this well, and always position the favorable answers at the top. Scrupulous researchers never present answers in the same order to every respondent, but mix them up in order to prevent positional bias. (In fact, careful analysis of the effects of such scrambling are one of the techniques to assess the validity of a survey. If the positional effect is more pronounced than the consistency of the substantive answers, the question is useless.)
Eliminating positional bias isn’t easy — there are twenty-four different sequences in which to present four answers — but it’s necessary if you’re serious about getting meaningful results.
9. Yes, but whatever: I didn’t know what else to call this. It occurs when you force people to answer a question (“Question 9 is required” followed by an inability to continue until you bloody well answer it) without ascertaining whether they actually care about it. I don’t know about you, but among the things about which I couldn’t care less are the exterior appearance of a hotel I pull into at two a.m. and leave five hours later, or whether or not the support rep greeted me by name, or whether I was thanked for my participation in a loyalty program. (These last two are especially annoying because they don’t have anything to do with customer satisfaction but are really about using customers to check up on employee compliance with policy.)
The surveys I personally admire, and therefore take more seriously, are the ones where “N/A” is an allowed response. It’s usually reserved for questions about things you might not have experienced (“How would rate the breakfast buffet?”) or that you don’t remember, but could just as easily be used for things you don’t care about.
The problem with forced questions is that it leads to false conclusions, conflating weakly held opinions with heartfelt ones. (It’s also a special case of sample population bias: Making people answer questions about things they don’t care about is pretty much the same as asking the question of the wrong people.)
10. Misuse of statistics: The spectrum of potential problems here is so vast, I’m only going to touch on a couple of illustrations.
My favorite recent example came about three weeks before the California Democratic presidential primary. A major national publication reported that Bernie Sanders had assumed an edge over Hillary Clinton, and provided this precious little bit of insight: “The results of the poll were 42.2% in favor of Sanders vs. 42.1% for Clinton, a difference which is well within the margin of error but is nevertheless significant because it’s the first time Sanders has ever been ahead.” Similarly, the Los Angeles Times added this to a story about its own poll two days later: “Sanders’s 1-point lead falls within the poll’s margin of error.”
This betrays a complete lack of understanding on the part of the reporters, as well as their editors. When the difference is inside the margin of error, the difference isn’t small; it’s literally (and I mean literally) nonexistent. If fairness were the objective instead of headlines, the difference wouldn’t even be reported. “Inside the margin of error” means that, were we to conduct the same poll under the exact same conditions in an equivalent sample population, it could just as easily go in the other direction, and by an even wider margin. In other words, the only legitimate conclusion is that the poll tells us nothing about who is more likely to win.
Another misuse of statistics occurs when we blithely cite averages without taking into account variance, or fail altogether to factor in additional statistics that shed light on seemingly illogical outcomes, or simply use the wrong metric. During World War I, the U.S. Army came up with a vastly improved combat helmet. To the shock of many, use of the new bit of protective gear resulted in a steep rise in the number of head injuries. It took a few weeks of digging to discover that researchers were asking the wrong question. The reason the rate of head injuries was increasing was that the number of deaths due to head trauma was plummeting. Those additional soldiers coming in with head injuries would have died but for the new helmet.
That the “average” homeowner has 4.5 connected devices tells me nothing. That 40% have none and 40% have 20 tells me a lot, at least if I care about segmentation and targeting.
Do we need to survey at all?
We’re so thoroughly wedded to surveys, I wonder if we spend enough time considering alternate, and more direct ways, of getting useful information. One of the things we at Support.com have been evangelizing is that support organizations should stop focusing exclusively on traditional metrics like average handle time and first call resolution. The rationale is that, by focusing on the overall business benefit of support instead of on how much it costs, we stand a much better chance of positively impacting the bottom line. If doubling AHT cost $1 million but resulted in a 10% drop in product returns that saved your company $8 million, would you do it? Of course you would, but you need to ask the right questions, and gather the right statistics, in order to make that happen.
What has this got to do with surveying? Simply that there are situations in which you don’t need to survey at all. With the right systems in place, you can measure the impact of policies without ever having to ask your customers to self-report. Bear in mind that, ultimately, surveying your customers is an attempt to predict how they or others are going to behave. The only reason you care whether they’re “satisfied” is because customer satisfaction scores gives you some clues as to whether they’re going to buy from you again, or even just keep what they already have, and whether they’re likely to talk up your products and services to others.
But if you’ve got other ways of knowing those things, you really don’t care how they feel. You just care what they do. If you could link individual consumer behavior back to specific touch points, why bother asking questions? Instead of a post-support satisfaction survey, what you really want to know is what that customer actually did: Did he buy more of your goods and services? Did he return what he already had?
We’re a ways away from such direct measures across the board. (The technology is there but the will isn’t.) In the meantime, we’ll continue to depend on inherently vague self-reporting based on surveys of dubiously reliable construction interpreted by overtly or inadvertently biased researchers. The price of good information is eternal vigilance, so treat any survey results you come across with extreme skepticism until you’ve applied a handful of basic filters to weed out the kinds of landmines we’ve discussed. If you do that, you’re going to be amazed at how much misinformation is being cavalierly tossed about as information, and you’ll be better prepared to deal with it.