Security Tools Podcast

Statistician Kaiser Fung: Fishy Stats (Part 3)

Episode Summary

Over the past few weeks, Kaiser Fung has given us some valuable pointers on understanding the big data stats we are assaulted with on a daily basis. To sum up, learn the context behind the stats — sources and biases — and know that the algorithms that crunch numbers may not have the answer to your […]

Episode Notes

Over the past few weeks, Kaiser Fung has given us some valuable pointers on understanding the big data stats we are assaulted with on a daily basis.  To sum up, learn the context behind the stats — sources and biases — and know that the algorithms that crunch numbers may not have the answer to your problems.

In this third  segment of our podcast, Kaiser points out all the ways the stats can trick us through its inherently random nature — variability in stats-speak.


Cindy Ng: In part one and two with our interview with Kaiser Fung, we discussed the process behind a numerical finding, then focused on accuracy. In our last installment, Kaiser reveals one last way to cultivate numbersense.

Your third point is to have a nose for doctored statistics. And for me, it's kind of like…if you don't know what you don't know? Kind of like I was surprised to read in the school rankings chapter in Number Sense that different publications have different rules in ranking. And then I didn't know that like reporting low GPAs as not available, it's a magic trick that causes a median GPA to rise. And so if I didn't know this, I would just use any number in any of these publications and use it in my marketing. How do I cultivate a nose for doctored statistics?

Kaiser Fung: Well, I think...well, for a lot of people, I think it would involve like reading certain authors, certain people who specializes in this sort of stuff. I'm one of them but there are also others out there who have this sort of skepticism and they will point out know, I mean I think it's all about figuring out how other people do it and then you can do it even to just follow the same types of logic. Often times, it involves sort of like, there are multiple stages to this. So there's the stage of can you smell something fishy? So it's sort of this awareness that, "Okay, do I want to believe this or not?"

And then there's the next stage of, do you...once you smell something, do you know where to look, how to look, how do you investigate it? So usually when you smell something that means that you have developed an alternative hypothesis or interpretation that is different from what the thing you're reading. So in sort of this scientific method, what we want to do at that point is to try to go out and find cooperating evidence. So then the question becomes do you have this notion of what kinds of things I could find that could help you decide whether you're right or whether the original person is right? And here the distinction is really around if you're more experienced, you might be able to know if I am able to find this information that will be sufficient for me to even validate this or to fortify that. So you don't necessarily go through the entire analysis. Maybe you just find a shortcut to get to a certain point.

And then the last stage is, that's the hardest to achieve and also not always necessary but it's sort of like okay if you no longer believe in what was published, how do you develop your alternative argument? So that requires a little more work and that's the kind of thing that I try to train my students to do. So often times when I set very open-ended type problems for them, you can see these people in different stages. Like there are people who don't recognize where the problems are, you know, just believe what they see. There are people who recognize the problems and able to diagnose what's wrong. Then there are ones that can diagnose what's wrong and they will know, whether it's usually through looking at some other data or some other data points, they can decide, okay, instead of making the assumptions that the original people made which you no longer believe, I'm going to make a different set of assumptions. So like make this other set of assumptions, what would be the logical outcome of the analysis? So I think it's something that can be trained. It's just difficult in the classroom setting in our traditional sort of textbook lecture style. That type of stuff is very difficult to train.

Andy Green: Something you said about sort of being able to train ourselves. And one thing that, it comes up in your books a lot, is that a lot of us don't have the sense of variability in the data. We don't understand what that means or what it...if we were to sort of put it out on a bar chart, we don't have that picture in our mind. And one example that you talk about I think on a blog post in something as marketers, we do a lot is A/B testing. And so we'll look at, we'll do a comparison of changing one website slightly and then testing it and then noticing that maybe it does better, we think. And then when we roll it out, we find out it really doesn't make too much of a difference. So you talked about reasons why something might not scale up in an A/B test. I think you wrote about that for one of the blogs. I think it was Harvard Business Review,

Kaiser Fung: ...I'm not sure about whether we're saying the same things. I'm not quite exactly remembering what I wrote about there. But from an A/B testing perspective, I think there are lots of little things that people need to pay attention to because ultimately what you're trying to do is to come up with a result that is generalizable, right? So you can run your test in a period of time but in reality, you would like this effect to hold, I mean that you'll find anything over the next period of time.

Now, I think both in this case as well as what I just talked about before, one of the core concepts in statistics is not just understanding it's variability. Whatever number is put in front of you, it's just a, at the moment sort of measurement, right? It's sort of like if you measure your weight on the same scale it's going to fluctuate, morning, night, you know different days. But you don't have this notion that your weight has changed. But the actual measurement of the weight, even though if it's still the same weight, will be slightly different.

So that's the variability but the next phase is understanding that there are sources of variability. So there are many different reasons why things are variable. And I think that's sort of what we're getting into. So in the case of A/B testing, there are many different reasons why your results have been generalized. One very obvious example is that what we call the, we say that there's a drift in population. Meaning that especially websites, you know, a site changes over time. So even if you keep stable during the test, when you roll it forward it may have changed. And just a small change in the same part of the website could actually have a very large change in the type of people that comes to the page.

So I have the past, I've done a lot of A/B testing around kind of what you call the conversion funnel in marketing. And this is particularly an issue if you...let's say you're testing on a page that is close to the end of the funnel. Now, people do that because that's the most impactful place because the conversion rates are much higher in those pages. But the problem is because it's at the end of many steps. Anything that changed in any of the prior steps, it's going to potentially change the types of people ended up on your conversion page.

So that's one reason why there are tests that test variability in the type of people coming to your page. Then even if the result worked during a test, it's not going to work later. But there's plenty of other things including something that people often times fail to recognize which is the whole basis of A/B testing is you are randomly placing people into more pockets. And the randomization, it's supposed to on average tell you that they are comparable and the same. But random while it will get you there almost all of the time but you can throw a coin 10 times and get 10 heads. But there's a possibility that there is something odd about that case.

So another problem is what is your particular test had this weird phenomenon? Now, in statistics, we account for that by putting error box around these things. But it still doesn't solve the problem that that particular sample was a very odd sample. And so one of the underlying assumptions of all the analysis in statistics is that you're not analyzing that rare sample. That rare sample is kind of treated as part of the outside of normal situation. So yeah, there are a lot of subtlety in how you would actually interpret these things. And so A/B testing is still one of the best ways of measuring something. But even there, there are lots of things that you can't tell.

I mean, I also wrote about the fact that sometimes it doesn't tell you...we'd like to say A/B testing gives you cause-effect analysis. It all depends on what you mean by cause-effect because even the most...for a typical example, like the red button and the green button, it's not caused by the color. It's like the color change did not cause anything. So there are some more intricate mechanisms there that if you really want to talk about cause, you wouldn't say color is a cause. Although in a particular way of interpreting this, you can say that the color is the cause.

Andy Green: Right, right.

Cindy Ng : It really just sounds like at every point you have to ask yourself, is this accurate? Is this the truth? It's a lot more work to get to the truth of the matter.

Kaiser Fung: Yes. So I think when people sell you the notion that somehow because of the volume of the data everything becomes easy, I think it's the opposite. I think that's one of the key points of the book. When you have more data, it actually requires a lot more work. And going back to the earlier point which is that when you have more data, the amount of potentially wrong analysis or coming to the wrong conclusion is exponentially larger. And a lot of it is because of the fact that most analysis, especially with data that is not experimental, it's not randomized, not controlled, you essentially you rely on a lot of assumptions. And when you rely a lot on assumptions, it's the proverbial thing about you can basically say whatever the hell you want with this data.

And so that's why I think it's really important for people when especially for those people who are not actually in this business of generating analysis, if you're in the business of consuming analysis, you really have to look out for yourself because you really could, in this day and age, could say whatever you want with the data that you have.

Cindy Ng: So be a skeptic, be paranoid.

Kaiser Fung: Well the nice thing is like when they're only talking about the colors of your bicycles and so on, you can probably just ignore and not do the work because it's not really that important to the problem. But on the other hand, when know, in the other case that is ongoing which is the whole Tesla autopilot algorithm thing, right? Like in those cases and also when people are now getting into healthcare and all these other things where your potential...there's a life and death decision, then you really should pay more attention.

Cindy Ng: This is great. Do you have any kind of final thoughts in terms of Numbersense?

Kaiser Fung: Well, I'm about...I mean, this is a preview of a blog post that I'm going to put out probably this week. And I don't know if this works for you guys because this could be a bit more involved but so here's the situation. I mean, it's again that basically reinforces the point that you can easily get fooled by the data. So my TA and I were reviewing a data set that one of our students is using for their class projects. And this was basically some data about the revenue contributions of various customers and some characteristics of the customers. So we were basically trying to solve the problem of is there a way to use these characteristics to explain why the revenue contributions for different customers have gone up or down?

So we've spent a bit of time thinking about it and we eventually come up with a nice way of doing it. You know, it's not an obvious problem, so we have a nice way of doing it. We thought that actually produced pretty nice results. So then we met with the student and pretty much the first thing that we learned from this conversation is that, oh, because this is for proprietary data, all the revenue members were completely made up. Like there is some, this thing, formula or whatever that she used to generate the number.

So that's sort of the interesting sort of dynamic there. Because on the one hand, like obviously all of the work that we spent kind of put in creating this model and then the reason why we like the model is that it creates a nicely interpretable results. Like it actually makes sense, right? But it turns that yes, it makes sense in that imaginary world but it really doesn't have any impact on reality, right? So I think that's the...and then the other side of this which I kind of touch upon in my book too is well, if you were to just look at the methodology of what we did and the model that we built, you would say we did a really good work. Because we applied a good methodology, generate it, quick results.

So the method and the data and then your assumptions, I mean all these things play a role in this ecosystem. And I think going back to what I was saying today, I mean it's the problem is all these data. I think we have not spent sufficient time to really think about what are the sources of the data, how believable is this data? And in this day and age, especially with marketing data, with online data and all that, like there's a lot of manipulation going on. There are lots of people who are creating this data for a purpose. Think about online reviews and all other things. So on the analysis side, we have really not faced up to this issue. We just basically take the data and we just analyze and we come up with models and we say things. But how much of any of those things would be refuted if we actually knew how the data was created?

Cindy Ng: That's a really good takeaway. You are working on many things, it sounds like. You're working on a blog, you teach. What else are you working on these days?

Kaiser Fung: Well, I'm mainly working on various educational activities that are hoping to train the next generation of analysts and people who look at data that will hopefully have...the Numbersense that I want to talk about. I have various book projects in mind which I hope to get to when I have more time. And from the Numbersense perspective, I'm interested in exploring ways to describe this in a more concrete way, right? So there this notion of...I mean, this is a general ecosystem of things that I've talked about. But I want a system that ties it a bit. And so I have an effort ongoing to try to make it more quantifiable.

Cindy Ng: And so if people want to follow what you're doing, what is your Twitter handle on your website?

Kaiser Fung: Yes, so my Twitter is @junkcharts. And that's probably where most of my, like in terms of updates that's where things go. I have a personal website called just where they can learn more about what I do. And then I try to update my speaking schedule there because I do travel around the country, speak at various events. And then they will also read about other things that I do like for corporations that are mostly around, again, training managers, training people in this area of statistical reasoning, data visualization, number sense and all that.