Completely agree. The sign up flow for your startup does not need the same rigor as medical research. You don’t need transportation engineering standards for your product packaging, either. They’re just totally different levels of risk.
I could write pages on this (I’ve certainly spoken for hours) but the adoption of a scientific research mindset is very limiting for A/B testing. You don’t need all the status quo bias of null hypothesis testing.
At the same time, it’s quite impressive how people are able to adapt. An organization experienced with A/B testing will start doing things like multi variate correction in their heads.
For anyone spinning this stuff up, go Bayesian from the start. You’ll end up there, whether you realize it or not. (People will look at p-values in consideration of prior evidence).
0.05 (or any Bayesian equivalent) is not a magic number. It’s really quite high for a default. Harder sciences (the ones not in replication crisis) use much stricter values by default.
Adjust the confidence required to the cost of the change and the risk of harm. If you’re at the point of testing, the cost of change may be zero (content). It may be really high, it may be net negative!
But in most cases, at a startup, you should be going after wins that are way more impactful and end up having p-values lower than 0.05, anyway. This is easy to say, but don’t waste your time coming up with methods to squeeze out more signal. Just (just lol) make better changes to your product so that the methods don’t matter. If p=0.00001, that’s going to be a better signal than p=0.05 with every correction in this article.
If you’re going to pick any fanciness from the start (besides Bayes) make it anytime valid methods. You’re certainly already going to be peaking (as you should) so have your data reflect that.
I could write pages on this (I’ve certainly spoken for hours) but the adoption of a scientific research mindset is very limiting for A/B testing. You don’t need all the status quo bias of null hypothesis testing.
At the same time, it’s quite impressive how people are able to adapt. An organization experienced with A/B testing will start doing things like multi variate correction in their heads.
For anyone spinning this stuff up, go Bayesian from the start. You’ll end up there, whether you realize it or not. (People will look at p-values in consideration of prior evidence).
0.05 (or any Bayesian equivalent) is not a magic number. It’s really quite high for a default. Harder sciences (the ones not in replication crisis) use much stricter values by default.
Adjust the confidence required to the cost of the change and the risk of harm. If you’re at the point of testing, the cost of change may be zero (content). It may be really high, it may be net negative!
But in most cases, at a startup, you should be going after wins that are way more impactful and end up having p-values lower than 0.05, anyway. This is easy to say, but don’t waste your time coming up with methods to squeeze out more signal. Just (just lol) make better changes to your product so that the methods don’t matter. If p=0.00001, that’s going to be a better signal than p=0.05 with every correction in this article.
If you’re going to pick any fanciness from the start (besides Bayes) make it anytime valid methods. You’re certainly already going to be peaking (as you should) so have your data reflect that.