The idea is not do do science. The idea is to loosely systematize and conceptualize innovation. To generate options and create a failure tolerant system.
I'm sure improvements could be made... but this isn't about being a valid or invalid expirement.
> The idea is not do do science. The idea is to loosely systematize and conceptualize innovation.
Why are you acting like these are completely different frameworks? You have the same goalsWhen you A/B test generally mistakes are reversible and will not make your company bankrupt or lose your job. Something being a 1 in 20 fluke is acceptable risk, you'll get most decisions right. Compare this however to hairy decisions on entering a new market or creating a new product line, there are no A/B tests or scientific frameworks here, you gather all the evidence you can, estimate the risk and make a decision
> The problem is there can be a large number of potential comparisons when the details of data analysis are highly contingent on data, without the researcher having to perform any conscious procedure of fishing or examining multiple p values. We discuss in the context of several examples of published papers where data-analysis decisions were theoretically-motivated based on previous literature, but where the details of data selection and analysis were not pre specified and, as a result, were contingent on data.
Not only are expriments commonly multi-arm, you also repeat your experiment (usually after making some changes) if the previous experiment failed / did not pass the launch criteria.
This is further complicated by the fact that lauch criteria is usually not well defined ahead of time. Unless it's a complete slam dunk, you won't know until your launch meeting whether the experiment will be approved for launch or not. It's mostly vibe based, determined based on tens or hundreds of "relevant" metric movements, often decided on the whim of the stakeholder sitting at the lauch meeting.