hello.
29 stories
·
1 follower

Let's practice what we preach: Planning and interpreting simulation studies with design and analysis of experiments

1 Share
Abstract Statisticians recommend design and analysis of experiments (DAE) for evidence‐based research but often use tables to present their own simulation studies. Could DAE do better? We outline how DAE methods can be used to plan and analyze simulation studies. Tools for planning include cause‐and‐effect diagrams and factorial and fractional factorial designs. Analysis is carried out via analysis of variance, main effect and interaction plots, and other DAE tools. We also demonstrate how Taguchi robust parameter design can be used to study the robustness of methods to a variety of uncontrollable population parameters. Résumé Les statisticiens prônent le recours aux plans et analyse d'expériences (DAE) en recherche factuelle, mais le plus souvent, ils se contentent de tableaux pour présenter leurs propres études de simulation. Les auteurs de ce travail examinent l'apport et l'amélioration que pourrait apporter l'adoption de l'approche (DAE). Pour y répondre, ils présentent une description de la façon dont les méthodes DAE peuvent être employées pour planifier et analyser les études de simulation. Les outils de planification proposés comprennent les diagrammes de cause à effet et les plans factoriels et fractionnaires. L'analyse préconisée repose sur l'ANOVA, les graphiques d'effets principaux et d'interactions et d'autres outils DAE. Les auteurs montrent également comment les plans robustes de Taguchi peuvent être utilisés pour étudier la robustesse des méthodes à divers paramètres de population non contrôlables.
Read the whole story
jthaman
28 days ago
reply
Maryland
Share this story
Delete

N is never large statmodeling.stat.columbia.edu/2005/07/31/n_i…

1 Share
Read the whole story
jthaman
35 days ago
reply
Maryland
Share this story
Delete

“Statistics: A Life Cycle View”

1 Share

This article from Ron Kenett is a few years old but is still relevant:

Statistics has gained a reputation as being focused only on data collection and data analysis. This paper is about an expanded view of the role of statistics in research, business, industry and service organizations. . . . a “life cycle view” consisting of: 1) Problem elicitation, 2) Goal formulation, 3) Data collection, 4) Data analysis, 5) Formulation of findings, 6) Operationalization of findings, 7) Communication and 8) Impact assessment. These 8 phases are conducted with internal iterations that combine the inductive-deductive learning process . . . The envisaged overall approach is that applied statistics needs to involve a trilogy combining: 1) a life cycle view, 2) an analysis of impact and 3) an assessment of the quality of the generated information and knowledge. . . .

It can be hard to write, and to read, this sort of article, as advice about problem elicitation, goal formulation, etc., can sound so vague compared to harder-edged topics such as optimization, computing, and probability theory. But all these things are important, and I think it does help to think them through, in specific examples and more generally.

Statistics is a branch of engineering.

Read the whole story
jthaman
38 days ago
reply
Maryland
Share this story
Delete

Statistical Concepts in Their Relation to Reality–E.S. Pearson

1 Share

11 August 1895 – 12 June 1980

This is my third and final post marking Egon Pearson’s birthday (Aug. 11). The focus is his little-known paper: “Statistical Concepts in Their Relation to Reality” (Pearson 1955). I’ve linked to it several times over the years, but always find a new gem or two, despite its being so short. E. Pearson rejected some of the familiar tenets that have come to be associated with Neyman and Pearson (N-P) statistical tests, notably the idea that the essential justification for tests resides in a repeated applications or long-run control of rates of erroneous interpretations–what he termed the “behavioral” rationale of tests. In an unpublished letter E. Pearson wrote to Birnbaum (1974), he talks about N-P theory admitting of two interpretations: behavioral and evidential:

“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.

(Nowadays, it might be said that some people concentrate to an absurd extent on “science-wise error rates” in their view of statistical tests as dichotomous screening devices.)One of the best sources of E.S. Pearson’s statistical philosophy is his (1955) “Statistical Concepts in Their Relation to Reality”. It’s his response to Fisher (1955), the first part of what I call the “triad”). It begins like this:

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data.  We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done.  If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955 “Scientific Methods and Scientific Induction” ), it is impossible to leave him altogether unanswered.

In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect.  There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”. There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans.  It was really much simpler–or worse.  The original heresy, as we shall see, was a Pearson one!…

You can read “Statistical Concepts in Their Relation to Reality” HERE.

What was the heresy, really? Pearson doesn’t mean it was he who endorsed the behavioristic model that Fisher is here attacking.[i] The “original heresy” refers to the break from Fisher in the explicit introduction of alternative hypotheses (even if only directional). Without considering alternatives, Pearson and Neyman argued, statistical tests of significance are insufficiently constrained–for evidential purposes! Note: this does not mean N-P tests give us merely a comparativist appraisal (as in a report of relative likelihoods!)

But it’s a mistake to suppose that’s all that an inferential or evidential formulation of statistical tests requires. What more is required comes out in my deconstruction of those famous (“miserable”) passages found in the key Neyman and Pearson 1933 paper. We acted out the play I wrote for SIST (2018) in our recent Summer Seminar in Phil Stat. The participants were surprisingly good actors!

Granted, these “evidential” attitudes and practices have never been explicitly codified to guide the interpretation of N-P tests. Doing so is my goal in viewing “Statistical Inference as Severe Testing”.

Notice, by the way, Pearson’s discussion and extension of Fisher’s construal of differences that are not statistically significant on p. 207:

These points might have been helpful to those especially concerned with mistaking non-statistically significant differences as supposed “proofs of the null”.

Share your comments.

“The triad”:

I’ll post some other Pearson items over the week. 

[i] Fisher’s tirades against behavioral interpretations of “his” tests are almost entirely a reflection of his break with Neyman (after 1935) rather than any radical disagreement either in philosophy or method. Fisher could be even more behavioristic in practice (if not in theory) than Neyman, and Neyman could be even more evidential in practice (if not in theory) than Fisher. Moreover, it was really when others discovered Fisher’s fiducial methods could fail to correspond to intervals with valid error probabilities that Fisher began claiming he never really was too wild about them! (Check fiducial on this blog and in Excursion 5 of SIST.) Contemporary writers tend to harp on the so-called “inconsistent hybrid” combining Fisherian and N-P tests. I argue in SIST that it’s time to dismiss these popular distractions: they are serious obstacles to progress in statistical understanding. Most notably, Fisherians are kept from adopting features of N-P statistics, and visa versa (or they adopt them improperly). What matters is what the methods are capable of doing!  For more on this, see the post “it’s the methods, stupid!” and excerpts from Excursion 3 of SIST. Thanks to CUP, my full book, corrected, can still be downloaded for free until August 31, 2022 at

https://www.cambridge.org/core/books/statistical-inference-as-severe-testing/D9DF409EF568090F3F60407FF2B973B2

References

Lehmann, E. (1997). Review of Error and the Growth of Experimental Knowledge by Deborah G. Mayo,  Journal of the American Statistical Association, Vol. 92.

Also of relevance:

Erich Lehmann’s (1993), “The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two?“. Journal of the American Statistical Association, Vol. 88, No. 424: 1242-1249.

Mayo, D. (1996), “Why Pearson Rejected the Neyman-Pearson (Behavioristic) Philosophy and a Note on Objectivity in Statistics” (Chapter 11) in Error and the Growth of Experimental Knowledge. Chicago: University of Chicago Press. [This is a somewhat older view of mine; a newer view is in SIST below.]

Mayo, D. (2018). Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars. (SIST) CUP.







Read the whole story
jthaman
39 days ago
reply
Maryland
Share this story
Delete

Assumption‐lean inference for generalised linear model parameters

1 Share
Abstract Inference for the parameters indexing generalised linear models is routinely based on the assumption that the model is correct and a priori specified. This is unsatisfactory because the chosen model is usually the result of a data‐adaptive model selection process, which may induce excess uncertainty that is not usually acknowledged. Moreover, the assumptions encoded in the chosen model rarely represent some a priori known, ground truth, making standard inferences prone to bias, but also failing to give a pure reflection of the information that is contained in the data. Inspired by developments on assumption‐free inference for so‐called projection parameters, we here propose novel nonparametric definitions of main effect estimands and effect modification estimands. These reduce to standard main effect and effect modification parameters in generalised linear models when these models are correctly specified, but have the advantage that they continue to capture respectively the (conditional) association between two variables, or the degree to which two variables interact in their association with outcome, even when these models are misspecified. We achieve an assumption‐lean inference for these estimands on the basis of their efficient influence function under the nonparametric model while invoking flexible data‐adaptive (e.g. machine learning) procedures.
Read the whole story
jthaman
60 days ago
reply
Maryland
Share this story
Delete

P-values don’t measure evidence

1 Share
.
Read the whole story
jthaman
88 days ago
reply
Maryland
Share this story
Delete
Next Page of Stories