Digging Deeper from the Radical to the Reasoned: p-value Alternatives Offered by Experts in NISS Webinar

On November 19, 2019 about 400 attendees dialed in to hear from three speakers assembled by NISS to provide insight into their personal thoughts regarding next steps for how to recoup what utility is left from the well-known, perhaps little understood and decidedly mis-used measurement, the traditional p-value.  Each of these three speakers had a paper published in the special issue of The American Statistician entitled "Statistical Inference in the 21st Century: A World Beyond p < 0.05" that focused on this topic.  This webinar allowed each of these authors explain what they thought might be viable alternatives and served as a follow up to the very popular May 23, 2019 NISS hosted webinar that discussed the major ideas covered in some of the papers in the special issue. 

The authors featured in this follow-up webinar included Dr. Jim Berger (Arts and Sciences Professor of Statistics at Duke University), Dr. Sander Greenland (Emeritus Professor of Epidemiology and Statistics at the University of California, Los Angeles) and Robert Matthews (Visiting Professor in the Department of Mathematics at Aston University, Birmingham, UK). The webinar was moderated by Dr. Dan Jeske from UC Riverside and editor of The American Statistician.

Dr. Berger recognized the need for immediate action but also realized that using a new process would take time.  In the interim he made three recommendations.  His first recommendation: “If using the current language of ‘statistical significance’ for a novel discovery, replace the 0.05 threshold with 0.005. Refer to discoveries with a p-value between 0.05 and 0.005 as ‘suggestive,’ rather than ‘significant.’ ” He then gave two other recommendations, the second providing support for the first. The second suggestion was to improve understanding by converting a p-value for a point null hypothesis H0 (which is incredibly difficult to interpret) into a lower bound on the data-based odds of H0 to the alternative hypothesis, under an extremely general class of prior distributions, via the formula −e∙p∙loge(p).  The third recommendation was to incorporate prior odds of hypotheses into the analysis since, in today’s world of multiple testing (e.g., does eating broccoli cause brain cancer, as one of thousands of tests of food/disease) one must often give high odds to the null hypothesis of ‘no effect.’ (see Berger’s original paper).

Dr. Greenland followed with a much more aggressive stance regarding change calling for scientists and statisticians to “rebuild statistics as an information science, not a branch of probability theory, with cognitive science as a core component.”  He went on to illustrate how, “Blind acceptance of mathematical frameworks, deification of “great men” and their conceptual errors, and neglect of cognitive problems have rotted the core of statistical training and research practice.”  He advised converting the p-value for a hypothesis H into a surprisal or s-value −log2(p) as a measure of the information against H supplied by the p-value, and then graphing or tabulating the s-values as H is varied across alternatives.  He also advised incorporating causal models and tests of interval hypotheses into basic statistical training, and treating any p-value as a test of an entire models (including all assumptions made by the test) rather only testing the targeted test hypothesis H.  (see Greenland’s original paper)

The final speaker was Robert Matthews.  He suggested that the research community’s reluctance to abandon p-values and confidence intervals called for a “pragmatic” approach to initiating change, based around making “p-values and 95% confidence intervals more informative, less prone to misinterpretation and more nuanced in their implications.” He walked through a process he calls Analysis of Credibility (AnCred) for extracting more information from standard data summaries and reducing the risk of mis-interpretation of evidence.  AnCred takes a significant or non-significant finding and determines the level of prior support needed for the finding to also be deemed credible. This allows the notorious “true/false” dichotomization of findings to be replaced by nuanced, quantitative discussion of whether the required prior support can be justified.   (see Matthews’s original paper)

One common thread that each of these authors agreed upon was the need to expand this discussion to include the editors and editorial teams of various research journals to reform the requirements that are imposed upon prospective authors before claiming they have found evidence for or against important effects.  As a neutral party, NISS is in a good position to bring informed parties together.  If you think that this is a discussion that you would like to be involved in or that you would like to see move forward, please contact NISS.

As you might expect, this news summary above cannot begin to report all of the details that each speaker offered during the session, nor their responses to the myriad of provocative questions that were asked of these individuals.  For this reason, NISS is embedding a recording of the session below along with links to the slides that each of the speakers used.  And, please be sure to follow up the links to the original papers published in the special issue of The American Statistician as well.

Please feel free to share this link with others.

Speaker Slides

Berger_3_recommendations.pdf

Greenland_slides-11-2019.pdf

Matthews_NISS 19 Nov 2019.pdf

Wednesday, November 20, 2019 by Glenn Johnson