Papers
arxiv:2504.14571

Prompt-Hacking: The New p-Hacking?

Published on Apr 20, 2025
Authors:
,

Abstract

As Large Language Models (LLMs) become increasingly embedded in empirical research workflows, their use as analytical tools for quantitative or qualitative data raises pressing concerns for scientific integrity. This opinion paper draws a parallel between "prompt-hacking", the strategic tweaking of prompts to elicit desirable outputs from LLMs, and the well-documented practice of "p-hacking" in statistical analysis. We argue that the inherent biases, non-determinism, and opacity of LLMs make them unsuitable for data analysis tasks demanding rigor, impartiality, and reproducibility. We emphasize how researchers may inadvertently, or even deliberately, adjust prompts to confirm hypotheses while undermining research validity. We advocate for a critical view of using LLMs in research, transparent prompt documentation, and clear standards for when LLM use is appropriate. We discuss how LLMs can replace traditional analytical methods, whereas we recommend that LLMs should only be used with caution, oversight, and justification.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2504.14571 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2504.14571 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.