When I first started reading experimental
research articles, I would spend hours poring over every word, but when
I'd finished I'd often have little idea what specifically the
researchers did, or what big ideas were driving the studies. Such
difficulties are understandable: original research articles usually
represent a different genre of writing than anything a student has previously
encountered. They present novel technical terms and opaque
acronyms. They may describe only a small tweak on an existing
experimental paradigm. And they are occasionally so poorly written
that one can only speculate about how they passed peer review (perhaps
the reviewers a.) already knew what they paper was trying to say, or
b.) were too insecure to admit not understanding).
It's easier if you can approach an
article with specific questions in mind. Below, I've listed some of the
questions that I use to approach an article, hoping that it can provide
students with some structure to approach their own reading.
Comments are
welcome.
Why is this experiment being done?
Good research usually is driven by a BIG question
(e.g., Does your language change the way you think?), but only
really addresses a little question (e.g., Do people who speak Greek
respond differently to certain shades of blue?).
- What is the BIG question motivating this experiment?
(hint: this is probably a few steps bigger and more abstract than
whatever's listed in the title)
- What is the little question that this experiment was
designed to address? (hint: this is usually a specific pattern that you could assess with an individual statistical test)
What is the general experimental paradigm?
It's rare for researchers to design an experimental
method de novo. Usually they'll use a well-established technique (which
you might recognize from previous readings), but tweak it a bit or use
different stimuli or different measures, expecting to see different
results.
- If you've encountered the paradigm before, where else
have you seen it?
- What kinds of questions has it been used to address?
- What are some of it's strengths and weaknesses?
- How does this use of the paradigm differ from
what's been done in the past?
What kinds of results did the authors expect to see?
Researchers often run experiments expecting to
either support or challenge some pet theory. Recognizing their biases
can help you approach their claims with appropriate skepticism.
- Usually there are several theories that make
contrasting predictions. Describe these theories.
- Sometimes a person who wants to discredit a theory
will start by misrepresenting it (a straw man argument). Do all of
these theories sound reasonable, from the authors' descriptions? (hint:
look out for strong dichotomies -- real theories are often more nuanced
than that)
- Do the claimed predictions make sense for the
theories that the authors have described?
- Do the theories necessarily predict these results?
- Are there any possible results that no theory would
predict?
- Disregarding the theories and the author’s
predictions, what would your own gut feeling predict?
How was the experiment designed?
- Who were the subjects?
- How many subjects participated?
- What was the task?
- How long did the task take?
- How did they instruct the subjects?
- How did they present the experiment (e.g. computer
display)?
- What was the specific experimental manipulation?
- What were their baseline and experimental
conditions?
- How did the counterbalance their conditions?
- Within-subjects or between subjects?
- Within-items or between-items?
- What would it have felt like to participate in this
experiment?
- If you were a subject, how would you behave in the
experiment?
- What kind of data do you think your behaviour would
generate?
- What was the form of the data (what’s the outcome
variable and how is it measured)?
- How was the data collected (what tools were used)?
- Sometimes an experiment will give you results you
want, but for the wrong reasons. What are some ways that could happen
here? Do any of them seem especially problematic?
- What about the possibility that the experiment could
give you the wrong results for the wrong reasons?
How good is the data?
- How much data is there?
- How many independent data points are there?
- If this is a repeated-measures study, how many
measures were collected from each subject/item?
- Note that with binomial data (e.g. error or
accuracy) more trials don't necessarily give you more power; the count
of the less-frequent outcome (e.g. number of speech errors) is usually
a better indicator.
- How have the data been statistically analysed?
- Do you think that way of analysing them is
appropriate?
- Do the claimed patterns in the data look convincing?
- Are the statistical analyses appropriate for
testing these claimed patterns?
- Note: be very
skeptical of any claim that a pattern does not exist; even
professionals are unfortunately bad at interpreting statistical
non-significance, and almost never apply appropriate tests to
demonstrate the absence of a difference.
- How big are the effects? (in terms of milliseconds,
proportions, or percent of variance that they account for)
- Is this effect a major determiner of observed
behaviour, or is it more important for understanding how the mind
works? (hint: if it's a very large study (e.g. more than a few hundred
participants, like some sex difference studies), it can be easy to find
a statistically significant difference that means almost nothing for
day-to-day life.
- How big are the confidence intervals, compared to
the claimed effect size?
What does it all mean?
- How does the data relate to the little question
motivating the experiment?
- Do you think the data means what the authors say it
means?
- How well has the experiment isolated the process
that it claims to address?
- How could you do it better?
- What is another possible reason that the data could
show the patterns that it does?
- What additional experiment could tell you if this
is true?
- How does the data relate to the BIG question
motivating the experiment?
- How does it fit with the other things that you know
that might be relevant to that question?
- What does this experiment really tell us?
- How might this change the way we live?
- What is especially awesome about this experiment?
- Do you have any lingering concerns?
Why would you cite this paper?
You can think of the papers that you read as
building blocks for constructing your own stories, arguments, or experiments.
Usually you'd cite a paper for particular theoretical claims, empirical
findings, or methodology.
- Does it help fill or expose any gaps in your
knowledge?
- Does it test ideas that you thought were probably
true, but no one had actually tested before?
- Does it tell an interesting story?
- Does it simplify or complicate the stories that
you're concerned with?
- Does it use techniques that you would want to use in
your own work?
- How can you imagine this article coming up in
conversation?
A note on dealing with acronyms (and other semantically
opaque terms)
Acronyms are quite common in research articles, which is
unfortunate because they present substantial barriers to new readers'
comprehension and critical thinking. Though such abbreviations are
undoubtedly useful when typing, an author who actually wants to
communicate should always run a find-and-replace before submitting a
paper for peer review.
So how can the reader cope when an author has forgotten to run the
find-and-replace? My suggestion is to keep a notebook with three
columns. When you run across an abbreviation or semantically
opaque term for the first time in an article, copy it to the first
column. In the next column, write the authors' translation of the
term, if they have been kind enough to supply it (otherwise, wikipedia
or google it). In the final column, write out your own definition
-- something that actually makes sense to you in the current context.
You can refer back to these translations as
needed while reading the current article. If you encounter the
term again in a later article, check your notebook to see if they're
using it in the same way, or if you need to revise your definition
(careful, though: researchers don't always use terms in the same way
across subdomains or even across articles, and sometimes they don't
even recognize the inconsistency themselves).
A note on dealing with equations
It's tempting to skip over any equations that
appear in an article, as if they were pretentious quotes in a foreign
language that only the author speaks. But it really is worth taking the
time to work through equations, because they are perhaps the clearest,
least ambiguous part of any article. Here's a basic strategy that
should get you started with most simple equations:
- First, just look at the structure of the equation
(not worrying about the variables) and ask yourself if it (or any part
of it) reminds you any other equations you know. There are usually a
few basic equations that get adapted and re-used a lot in any field
(e.g. the Luce choice rule), so you'll have a head start if you can
learn to recognize them by structure.
- Second find the definitions for each variable, and
mark-up or re-write the equation in terms of those variables. You
might need a whole sentence to define a variable; that's okay, just
make sure you understand what those components mean.
- Third, look for any parts that you can conceptually
condense. For instance, ([target activation] - [actual activation])
can be condensed as 'activation error'. If the author has already done
this condensing for you, then go the other way, breaking up the terms
so you're sure you know what they refer to.
- Fourth, look at what basic operations are used to
relate the variables. What would happen if one of the variables got
very large or very small? Addition or subtraction means something very
different than multiplication or division.
- Fifth, consider any transformations of the
quantities. These might adjust the linearity of a scale (e.g. log,
exp), perform a simple calculation (e.g. a derivative refers to the
rate of change), or refer to a repeating process (e.g. summation)
- Finally, try to verbally describe the equation to
yourself, in your own words.
A note on reading clumbsy statistical prose
Somehow, between copyeditors' resistence to nested parentheses and
ANOVA's inabilty to detect directional differences, our Results
sections often end up unreadable. In your own writing, I generally
recommend presenting regression tables if you need to and primarily
using your text to make a clear, applied claim (e.g. "Speakers named
pictures more slowly in the semantically related blocks than in the
semantically unrelated,") and then follow it up with appropriate
parenthetical descriptions of your statistical tests at the end of the
sentence (e.g. "(fixed effect of semantic context: B=0.2, p=.01)). It's amazing how speaking
like a real person and focusing on your story can facilitate
communication!
But what if you have to slog through pages of Greek and follow-up
t-tests? You need a plan. First, remember that every statistical test
is asking a very specific question, and should be done for a specific
reason. Usually, the authors would have made it clear in their intro
that they were conducting the experiment to address one particular
question (e.g. "Is the phonemic similarity effect stronger for
self-reported speech errors from overt speech than it is for
self-reported speech errors in inner speech?"), which then identifies
one particular statistical test as the main point of their article
(e.g. a directional similarity * overtness interaction). Identify what
that crucial test should be, and then skim the Greco-English until you
find that test.
Then you can browse the other analyses, but make sure you're always
asking yourself, "Why are they reporting this analysis?" Usually there
will be a lot of housekeeping and descriptive stuff at the beginning --
things whose main point is to convince readers that your data and tests
are valid. You might also get some reports of expected effects -- it's
not surprising that they found these expected effects, but reporting
them could be useful for meta-analyses, and it helps convince us that
the experiments and data are generally okay (e.g. their subjects were
probably doing the task instead of just mashing the spacebar). And
then at the end there will often be some exploratory stuff -- these
analyses probably weren't planned when designing the experiment, but
they give the authors a chance to either geek out about their data or
report converging evidence for their main interpretations (e.g. "In
line with our interpretation that a platypus is a mammal, we also found
that they give birth to live young, p<.05").
A note on interpreting non-significant effects
Even professional researchers can be surprisingly awful at
interpreting
statistics, so it's up to you as a reader to supply the skepticism for
their claims. This is never more true than when reporting
non-significant effects. A suprising number of research articles
misinterpret p-values of greater than .05 as evidence supporting a
null hypothesis (that the size of the difference in question equals
zero, e.g. "there was no effect of similarity, p=.06, proving that any
apparent difference was merely due to random chance,"). But, having
taken introductory statistics, you know better,
right? You'll recall from your intro stats class that a p-value
reflects a specific estimated probability: the probably of the observed
data pattern (or patterns even more extreme), given that the null
hypothesis is
true. This definition means that a p-value of .049 means that the data
are very unlikely under the null hypothesis (you'd only expect to see
such an extreme pattern about one time out of twenty). But what
about a p-value of .051? It's slightly weaker evidence, sure, but it
means that the observed data are still quite unlikely under the null
hypothesis. When interpreting stats, my advice is to de-emphasise the
magical p<.05 cutoff and instead approach it as, "How would I bet
$100?"
Properly statistically testing for equality is difficult. Most methods
require a researcher to identify some expected effect size, and then
try to show that the actual effect size is smaller than that. That includes
power-based calculations, two-one-sided-t-tests, and Bayesian model
comparison. Even quite small effects can be theoretically important, though—and
the standard ‘small’ effect sizes used in power calculations are often quite
huge—so you need to carefully consider just how small an effect you are willing
to ignore. Thus, arguing for the absence of an effect might incorporate some
statistical test, but is often more about convincing a reader that your methods as a whole
should have been sufficient to detect that effect in particular.