How to read | Bangor Language Production Lab

When I first started reading experimental research articles, I would spend hours poring over every word, but when I'd finished I'd often have little idea what specifically the researchers did, or what big ideas were driving the studies. Such difficulties are understandable: original research articles usually represent a different genre of writing than anything a student has previously encountered. They present novel technical terms and opaque acronyms. They may describe only a small tweak on an existing experimental paradigm. And they are occasionally so poorly written that one can only speculate about how they passed peer review (perhaps the reviewers a.) already knew what they paper was trying to say, or b.) were too insecure to admit not understanding).

It's easier if you can approach an article with specific questions in mind. Below, I've listed some of the questions that I use to approach an article, hoping that it can provide students with some structure to approach their own reading.

Comments are welcome.

Why is this experiment being done?

Good research usually is driven by a BIG question (e.g., Does your language change the way you think?), but only really addresses a little question (e.g., Do people who speak Greek respond differently to certain shades of blue?).

What is the BIG question motivating this experiment? (hint: this is probably a few steps bigger and more abstract than whatever's listed in the title)
What is the little question that this experiment was designed to address? (hint: this is usually a specific pattern that you could assess with an individual statistical test)

What is the general experimental paradigm?

It's rare for researchers to design an experimental method de novo. Usually they'll use a well-established technique (which you might recognize from previous readings), but tweak it a bit or use different stimuli or different measures, expecting to see different results.

If you've encountered the paradigm before, where else have you seen it?

What kinds of questions has it been used to address?
What are some of it's strengths and weaknesses?
How does this use of the paradigm differ from what's been done in the past?

What kinds of results did the authors expect to see?

Researchers often run experiments expecting to either support or challenge some pet theory. Recognizing their biases can help you approach their claims with appropriate skepticism.

Usually there are several theories that make contrasting predictions. Describe these theories.

Sometimes a person who wants to discredit a theory will start by misrepresenting it (a straw man argument). Do all of these theories sound reasonable, from the authors' descriptions? (hint: look out for strong dichotomies -- real theories are often more nuanced than that)

Do the claimed predictions make sense for the theories that the authors have described?

Do the theories necessarily predict these results?

Are there any possible results that no theory would predict?
Disregarding the theories and the author’s predictions, what would your own gut feeling predict?

How was the experiment designed?

Who were the subjects?

How many subjects participated?

What was the task?

How long did the task take?
How did they instruct the subjects?
How did they present the experiment (e.g. computer display)?

What was the specific experimental manipulation?

What were their baseline and experimental conditions?
How did the counterbalance their conditions?

Within-subjects or between subjects?
Within-items or between-items?

What would it have felt like to participate in this experiment?

If you were a subject, how would you behave in the experiment?
What kind of data do you think your behaviour would generate?

What was the form of the data (what’s the outcome variable and how is it measured)?
How was the data collected (what tools were used)?
Sometimes an experiment will give you results you want, but for the wrong reasons. What are some ways that could happen here? Do any of them seem especially problematic?
What about the possibility that the experiment could give you the wrong results for the wrong reasons?

How good is the data?

How much data is there?

How many independent data points are there?
If this is a repeated-measures study, how many measures were collected from each subject/item?

Note that with binomial data (e.g. error or accuracy) more trials don't necessarily give you more power; the count of the less-frequent outcome (e.g. number of speech errors) is usually a better indicator.

How have the data been statistically analysed?

Do you think that way of analysing them is appropriate?

Do the claimed patterns in the data look convincing?

Are the statistical analyses appropriate for testing these claimed patterns?

Note: be very skeptical of any claim that a pattern does not exist; even professionals are unfortunately bad at interpreting statistical non-significance, and almost never apply appropriate tests to demonstrate the absence of a difference.

How big are the effects? (in terms of milliseconds, proportions, or percent of variance that they account for)

Is this effect a major determiner of observed behaviour, or is it more important for understanding how the mind works? (hint: if it's a very large study (e.g. more than a few hundred participants, like some sex difference studies), it can be easy to find a statistically significant difference that means almost nothing for day-to-day life.
How big are the confidence intervals, compared to the claimed effect size?

What does it all mean?

How does the data relate to the little question motivating the experiment?

Do you think the data means what the authors say it means?
How well has the experiment isolated the process that it claims to address?

How could you do it better?

What is another possible reason that the data could show the patterns that it does?

What additional experiment could tell you if this is true?

How does the data relate to the BIG question motivating the experiment?

How does it fit with the other things that you know that might be relevant to that question?

What does this experiment really tell us?

How might this change the way we live?

What is especially awesome about this experiment?
Do you have any lingering concerns?

Why would you cite this paper?

You can think of the papers that you read as building blocks for constructing your own stories, arguments, or experiments. Usually you'd cite a paper for particular theoretical claims, empirical findings, or methodology.

Does it help fill or expose any gaps in your knowledge?

Does it test ideas that you thought were probably true, but no one had actually tested before?

Does it tell an interesting story?
Does it simplify or complicate the stories that you're concerned with?
Does it use techniques that you would want to use in your own work?
How can you imagine this article coming up in conversation?

A note on dealing with acronyms (and other semantically opaque terms)

Acronyms are quite common in research articles, which is unfortunate because they present substantial barriers to new readers' comprehension and critical thinking. Though such abbreviations are undoubtedly useful when typing, an author who actually wants to communicate should always run a find-and-replace before submitting a paper for peer review.

So how can the reader cope when an author has forgotten to run the find-and-replace? My suggestion is to keep a notebook with three columns. When you run across an abbreviation or semantically opaque term for the first time in an article, copy it to the first column. In the next column, write the authors' translation of the term, if they have been kind enough to supply it (otherwise, wikipedia or google it). In the final column, write out your own definition -- something that actually makes sense to you in the current context. You can refer back to these translations as needed while reading the current article. If you encounter the term again in a later article, check your notebook to see if they're using it in the same way, or if you need to revise your definition (careful, though: researchers don't always use terms in the same way across subdomains or even across articles, and sometimes they don't even recognize the inconsistency themselves).

A note on dealing with equations

It's tempting to skip over any equations that appear in an article, as if they were pretentious quotes in a foreign language that only the author speaks. But it really is worth taking the time to work through equations, because they are perhaps the clearest, least ambiguous part of any article. Here's a basic strategy that should get you started with most simple equations:

First, just look at the structure of the equation (not worrying about the variables) and ask yourself if it (or any part of it) reminds you any other equations you know. There are usually a few basic equations that get adapted and re-used a lot in any field (e.g. the Luce choice rule), so you'll have a head start if you can learn to recognize them by structure.
Second find the definitions for each variable, and mark-up or re-write the equation in terms of those variables. You might need a whole sentence to define a variable; that's okay, just make sure you understand what those components mean.
Third, look for any parts that you can conceptually condense. For instance, ([target activation] - [actual activation]) can be condensed as 'activation error'. If the author has already done this condensing for you, then go the other way, breaking up the terms so you're sure you know what they refer to.
Fourth, look at what basic operations are used to relate the variables. What would happen if one of the variables got very large or very small? Addition or subtraction means something very different than multiplication or division.
Fifth, consider any transformations of the quantities. These might adjust the linearity of a scale (e.g. log, exp), perform a simple calculation (e.g. a derivative refers to the rate of change), or refer to a repeating process (e.g. summation)
Finally, try to verbally describe the equation to yourself, in your own words.

A note on reading clumbsy statistical prose

Somehow, between copyeditors' resistence to nested parentheses and ANOVA's inabilty to detect directional differences, our Results sections often end up unreadable. In your own writing, I generally recommend presenting regression tables if you need to and primarily using your text to make a clear, applied claim (e.g. "Speakers named pictures more slowly in the semantically related blocks than in the semantically unrelated,") and then follow it up with appropriate parenthetical descriptions of your statistical tests at the end of the sentence (e.g. "(fixed effect of semantic context: B=0.2, p=.01)). It's amazing how speaking like a real person and focusing on your story can facilitate communication!

But what if you have to slog through pages of Greek and follow-up t-tests? You need a plan. First, remember that every statistical test is asking a very specific question, and should be done for a specific reason. Usually, the authors would have made it clear in their intro that they were conducting the experiment to address one particular question (e.g. "Is the phonemic similarity effect stronger for self-reported speech errors from overt speech than it is for self-reported speech errors in inner speech?"), which then identifies one particular statistical test as the main point of their article (e.g. a directional similarity * overtness interaction). Identify what that crucial test should be, and then skim the Greco-English until you find that test.

Then you can browse the other analyses, but make sure you're always asking yourself, "Why are they reporting this analysis?" Usually there will be a lot of housekeeping and descriptive stuff at the beginning -- things whose main point is to convince readers that your data and tests are valid. You might also get some reports of expected effects -- it's not surprising that they found these expected effects, but reporting them could be useful for meta-analyses, and it helps convince us that the experiments and data are generally okay (e.g. their subjects were probably doing the task instead of just mashing the spacebar). And then at the end there will often be some exploratory stuff -- these analyses probably weren't planned when designing the experiment, but they give the authors a chance to either geek out about their data or report converging evidence for their main interpretations (e.g. "In line with our interpretation that a platypus is a mammal, we also found that they give birth to live young, p<.05").

A note on interpreting non-significant effects

Even professional researchers can be surprisingly awful at interpreting statistics, so it's up to you as a reader to supply the skepticism for their claims. This is never more true than when reporting non-significant effects. A suprising number of research articles misinterpret p-values of greater than .05 as evidence supporting a null hypothesis (that the size of the difference in question equals zero, e.g. "there was no effect of similarity, p=.06, proving that any apparent difference was merely due to random chance,"). But, having taken introductory statistics, you know better, right? You'll recall from your intro stats class that a p-value reflects a specific estimated probability: the probably of the observed data pattern (or patterns even more extreme), given that the null hypothesis is true. This definition means that a p-value of .049 means that the data are very unlikely under the null hypothesis (you'd only expect to see such an extreme pattern about one time out of twenty). But what about a p-value of .051? It's slightly weaker evidence, sure, but it means that the observed data are still quite unlikely under the null hypothesis. When interpreting stats, my advice is to de-emphasise the magical p<.05 cutoff and instead approach it as, "How would I bet $100?" Properly statistically testing for equality is difficult. Most methods require a researcher to identify some expected effect size, and then try to show that the actual effect size is smaller than that. That includes power-based calculations, two-one-sided-t-tests, and Bayesian model comparison. Even quite small effects can be theoretically important, though—and the standard ‘small’ effect sizes used in power calculations are often quite huge—so you need to carefully consider just how small an effect you are willing to ignore. Thus, arguing for the absence of an effect might incorporate some statistical test, but is often more about convincing a reader that your methods as a whole should have been sufficient to detect that effect in particular.