My Meta-analysis Code. Thank You R!

As promised, here’s my meta-analysis code that I wrote for my Masters thesis.

If you’re not sure what a meta-analysis is, but want to conduct one click here to find out. Feel free to use whatever code you like. All I kindly ask in return is that you cite it appropriately 🙂

Bit of info…

I used the free statistical software programme R to write and analyse my data. It’s not only FREE, but it was created for students (like my former self), who don’t have a degree in programming but want to code. You can download R and R Studio (a nice GUI for those who like GUIs) for both PC and Mac. Once R is up and running, download and run the CRAN packages MAcmetafor, dplyrHmisc and ggplot.

You can use this meta-analysis code to do the following:

  • Compute correlations from raw data
  • Convert effect sizes into correlation coefficients r
  • Aggregate effect sizes per study
  • Calculate the summary (overall) effect size and heterogeneity (variance)
  • Check for publication bias and outliers in your data
  • Conduct a moderator analysis

Need some extra guidance? No problem – read on.


What is a meta-analysis?

A meta-analysis is an analysis done on multiple studies that researched the same question. I’m going to use a similar example from Andy Field’s meta-analysis guide as it made me chuckle. Say you want to see if listening to Gothic music makes you a Satan worshiper and there just so happens to be hundreds of studies published (and a few unpublished) on this topic. Doing one more study is not going to make a dent, but conducting a meta-analysis on these studies will not only tell you if there’s a relationship based on the previous research, but you can also look at the differences between and within these studies, and if these differences influence the findings!

You can conduct a meta-analysis to clarify:

  • If there is a relationship, or effect, between listening to Gothic music and being a Satan worshiper
  • How big the effect is (in other words, will listening to Gothic music make everyone a Satan worshiper, or just a few?)
  • What factors influence this effect, i.e. does it depend on the band, the album, the track, or were participants already possessed by Satan?

This is why meta-analyses are cool; you’re not just asking, “Is there an effect?”, you’re asking the “why?” along with it.

I’ve outlined the major steps and also included references to guidelines and explanations if you’re hungry for more info.

Back to the top


Pick your research question

Done – does Gothic music make you a Satan worshiper?


Study selection

Pick your studies that also researched this question. This will take you the longest to do. Go to Google Scholar and type in a bunch of keywords, e.g. Gothic music, Satan worshiper, etc. Get a hold of as many studies as you can. Then check the abstracts to see if their research question matches yours (titles can be misleading).

Then come up with inclusion (or exclusion) criteria, e.g. participants must be a current Satan worshiper, participants must have listened to Gothic music prior to becoming a Satan worshiper, if study’s definition of Satan worshiper matches your definition, and so on. You can also ask the researcher(s) for unpublished data. If they send it to you, thank them and check it. If the data looks dodgy that’s probably why they didn’t publish it. You don’t want unusable data in your meta-analysis, so just take what’s relevant. Be vigilant!

Just as a side note, don’t worry if you don’t find hundreds of studies that match your research question. You technically only need two studies to do a meta-analysis (Valentine et al., 2010). It’s far more important that you select the right studies than make up the numbers with the wrong studies.

Now it’s time to create your own data table from your included studies! Note down the following:

  1. Author(s) & year
  2. Effect size
  3. Effect size measure (e.g. correlation coefficient, t-value, F-value, etc.)
  4. Sample size, i.e. number of participants
  5. Note the study characteristics or factors for each effect size*

*Most studies report more than one effect size, so it’s important to note the different factors that distinguishes each effect size, e.g. Gothic band, Satanic behaviour. You can also include demographic details, like country of testing,  participants age, gender, ethnicity, etc. as well as statistical analyses used. You will then analyse which of these factors to see if they have an influence on the effect in the moderator analysis . But more about this later… 

Back to the top


Coding time!

Before you jump into the analysing stuff, you’ll need to convert all the raw data and the effect sizes into one measurement scale. I’d recommend converting everything into correlation coefficients r as they’re versatile and it cuts out a bunch of other conversion code that you’d have to use later on.


Compute correlations from raw data

Load your data as a .csv or .txt file into R.

Run the function:

Corr_data <- rcorr(as.matrix(data), type = "pearson")

You can change the correlation type to Spearman. If you’re not sure which one to choose, see Laerd Statistics guides on Pearson and Spearman correlations.

Check that the structure is correct with the function:

str(Corr_data)

Lastly, create a data frame of just the correlations r and save out your results using the code:

df_Results_Corr = data.frame(Corr_data$r)
write.csv(df_Results_Corr, "dataResults_Corr.csv")

Back to the top


Convert effect sizes into correlation coefficients

Once you’ve found all the effect sizes within your studies and compiled them in your table, you’ll probably notice that they all have different scales of measurement, e.g. correlation coefficient r, Cohen’s d,  t-test effect sizes, F-test effect sizes, means and standard deviations (SD), etc. I will just show how to convert t-test and F-test effect sizes into the correlation coefficient r because they’re the most commonly used statistics in Psychology, but you can find all the possible conversions in the MAc guidelines.

To make things easy you can convert all your effect sizes from one data table. Make sure that your data contains an id column, a column for the effect size (ES), effect size measure (ES_type) that contain the values t-test, F-test, etc., separate columns for each sample size (n_1 for t-tests and F-tests, n_2 for F-tests), and so on.


Convert t-values into correlation coefficients r

Add in your data manually or load the .csv or .txt file into R.

Select the rows that contain all the data for t-test effect sizes with the code:

filter_t_test <- filter(data, ES_type == "t-test")

Select the columns id, ES, and n_1 with the code:

data_t_stats <- select(filter_t_test, id, ES, n_1)

Then run the code to convert the t-test effect sizes into correlation coefficients r:

data_r_from_t <- r_from_t(data_t_stats$ES, data_t_stats$n_1)

This code is based on the function:

r_from_t(t, n)

where:

  • t (data_t_stats$ES) refers to the t-values
  • n (data_t_stats$n_1) refers to the sample size.

Combine the results with the t-test data into one data frame with the code:

df_r_from_t = data.frame(data_r_from_t)
df_dataResults_r_from_t <- data.frame(bind_cols(data_t_stats, df_r_from_t))

Next…


Convert F-values into correlation coefficients r

For F-tests, you’ll first have to convert them into Cohen’s d and then into r. Plus when converting your new d values into r, you’ll have to use separate functions depending on if your two groups have the same number of participants or different number of participants.


F-values with the same sample size in each group

Select the rows that contain all the data for F-test effect with the same sample size with the code:

filter_F_test_same <- filter(data, ES_type == "f-test", n_1 == n_2)

Select the columns id, ESn_1, n_2 with the code:

data_F_stats_same <- select(filter_F_test_same, id, ES, n_1, n_2)

Then run the code to convert the F-test effect sizes into Cohen’s d (d):

data_f_to_d <- f_to_d(data_F_stats_same$ES, data_F_stats_same$n_1, data_F_stats_same$n_2)

This code is based on the function:

f_to_d(f, n.1, n.2)

where:

  • f (data_F_stats_same$ES) refers to the F-values
  • n.1 (data_F_stats_same$n_1) refers to group 1 or your experimental group
  • n.2 (data_F_stats_same$n_2) refers to group 2 or your control group

Create data frame…

df_f_to_d = data.frame(data_f_to_d)

and combine the results with the F-test data into one data frame with the code:

dataResults_f_to_d <- bind_cols(data_F_stats_same, df_f_to_d)

Then run the code to convert d and the variance of d (var_d) into correlation coefficients r with the code:

data_r_from_d <- r_from_d(dataResults_f_to_d$d, dataResults_f_to_d$var_d, a = 4)

This code is based on the function:

r_from_d(d, var_d, a = 4)

(a is just used to compute the correlation and 4 is the default number.)

Combine the results with the F-test data into one data frame with the code:

df_r_from_d = data.frame(data_r_from_d)
df_dataResults_r_from_d <- data.frame(bind_cols(data_F_stats_same, df_r_from_d))

Next…


F-values with different sample sizes in each group

Select the rows that contain all the data for F-test effect with different sample sizes with the code:

filter_F_test_dif <- filter(data, ES_type == "f-test", n_1 != n_2)

Select the columns id, ESn_1, n_2 with the code:

data_F_stats_dif <- select(filter_F_test_dif, id, ES, n_1, n_2)

Then run the code to convert the F-test effect sizes into Cohen’s d (d):

data_f_to_d1 <- f_to_d(data_F_stats_dif$ES, data_F_stats_dif$n_1, data_F_stats_dif$n_2)

Create data frame…

df_f_to_d1 = data.frame(data_f_to_d1)

and combine the results with the F-test data into one data frame with the code:

dataResults_f_to_d1

Then run the code to convert d and the variance of d (var_d) into correlation coefficients r with the code:

dataResults_f_to_d1 <- bind_cols(data_F_stats_dif, df_f_to_d1)

This code is based on the function:

r_from_d1(d, n.1, n.2, var_d)

Combine the results with the F-test data into one data frame with the code:

df_r_from_d1 = data.frame(data_r_from_d1)
df_dataResults_r_from_d1 <- data.frame(bind_cols(data_F_stats_dif, df_r_from_d1))


Final data organising bit

Add all your new correlation coefficients r to your original data table. First combine all data results:

dataResults_r_all <- bind_rows(df_dataResults_r_from_t, df_dataResults_r_from_d, df_dataResults_r_from_d1)

Then select the columns r and id:

dataResults_r <- select(dataResults_r_all, id, r)

Arrange the order by id:

dataResults_r_order <- arrange(dataResults_r, id)

Combine r with your original data using the code:

dataResults_r_from_ES <- bind_cols(dat, (select(dataResults_r_order, r)))

Back to the top


Aggregate effect sizes per study

Most studies report more than one effect size, however including multiple effect sizes from the same study can cause that study to be biased. To counteract this, aggregate the effect sizes per study. Check that your data file contains the study ID as one ID per study, all the effect sizes for each study converted into (r), and the sample sizes for each effect size (n).

Load your data as a .csv or .txt file into R.

Run the function:

agg_data <- agg(id, r, n, cor = .50, mod = NULL, data = data)

You’ll then get the study ID (id), correlation coefficient (r), the sample size for each study (n), and the variance of r (var_r).

Compute variance of the correlation coefficient (var_r) and add variance column to the data results:

var_r_funcAggData <- var_r(agg_data$r, agg_data$n)
agg_data$var_r <- var_r_funcAggData

Select the columns Study and Year from original data:

Study_Year_all <- select(data, Study, Year)

Remove the duplicate rows:

Study_Year <- distinct(Study_Year_all)

And merge the columns Study and Year with aggregated data results

agg_data_Results <- bind_cols(Study_Year, agg_data)

Now that that’s all set up, it’s time to analyse the data!

Back to the top


Calculate the summary (overall) effect size and heterogeneity (variance)

First thing you need to do is choose your model. As a rule, researchers make assumptions about their data and this rule also applies to meta-analysts. The rule here is that you need to make one of two assumptions:

  • Assumption 1: The data from your studies shows a typical intervention effect and there were no other effects influencing your data, i.e. your studies are homogeneous.
  • Assumption 2: The data from your studies was sampled from a distributed population of studies (they were not conducted exactly same way), and the typical intervention effect cannot explain the overall effect, i.e. your studies are heterogeneous.

Most meta-analysts go for the more common assumption that their studies are heterogeneous and therefore choose to model their data on the random-effects model (and that’s what we’re going to do). There are also a number of benefits when modelling your data on the random-effects model, but one of the biggest benefits of using this model is that it reduces the likelihood of getting false positives. Bonus!


The stats

Run the function:

m0 <- mareg(r ~ 1, var = var_r, method = "REML", data = agg_data)
  • m0 is just the output and stands for “model 0”
  • r refers to the correlation coefficients r
  • var refers to var_r
  • ~ 1 roughly stands for predicted by with 1 meaning that it is an intercept-only model
  • method = “REML” refers to the restricted maximum likelihood estimator and is the default method when applying the random-effects model (Viechtbauer, 2010 on p. 12 gives you the other estimator options you can use).
  • data = agg_data just telling R to use your aggregated effect size data, not all the effect sizes for each study

Then run the function…

summary(m0)

This will display the results of the summary effect size (estimate), if the estimate is significant (p), and the confidence intervals (ci.l, ci.u). We now have our summary effect size, which is the overall effect size of all the studies, yay!

Model Results: 
 
        estimate       se        z     ci.u   ci.l  Pr(>|z|) 
intrcpt 0.157545 0.075325 2.091542 0.009911 0.3052 0.03648 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The summary(m0) function will also display the Q-statistic (QE & QEp), which tells you if your studies are homogenous (QEp > 0.05) or not (QEp < 0.05). As we’re assuming that our data is heterogeneous, we want the QEp to be < 0.05.

 Heterogeneity & Fit: 
 
         QE QE.df   QEp     QM QM.df    QMp
[1,] 2.9635    17 0.564 4.3745     1 0.0365

But the Q-statistic doesn’t actually tell us if the studies are heterogeneous or anything else for that matter. So we need to quantify the heterogeneity by calculating the I-squared (I2) statistic and Tau-squared (Tau2) statistic.

The I2 statistic first tells us how much heterogeneity there is in our studies (given as %). It’s also worthwhile looking at the 95% confidence intervals (CI) because if the CI are at polar opposites, e.g. 10.01 and 90.05, then we cannot be certain if the variance is because of differences between studies or differences within studies. But we can find this out by calculating Tau2.

Tau2 tells us how much of the variance that is influencing the effect is due to between-study differences. It does this by calculating the difference between the total variance and the within-study variance. If there’s a lot of variance between studies, Tau2 will be high. If there’s little or no variance between studies, indicating that the variance is due to differences within studies, then Tau2 will be low or 0.

Run the function…

confint(m0)

This will give you the I2 statistic estimate, Tau2 estimate and the 95% CI for both of these. (You will also be given the Tau estimate and H2 estimate, but these aren’t very useful to us.

       estimate   ci.lb    ci.ub
tau^2    0.0245  0.0047   0.0580
tau      0.1555  0.0568   0.3801
I^2(%)  78.5890 54.2157  93.5555
H^2      2.1055  1.5522   5.0811

If you want to know more, see Viechtbauer, 2010 on p.14).


The forest plot

With all these numbers floating around, it’s a good idea to show what they mean in a visual way. Enter the forest plot, which shows the summary effect size and the heterogeneity in your studies.

If you want a simple forest plot with the study ID on the left, the individual effect sizes and CI on the right, boxes and whiskers in the middle and the summary effect size at the bottom, then simply run the function…

forest(m0)

Ta da!

screen-shot-2016-09-09-at-17-10-17

If you want something more fancy, like including the names of your studies, the year, etc. then you have to use the rma function.

Run the code:

forestPlot <- rma(yi = r, vi = var_r, data = agg_data)
forest(forestPlot, slab = paste(agg_data$Study, agg_data$Year, sep = ", "), xlim = c(-2.25, 2.25), digits = c(2, 2), cex = 1.3)
Add in titles:
par(cex = 1.3, font = 2) 
text(0, 20.5, "Est. effect size from aggregated effect sizes per study", pos = 3)
par(cex = 1.3, font = 1) 
text(-2.25, 20, "Author(s) & Year", pos = 4)
text(2.25, 20, "Observed r [95% CI]", pos = 2)

If you’re wondering what this forest plot actually shows, read on. If you already know about forest plots you can skip ahead to the Baujat plot code, or to the next section: Check for publication bias and outliers in your data.

Easy stuff first. You have your studies on the left, the aggregated effect sizes and 95% CI for each study on the right, and between them you (most likely) have a lot of different sized boxes with horizontal lines (whiskers) through them.

Now for the meat. Each box represents the weight of the study. The bigger the box, the bigger the weight, and the bigger the influence that study has on your summary effect size. The whiskers are the length of the 95% CI for each study and represent the precision. The longer the whiskers, the less precision, and consequently the more variance you have in that study. On the other hand, the shorter the whiskers, the more precision, and the less variance you have in that study. The weight is determined by the sample size and precision of each study, so the larger the sample size and the narrower the 95% CI, the greater weight of the study.

The distribution of the studies (boxes and whiskers) tells us a lot about the overall heterogeneity (images courtesy of Borenstein, Hedges & Rothstein, 2007):

Scenario 1

screen-shot-2016-09-09-at-17-53-26The boxes are more-or-less in line with each other and the whiskers are short. This means that there is little heterogeneity both within your studies and between each of your studies. You do want to have some variability in order to look at the factors that could be causing this. Realistically though, most studies have some variability so it’s more likely that you’ll get Scenarios 2 or 3.

Scenario 2

screen-shot-2016-09-09-at-17-53-45The boxes are dispersed and the whiskers are short, which means the heterogeneity is between your studies, e.g. Study 1 only tested Satan worshipers who listened to one Gothic band, whereas Study 2 tested Satan worshipers who listened to many different Gothic bands. You then have a good argument to identify the different varying factors between each of your studies and analyse them in your moderator analysis.

Scenario 3

screen-shot-2016-09-09-at-17-53-36The whiskers are long (regardless of where the boxes are), which means the heterogeneity is within each of your studies. This usually happens if you have multiple effect sizes in each study, e.g. effect size x was the group of Satan worshipers, and effect size y was the control group (non-Satan worshipers). You can check that the heterogeneity is definitely within your studies with the Tau2 value (if the value is low or 0 then the heterogeneity is due to differences within studies). Again, you have a good argument to identify the different varying factors between each of your studies and analyse them in your moderator analysis.

Last little bit, at the bottom of the plot you’ll see a diamond. This is your summary effect size. The mid point of the diamond represents the r value, which is also specified in the right column at the bottom. The width of the diamond indicates the summary effect size’s 95% CI. There is also a line running down the middle of your plot, which is known as the line of no effect. If the diamond falls on this line then listening to Gothic music is not related to becoming a Satan worshiper. If the diamond falls to the right of this line, then this shows a positive correlation and the more Gothic music you listen to increases the probability that you will become a Satan worshiper. If the diamond falls to the left of the line, then that’s a negative correlation meaning the more Gothic music you listen to, the less likely you will become a Satan worshiper.

If you want more info, here’s two easy-to-digest guides on interpreting forest plots and decomposing the variance from p. 13.


The Baujat plot

Baujat plots show which studies have the most influence on the summary effect size (Influence on Overall Result) and which contribute the most to the heterogeneity (Squared Pearson Residual). The Baujat plot is easy to interpret because it places these studies in the top right-hand section of the plot. In this case studies 11 and 4 are the culprits. Easy!

Run the code:

plotData <- rma(yi = r, vi = var_r, data = agg_data)
par(cex = 1.3, font = 2)
Then run the function:
baujat(plotData, xlim = c(0, 0.4), ylim = c(0, 0.4))
You’ll have to adjust the xlim and ylim numbers to check that all studies are included in the Baujat plot. You should get something that looks like this:

screen-shot-2016-09-09-at-17-09-49

Back to the top


Check for publication bias and outliers in your data

Before jumping into your moderator analysis, you should check that your studies aren’t biased. Publication bias (or the file drawer problem) is the phenomenon where studies showing statistically stronger effect sizes are more likely to be published than with null results, which could bias the summary effect size. You can also check for publication bias before you calculate your summary effect size, but I find that it’s more beneficial to do this after you’ve got your results because if your results look skewed it could be due to bias or outliers in your data.

Funnel plots and asymmetry tests

First, create a funnel plot using the function:

par(cex = 1.3, font = 2)
funnel(m0, main = "Aggregated effect sizes per study with 95% CI")

The funnel plot is a scatterplot of the effect sizes (on the x-axis) and a measure of study precision (on the y-axis). It’s a useful visual because you can check if your studies are distributed evenly or symmetrically on the funnel. If they aren’t aren’t distributed evenly and are therefore asymmetric, there may be some studies biasing your results (images courtesy of Quintana, 2015).

Symmetrical funnel plot

screen-shot-2016-09-09-at-17-51-28

Asymmetrical funnel plots

screen-shot-2016-09-09-at-17-51-39

However, simply looking for asymmetry in your funnel plot can be subjective, so you should also run some asymmetry tests. The Rank Correlation Test is a standard asymmetry test.

Run the Rank Correlation Test using the function…

ranktest(m0)

Which will give you…

Rank Correlation Test for Funnel Plot Asymmetry

Kendall's tau = -0.4000, p = 0.4833

However the Rank Correlation Test can only have moderate power if you have less than 25 studies in your meta-analysis. Which is why it’s also good to conduct Egger’s Regression Test as well.

Run the Egger’s Regression Test using the code:

egger <- rma( yi = r, vi = var_r, data = agg_data)

Followed by the function:

regtest.rma(egger)

Which will give you…

Regression Test for Funnel Plot Asymmetry

model:     mixed-effects meta-regression model
predictor: standard error

test for funnel plot asymmetry: z = -0.3836, p = 0.7012

If both tests are non-significant (p > 0.05) then your data is not biased.

If you find yourself in the (unfortunate) scenario where your funnel plot is asymmetrical, your Rank and/or Egger tests show significant asymmetry and ultimately your data is biased, there are two things you can do:

  • Option 1: Go back and google more studies and pester/sweet talk/bribe fellow colleagues to give you their non-significant, unpublished data
  • Option 2: Use the Trim and Fill method

Trim and Fill and run your analysis again

The Trim and Fill method was created to give a hypothetical scenario of what your summary effect size could be if your studies weren’t biased. This method inputs the “missing” studies into your data to make your funnel plot symmetrical. How does it do this? By simply mirroring the studies you already have.

First, fit a Fixed-effects model to your data:

trimFill_data <- rma(yi = r, vi = var_r, data = agg_data, method = "FE")

Next, input the “missing” studies:

trimFill_results <- trimfill(trimFill_data)

Which will give you a new summary effect size:

Estimated number of missing studies on the right side: 5 (SE = 1.9602)

Fixed-Effects Model (k = 15)

Test for Heterogeneity: 
Q(df = 14) = 109.4011, p-val < .0001

Model Results:

estimate se zval pval ci.lb ci.ub 
 0.3429 0.0356 9.6349 <.0001 0.2731 0.4126 ***

---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Lastly, create a new funnel plot:

funnel(trimFill_results, main = "Funnel plot of aggregated effect sizes per study")

Which should something like this with your original studies as the black dots and your filled in studies as the white dots mirroring the black dots:

funnel

This is a nice-to-have in your analysis as it gives an estimate of how many studies are missing from your analysis (assuming that you’ve already included all the available studies). You cannot draw conclusions from it because these studies are not real, but it does put a positive spin on what the summary effect size could have looked like.

Q-Q plots

It’s (always) a good idea to check that there are no outliers in your selection of studies. You can do this by plotting a normal quantile-quantile (Q-Q) plot. Like the funnel plot, the normal Q-Q plot is used to search for publication bias, but it also includes 95% CI, which is useful for identifying outliers. Basically, any points that fall outside of the 95% CI are outliers and you probably want to remove these from your data.

Simply run the function:

par(cex = 1.3, font = 2)
qqnorm(plotData, label = "all")

screen-shot-2016-09-09-at-17-11-22

Back to the top


Conduct a moderator analysis

Last hurdle, but the most rewarding…

Now that you’ve got your summary effect size and you’ve established that your studies have variance (either between or within), you can look at the variables that are moderating all this (a.k.a. moderators). You can analyse as many variables as you like, but make sure you can justify why you chose them in your published paper. Otherwise it looks you’re throwing s*** at the wall and seeing what sticks.

I’ve deliberately chosen to look at categorical variables, because there’s not a lot of information out there on how to interpret them. If your study includes continuous variables, e.g. dosage amount, there are resources on analysing and interpreting continuous variables, as well as analysing confounding continuous and categorical variables.

On a side note about modelling your data, the metafor package automatically assumes that you’ll be using a mixed-effects model. This means that it will assign the appropriate model (random-effects or fixed-effects) to each variable, so you don’t have to worry about which model to choose.


Categorical moderator analysis

Down to business. Use your original data table with all the effect sizes (before you aggregated the effect sizes) and with the variables. Each variable should have its own column, e.g. Gothic artists, and the values should represent all the levels in that variable, e.g. The Cult, Nightmare, UK Decay, etc. Each level for each variable should have its own effect size.

For simplicity’s sake, say we have 2 variables we want to analyse; Gothic_artists and has 3 levels representing the three artists: The Cult, Nightmare and UK Decay, and Time_of_day that participants listened to these artists, which was either during the daytime or night time. In my code on GitHub. I’ve used mod1 and mod2 for Gothic_artists and Time_of_day, and level1, level2, etc. for the values, e.g. daytime, night time, band names and so on.

First, add in an extra id column, where each effect size has its own id (instead of each study having its own id).

Calculate the variance of each effect size using the code:

var_r_func <- var_r(data$r,data$n)
data$var_r <- var_r_func
We first want to look at how much each level of each variable influences the summary effect size, i.e. the separate effects. To do this we need to code each level of each variable, like so:
rma(yi = r, vi = var_r, data = data, subset = (mod1 == "level1"))
rma(yi = r, vi = var_r, data = data, subset = (mod1 == "level2"))
rma(yi = r, vi = var_r, data = data, subset = (mod1 == "level3"))
rma(yi = r, vi = var_r, data = data, subset = (mod2 == "level1"))
rma(yi = r, vi = var_r, data = data, subset = (mod2 == "level2"))

We then need to create dummy variables. A dummy variable assigns a dummy control group and experimental group to each level. This way we can look at how much each level influences the effect size without the influence of other levels.

data$mod1.lev1 <- ifelse(data$mod1 == "level1", 1, 0)
data$mod1.lev2 <- ifelse(data$mod1 == "level2", 1, 0)
data$mod1.lev3 <- ifelse(data$mod1 == "level3", 1, 0)
data$mod2.lev1 <- ifelse(data$mod2 == "level1", 1, 0)
data$mod2.lev2 <- ifelse(data$mod2 == "level2", 1, 0)

Or if you want to use a for loop, which will save you a LOT of time, you can use this code below.

First, detect all the levels in each variable:

mod1_levels <- levels(unique(data$mod1, incomparables = FALSE))
mod2_levels <- levels(unique(data$mod2, incomparables = FALSE))

Then code each level and create the necessary dummy variables in a for loop:

for (i in 1:length(mod1_levels))
{
 rma(yi = r, vi = var_r, data = data, subset = (mod1 == mod1_levels[i]))
 variableName <- paste("mod1_dummy", mod1_levels[i], sep=".")
 data[variableName] <- ifelse(data$mod1 == mod1_levels[i], 1, 0)
}
for (i in 1:length(mod2_levels))
{
 rma(yi = r, vi = var_r, data = data, subset = (mod2 == mod2_levels[i]))
 variableName <- paste("mod2_dummy", mod2_levels[i], sep=".")
 data[variableName] <- ifelse(data$mod2 == mod2_levels[i], 1, 0)
}

Now it’s time to analyse the separate effects for each level of our 2 variables, mod1 and mod2, using the code:

m1sep <- rma(yi = r, vi = var_r, mods = ~ factor(mod1) - 1, data = data)
m1sep

And:

m2sep <- rma(yi = r, vi = var_r, mods = ~ factor(mod2) - 1, data = data)
m2sep

Which will give you…

Mixed-Effects Model (k = 42; tau^2 estimator: REML)

tau^2 (estimated amount of residual heterogeneity):     0.0188 (SE = 0.0104)
tau (square root of estimated tau^2 value):             0.1372
I^2 (residual heterogeneity / unaccounted variability): 41.08%
H^2 (unaccounted variability / sampling variability):   1.70

Test for Residual Heterogeneity: 
QE(df = 39) = 65.4436, p-val = 0.0050

Test of Moderators (coefficient(s) 1,2,3): 
QM(df = 3) = 16.6808, p-val = 0.0008

Model Results:

                 estimate     se   zval   pval   ci.lb  ci.ub 
factor(mod1)level1 0.1871 0.0609 3.0697 0.0021  0.0676 0.3065 **
factor(mod1)level2 0.1348 0.0533 2.5287 0.0114  0.0303 0.2392 *
factor(mod1)level3 0.0543 0.0585 0.9293 0.3527 -0.0603 0.1689

---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

And…

Mixed-Effects Model (k = 42; tau^2 estimator: REML)

tau^2 (estimated amount of residual heterogeneity):     0.0207 (SE = 0.0107)
tau (square root of estimated tau^2 value):             0.1439
I^2 (residual heterogeneity / unaccounted variability): 43.44%
H^2 (unaccounted variability / sampling variability):   1.77

Test for Residual Heterogeneity: 
QE(df = 40) = 70.0734, p-val = 0.0023

Test of Moderators (coefficient(s) 1,2): 
QM(df = 2) = 13.6047, p-val = 0.0011

Model Results:

                 estimate     se   zval   pval  ci.lb  ci.ub 
factor(mod2)level1 0.3330 0.0478 2.7823 0.0054 0.0393 0.2267 **
factor(mod2)level2 0.1154 0.0477 2.4214 0.0155 0.0220 0.2089 *

---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You’ll get a lot of data here, but what you’re interested in is the r value, the 95% CI and the significance value (pval). The levels that influence the summary effect size are the ones that are significant indicated with * (p < 0.05).

In mod1, the results show that level1 (p = 0.0021) and level2 (p = 0.0114) are significant (p < 0.05), but not level3 (p > 0.05). This means that certain Gothic_artists (level1 & level2) influences participants to become Satan worshipers. In mod2, the results show that both level1 (p = 0.0054) and level2 (p = 0.0155) are significant (p < 0.05). This means that listening to Gothic music during the daytime and night time can influence participants to become Satan worshipers.

We also want to know if there is a difference between the levels, i.e. if listening to Gothic music during the daytime is less likely to make you become a Satan worshiper than listening to music in the night time (or the other way around). In other words, even though both levels in mod2, and level1 and level2 in mod1, influence the summary effect size, is there a difference between the levels?

We can check this by looking at the overall fit of the model, which is given as a Q-statistic (QMp).

Run the code:

m1 <- rma(yi = r, vi = var_r, mods = ~ factor(mod1), data = data)
m1

And…

m2 <- rma(yi = r, vi = var_r, mods = ~ factor(mod2), data = data)
m2

(Psst… all we changed between this code and the code to analyse the separate effects for each level is we removed the -1, which changes level1 into the intercept. By including an intercept we can compare the levels to each other.)

And the results are…

Mixed-Effects Model (k = 42; tau^2 estimator: REML)

tau^2 (estimated amount of residual heterogeneity):     0.0188 (SE = 0.0104)
tau (square root of estimated tau^2 value):             0.1372
I^2 (residual heterogeneity / unaccounted variability): 41.08%
H^2 (unaccounted variability / sampling variability):   1.70
R^2 (amount of heterogeneity accounted for):            5.00%

Test for Residual Heterogeneity: 
QE(df = 39) = 65.4436, p-val = 0.0050

Test of Moderators (coefficient(s) 2,3): 
QM(df = 2) = 2.5316, p-val = 0.2820

Model Results:

                  estimate     se    zval   pval   ci.lb  ci.ub 
intrcpt             0.1871 0.0609  3.0697 0.0021  0.0676 0.3065 **
factor(mod1)level2 -0.0523 0.0810 -0.6461 0.5182 -0.2110 0.1064 
factor(mod1)level3 -0.1327 0.0845 -1.5718 0.1160 -0.2983 0.0328

---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

And…

Mixed-Effects Model (k = 42; tau^2 estimator: REML)

tau^2 (estimated amount of residual heterogeneity): 0.0207 (SE = 0.0107)
tau (square root of estimated tau^2 value): 0.1439
I^2 (residual heterogeneity / unaccounted variability): 43.44%
H^2 (unaccounted variability / sampling variability): 1.77
R^2 (amount of heterogeneity accounted for): 0.00%

Test for Residual Heterogeneity: 
QE(df = 40) = 70.0734, p-val = 0.0023

Test of Moderators (coefficient(s) 2): 
QM(df = 1) = 0.0680, p-val = 0.0251

Model Results:

                  estimate     se    zval   pval   ci.lb  ci.ub 
intrcpt             0.3330 0.0478  2.7823 0.0054  0.0393 0.2267 **
factor(mod2)level2  0.0176 0.0675  0.2607 0.0279  0.1499 0.4814 *

---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Again, you’ll get a lot of data, but the most important part is the Q-statistic. If this is significant (QMp < 0.05), then some levels significantly influence the summary effect size more than others. In mod1, our QMp is not significant (p > 0.05), but in mod2, our QMp is significant (p = 0.0251). We can then look at how many standard deviations (SD) level2 is from the intercept (intrcpt) by adding the intrcpt and level2 together (0.3330 + 0.0176 = 0.3476). So it looks like that there isn’t a difference between the different Gothic_artists, but there is a difference between whether you listen to Gothic music during the daytime vs. during night time, and it looks like that listening to Gothic music during the daytime has a greater influence on participants becoming Satan worshipers.

Let’s say that in mod1 our QMp was significant and that the intrcpt and level2 were significantly different where the intrcpt is higher than level2, but the intrcpt and level3 were not significantly different, i.e.:

Mixed-Effects Model (k = 42; tau^2 estimator: REML)

tau^2 (estimated amount of residual heterogeneity):     0.0188 (SE = 0.0104)
tau (square root of estimated tau^2 value):             0.1372
I^2 (residual heterogeneity / unaccounted variability): 41.08%
H^2 (unaccounted variability / sampling variability):   1.70
R^2 (amount of heterogeneity accounted for):            5.00%

Test for Residual Heterogeneity: 
QE(df = 39) = 65.4436, p-val = 0.0050

Test of Moderators (coefficient(s) 2,3): 
QM(df = 2) = 2.5316, p-val = 0.0282

Model Results:

                  estimate     se    zval   pval   ci.lb  ci.ub 
intrcpt             0.1871 0.0609  3.0697 0.0021  0.0676 0.3065 **
factor(mod1)level2 -0.0523 0.0810 -0.6461 0.0351 -0.2110 0.1064 *
factor(mod1)level3 -0.1327 0.0845 -1.5718 0.1160 -0.2983 0.0328

---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

…and you want to know if there’s a significant difference between level2 and level3. You could manually code level2 (or level3) to be the intrcpt, but this takes a lot of faffing around in my experience. Instead, you could conduct an ANOVA using the anova function, like so:

anova(m1sep, L = c(0, 1, -1))

The L contains the levels in your moderator, where each digit in the concatenated brackets c(0, 1, -1) represents level1, level2 and level3. level1 is set to 0 because you don’t want to analyse it, level2 is set to 1 and level3 to -1 (you can also set level2 to -1 and level3 to 1 and it will give you the same result). Once the levels are set, run the code and you should get something like:

Hypothesis: 
1: factor(mod1)level2 - factor(mod1)level3 = 0

Results:
 estimate se zval pval
1: 0.4360 0.2118 2.0590 0.0395

Test of Hypothesis:
QM(df = 1) = 4.2394, p-val = 0.0395

The QMp value is significant which means that there is a difference between level2 and level3, and from the original data:

Mixed-Effects Model (k = 42; tau^2 estimator: REML)

tau^2 (estimated amount of residual heterogeneity):     0.0188 (SE = 0.0104)
tau (square root of estimated tau^2 value):             0.1372
I^2 (residual heterogeneity / unaccounted variability): 41.08%
H^2 (unaccounted variability / sampling variability):   1.70
R^2 (amount of heterogeneity accounted for):            5.00%

Test for Residual Heterogeneity: 
QE(df = 39) = 65.4436, p-val = 0.0050

Test of Moderators (coefficient(s) 2,3): 
QM(df = 2) = 2.5316, p-val = 0.0282

Model Results:

                  estimate     se    zval   pval   ci.lb  ci.ub 
intrcpt             0.1871 0.0609  3.0697 0.0021  0.0676 0.3065 **
factor(mod1)level2 -0.0523 0.0810 -0.6461 0.0351 -0.2110 0.1064 *
factor(mod1)level3 -0.1327 0.0845 -1.5718 0.1160 -0.2983 0.0328

---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

…the estimates show that level2 (-0.0523) is higher than level3 (-0.1327). It sounds like a lot of complicated interpretation and coding, but all we’ve done here is looked at all the levels and found out which levels influence the effect more than others.


Box plots

Box plots are a great way to visualise your moderators. They show you the distribution of the data, the interquartile range (IQR), and any outliers in your data. The image below shows an example of data in a typical Gaussian distribution or bell-curve, and how a box plot represents that data.

The blue box is the IQR and shows where 50% of your data lies. The IQR also has a line through the middle, which indicates where the median of our data is (also known as Q2). The lower whisker shows where 25% of your data falls below the lower quartile (also known as Q1), and the upper whisker shows where 25% of your data falls above the upper quartile (also known as Q3).

Creating a box plot is super easy. But I’ve found R to be a bit fussy with box plots, so I’d recommend first filtering your data and saving out a .csv file that contains the columns ID, Study, Year, r and the moderator you want to make into a box plot.

Let’s say you want to create a box plot of mod2 (Time_of_day). First, filter your data. You can do this by using the select function:

dat_boxPlot_mod2 <- select(data, id, Study, Year, r, mod2)

Then writing your new, filtered data to a .csv file:

 write.csv(data_boxPlot_mod2, "importData_boxPlot_mod2.csv")

And loading that new file:

boxPlot_mod2 <- read.csv("C:/.../importData_boxPlot.csv", header = TRUE)

Then you simply use the plot function:

plot(r ~ mod2, boxPlot_mod2)

If you want to add in a bit of colour, you can use the rainbow function:

plot(r ~ mod2, boxPlot_mod2, col = rainbow(length(unique(boxPlot_mod2))))

If you find these colours too glaring, like I did, you can also customise the colours by typing them in manually. Remember to type them in in the order you want them to appear in:

plot(r ~ mod2, boxPlot_mod2, col = c("mistyrose", "powderblue"))

Lastly, add a title:

title("Time of day Moderator")

And you’ve got yourself a box plot:

_boxplot

Our box plot shows both levels in our mod2 variable; Daytime and Night time, the distribution of both levels, the IQR (plus the Q1, Q2 and Q3), the whiskers and it also shows that we have one outlier in our Daytime level, which is indicated by the white circle below the box plot. For more information on how to create box plots check out the Quick-R box plot guide.


Check for confounding moderators

Lastly, we can check if there are any confounding moderators. In our example, we’re checking to see if Gothic_artists (mod1) still influences whether you become a Satan worshiper when we control for Time_of_day (mod2) and vice versa. In other words, we’re checking if mod1 confounds mod2 and if mod2 confounds mod1. To do this, we simply conduct an ANOVA (analysis of variance) on the two mods. If you want to know more about ANOVAs, this Theory of ANOVA guide and this video on ANOVAs, both by Andy Field, give a good overview.

First, fit a Maximum Likelihood model (method = “ML”) to your two moderators using the code:

m_mod1and2 <- rma(yi = r, vi = var_r, mods = ~ factor(mod1) + factor(mod2), data = data, method = "ML")

And then bring up the results with:

m_mod1and2

The results show similar results like the overall fit of the model results, but this time the intercept will take the first levels of both moderators and display all the other levels of your two moderators starting with mod1 and then mod2.

Mixed-Effects Model (k = 42; tau^2 estimator: ML)

tau^2 (estimated amount of residual heterogeneity):     0.0135 (SE = 0.0088)
tau (square root of estimated tau^2 value):             0.1163
I^2 (residual heterogeneity / unaccounted variability): 33.43%
H^2 (unaccounted variability / sampling variability):   1.50
R^2 (amount of heterogeneity accounted for):            28.02%

Test for Residual Heterogeneity: 
QE(df = 38) = 61.4139, p-val = 0.0095

Test of Moderators (coefficient(s) 2,3,4): 
QM(df = 3) = 5.6818, p-val = 0.1282

Model Results:

                  estimate     se    zval   pval   ci.lb   ci.ub 
intrcpt             0.1883 0.0572  3.2927 0.0010  0.0762  0.3004 ***
factor(mod1)level2 -0.1311 0.0897 -1.4606 0.1441 -0.3069  0.0448 
factor(mod1)level3 -0.3010 0.1282 -2.3486 0.0188 -0.5522 -0.0498 *
factor(mod2)level2  0.1670 0.1007  1.6582 0.0973 -0.0304  0.3644 .

---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You can then conduct your ANOVA using the anova function and specifying which moderator you want to check. We also need to use the btt function to select the levels (not the intercept) of the moderator we want to check. Let’s start with mod1. In our btt function, we need to select the levels minus the intercept in mod1. These are the 2nd and 3rd levels (factor(mod1)level2factor(mod1)level3) in our results, so we use btt = 2:3 in the code, like so:

anova(m_mod1and2, btt = 2:3)

Which will give you…

Test of Moderators (coefficient(s) 2,3): 
QM(df = 2) = 5.5821, p-val = 0.0614

mod1 was not significant (p > 0.05). This tells us that mod1 does not have an effect when mod2 is controlled for. In other words, if we control the Time_of_day that people listen to Gothic music then it doesn’t matter which Gothic_artists they listen to.

Next we’ll check mod2. The level we want to select for our btt function is the 4th level (factor(mod2)level2) in our results. So we use btt = 4 in our code, like so:

anova(m_mod1and2, btt = 4)

Which will give you…

Test of Moderators (coefficient(s) 4): 
QM(df = 1) = 2.7498, p-val = 0.0097

Now let’s assume our results show that mod2 was significant (p = 0.0097) when controlling for mod1. This means that the Time_of_day is relevant and therefore certain Gothic_artists only influence whether people become Satan worshipers depending on the Time_of_day.

Back to the top

And you’re done! Now it’s time to write up your meta-analysis.

For the eager beavers our there, you can find all the references, guidelines and stuff that I used compiling this post below. Lastly, feel free to message me. I’m always happy to answer questions, get feedback, or correct stuff that’s outdated, incoherent or needs more explanation. 

Written by Alison Holland
Special thanks to Florian Zender


References 

Baujat, B., Mahé, C., Pignon, J. P., & Hill, C. (2002). A graphical method for exploring heterogeneity in meta-analyses: application to a meta-analysis of 65 trials. Statistics in medicine, 21(18), 2641-2652.

Begg, C. B., & Mazumdar, M. (1994). Operating characteristics of a rank correlation test for publication bias. Biometrics, 1088-1101.

Borenstein, M., Hedges, L., & Rothstein, H. (2007). Meta-analysis: Fixed effect vs. random effects. Meta-analysis. com.

Borenstein, M., Cooper, H., Hedges, L., & Valentine, J. (2009). Effect sizes for continuous data. The handbook of research synthesis and meta-analysis, 2, 221-235.

Coe, R. (2002, September). It’s the effect size, stupid. In Paper presented at the British Educational Research Association annual conference (Vol. 12, p. 14).

Cohn, L. D., & Becker, B. J. (2003). How meta-analysis increases statistical power. Psychological methods, 8(3), 243.

Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7-29.

Del Re, A. C. (2015). A practical tutorial on conducting meta-analysis in R. The Quantitative Methods for Psychology, 11(1), 37-50.

Del Re, A. C., & Hoyt, W. T. (2010). MAc: Meta-analysis with correlations. R package version 1.0. 5. Computer software] Available from http://CRAN. R-project. org/packageMAc.

Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. Bmj, 315(7109), 629-634.

Evangelou, E., Ioannidis, J. P., & Patsopoulos, N. A. (2007). Uncertainty in heterogeneity estimates in meta-analyses. BMJ: British Medical Journal, 335(7626), 914-916.

Field, A. (1999). A bluffer’s guide to meta-analysis 1. Newsletter of the Mathematical, Statistical and computing section of the British Psychological Society, 7, 16-25.

Hedges, L. V., & Vevea, J. L. (1998). Fixed-and random-effects models in meta-analysis. Psychological methods, 3(4), 486.

Higgins, J. P., Thompson, S. G., Deeks, J. J., & Altman, D. G. (2003). Measuring inconsistency in meta-analyses. Bmj, 327(7414), 557-560.

Hunter, J. E., & Schmidt, F. L. (2000). Fixed effects vs. random effects meta-analysis models: implications for cumulative research knowledge. International Journal of Selection and Assessment, 8(4), 275-292.

Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias  in research findings (2nd ed.). Thousand Oaks, CA: Sage.

Kock, A., & Gemünden, H. G. (2009). A guideline to Meta-analysis. Retrieved from https://www.tim.tu-berlin.de/fileadmin/fg101/TIM_Working_Paper_Series/Volume_2/TIM_WPS_Kock_2009.pdf

Quintana, D. S. (2015). From pre-registration to publication: a non-technical primer for conducting a meta-analysis to synthesize correlational data. Frontiers in psychology, 6.

R Development Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2013.

Ried, K. (2006). Interpreting and understanding meta-analysis graphs: a practical guide.

Rosenthal, R., & DiMatteo, M. R. (2001). Meta-analysis: Recent developments in quantitative methods for literature reviews. Annual Review of Psychology, 52(1), 59-82.

Sánchez-Meca, J., & Marín-Martínez, F., (2010). Meta-analysis in psychological research. International Journal of Psychological Research, 3(1), 151-163.

Sterne, J. A., Gavaghan, D., & Egger, M. (2000). Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. Journal of clinical epidemiology, 53(11), 1119-1129.

Valentine, J. C., Pigott, T. D., & Rothstein, H. R. (2010). How many studies do you need? A primer on statistical power for meta-analysis. Journal of Educational and Behavioral Statistics35(2), 215-247.

Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. J Stat Softw, 36(3), 1-48.

Viechtbauer, W., & Viechtbauer, M. W. (2015). Package ‘metafor’. The Comprehensive R Archive Network. Package ‘metafor’. http://cran. r-project. org/web/packages/metafor/metafor. pdf.

Wampold, B. E., Mondin, G. W., Moody, M., Stich, F., Benson, K., & Ahn, H. N. (1997). A meta-analysis of outcome studies comparing bona fide psychotherapies: Empirically,” all must have prizes.”. Psychological bulletin, 122(3), 203.

Wang, M. C., & Bushman, B. J. (1998). Using the normal quantile plot to explore meta-analytic data sets. Psychological Methods, 3(1), 46.

1 Comment

Comments are closed.