As promised, here’s my meta-analysis code that I wrote for my Masters thesis.

If you’re not sure what a meta-analysis is, but want to conduct one click here to find out. Feel free to use whatever code you like. All I kindly ask in return is that you cite it appropriately 🙂

Bit of info…

I used the free statistical software programme R to write and analyse my data. It’s not only **FREE**, but it was created for students (like my former self), who don’t have a degree in programming but want to code. You can download R and R Studio (a nice GUI for those who like GUIs) for both PC and Mac. Once *R* is up and running, download and run the CRAN packages MAc, metafor, dplyr, Hmisc and ggplot.

You can use this meta-analysis code to do the following:

- Compute correlations from raw data
- Convert effect sizes into correlation coefficients
*r* - Aggregate effect sizes per study
- Calculate the summary (overall) effect size and heterogeneity (variance)
- Check for publication bias and outliers in your data
- Conduct a moderator analysis

Need some extra guidance? No problem – read on.

What is a meta-analysis?

A meta-analysis is an analysis done on multiple studies that researched the same question. I’m going to use a similar example from Andy Field’s meta-analysis guide as it made me chuckle. Say you want to see if listening to Gothic music makes you a Satan worshiper and there just so happens to be hundreds of studies published (and a few unpublished) on this topic. Doing one more study is not going to make a dent, but conducting a meta-analysis on these studies will not only tell you if there’s a relationship based on the previous research, but you can also look at the differences between and within these studies, and if these differences influence the findings!

You can conduct a meta-analysis to clarify:

- If there is a relationship, or
*effect*, between listening to Gothic music and being a Satan worshiper - How big the effect is (in other words, will listening to Gothic music make everyone a Satan worshiper, or just a few?)
- What factors influence this effect, i.e. does it depend on the band, the album, the track, or were participants already possessed by Satan?

This is why meta-analyses are cool; you’re not just asking, “Is there an effect?”, you’re asking the “why?” along with it.

I’ve outlined the major steps and also included references to guidelines and explanations if you’re hungry for more info.

Pick your research question

Done – does Gothic music make you a Satan worshiper?

Study selection

Pick your studies that also researched this question. This will take you the longest to do. Go to Google Scholar and type in a bunch of keywords, e.g. Gothic music, Satan worshiper, etc. Get a hold of as many studies as you can. Then check the abstracts to see if their research question matches yours (titles can be misleading).

Then come up with inclusion (or exclusion) criteria, e.g. participants must be a current Satan worshiper, participants must have listened to Gothic music prior to becoming a Satan worshiper, if study’s definition of Satan worshiper matches your definition, and so on. You can also ask the researcher(s) for unpublished data. If they send it to you, thank them and check it. If the data looks dodgy that’s probably why they didn’t publish it. You don’t want unusable data in your meta-analysis, so just take what’s relevant. Be vigilant!

Just as a side note, don’t worry if you don’t find hundreds of studies that match your research question. You technically only need two studies to do a meta-analysis (Valentine et al., 2010). It’s far more important that you select the *right* studies than make up the numbers with the *wrong* studies.

Now it’s time to create your own data table from your included studies! Note down the following:

- Author(s) & year
- Effect size
- Effect size measure (e.g. correlation coefficient,
*t*-value,*F*-value, etc.) - Sample size, i.e. number of participants
- Note the study characteristics or factors for each effect size*

**Most studies report more than one effect size, so it’s important to note the different factors that distinguishes each effect size, e.g. Gothic band, Satanic behaviour. You can also include demographic details, like country of testing, participants age, gender, ethnicity, etc. as well as statistical analyses used. You will then analyse which of these factors to see if they have an influence on the effect in the moderator analysis . But more about this later… *

Coding time!

Before you jump into the analysing stuff, you’ll need to convert all the raw data and the effect sizes into one measurement scale. I’d recommend converting everything into correlation coefficients *r* as they’re versatile and it cuts out a bunch of other conversion code that you’d have to use later on.

Compute correlations from raw data

Load your data as a .csv or .txt file into *R*.

Run the function:

Corr_data <- rcorr(as.matrix(data), type = "pearson")

You can change the correlation type to **Spearman**. If you’re not sure which one to choose, see Laerd Statistics guides on Pearson and Spearman correlations.

Check that the structure is correct with the function:

str(Corr_data)

Lastly, create a data frame of just the correlations *r* and save out your results using the code:

df_Results_Corr = data.frame(Corr_data$r)

write.csv(df_Results_Corr, "dataResults_Corr.csv")

Convert effect sizes into correlation coefficients

Once you’ve found all the effect sizes within your studies and compiled them in your table, you’ll probably notice that they all have different scales of measurement, e.g. correlation coefficient *r*, Cohen’s *d*, *t-test* effect sizes, *F-test* effect sizes, *means* and *standard deviations (SD)*, etc. I will just show how to convert *t-test* and *F-test* effect sizes into the correlation coefficient *r *because they’re the most commonly used statistics in Psychology, but you can find all the possible conversions in the MAc guidelines.

To make things easy you can convert all your effect sizes from one data table. Make sure that your data contains an **id** column, a column for the effect size (**ES**), effect size measure (**ES_type**) that contain the values *t-test*, *F-test*, etc., separate columns for each sample size (**n_1** for* t-tests* and *F-tests*, **n_2** for *F-tests*), and so on.

Convert t-values into correlation coefficients r

Add in your data manually or load the .csv or .txt file into *R*.

Select the rows that contain all the data for *t-test* effect sizes with the code:

filter_t_test <- filter(data, ES_type == "t-test")

Select the columns **id**, **ES**, and **n_1** with the code:

data_t_stats <- select(filter_t_test, id, ES, n_1)

Then run the code to convert the *t-test* effect sizes into correlation coefficients *r*:

data_r_from_t <- r_from_t(data_t_stats$ES, data_t_stats$n_1)

This code is based on the function:

r_from_t(t, n)

where:

**t**(**data_t_stats$ES**) refers to the*t*-values**n**(**data_t_stats$n_1**) refers to the sample size.

Combine the results with the *t-test* data into one data frame with the code:

df_r_from_t = data.frame(data_r_from_t)

df_dataResults_r_from_t <- data.frame(bind_cols(data_t_stats, df_r_from_t))

Next…

Convert F-values into correlation coefficients r

For *F-tests*, you’ll first have to convert them into Cohen’s *d* and then into *r*. Plus when converting your new *d* values into *r*, you’ll have to use separate functions depending on if your two groups have the same number of participants or different number of participants.

F-values with the same sample size in each group

Select the rows that contain all the data for *F-test* effect with the same sample size with the code:

filter_F_test_same <- filter(data, ES_type == "f-test", n_1 == n_2)

Select the columns **id**, **ES**, **n_1**, **n_2** with the code:

data_F_stats_same <- select(filter_F_test_same, id, ES, n_1, n_2)

Then run the code to convert the *F-test* effect sizes into Cohen’s *d *(**d**):

data_f_to_d <- f_to_d(data_F_stats_same$ES, data_F_stats_same$n_1, data_F_stats_same$n_2)

This code is based on the function:

f_to_d(f, n.1, n.2)

where:

**f**(**data_F_stats_same$ES**)*F*-values**n.1**(**data_F_stats_same$n_1**) refers to group 1 or your experimental group**n.2**(**data_F_stats_same$n_2**) refers to group 2 or your control group

Create data frame…

df_f_to_d = data.frame(data_f_to_d)

and combine the results with the *F-test* data into one data frame with the code:

dataResults_f_to_d <- bind_cols(data_F_stats_same, df_f_to_d)

Then run the code to convert **d** and the variance of *d* (**var_d**) into correlation coefficients *r* with the code:

data_r_from_d <- r_from_d(dataResults_f_to_d$d, dataResults_f_to_d$var_d, a = 4)

This code is based on the function:

r_from_d(d, var_d, a = 4)

(**a** is just used to compute the correlation and **4** is the default number.)

Combine the results with the *F-test* data into one data frame with the code:

df_r_from_d = data.frame(data_r_from_d)

df_dataResults_r_from_d <- data.frame(bind_cols(data_F_stats_same, df_r_from_d))

Next…

F-values with different sample sizes in each group

Select the rows that contain all the data for *F-test* effect with different sample sizes with the code:

filter_F_test_dif <- filter(data, ES_type == "f-test", n_1 != n_2)

Select the columns **id**, **ES**, **n_1**, **n_2** with the code:

data_F_stats_dif <- select(filter_F_test_dif, id, ES, n_1, n_2)

Then run the code to convert the *F-test* effect sizes into Cohen’s *d *(**d**):

data_f_to_d1 <- f_to_d(data_F_stats_dif$ES, data_F_stats_dif$n_1, data_F_stats_dif$n_2)

Create data frame…

df_f_to_d1 = data.frame(data_f_to_d1)

and combine the results with the *F-test* data into one data frame with the code:

dataResults_f_to_d1

Then run the code to convert **d** and the variance of *d* (**var_d**) into correlation coefficients *r* with the code:

dataResults_f_to_d1 <- bind_cols(data_F_stats_dif, df_f_to_d1)

This code is based on the function:

r_from_d1(d, n.1, n.2, var_d)

Combine the results with the *F-test* data into one data frame with the code:

df_r_from_d1 = data.frame(data_r_from_d1)

df_dataResults_r_from_d1 <- data.frame(bind_cols(data_F_stats_dif, df_r_from_d1))

Final data organising bit

Add all your new correlation coefficients *r* to your original data table. First combine all data results:

dataResults_r_all <- bind_rows(df_dataResults_r_from_t, df_dataResults_r_from_d, df_dataResults_r_from_d1)

Then select the columns **r** and** id**:

dataResults_r <- select(dataResults_r_all, id, r)

Arrange the order by **id**:

dataResults_r_order <- arrange(dataResults_r, id)

Combine **r** with your original data using the code:

dataResults_r_from_ES <- bind_cols(dat, (select(dataResults_r_order, r)))

Aggregate effect sizes per study

Most studies report more than one effect size, however including multiple effect sizes from the same study can cause that study to be biased. To counteract this, aggregate the effect sizes per study. Check that your data file contains the study ID as one ID per study, all the effect sizes for each study converted into (**r***)*, and the sample sizes for each effect size (**n**).

Load your data as a .csv or .txt file into *R*.

Run the function:

agg_data <- agg(id, r, n, cor = .50, mod = NULL, data = data)

You’ll then get the study ID (**id**), correlation coefficient (**r***)*, the sample size for each study (**n**), and the variance of *r* (**var_r**).

Compute variance of the correlation coefficient (**var_r**) and add variance column to the data results:

var_r_funcAggData <- var_r(agg_data$r, agg_data$n)

agg_data$var_r <- var_r_funcAggData

Select the columns Study and Year from original data:

Study_Year_all <- select(data, Study, Year)

Remove the duplicate rows:

Study_Year <- distinct(Study_Year_all)

And merge the columns Study and Year with aggregated data results

agg_data_Results <- bind_cols(Study_Year, agg_data)

Now that that’s all set up, it’s time to analyse the data!

Calculate the summary (overall) effect size and heterogeneity (variance)

First thing you need to do is choose your model. As a rule, researchers make assumptions about their data and this rule also applies to meta-analysts. The rule here is that you need to make one of two assumptions:

*Assumption 1*: The data from your studies shows a typical intervention effect and there were no other effects influencing your data, i.e. your studies are homogeneous.*Assumption 2*: The data from your studies was sampled from a distributed population of studies (they were not conducted exactly same way), and the typical intervention effect cannot explain the overall effect, i.e. your studies are heterogeneous.

Most meta-analysts go for the more common assumption that their studies are heterogeneous and therefore choose to model their data on the random-effects model (and that’s what we’re going to do). There are also a number of benefits when modelling your data on the random-effects model, but one of the biggest benefits of using this model is that it reduces the likelihood of getting false positives. Bonus!

The stats

Run the function:

m0 <- mareg(r ~ 1, var = var_r, method = "REML", data = agg_data)

**m0**is just the output and stands for “model 0”**r**refers to the correlation coefficients*r***var**refers to var_r**~ 1**roughly stands for*predicted by*with**1**meaning that it is an intercept-only model**method = “REML”**refers to the restricted maximum likelihood estimator and is the default method when applying the random-effects model (Viechtbauer, 2010 on p. 12 gives you the other estimator options you can use).**data = agg_data**just telling*R*to use your aggregated effect size data, not all the effect sizes for each study

Then run the function…

summary(m0)

This will display the results of the summary effect size (**estimate**), if the estimate is significant (**p**), and the confidence intervals (**ci.l, ci.u**). We now have our summary effect size, which is the overall effect size of all the studies, yay!

Model Results: estimate se z ci.u ci.l Pr(>|z|) intrcpt 0.157545 0.075325 2.091542 0.009911 0.3052 0.03648 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The **summary(m0)** function will also display the *Q*-statistic (**QE & QEp**), which tells you if your studies are homogenous (**QEp > 0.05**) or not (**QEp < 0.05**). As we’re assuming that our data is heterogeneous, we want the **QEp** to be **< 0.05**.

Heterogeneity & Fit: QE QE.df QEp QM QM.df QMp [1,] 2.9635 17 0.564 4.3745 1 0.0365

But the *Q*-statistic doesn’t actually tell us if the studies are heterogeneous or anything else for that matter. So we need to quantify the heterogeneity by calculating the *I-squared* (*I*^{2}) statistic and *Tau-squared* (*Tau*^{2}) statistic.

The *I ^{2} *statistic first tells us how much heterogeneity there is in our studies (given as %). It’s also worthwhile looking at the 95% confidence intervals (CI) because if the CI are at polar opposites, e.g. 10.01 and 90.05, then we cannot be certain if the variance is because of differences between studies or differences within studies. But we can find this out by calculating

*Tau*

^{2}.

*Tau*^{2} tells us how much of the variance that is influencing the effect is due to between-study differences. It does this by calculating the difference between the total variance and the within-study variance. If there’s a lot of variance between studies, *Tau*^{2} will be high. If there’s little or no variance between studies, indicating that the variance is due to differences within studies, then *Tau*^{2} will be low or 0.

Run the function…

confint(m0)

This will give you the *I*^{2} statistic estimate, *Tau ^{2} *estimate and the 95% CI for both of these. (You will also be given the

*Tau*estimate and

*H*

^{2}estimate, but these aren’t very useful to us.

estimate ci.lb ci.ub tau^2 0.0245 0.0047 0.0580 tau 0.1555 0.0568 0.3801 I^2(%) 78.5890 54.2157 93.5555 H^2 2.1055 1.5522 5.0811

If you want to know more, see Viechtbauer, 2010 on p.14).

The forest plot

With all these numbers floating around, it’s a good idea to show what they mean in a visual way. Enter the forest plot, which shows the summary effect size and the heterogeneity in your studies.

If you want a simple forest plot with the study ID on the left, the individual effect sizes and CI on the right, boxes and whiskers in the middle and the summary effect size at the bottom, then simply run the function…

forest(m0)

Ta da!

If you want something more fancy, like including the names of your studies, the year, etc. then you have to use the **rma** function.

Run the code:

forestPlot <- rma(yi = r, vi = var_r, data = agg_data)

forest(forestPlot, slab = paste(agg_data$Study, agg_data$Year, sep = ", "), xlim = c(-2.25, 2.25), digits = c(2, 2), cex = 1.3)

par(cex = 1.3, font = 2) text(0, 20.5, "Est. effect size from aggregated effect sizes per study", pos = 3)

par(cex = 1.3, font = 1) text(-2.25, 20, "Author(s) & Year", pos = 4) text(2.25, 20, "Observed r [95% CI]", pos = 2)

If you’re wondering what this forest plot actually shows, read on. If you already know about forest plots you can skip ahead to the * Baujat plot* code, or to the next section: Check for publication bias and outliers in your data.

Easy stuff first. You have your studies on the left, the aggregated effect sizes and 95% CI for each study on the right, and between them you (most likely) have a lot of different sized boxes with horizontal lines (whiskers) through them.

Now for the meat. Each box represents the *weight* of the study. The bigger the box, the bigger the weight, and the bigger the influence that study has on your summary effect size. The whiskers are the length of the 95% CI for each study and represent the *precision*. The longer the whiskers, the less precision, and consequently the more variance you have in that study. On the other hand, the shorter the whiskers, the more precision, and the less variance you have in that study. The weight is determined by the sample size and precision of each study, so the larger the sample size and the narrower the 95% CI, the greater weight of the study.

The distribution of the studies (boxes and whiskers) tells us a lot about the overall heterogeneity (images courtesy of Borenstein, Hedges & Rothstein, 2007):

*Scenario 1*

The boxes are more-or-less in line with each other and the whiskers are short. This means that there is little heterogeneity both within your studies and between each of your studies. You do want to have some variability in order to look at the factors that could be causing this. Realistically though, most studies have some variability so it’s more likely that you’ll get Scenarios 2 or 3.

*Scenario 2*

The boxes are dispersed and the whiskers are short, which means the heterogeneity is between your studies, e.g. Study 1 only tested Satan worshipers who listened to one Gothic band, whereas Study 2 tested Satan worshipers who listened to many different Gothic bands. You then have a good argument to identify the different varying factors between each of your studies and analyse them in your moderator analysis.

*Scenario 3*

The whiskers are long (regardless of where the boxes are), which means the heterogeneity is within each of your studies. This usually happens if you have multiple effect sizes in each study, e.g. effect size *x* was the group of Satan worshipers, and effect size *y* was the control group (non-Satan worshipers). You can check that the heterogeneity is definitely within your studies with the *Tau*^{2} value (if the value is low or 0 then the heterogeneity is due to differences within studies). Again, you have a good argument to identify the different varying factors between each of your studies and analyse them in your moderator analysis.

Last little bit, at the bottom of the plot you’ll see a diamond. This is your summary effect size. The mid point of the diamond represents the *r* value, which is also specified in the right column at the bottom. The width of the diamond indicates the summary effect size’s 95% CI. There is also a line running down the middle of your plot, which is known as the *line of no effect*. If the diamond falls on this line then listening to Gothic music is not related to becoming a Satan worshiper. If the diamond falls to the right of this line, then this shows a positive correlation and the more Gothic music you listen to increases the probability that you will become a Satan worshiper. If the diamond falls to the left of the line, then that’s a negative correlation meaning the more Gothic music you listen to, the less likely you will become a Satan worshiper.

If you want more info, here’s two easy-to-digest guides on interpreting forest plots and decomposing the variance from p. 13.

The Baujat plot

Baujat plots show which studies have the most influence on the summary effect size (Influence on Overall Result) and which contribute the most to the heterogeneity (Squared Pearson Residual). The Baujat plot is easy to interpret because it places these studies in the top right-hand section of the plot. In this case studies 11 and 4 are the culprits. Easy!

Run the code:

plotData <- rma(yi = r, vi = var_r, data = agg_data)

par(cex = 1.3, font = 2)

baujat(plotData, xlim = c(0, 0.4), ylim = c(0, 0.4))

Check for publication bias and outliers in your data

Before jumping into your moderator analysis, you should check that your studies aren’t biased. Publication bias (or the *file drawer problem*) is the phenomenon where studies showing statistically stronger effect sizes are more likely to be published than with null results, which could bias the summary effect size. You can also check for publication bias before you calculate your summary effect size, but I find that it’s more beneficial to do this after you’ve got your results because if your results look skewed it could be due to bias or outliers in your data.

**Funnel plots and asymmetry tests**

First, create a funnel plot using the function:

par(cex = 1.3, font = 2)

funnel(m0, main = "Aggregated effect sizes per study with 95% CI")

The funnel plot is a scatterplot of the effect sizes (on the *x*-axis) and a measure of study precision (on the *y*-axis). It’s a useful visual because you can check if your studies are distributed evenly or *symmetrically* on the funnel. If they aren’t aren’t distributed evenly and are therefore *asymmetric*, there may be some studies biasing your results (images courtesy of Quintana, 2015).

*Symmetrical funnel plot*

*Asymmetrical funnel plots*

However, simply looking for asymmetry in your funnel plot can be subjective, so you should also run some asymmetry tests. The Rank Correlation Test is a standard asymmetry test.

Run the Rank Correlation Test using the function…

ranktest(m0)

Which will give you…

Rank Correlation Test for Funnel Plot Asymmetry Kendall's tau = -0.4000, p = 0.4833

However the Rank Correlation Test can only have moderate power if you have less than 25 studies in your meta-analysis. Which is why it’s also good to conduct Egger’s Regression Test as well.

Run the Egger’s Regression Test using the code:

egger <- rma( yi = r, vi = var_r, data = agg_data)

Followed by the function:

regtest.rma(egger)

Which will give you…

Regression Test for Funnel Plot Asymmetry model: mixed-effects meta-regression model predictor: standard error test for funnel plot asymmetry: z = -0.3836, p = 0.7012

If both tests are *non-significant* (** p > 0.05**) then your data is

*not biased*.

If you find yourself in the (unfortunate) scenario where your funnel plot is asymmetrical, your Rank and/or Egger tests show significant asymmetry and ultimately your data *is biased*, there are two things you can do:

- Option 1: Go back and google more studies and pester/sweet talk/bribe fellow colleagues to give you their non-significant, unpublished data
- Option 2: Use the
*Trim and Fill*method

*Trim and Fill and run your analysis again*

The *Trim and Fill* method was created to give a hypothetical scenario of what your summary effect size *could* be if your studies weren’t biased. This method inputs the “missing” studies into your data to make your funnel plot symmetrical. How does it do this? By simply mirroring the studies you already have.

First, fit a Fixed-effects model to your data:

trimFill_data <- rma(yi = r, vi = var_r, data = agg_data, method = "FE")

Next, input the “missing” studies:

trimFill_results <- trimfill(trimFill_data)

Which will give you a new summary effect size:

Estimated number of missing studies on the right side: 5 (SE = 1.9602) Fixed-Effects Model (k = 15) Test for Heterogeneity: Q(df = 14) = 109.4011, p-val < .0001 Model Results: estimate se zval pval ci.lb ci.ub 0.3429 0.0356 9.6349 <.0001 0.2731 0.4126 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Lastly, create a new funnel plot:

funnel(trimFill_results, main = "Funnel plot of aggregated effect sizes per study")

Which should something like this with your original studies as the black dots and your filled in studies as the white dots mirroring the black dots:

This is a nice-to-have in your analysis as it gives an estimate of how many studies are missing from your analysis (assuming that you’ve already included all the available studies). You cannot draw conclusions from it because these studies are not real, but it does put a positive spin on what the summary effect size could have looked like.

**Q-Q plots**

It’s (always) a good idea to check that there are no outliers in your selection of studies. You can do this by plotting a normal quantile-quantile (Q-Q) plot. Like the funnel plot, the normal Q-Q plot is used to search for publication bias, but it also includes 95% CI, which is useful for identifying outliers. Basically, any points that fall outside of the 95% CI are outliers and you probably want to remove these from your data.

Simply run the function:

par(cex = 1.3, font = 2)

qqnorm(plotData, label = "all")

Conduct a moderator analysis

Last hurdle, but the most rewarding…

Now that you’ve got your summary effect size and you’ve established that your studies have variance (either between or within), you can look at the variables that are moderating all this (a.k.a. moderators). You can analyse as many variables as you like, but make sure you can justify why you chose them in your published paper. Otherwise it looks you’re throwing s*** at the wall and seeing what sticks.

I’ve deliberately chosen to look at categorical variables, because there’s not a lot of information out there on how to interpret them. If your study includes continuous variables, e.g. dosage amount, there are resources on analysing and interpreting continuous variables, as well as analysing confounding continuous and categorical variables.

On a side note about modelling your data, the **metafor** package automatically assumes that you’ll be using a mixed-effects model. This means that it will assign the appropriate model (random-effects or fixed-effects) to each variable, so you don’t have to worry about which model to choose.

Categorical moderator analysis

Down to business. Use your original data table with all the effect sizes (before you aggregated the effect sizes) and with the variables. Each variable should have its own column, e.g. Gothic artists, and the values should represent all the levels in that variable, e.g. The Cult, Nightmare, UK Decay, etc. Each level for each variable should have its own effect size.

For simplicity’s sake, say we have 2 variables we want to analyse; **Gothic_artists** and has 3 levels representing the three artists: **The Cult**,** Nightmare** and** UK Decay**, and **Time_of_day **that participants listened to these artists, which was either during the **daytime** or **night time**. In my code on GitHub. I’ve used **mod1** and** mod2** for **Gothic_artists** and **Time_of_day**, and** level1**,** level2**, etc. for the values, e.g. **daytime**, **night time**, **band names** and so on.

First, add in an extra **id** column, where each effect size has its own id (instead of each study having its own id).

Calculate the variance of each effect size using the code:

var_r_func <- var_r(data$r,data$n)

data$var_r <- var_r_func

*separate effects*. To do this we need to code each level of each variable, like so:

rma(yi = r, vi = var_r, data = data, subset = (mod1 == "level1")) rma(yi = r, vi = var_r, data = data, subset = (mod1 == "level2")) rma(yi = r, vi = var_r, data = data, subset = (mod1 == "level3"))

rma(yi = r, vi = var_r, data = data, subset = (mod2 == "level1")) rma(yi = r, vi = var_r, data = data, subset = (mod2 == "level2"))

We then need to create *dummy variables*. A dummy variable assigns a dummy control group and experimental group to each level. This way we can look at how much each level influences the effect size without the influence of other levels.

data$mod1.lev1 <- ifelse(data$mod1 == "level1", 1, 0) data$mod1.lev2 <- ifelse(data$mod1 == "level2", 1, 0) data$mod1.lev3 <- ifelse(data$mod1 == "level3", 1, 0)

data$mod2.lev1 <- ifelse(data$mod2 == "level1", 1, 0) data$mod2.lev2 <- ifelse(data$mod2 == "level2", 1, 0)

Or if you want to use a **for** loop, which will save you a LOT of time, you can use this code below.

First, detect all the levels in each variable:

mod1_levels <- levels(unique(data$mod1, incomparables = FALSE)) mod2_levels <- levels(unique(data$mod2, incomparables = FALSE))

Then code each level and create the necessary dummy variables in a **for** loop:

for (i in 1:length(mod1_levels)) { rma(yi = r, vi = var_r, data = data, subset = (mod1 == mod1_levels[i])) variableName <- paste("mod1_dummy", mod1_levels[i], sep=".") data[variableName] <- ifelse(data$mod1 == mod1_levels[i], 1, 0) }

for (i in 1:length(mod2_levels)) { rma(yi = r, vi = var_r, data = data, subset = (mod2 == mod2_levels[i])) variableName <- paste("mod2_dummy", mod2_levels[i], sep=".") data[variableName] <- ifelse(data$mod2 == mod2_levels[i], 1, 0) }

Now it’s time to analyse the separate effects for each level of our 2 variables, **mod1** and **mod2**, using the code:

m1sep <- rma(yi = r, vi = var_r, mods = ~ factor(mod1) - 1, data = data)

m1sep

And:

m2sep <- rma(yi = r, vi = var_r, mods = ~ factor(mod2) - 1, data = data)

m2sep

Which will give you…

Mixed-Effects Model (k = 42; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 0.0188 (SE = 0.0104) tau (square root of estimated tau^2 value): 0.1372 I^2 (residual heterogeneity / unaccounted variability): 41.08% H^2 (unaccounted variability / sampling variability): 1.70 Test for Residual Heterogeneity: QE(df = 39) = 65.4436, p-val = 0.0050 Test of Moderators (coefficient(s) 1,2,3): QM(df = 3) = 16.6808, p-val = 0.0008 Model Results: estimate se zval pval ci.lb ci.ub factor(mod1)level1 0.1871 0.0609 3.0697 0.0021 0.0676 0.3065 ** factor(mod1)level2 0.1348 0.0533 2.5287 0.0114 0.0303 0.2392 * factor(mod1)level3 0.0543 0.0585 0.9293 0.3527 -0.0603 0.1689 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

And…

Mixed-Effects Model (k = 42; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 0.0207 (SE = 0.0107) tau (square root of estimated tau^2 value): 0.1439 I^2 (residual heterogeneity / unaccounted variability): 43.44% H^2 (unaccounted variability / sampling variability): 1.77 Test for Residual Heterogeneity: QE(df = 40) = 70.0734, p-val = 0.0023 Test of Moderators (coefficient(s) 1,2): QM(df = 2) = 13.6047, p-val = 0.0011 Model Results: estimate se zval pval ci.lb ci.ub factor(mod2)level1 0.3330 0.0478 2.7823 0.0054 0.0393 0.2267 ** factor(mod2)level2 0.1154 0.0477 2.4214 0.0155 0.0220 0.2089 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You’ll get a lot of data here, but what you’re interested in is the* r* value, the 95% CI and the significance value (**pval**). The levels that influence the summary effect size are the ones that are significant indicated with ***** (** p < 0.05**).

In mod1, the results show that **level1** (** p = 0.0021**) and

**level2**(

**) are**

*p*= 0.0114*significant*(

**), but**

*p*< 0.05*not*

**level3**(

**). This means that certain**

*p*> 0.05**Gothic_artists**(

**level1**&

**level2**) influences participants to become Satan worshipers. In

**mod2**, the results show that both

**level1**(

**) and**

*p*= 0.0054**level2**(

**) are**

*p*= 0.0155*significant*(

**). This means that listening to Gothic music during the**

*p*< 0.05**daytime**and

**night time**can influence participants to become Satan worshipers.

We also want to know if there is a difference between the levels, i.e. if listening to Gothic music during the daytime is less likely to make you become a Satan worshiper than listening to music in the night time (or the other way around). In other words, even though both levels in **mod2**, and **level1** and **level2** in **mod1**, influence the summary effect size, is there a difference between the levels?

We can check this by looking at the *overall fit of the model*, which is given as a *Q*-statistic (**QMp**).

Run the code:

m1 <- rma(yi = r, vi = var_r, mods = ~ factor(mod1), data = data)

m1

And…

m2 <- rma(yi = r, vi = var_r, mods = ~ factor(mod2), data = data)

m2

(*Psst*… all we changed between this code and the code to analyse the separate effects for each level is we removed the **-1**, which changes **level1** into the intercept. By including an intercept we can compare the levels to each other.)

And the results are…

Mixed-Effects Model (k = 42; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 0.0188 (SE = 0.0104) tau (square root of estimated tau^2 value): 0.1372 I^2 (residual heterogeneity / unaccounted variability): 41.08% H^2 (unaccounted variability / sampling variability): 1.70 R^2 (amount of heterogeneity accounted for): 5.00% Test for Residual Heterogeneity: QE(df = 39) = 65.4436, p-val = 0.0050 Test of Moderators (coefficient(s) 2,3): QM(df = 2) = 2.5316, p-val = 0.2820 Model Results: estimate se zval pval ci.lb ci.ub intrcpt 0.1871 0.0609 3.0697 0.0021 0.0676 0.3065 ** factor(mod1)level2 -0.0523 0.0810 -0.6461 0.5182 -0.2110 0.1064 factor(mod1)level3 -0.1327 0.0845 -1.5718 0.1160 -0.2983 0.0328 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

And…

Mixed-Effects Model (k = 42; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 0.0207 (SE = 0.0107) tau (square root of estimated tau^2 value): 0.1439 I^2 (residual heterogeneity / unaccounted variability): 43.44% H^2 (unaccounted variability / sampling variability): 1.77 R^2 (amount of heterogeneity accounted for): 0.00% Test for Residual Heterogeneity: QE(df = 40) = 70.0734, p-val = 0.0023 Test of Moderators (coefficient(s) 2): QM(df = 1) = 0.0680, p-val = 0.0251 Model Results: estimate se zval pval ci.lb ci.ub intrcpt 0.3330 0.0478 2.7823 0.0054 0.0393 0.2267 ** factor(mod2)level2 0.0176 0.0675 0.2607 0.0279 0.1499 0.4814 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Again, you’ll get a lot of data, but the most important part is the *Q*-statistic. If this is significant (**QMp < 0.05**), then some levels significantly influence the summary effect size more than others. In **mod1**, our **QMp** is *not significant* (**p****> 0.05**), but in **mod2**, our **QMp** is *significant* (**p****= 0.0251**). We can then look at how many standard deviations (SD) **level2** is from the intercept (**intrcpt**) by adding the **intrcpt **and **level2** together (**0.3330** + **0.0176** = **0.3476**). So it looks like that there isn’t a difference between the different **Gothic_artists**, but there is a difference between whether you listen to Gothic music during the **daytime** vs. during **night time**, and it looks like that listening to Gothic music during the **daytime** has a greater influence on participants becoming Satan worshipers.

Let’s say that in **mod1** our **QMp** was significant and that the **intrcpt** and **level2** were significantly different where the **intrcpt** is higher than **level2**, but the **intrcpt** and **level3** were not significantly different, i.e.:

Mixed-Effects Model (k = 42; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 0.0188 (SE = 0.0104) tau (square root of estimated tau^2 value): 0.1372 I^2 (residual heterogeneity / unaccounted variability): 41.08% H^2 (unaccounted variability / sampling variability): 1.70 R^2 (amount of heterogeneity accounted for): 5.00% Test for Residual Heterogeneity: QE(df = 39) = 65.4436, p-val = 0.0050 Test of Moderators (coefficient(s) 2,3): QM(df = 2) = 2.5316, p-val = 0.0282 Model Results: estimate se zval pval ci.lb ci.ub intrcpt 0.1871 0.0609 3.0697 0.0021 0.0676 0.3065 ** factor(mod1)level2 -0.0523 0.0810 -0.6461 0.0351 -0.2110 0.1064 * factor(mod1)level3 -0.1327 0.0845 -1.5718 0.1160 -0.2983 0.0328 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

…and you want to know if there’s a significant difference between **level2** and **level3**. You could manually code **level2** (or **level3**) to be the **intrcpt**, but this takes a lot of faffing around in my experience. Instead, you could conduct an ANOVA using the **anova** function, like so:

anova(m1sep, L = c(0, 1, -1))

The **L** contains the levels in your moderator, where each digit in the concatenated brackets **c(0, 1, -1)** represents **level1**, **level2** and **level3**. **level1** is set to **0** because you don’t want to analyse it, **level2** is set to **1** and **level3** to **-1** (you can also set **level2** to **-1** and **level3** to **1** and it will give you the same result). Once the levels are set, run the code and you should get something like:

Hypothesis: 1: factor(mod1)level2 - factor(mod1)level3 = 0 Results: estimate se zval pval 1: 0.4360 0.2118 2.0590 0.0395 Test of Hypothesis: QM(df = 1) = 4.2394, p-val = 0.0395

The **QMp** value is significant which means that there is a difference between **level2** and **level3**, and from the original data:

Mixed-Effects Model (k = 42; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 0.0188 (SE = 0.0104) tau (square root of estimated tau^2 value): 0.1372 I^2 (residual heterogeneity / unaccounted variability): 41.08% H^2 (unaccounted variability / sampling variability): 1.70 R^2 (amount of heterogeneity accounted for): 5.00% Test for Residual Heterogeneity: QE(df = 39) = 65.4436, p-val = 0.0050 Test of Moderators (coefficient(s) 2,3): QM(df = 2) = 2.5316, p-val = 0.0282 Model Results: estimate se zval pval ci.lb ci.ub intrcpt 0.1871 0.0609 3.0697 0.0021 0.0676 0.3065 ** factor(mod1)level2 -0.0523 0.0810 -0.6461 0.0351 -0.2110 0.1064 * factor(mod1)level3 -0.1327 0.0845 -1.5718 0.1160 -0.2983 0.0328 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

…the estimates show that **level2** (**-0.0523**) is higher than **level3** (**-0.1327**). It sounds like a lot of complicated interpretation and coding, but all we’ve done here is looked at all the levels and found out which levels influence the effect more than others.

Box plots

Box plots are a great way to visualise your moderators. They show you the distribution of the data, the *interquartile range* (IQR), and any outliers in your data. The image below shows an example of data in a typical Gaussian distribution or bell-curve, and how a box plot represents that data.

The blue box is the IQR and shows where 50% of your data lies. The IQR also has a line through the middle, which indicates where the median of our data is (also known as Q2). The lower whisker shows where 25% of your data falls below the lower quartile (also known as Q1), and the upper whisker shows where 25% of your data falls above the upper quartile (also known as Q3).

Creating a box plot is super easy. *But* I’ve found R to be a bit fussy with box plots, so I’d recommend first filtering your data and saving out a .csv file that contains the columns ID, Study, Year, *r* and the moderator you want to make into a box plot.

Let’s say you want to create a box plot of **mod2** (**Time_of_day**). First, filter your data. You can do this by using the **select** function:

dat_boxPlot_mod2 <- select(data, id, Study, Year, r, mod2)

Then writing your new, filtered data to a .csv file:

write.csv(data_boxPlot_mod2, "importData_boxPlot_mod2.csv")

And loading that new file:

boxPlot_mod2 <- read.csv("C:/.../importData_boxPlot.csv", header = TRUE)

Then you simply use the **plot** function:

plot(r ~ mod2, boxPlot_mod2)

If you want to add in a bit of colour, you can use the rainbow function:

plot(r ~ mod2, boxPlot_mod2, col = rainbow(length(unique(boxPlot_mod2))))

If you find these colours too glaring, like I did, you can also customise the colours by typing them in manually. Remember to type them in in the order you want them to appear in:

plot(r ~ mod2, boxPlot_mod2, col = c("mistyrose", "powderblue"))

Lastly, add a title:

title("Time of day Moderator")

And you’ve got yourself a box plot:

Our box plot shows both levels in our **mod2** variable; **Daytime** and **Night time**, the distribution of both levels, the IQR (plus the Q1, Q2 and Q3), the whiskers and it also shows that we have one outlier in our Daytime level, which is indicated by the white circle below the box plot. For more information on how to create box plots check out the Quick-R box plot guide.

Check for confounding moderators

Lastly, we can check if there are any confounding moderators. In our example, we’re checking to see if **Gothic_artists** (**mod1**) still influences whether you become a Satan worshiper when we control for **Time_of_day** (**mod2**) and vice versa. In other words, we’re checking if **mod1** confounds **mod2** and if **mod2** confounds **mod1**. To do this, we simply conduct an *ANOVA (analysis of variance)* on the two mods. If you want to know more about ANOVAs, this Theory of ANOVA guide and this video on ANOVAs, both by *Andy Field,* give a good overview.

First, fit a *Maximum Likelihood model* (**method = “ML”**) to your two moderators using the code:

m_mod1and2 <- rma(yi = r, vi = var_r, mods = ~ factor(mod1) + factor(mod2), data = data, method = "ML")

And then bring up the results with:

m_mod1and2

The results show similar results like the *overall fit of the model* results, but this time the intercept will take the first levels of both moderators and display all the other levels of your two moderators starting with **mod1** and then **mod2**.

Mixed-Effects Model (k = 42; tau^2 estimator: ML) tau^2 (estimated amount of residual heterogeneity): 0.0135 (SE = 0.0088) tau (square root of estimated tau^2 value): 0.1163 I^2 (residual heterogeneity / unaccounted variability): 33.43% H^2 (unaccounted variability / sampling variability): 1.50 R^2 (amount of heterogeneity accounted for): 28.02% Test for Residual Heterogeneity: QE(df = 38) = 61.4139, p-val = 0.0095 Test of Moderators (coefficient(s) 2,3,4): QM(df = 3) = 5.6818, p-val = 0.1282 Model Results: estimate se zval pval ci.lb ci.ub intrcpt 0.1883 0.0572 3.2927 0.0010 0.0762 0.3004 *** factor(mod1)level2 -0.1311 0.0897 -1.4606 0.1441 -0.3069 0.0448 factor(mod1)level3 -0.3010 0.1282 -2.3486 0.0188 -0.5522 -0.0498 * factor(mod2)level2 0.1670 0.1007 1.6582 0.0973 -0.0304 0.3644 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You can then conduct your *ANOVA* using the **anova** function and specifying which moderator you want to check. We also need to use the **btt** function to select the levels (not the intercept) of the moderator we want to check. Let’s start with **mod1**. In our **btt** function, we need to select the levels minus the intercept in **mod1**. These are the 2nd and 3rd levels (**factor(mod1)level2** & **factor(mod1)level3**) in our results, so we use **btt = 2:3** in the code, like so:

anova(m_mod1and2, btt = 2:3)

Which will give you…

Test of Moderators (coefficient(s) 2,3): QM(df = 2) = 5.5821, p-val = 0.0614

**mod1** was *not significant *(** p > 0.05**). This tells us that

**mod1**does not have an effect when

**mod2**is controlled for. In other words, if we control the

**Time_of_day**that people listen to Gothic music then it doesn’t matter which

**Gothic_artists**they listen to.

Next we’ll check **mod2**. The level we want to select for our **btt** function is the 4th level (**factor(mod2)level2**) in our results. So we use **btt = 4** in our code, like so:

anova(m_mod1and2, btt = 4)

Which will give you…

Test of Moderators (coefficient(s) 4): QM(df = 1) = 2.7498, p-val = 0.0097

Now let’s assume our results show that **mod2** was *significant* (** p = 0.0097**) when controlling for

**mod1**. This means that the

**Time_of_day**is relevant and therefore certain

**Gothic_artists**only influence whether people become Satan worshipers depending on the

**Time_of_day**.

And you’re done! Now it’s time to write up your meta-analysis.

*For the eager beavers our there, you can find all the references, guidelines and stuff that I used compiling this post below. Lastly, feel free to message me. I’m always happy to answer questions, get feedback, or correct stuff that’s outdated, incoherent or needs more explanation. *

*Written by Alison Holland
*

*Special thanks to Florian Zender*

References

Baujat, B., Mahé, C., Pignon, J. P., & Hill, C. (2002). A graphical method for exploring heterogeneity in meta-analyses: application to a meta-analysis of 65 trials. *Statistics in medicine, 21*(18), 2641-2652.

Begg, C. B., & Mazumdar, M. (1994). Operating characteristics of a rank correlation test for publication bias. *Biometrics*, 1088-1101.

Borenstein, M., Hedges, L., & Rothstein, H. (2007). Meta-analysis: Fixed effect vs. random effects. *Meta-analysis. com.*

Borenstein, M., Cooper, H., Hedges, L., & Valentine, J. (2009). Effect sizes for continuous data. *The handbook of research synthesis and meta-analysis, 2,* 221-235.

Coe, R. (2002, September). It’s the effect size, stupid. *In Paper presented at the British **Educational Research Association annual conference* (Vol. 12, p. 14).

Cohn, L. D., & Becker, B. J. (2003). How meta-analysis increases statistical power. *Psychological methods, 8*(3), 243.

Cumming, G. (2014). The new statistics: Why and how. *Psychological Science, 25*(1), 7-29.

Del Re, A. C. (2015). A practical tutorial on conducting meta-analysis in R. *The Quantitative **Methods for Psychology, 11*(1), 37-50.

Del Re, A. C., & Hoyt, W. T. (2010). MAc: Meta-analysis with correlations. R package version 1.0. 5. *Computer software] Available from http://CRAN. R-project. org/packageMAc.*

Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. *Bmj, 315*(7109), 629-634.

Evangelou, E., Ioannidis, J. P., & Patsopoulos, N. A. (2007). Uncertainty in heterogeneity estimates in meta-analyses. *BMJ: British Medical Journal, 335*(7626), 914-916.

Field, A. (1999). A bluffer’s guide to meta-analysis 1. *Newsletter of the Mathematical, Statistical and computing section of the British Psychological Society*, *7*, 16-25.

Hedges, L. V., & Vevea, J. L. (1998). Fixed-and random-effects models in meta-analysis. *Psychological methods, 3*(4), 486.

Higgins, J. P., Thompson, S. G., Deeks, J. J., & Altman, D. G. (2003). Measuring inconsistency in meta-analyses. *Bmj, 327*(7414), 557-560.

Hunter, J. E., & Schmidt, F. L. (2000). Fixed effects vs. random effects meta-analysis models: implications for cumulative research knowledge. *International Journal of Selection and Assessment, 8*(4), 275-292.

Hunter, J. E., & Schmidt, F. L. (2004). *Methods of meta-analysis: Correcting error and bias **in research findings *(2nd ed.). Thousand Oaks, CA: Sage.

Kock, A., & Gemünden, H. G. (2009). A guideline to Meta-analysis. Retrieved from https://www.tim.tu-berlin.de/fileadmin/fg101/TIM_Working_Paper_Series/Volume_2/TIM_WPS_Kock_2009.pdf

Quintana, D. S. (2015). From pre-registration to publication: a non-technical primer for conducting a meta-analysis to synthesize correlational data. *Frontiers in psychology, 6*.

R Development Core Team (2015). R: A language and environment for statistical computing. *R Foundation for Statistical Computing, Vienna, Austria*. 2013.

Ried, K. (2006). Interpreting and understanding meta-analysis graphs: a practical guide.

Rosenthal, R., & DiMatteo, M. R. (2001). Meta-analysis: Recent developments in quantitative methods for literature reviews. *Annual Review of Psychology, 52*(1), 59-82.

Sánchez-Meca, J., & Marín-Martínez, F., (2010). Meta-analysis in psychological research. *International Journal of Psychological Research, 3*(1), 151-163.

Sterne, J. A., Gavaghan, D., & Egger, M. (2000). Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. *Journal of clinical **epidemiology, 53*(11), 1119-1129.

Valentine, J. C., Pigott, T. D., & Rothstein, H. R. (2010). How many studies do you need? A primer on statistical power for meta-analysis. *Journal of Educational and Behavioral Statistics*, *35*(2), 215-247.

Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. *J Stat **Softw, 36*(3), 1-48.

Viechtbauer, W., & Viechtbauer, M. W. (2015). Package ‘metafor’. *The Comprehensive R Archive Network. Package ‘metafor’. http://cran. r-project. org/web/packages/metafor/metafor. pdf*.

Wampold, B. E., Mondin, G. W., Moody, M., Stich, F., Benson, K., & Ahn, H. N. (1997). A meta-analysis of outcome studies comparing bona fide psychotherapies: Empirically,” all must have prizes.”. *Psychological bulletin, 122*(3), 203.

Wang, M. C., & Bushman, B. J. (1998). Using the normal quantile plot to explore meta-analytic data sets. *Psychological Methods, 3*(1), 46.

## 1 Comment

Comments are closed.