Systematic Reviews and Meta Analysis

  • Getting Started
  • Guides and Standards
  • Review Protocols
  • Databases and Sources
  • Randomized Controlled Trials
  • Controlled Clinical Trials
  • Observational Designs
  • Tests of Diagnostic Accuracy
  • Software and Tools
  • Where do I get all those articles?
  • Collaborations
  • EPI 233/528
  • Countway Mediated Search
  • Risk of Bias (RoB)

Systematic review Q & A

What is a systematic review.

A systematic review is guided filtering and synthesis of all available evidence addressing a specific, focused research question, generally about a specific intervention or exposure. The use of standardized, systematic methods and pre-selected eligibility criteria reduce the risk of bias in identifying, selecting and analyzing relevant studies. A well-designed systematic review includes clear objectives, pre-selected criteria for identifying eligible studies, an explicit methodology, a thorough and reproducible search of the literature, an assessment of the validity or risk of bias of each included study, and a systematic synthesis, analysis and presentation of the findings of the included studies. A systematic review may include a meta-analysis.

For details about carrying out systematic reviews, see the Guides and Standards section of this guide.

Is my research topic appropriate for systematic review methods?

A systematic review is best deployed to test a specific hypothesis about a healthcare or public health intervention or exposure. By focusing on a single intervention or a few specific interventions for a particular condition, the investigator can ensure a manageable results set. Moreover, examining a single or small set of related interventions, exposures, or outcomes, will simplify the assessment of studies and the synthesis of the findings.

Systematic reviews are poor tools for hypothesis generation: for instance, to determine what interventions have been used to increase the awareness and acceptability of a vaccine or to investigate the ways that predictive analytics have been used in health care management. In the first case, we don't know what interventions to search for and so have to screen all the articles about awareness and acceptability. In the second, there is no agreed on set of methods that make up predictive analytics, and health care management is far too broad. The search will necessarily be incomplete, vague and very large all at the same time. In most cases, reviews without clearly and exactly specified populations, interventions, exposures, and outcomes will produce results sets that quickly outstrip the resources of a small team and offer no consistent way to assess and synthesize findings from the studies that are identified.

If not a systematic review, then what?

You might consider performing a scoping review . This framework allows iterative searching over a reduced number of data sources and no requirement to assess individual studies for risk of bias. The framework includes built-in mechanisms to adjust the analysis as the work progresses and more is learned about the topic. A scoping review won't help you limit the number of records you'll need to screen (broad questions lead to large results sets) but may give you means of dealing with a large set of results.

This tool can help you decide what kind of review is right for your question.

Can my student complete a systematic review during her summer project?

Probably not. Systematic reviews are a lot of work. Including creating the protocol, building and running a quality search, collecting all the papers, evaluating the studies that meet the inclusion criteria and extracting and analyzing the summary data, a well done review can require dozens to hundreds of hours of work that can span several months. Moreover, a systematic review requires subject expertise, statistical support and a librarian to help design and run the search. Be aware that librarians sometimes have queues for their search time. It may take several weeks to complete and run a search. Moreover, all guidelines for carrying out systematic reviews recommend that at least two subject experts screen the studies identified in the search. The first round of screening can consume 1 hour per screener for every 100-200 records. A systematic review is a labor-intensive team effort.

How can I know if my topic has been been reviewed already?

Before starting out on a systematic review, check to see if someone has done it already. In PubMed you can use the systematic review subset to limit to a broad group of papers that is enriched for systematic reviews. You can invoke the subset by selecting if from the Article Types filters to the left of your PubMed results, or you can append AND systematic[sb] to your search. For example:

"neoadjuvant chemotherapy" AND systematic[sb]

The systematic review subset is very noisy, however. To quickly focus on systematic reviews (knowing that you may be missing some), simply search for the word systematic in the title:

"neoadjuvant chemotherapy" AND systematic[ti]

Any PRISMA-compliant systematic review will be captured by this method since including the words "systematic review" in the title is a requirement of the PRISMA checklist. Cochrane systematic reviews do not include 'systematic' in the title, however. It's worth checking the Cochrane Database of Systematic Reviews independently.

You can also search for protocols that will indicate that another group has set out on a similar project. Many investigators will register their protocols in PROSPERO , a registry of review protocols. Other published protocols as well as Cochrane Review protocols appear in the Cochrane Methodology Register, a part of the Cochrane Library .

  • Next: Guides and Standards >>
  • Last Updated: Feb 26, 2024 3:17 PM
  • URL: https://guides.library.harvard.edu/meta-analysis
         


10 Shattuck St, Boston MA 02115 | (617) 432-2136

| |
Copyright © 2020 President and Fellows of Harvard College. All rights reserved.

Loading metrics

Open Access

Ten simple rules for carrying out and writing meta-analyses

* E-mail: [email protected] .

Affiliations Laboratory of NeuroPsychiatric Genetics, Biomedical Sciences Research Group, School of Medicine, Universidad Antonio Nariño, Bogotá, Colombia, PhD Program in Health Sciences, School of Medicine, Universidad Antonio Nariño, Bogotá, Colombia

ORCID logo

Affiliation Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, United States of America

Affiliation Departamento de Nutrición y Bioquímica, Facultad de Ciencias, Pontificia Universidad Javeriana, Bogotá., Colombia

Affiliation Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece

  • Diego A. Forero, 
  • Sandra Lopez-Leon, 
  • Yeimy González-Giraldo, 
  • Pantelis G. Bagos

PLOS

Published: May 16, 2019

  • https://doi.org/10.1371/journal.pcbi.1006922
  • Reader Comments

Citation: Forero DA, Lopez-Leon S, González-Giraldo Y, Bagos PG (2019) Ten simple rules for carrying out and writing meta-analyses. PLoS Comput Biol 15(5): e1006922. https://doi.org/10.1371/journal.pcbi.1006922

Editor: Scott Markel, Dassault Systemes BIOVIA, UNITED STATES

Copyright: © 2019 Forero et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: YG-G is supported by a PhD fellowship from Centro de Estudios Interdisciplinarios Básicos y Aplicados CEIBA (Rodolfo Llinás Program). DAF is supported by research grants from Colciencias and VCTI. PGB is partially supported by ELIXIR-GR, the Greek Research Infrastructure for data management and analysis in the biosciences. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In the context of evidence-based medicine, meta-analyses provide novel and useful information [ 1 ], as they are at the top of the pyramid of evidence and consolidate previous evidence published in multiple previous reports [ 2 ]. Meta-analysis is a powerful tool to cumulate and summarize the knowledge in a research field [ 3 ]. Because of the significant increase in the published scientific literature in recent years, there has also been an important growth in the number of meta-analyses for a large number of topics [ 4 ]. It has been found that meta-analyses are among the types of publications that usually receive a larger number of citations in the biomedical sciences [ 5 , 6 ]. The methods and standards for carrying out meta-analyses have evolved in recent years [ 7 – 9 ].

Although there are several published articles describing comprehensive guidelines for specific types of meta-analyses, there is still the need for an abridged article with general and updated recommendations for researchers interested in the development of meta-analyses. We present here ten simple rules for carrying out and writing meta-analyses.

Rule 1: Specify the topic and type of the meta-analysis

Considering that a systematic review [ 10 ] is fundamental for a meta-analysis, you can use the Population, Intervention, Comparison, Outcome (PICO) model to formulate the research question. It is important to verify that there are no published meta-analyses on the specific topic in order to avoid duplication of efforts [ 11 ]. In some cases, an updated meta-analysis in a topic is needed if additional data become available. It is possible to carry out meta-analyses for multiple types of studies, such as epidemiological variables for case-control, cohort, and randomized clinical trials. As observational studies have a larger possibility of having several biases, meta-analyses of these types of designs should take that into account. In addition, there is the possibility to carry out meta-analyses for genetic association studies, gene expression studies, genome-wide association studies (GWASs), or data from animal experiments. It is advisable to preregister the systematic review protocols at the International Prospective Register of Systematic Reviews (PROSPERO; https://www.crd.york.ac.uk/Prospero ) database [ 12 ]. Keep in mind that an increasing number of journals require registration prior to publication.

Rule 2: Follow available guidelines for different types of meta-analyses

There are several available general guidelines. The first of such efforts were the Quality of Reports of Meta-analyses of Randomized Controlled Trials (QUORUM) [ 13 ] and the Meta-analysis of Observational Studies in Epidemiology (MOOSE) statements [ 14 ], but currently, the Preferred Reporting Items for Systematic reviews and Meta-analyses (PRISMA) [ 15 ] has been broadly cited and used. In addition, there have been efforts to develop specific guidelines regarding meta-analyses for clinical studies (Cochrane Handbook; https://training.cochrane.org/handbook ), genetic association studies [ 16 ], genome-wide expression studies [ 17 ], GWASs [ 18 ], and animal studies [ 19 ].

Rule 3: Establish inclusion criteria and define key variables

You should establish in advance the inclusion (such as type of study, language of publication, among others) and exclusion (such as minimal sample size, among others) criteria. Keep in mind that the current consensus advises against strict criteria concerning language or sample size. You should clearly define the variables that will be extracted from each primary article. Broad inclusion criteria increase heterogeneity between studies, and narrow inclusion criteria can make it difficult to find studies; therefore, a compromise should be found. Prospective meta-analyses, which usually are carried out by international consortia, have the advantage of the possibility of including individual-level data [ 20 ].

Rule 4: Carry out a systematic search in different databases and extract key data

You can carry out your systematic search in several bibliographic databases, such as PubMed, Embase, The Cochrane Central Register of Controlled Trials, Scopus, Web of Science, and Google Scholar [ 21 ]. Usually, searching in several databases helps to minimize the possibility of failing to identify all published studies [ 22 ]. In some specific areas, searching in specialized databases is also worth doing (such as BIOSIS, Cumulative index to Nursing and Allied Health Literature (CINAHL), PsycINFO, Sociological Abstracts, and EconLit, among others). Moreover, in other cases, direct search for the data is also advisable (i.e., Gene Expression Omnibus [GEO] database for gene expression studies) [ 23 ]. Usually, the bibliography of review articles might help to identify additional articles and data from other types of documents (such as theses or conference proceedings) that might be included in your meta-analysis. The Web of Science database can be used to identify publications that have cited key articles. Adequate extraction and recording of key data from primary articles are fundamental for carrying out a meta-analysis. Quality assessment of the included studies is also an important issue; it can be used for determining inclusion criteria, sensitivity analysis, or differential weighting of the studies. For example the Jadad scale [ 24 ] is frequently used for randomized clinical trials, the Newcastle–Ottawa scale [ 25 ] for nonrandomized studies, and QUADAS-2 for the Quality Assessment of Diagnostic Accuracy Studies [ 26 ]. It is recommended that these steps be carried out by two researchers in parallel and that discrepancies be resolved by consensus. Nevertheless, the reader must be aware that quality assessment has been criticized, especially when it reduces the studies to a single “quality” score [ 27 , 28 ]. In any case, it is important to avoid the confusion of using guidelines for the reporting of primary studies as scales for the assessment of the quality of included articles [ 29 , 30 ].

Rule 5: Contact authors of primary articles to ask for missing data

It is common that key data are not available in the main text or supplementary files of primary articles [ 31 ], leading to the need to contact the authors to ask for missing data. However, the rate of response from authors is lower than expected. There are multiple standards that promote the availability of primary data in published articles, such as the minimum information about a microarray experiment (MIAME) [ 32 ] and the STrengthening the REporting of Genetic Association Studies (STREGA) [ 33 ]. In some areas, such as genetics, in which it was shown that it is possible to identify an individual using the aggregated statistics from a particular study [ 34 ], strict criteria are imposed for data sharing, and specialized permissions might be needed.

Rule 6: Select the best statistical models for your question

For cases in which there is enough primary data of adequate quality for a quantitative summary, there is the option to carry out a meta-analysis. The potential analyst must be warned that in many cases the data are reported in noncompatible forms, so one must be ready to perform various types of transformations. Thankfully, there are methods available for extracting and transforming data regarding continuous variables [ 35 – 37 ], 2 × 2 tables [ 38 , 39 ], or survival data [ 40 ]. Frequently, meta-analyses are based on fixed-effects or random-effects statistical models [ 20 ]. In addition, models based on combining ranks or p -values are also available and can be used in specific cases [ 41 – 44 ]. For more complex data, multivariate methods for meta-analysis have been proposed [ 45 , 46 ]. Additional statistical examinations involve sensitivity analyses, metaregressions, subgroup analyses, and calculation of heterogeneity metrics, such as Q or I 2 [ 20 ]. It is fundamental to assess and, if present, explain the possible sources of heterogeneity. Although random-effects models are suitable for cases of between-studies heterogeneity, the sources of between-studies variation should be identified, and their impact on effect size should be quantified using statistical tests, such as subgroup analyses or metaregression. Publication bias is an important aspect to consider [ 47 ], since in many cases negative findings have less probability of being published. Other types of bias, such as the so-called “Proteus phenomenon” [ 48 ] or “winner’s curse” [ 49 ], are common in some scientific fields, such as genetics, and the approach of cumulative meta-analysis is suggested in order to identify them.

Rule 7: Use available software to carry metastatistics

There are several very user-friendly and freely available programs for carrying out meta-analyses [ 43 , 44 ], either within the framework of a statistical package such as Stata or R or as stand-alone applications. Stata and R [ 50 – 52 ] have dozens of routines, mostly user written, that can handle most meta-analysis tasks, even complex analyses such as network meta-analysis and meta-analyses of GWASs and gene expression studies ( https://cran.r-project.org/web/views/MetaAnalysis.html ; https://www.stata.com/support/faqs/statistics/meta-analysis ). There are also stand-alone packages that can be useful for general applications or for specific areas, such as OpenMetaAnalyst [ 53 ], NetworkAnalyst [ 54 ], JASP [ 55 ], MetaGenyo [ 56 ], Cochrane RevMan ( https://community.cochrane.org/help/tools-and-software/revman-5 ), EpiSheet (krothman.org/episheet.xls), GWAR [ 57 ], GWAMA [ 58 ], and METAL [ 59 ]. Some of these programs are web services or stand-alone software. In some cases, certain programs can present issues when they are run because of their dependency on other packages.

Rule 8: The records and study report must be complete and transparent

Following published guidelines for meta-analyses guarantees that the manuscript will describe the different steps and methods used, facilitating their transparency and replicability [ 15 ]. Data such as search and inclusion criteria, numbers of abstracts screened, and included studies are quite useful, in addition to details of meta-analytical strategies used. An assessment of quality of included studies is also useful [ 60 ]. A spreadsheet can be constructed in which every step in the selection criteria is recorded; this will be helpful to construct flow charts. In this context, a flow diagram describing the progression between the different steps is quite useful and might enhance the quality of the meta-analysis [ 61 ]. Records will be also useful if, in the future, the meta-analysis needs to be updated. Stating the limitations of the analysis is also important [ 62 ].

Rule 9: Provide enough data in your manuscript

A table with complete information about included studies (such as author, year, details of included subjects, DOIs, or PubMed IDs, among others) is quite useful in an article reporting a meta-analysis; it can be included in the main text of the manuscript or as a supplementary file. Software used for carrying out meta-analyses and to generate key graphs, such as forest plots, should be referenced. Summary effect measures, such as a pooled odds ratios or the counts used to generate them, should be always reported, including confidence intervals. It is also possible to generate figures with information from multiple forest plots [ 63 ]. In the case of positive findings, plots from sensitivity analyses are quite informative. In more-complex analyses, it is advisable to include in the supplementary files the scripts used to generate the results [ 64 ].

Rule 10: Provide context for your findings and suggest future directions

The Discussion section is an important scientific component in a manuscript describing a meta-analysis, as the authors should discuss their current findings in the context of the available scientific literature and existing knowledge [ 65 ]. Authors can discuss possible reasons for the positive or negative results of their meta-analysis, provide an interpretation of findings based on available biological or epidemiological evidence, and comment on particular features of individual studies or experimental designs used [ 66 ]. As meta-analyses are usually synthesizing the existing evidence from multiple primary studies, which commonly took years and large amounts of funding, authors can recommend key suggestions for conducting and/or reporting future primary studies [ 67 ].

As open science is becoming more important around the globe [ 68 , 69 ], adherence to published standards, in addition to the evolution of methods for different meta-analytical applications, will be even more important to carry out meta-analyses of high quality and impact.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 47. Rothstein HR, Sutton AJ, Borenstein M (2006) Publication bias in meta-analysis: Prevention, assessment and adjustments. Hoboken, NJ: John Wiley & Sons.
  • 50. Sterne JA, Bradburn MJ, Egger M (2001) Meta‒Analysis in Stata™. In: Egger M, Smith GD, Altman DG, editors. Systematic reviews in health care: meta‐analysis in context. Hoboken, NJ: Wiley. pp. 347–369.
  • Open access
  • Published: 01 August 2019

A step by step guide for conducting a systematic review and meta-analysis with simulation data

  • Gehad Mohamed Tawfik 1 , 2 ,
  • Kadek Agus Surya Dila 2 , 3 ,
  • Muawia Yousif Fadlelmola Mohamed 2 , 4 ,
  • Dao Ngoc Hien Tam 2 , 5 ,
  • Nguyen Dang Kien 2 , 6 ,
  • Ali Mahmoud Ahmed 2 , 7 &
  • Nguyen Tien Huy 8 , 9 , 10  

Tropical Medicine and Health volume  47 , Article number:  46 ( 2019 ) Cite this article

815k Accesses

304 Citations

94 Altmetric

Metrics details

The massive abundance of studies relating to tropical medicine and health has increased strikingly over the last few decades. In the field of tropical medicine and health, a well-conducted systematic review and meta-analysis (SR/MA) is considered a feasible solution for keeping clinicians abreast of current evidence-based medicine. Understanding of SR/MA steps is of paramount importance for its conduction. It is not easy to be done as there are obstacles that could face the researcher. To solve those hindrances, this methodology study aimed to provide a step-by-step approach mainly for beginners and junior researchers, in the field of tropical medicine and other health care fields, on how to properly conduct a SR/MA, in which all the steps here depicts our experience and expertise combined with the already well-known and accepted international guidance.

We suggest that all steps of SR/MA should be done independently by 2–3 reviewers’ discussion, to ensure data quality and accuracy.

SR/MA steps include the development of research question, forming criteria, search strategy, searching databases, protocol registration, title, abstract, full-text screening, manual searching, extracting data, quality assessment, data checking, statistical analysis, double data checking, and manuscript writing.

Introduction

The amount of studies published in the biomedical literature, especially tropical medicine and health, has increased strikingly over the last few decades. This massive abundance of literature makes clinical medicine increasingly complex, and knowledge from various researches is often needed to inform a particular clinical decision. However, available studies are often heterogeneous with regard to their design, operational quality, and subjects under study and may handle the research question in a different way, which adds to the complexity of evidence and conclusion synthesis [ 1 ].

Systematic review and meta-analyses (SR/MAs) have a high level of evidence as represented by the evidence-based pyramid. Therefore, a well-conducted SR/MA is considered a feasible solution in keeping health clinicians ahead regarding contemporary evidence-based medicine.

Differing from a systematic review, unsystematic narrative review tends to be descriptive, in which the authors select frequently articles based on their point of view which leads to its poor quality. A systematic review, on the other hand, is defined as a review using a systematic method to summarize evidence on questions with a detailed and comprehensive plan of study. Furthermore, despite the increasing guidelines for effectively conducting a systematic review, we found that basic steps often start from framing question, then identifying relevant work which consists of criteria development and search for articles, appraise the quality of included studies, summarize the evidence, and interpret the results [ 2 , 3 ]. However, those simple steps are not easy to be reached in reality. There are many troubles that a researcher could be struggled with which has no detailed indication.

Conducting a SR/MA in tropical medicine and health may be difficult especially for young researchers; therefore, understanding of its essential steps is crucial. It is not easy to be done as there are obstacles that could face the researcher. To solve those hindrances, we recommend a flow diagram (Fig. 1 ) which illustrates a detailed and step-by-step the stages for SR/MA studies. This methodology study aimed to provide a step-by-step approach mainly for beginners and junior researchers, in the field of tropical medicine and other health care fields, on how to properly and succinctly conduct a SR/MA; all the steps here depicts our experience and expertise combined with the already well known and accepted international guidance.

figure 1

Detailed flow diagram guideline for systematic review and meta-analysis steps. Note : Star icon refers to “2–3 reviewers screen independently”

Methods and results

Detailed steps for conducting any systematic review and meta-analysis.

We searched the methods reported in published SR/MA in tropical medicine and other healthcare fields besides the published guidelines like Cochrane guidelines {Higgins, 2011 #7} [ 4 ] to collect the best low-bias method for each step of SR/MA conduction steps. Furthermore, we used guidelines that we apply in studies for all SR/MA steps. We combined these methods in order to conclude and conduct a detailed flow diagram that shows the SR/MA steps how being conducted.

Any SR/MA must follow the widely accepted Preferred Reporting Items for Systematic Review and Meta-analysis statement (PRISMA checklist 2009) (Additional file 5 : Table S1) [ 5 ].

We proposed our methods according to a valid explanatory simulation example choosing the topic of “evaluating safety of Ebola vaccine,” as it is known that Ebola is a very rare tropical disease but fatal. All the explained methods feature the standards followed internationally, with our compiled experience in the conduct of SR beside it, which we think proved some validity. This is a SR under conduct by a couple of researchers teaming in a research group, moreover, as the outbreak of Ebola which took place (2013–2016) in Africa resulted in a significant mortality and morbidity. Furthermore, since there are many published and ongoing trials assessing the safety of Ebola vaccines, we thought this would provide a great opportunity to tackle this hotly debated issue. Moreover, Ebola started to fire again and new fatal outbreak appeared in the Democratic Republic of Congo since August 2018, which caused infection to more than 1000 people according to the World Health Organization, and 629 people have been killed till now. Hence, it is considered the second worst Ebola outbreak, after the first one in West Africa in 2014 , which infected more than 26,000 and killed about 11,300 people along outbreak course.

Research question and objectives

Like other study designs, the research question of SR/MA should be feasible, interesting, novel, ethical, and relevant. Therefore, a clear, logical, and well-defined research question should be formulated. Usually, two common tools are used: PICO or SPIDER. PICO (Population, Intervention, Comparison, Outcome) is used mostly in quantitative evidence synthesis. Authors demonstrated that PICO holds more sensitivity than the more specific SPIDER approach [ 6 ]. SPIDER (Sample, Phenomenon of Interest, Design, Evaluation, Research type) was proposed as a method for qualitative and mixed methods search.

We here recommend a combined approach of using either one or both the SPIDER and PICO tools to retrieve a comprehensive search depending on time and resources limitations. When we apply this to our assumed research topic, being of qualitative nature, the use of SPIDER approach is more valid.

PICO is usually used for systematic review and meta-analysis of clinical trial study. For the observational study (without intervention or comparator), in many tropical and epidemiological questions, it is usually enough to use P (Patient) and O (outcome) only to formulate a research question. We must indicate clearly the population (P), then intervention (I) or exposure. Next, it is necessary to compare (C) the indicated intervention with other interventions, i.e., placebo. Finally, we need to clarify which are our relevant outcomes.

To facilitate comprehension, we choose the Ebola virus disease (EVD) as an example. Currently, the vaccine for EVD is being developed and under phase I, II, and III clinical trials; we want to know whether this vaccine is safe and can induce sufficient immunogenicity to the subjects.

An example of a research question for SR/MA based on PICO for this issue is as follows: How is the safety and immunogenicity of Ebola vaccine in human? (P: healthy subjects (human), I: vaccination, C: placebo, O: safety or adverse effects)

Preliminary research and idea validation

We recommend a preliminary search to identify relevant articles, ensure the validity of the proposed idea, avoid duplication of previously addressed questions, and assure that we have enough articles for conducting its analysis. Moreover, themes should focus on relevant and important health-care issues, consider global needs and values, reflect the current science, and be consistent with the adopted review methods. Gaining familiarity with a deep understanding of the study field through relevant videos and discussions is of paramount importance for better retrieval of results. If we ignore this step, our study could be canceled whenever we find out a similar study published before. This means we are wasting our time to deal with a problem that has been tackled for a long time.

To do this, we can start by doing a simple search in PubMed or Google Scholar with search terms Ebola AND vaccine. While doing this step, we identify a systematic review and meta-analysis of determinant factors influencing antibody response from vaccination of Ebola vaccine in non-human primate and human [ 7 ], which is a relevant paper to read to get a deeper insight and identify gaps for better formulation of our research question or purpose. We can still conduct systematic review and meta-analysis of Ebola vaccine because we evaluate safety as a different outcome and different population (only human).

Inclusion and exclusion criteria

Eligibility criteria are based on the PICO approach, study design, and date. Exclusion criteria mostly are unrelated, duplicated, unavailable full texts, or abstract-only papers. These exclusions should be stated in advance to refrain the researcher from bias. The inclusion criteria would be articles with the target patients, investigated interventions, or the comparison between two studied interventions. Briefly, it would be articles which contain information answering our research question. But the most important is that it should be clear and sufficient information, including positive or negative, to answer the question.

For the topic we have chosen, we can make inclusion criteria: (1) any clinical trial evaluating the safety of Ebola vaccine and (2) no restriction regarding country, patient age, race, gender, publication language, and date. Exclusion criteria are as follows: (1) study of Ebola vaccine in non-human subjects or in vitro studies; (2) study with data not reliably extracted, duplicate, or overlapping data; (3) abstract-only papers as preceding papers, conference, editorial, and author response theses and books; (4) articles without available full text available; and (5) case reports, case series, and systematic review studies. The PRISMA flow diagram template that is used in SR/MA studies can be found in Fig. 2 .

figure 2

PRISMA flow diagram of studies’ screening and selection

Search strategy

A standard search strategy is used in PubMed, then later it is modified according to each specific database to get the best relevant results. The basic search strategy is built based on the research question formulation (i.e., PICO or PICOS). Search strategies are constructed to include free-text terms (e.g., in the title and abstract) and any appropriate subject indexing (e.g., MeSH) expected to retrieve eligible studies, with the help of an expert in the review topic field or an information specialist. Additionally, we advise not to use terms for the Outcomes as their inclusion might hinder the database being searched to retrieve eligible studies because the used outcome is not mentioned obviously in the articles.

The improvement of the search term is made while doing a trial search and looking for another relevant term within each concept from retrieved papers. To search for a clinical trial, we can use these descriptors in PubMed: “clinical trial”[Publication Type] OR “clinical trials as topic”[MeSH terms] OR “clinical trial”[All Fields]. After some rounds of trial and refinement of search term, we formulate the final search term for PubMed as follows: (ebola OR ebola virus OR ebola virus disease OR EVD) AND (vaccine OR vaccination OR vaccinated OR immunization) AND (“clinical trial”[Publication Type] OR “clinical trials as topic”[MeSH Terms] OR “clinical trial”[All Fields]). Because the study for this topic is limited, we do not include outcome term (safety and immunogenicity) in the search term to capture more studies.

Search databases, import all results to a library, and exporting to an excel sheet

According to the AMSTAR guidelines, at least two databases have to be searched in the SR/MA [ 8 ], but as you increase the number of searched databases, you get much yield and more accurate and comprehensive results. The ordering of the databases depends mostly on the review questions; being in a study of clinical trials, you will rely mostly on Cochrane, mRCTs, or International Clinical Trials Registry Platform (ICTRP). Here, we propose 12 databases (PubMed, Scopus, Web of Science, EMBASE, GHL, VHL, Cochrane, Google Scholar, Clinical trials.gov , mRCTs, POPLINE, and SIGLE), which help to cover almost all published articles in tropical medicine and other health-related fields. Among those databases, POPLINE focuses on reproductive health. Researchers should consider to choose relevant database according to the research topic. Some databases do not support the use of Boolean or quotation; otherwise, there are some databases that have special searching way. Therefore, we need to modify the initial search terms for each database to get appreciated results; therefore, manipulation guides for each online database searches are presented in Additional file 5 : Table S2. The detailed search strategy for each database is found in Additional file 5 : Table S3. The search term that we created in PubMed needs customization based on a specific characteristic of the database. An example for Google Scholar advanced search for our topic is as follows:

With all of the words: ebola virus

With at least one of the words: vaccine vaccination vaccinated immunization

Where my words occur: in the title of the article

With all of the words: EVD

Finally, all records are collected into one Endnote library in order to delete duplicates and then to it export into an excel sheet. Using remove duplicating function with two options is mandatory. All references which have (1) the same title and author, and published in the same year, and (2) the same title and author, and published in the same journal, would be deleted. References remaining after this step should be exported to an excel file with essential information for screening. These could be the authors’ names, publication year, journal, DOI, URL link, and abstract.

Protocol writing and registration

Protocol registration at an early stage guarantees transparency in the research process and protects from duplication problems. Besides, it is considered a documented proof of team plan of action, research question, eligibility criteria, intervention/exposure, quality assessment, and pre-analysis plan. It is recommended that researchers send it to the principal investigator (PI) to revise it, then upload it to registry sites. There are many registry sites available for SR/MA like those proposed by Cochrane and Campbell collaborations; however, we recommend registering the protocol into PROSPERO as it is easier. The layout of a protocol template, according to PROSPERO, can be found in Additional file 5 : File S1.

Title and abstract screening

Decisions to select retrieved articles for further assessment are based on eligibility criteria, to minimize the chance of including non-relevant articles. According to the Cochrane guidance, two reviewers are a must to do this step, but as for beginners and junior researchers, this might be tiresome; thus, we propose based on our experience that at least three reviewers should work independently to reduce the chance of error, particularly in teams with a large number of authors to add more scrutiny and ensure proper conduct. Mostly, the quality with three reviewers would be better than two, as two only would have different opinions from each other, so they cannot decide, while the third opinion is crucial. And here are some examples of systematic reviews which we conducted following the same strategy (by a different group of researchers in our research group) and published successfully, and they feature relevant ideas to tropical medicine and disease [ 9 , 10 , 11 ].

In this step, duplications will be removed manually whenever the reviewers find them out. When there is a doubt about an article decision, the team should be inclusive rather than exclusive, until the main leader or PI makes a decision after discussion and consensus. All excluded records should be given exclusion reasons.

Full text downloading and screening

Many search engines provide links for free to access full-text articles. In case not found, we can search in some research websites as ResearchGate, which offer an option of direct full-text request from authors. Additionally, exploring archives of wanted journals, or contacting PI to purchase it if available. Similarly, 2–3 reviewers work independently to decide about included full texts according to eligibility criteria, with reporting exclusion reasons of articles. In case any disagreement has occurred, the final decision has to be made by discussion.

Manual search

One has to exhaust all possibilities to reduce bias by performing an explicit hand-searching for retrieval of reports that may have been dropped from first search [ 12 ]. We apply five methods to make manual searching: searching references from included studies/reviews, contacting authors and experts, and looking at related articles/cited articles in PubMed and Google Scholar.

We describe here three consecutive methods to increase and refine the yield of manual searching: firstly, searching reference lists of included articles; secondly, performing what is known as citation tracking in which the reviewers track all the articles that cite each one of the included articles, and this might involve electronic searching of databases; and thirdly, similar to the citation tracking, we follow all “related to” or “similar” articles. Each of the abovementioned methods can be performed by 2–3 independent reviewers, and all the possible relevant article must undergo further scrutiny against the inclusion criteria, after following the same records yielded from electronic databases, i.e., title/abstract and full-text screening.

We propose an independent reviewing by assigning each member of the teams a “tag” and a distinct method, to compile all the results at the end for comparison of differences and discussion and to maximize the retrieval and minimize the bias. Similarly, the number of included articles has to be stated before addition to the overall included records.

Data extraction and quality assessment

This step entitles data collection from included full-texts in a structured extraction excel sheet, which is previously pilot-tested for extraction using some random studies. We recommend extracting both adjusted and non-adjusted data because it gives the most allowed confounding factor to be used in the analysis by pooling them later [ 13 ]. The process of extraction should be executed by 2–3 independent reviewers. Mostly, the sheet is classified into the study and patient characteristics, outcomes, and quality assessment (QA) tool.

Data presented in graphs should be extracted by software tools such as Web plot digitizer [ 14 ]. Most of the equations that can be used in extraction prior to analysis and estimation of standard deviation (SD) from other variables is found inside Additional file 5 : File S2 with their references as Hozo et al. [ 15 ], Xiang et al. [ 16 ], and Rijkom et al. [ 17 ]. A variety of tools are available for the QA, depending on the design: ROB-2 Cochrane tool for randomized controlled trials [ 18 ] which is presented as Additional file 1 : Figure S1 and Additional file 2 : Figure S2—from a previous published article data—[ 19 ], NIH tool for observational and cross-sectional studies [ 20 ], ROBINS-I tool for non-randomize trials [ 21 ], QUADAS-2 tool for diagnostic studies, QUIPS tool for prognostic studies, CARE tool for case reports, and ToxRtool for in vivo and in vitro studies. We recommend that 2–3 reviewers independently assess the quality of the studies and add to the data extraction form before the inclusion into the analysis to reduce the risk of bias. In the NIH tool for observational studies—cohort and cross-sectional—as in this EBOLA case, to evaluate the risk of bias, reviewers should rate each of the 14 items into dichotomous variables: yes, no, or not applicable. An overall score is calculated by adding all the items scores as yes equals one, while no and NA equals zero. A score will be given for every paper to classify them as poor, fair, or good conducted studies, where a score from 0–5 was considered poor, 6–9 as fair, and 10–14 as good.

In the EBOLA case example above, authors can extract the following information: name of authors, country of patients, year of publication, study design (case report, cohort study, or clinical trial or RCT), sample size, the infected point of time after EBOLA infection, follow-up interval after vaccination time, efficacy, safety, adverse effects after vaccinations, and QA sheet (Additional file 6 : Data S1).

Data checking

Due to the expected human error and bias, we recommend a data checking step, in which every included article is compared with its counterpart in an extraction sheet by evidence photos, to detect mistakes in data. We advise assigning articles to 2–3 independent reviewers, ideally not the ones who performed the extraction of those articles. When resources are limited, each reviewer is assigned a different article than the one he extracted in the previous stage.

Statistical analysis

Investigators use different methods for combining and summarizing findings of included studies. Before analysis, there is an important step called cleaning of data in the extraction sheet, where the analyst organizes extraction sheet data in a form that can be read by analytical software. The analysis consists of 2 types namely qualitative and quantitative analysis. Qualitative analysis mostly describes data in SR studies, while quantitative analysis consists of two main types: MA and network meta-analysis (NMA). Subgroup, sensitivity, cumulative analyses, and meta-regression are appropriate for testing whether the results are consistent or not and investigating the effect of certain confounders on the outcome and finding the best predictors. Publication bias should be assessed to investigate the presence of missing studies which can affect the summary.

To illustrate basic meta-analysis, we provide an imaginary data for the research question about Ebola vaccine safety (in terms of adverse events, 14 days after injection) and immunogenicity (Ebola virus antibodies rise in geometric mean titer, 6 months after injection). Assuming that from searching and data extraction, we decided to do an analysis to evaluate Ebola vaccine “A” safety and immunogenicity. Other Ebola vaccines were not meta-analyzed because of the limited number of studies (instead, it will be included for narrative review). The imaginary data for vaccine safety meta-analysis can be accessed in Additional file 7 : Data S2. To do the meta-analysis, we can use free software, such as RevMan [ 22 ] or R package meta [ 23 ]. In this example, we will use the R package meta. The tutorial of meta package can be accessed through “General Package for Meta-Analysis” tutorial pdf [ 23 ]. The R codes and its guidance for meta-analysis done can be found in Additional file 5 : File S3.

For the analysis, we assume that the study is heterogenous in nature; therefore, we choose a random effect model. We did an analysis on the safety of Ebola vaccine A. From the data table, we can see some adverse events occurring after intramuscular injection of vaccine A to the subject of the study. Suppose that we include six studies that fulfill our inclusion criteria. We can do a meta-analysis for each of the adverse events extracted from the studies, for example, arthralgia, from the results of random effect meta-analysis using the R meta package.

From the results shown in Additional file 3 : Figure S3, we can see that the odds ratio (OR) of arthralgia is 1.06 (0.79; 1.42), p value = 0.71, which means that there is no association between the intramuscular injection of Ebola vaccine A and arthralgia, as the OR is almost one, and besides, the P value is insignificant as it is > 0.05.

In the meta-analysis, we can also visualize the results in a forest plot. It is shown in Fig. 3 an example of a forest plot from the simulated analysis.

figure 3

Random effect model forest plot for comparison of vaccine A versus placebo

From the forest plot, we can see six studies (A to F) and their respective OR (95% CI). The green box represents the effect size (in this case, OR) of each study. The bigger the box means the study weighted more (i.e., bigger sample size). The blue diamond shape represents the pooled OR of the six studies. We can see the blue diamond cross the vertical line OR = 1, which indicates no significance for the association as the diamond almost equalized in both sides. We can confirm this also from the 95% confidence interval that includes one and the p value > 0.05.

For heterogeneity, we see that I 2 = 0%, which means no heterogeneity is detected; the study is relatively homogenous (it is rare in the real study). To evaluate publication bias related to the meta-analysis of adverse events of arthralgia, we can use the metabias function from the R meta package (Additional file 4 : Figure S4) and visualization using a funnel plot. The results of publication bias are demonstrated in Fig. 4 . We see that the p value associated with this test is 0.74, indicating symmetry of the funnel plot. We can confirm it by looking at the funnel plot.

figure 4

Publication bias funnel plot for comparison of vaccine A versus placebo

Looking at the funnel plot, the number of studies at the left and right side of the funnel plot is the same; therefore, the plot is symmetry, indicating no publication bias detected.

Sensitivity analysis is a procedure used to discover how different values of an independent variable will influence the significance of a particular dependent variable by removing one study from MA. If all included study p values are < 0.05, hence, removing any study will not change the significant association. It is only performed when there is a significant association, so if the p value of MA done is 0.7—more than one—the sensitivity analysis is not needed for this case study example. If there are 2 studies with p value > 0.05, removing any of the two studies will result in a loss of the significance.

Double data checking

For more assurance on the quality of results, the analyzed data should be rechecked from full-text data by evidence photos, to allow an obvious check for the PI of the study.

Manuscript writing, revision, and submission to a journal

Writing based on four scientific sections: introduction, methods, results, and discussion, mostly with a conclusion. Performing a characteristic table for study and patient characteristics is a mandatory step which can be found as a template in Additional file 5 : Table S3.

After finishing the manuscript writing, characteristics table, and PRISMA flow diagram, the team should send it to the PI to revise it well and reply to his comments and, finally, choose a suitable journal for the manuscript which fits with considerable impact factor and fitting field. We need to pay attention by reading the author guidelines of journals before submitting the manuscript.

The role of evidence-based medicine in biomedical research is rapidly growing. SR/MAs are also increasing in the medical literature. This paper has sought to provide a comprehensive approach to enable reviewers to produce high-quality SR/MAs. We hope that readers could gain general knowledge about how to conduct a SR/MA and have the confidence to perform one, although this kind of study requires complex steps compared to narrative reviews.

Having the basic steps for conduction of MA, there are many advanced steps that are applied for certain specific purposes. One of these steps is meta-regression which is performed to investigate the association of any confounder and the results of the MA. Furthermore, there are other types rather than the standard MA like NMA and MA. In NMA, we investigate the difference between several comparisons when there were not enough data to enable standard meta-analysis. It uses both direct and indirect comparisons to conclude what is the best between the competitors. On the other hand, mega MA or MA of patients tend to summarize the results of independent studies by using its individual subject data. As a more detailed analysis can be done, it is useful in conducting repeated measure analysis and time-to-event analysis. Moreover, it can perform analysis of variance and multiple regression analysis; however, it requires homogenous dataset and it is time-consuming in conduct [ 24 ].

Conclusions

Systematic review/meta-analysis steps include development of research question and its validation, forming criteria, search strategy, searching databases, importing all results to a library and exporting to an excel sheet, protocol writing and registration, title and abstract screening, full-text screening, manual searching, extracting data and assessing its quality, data checking, conducting statistical analysis, double data checking, manuscript writing, revising, and submitting to a journal.

Availability of data and materials

Not applicable.

Abbreviations

Network meta-analysis

Principal investigator

Population, Intervention, Comparison, Outcome

Preferred Reporting Items for Systematic Review and Meta-analysis statement

Quality assessment

Sample, Phenomenon of Interest, Design, Evaluation, Research type

Systematic review and meta-analyses

Bello A, Wiebe N, Garg A, Tonelli M. Evidence-based decision-making 2: systematic reviews and meta-analysis. Methods Mol Biol (Clifton, NJ). 2015;1281:397–416.

Article   Google Scholar  

Khan KS, Kunz R, Kleijnen J, Antes G. Five steps to conducting a systematic review. J R Soc Med. 2003;96(3):118–21.

Rys P, Wladysiuk M, Skrzekowska-Baran I, Malecki MT. Review articles, systematic reviews and meta-analyses: which can be trusted? Polskie Archiwum Medycyny Wewnetrznej. 2009;119(3):148–56.

PubMed   Google Scholar  

Higgins JPT, Green S. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. 2011.

Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.

Methley AM, Campbell S, Chew-Graham C, McNally R, Cheraghi-Sohi S. PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews. BMC Health Serv Res. 2014;14:579.

Gross L, Lhomme E, Pasin C, Richert L, Thiebaut R. Ebola vaccine development: systematic review of pre-clinical and clinical studies, and meta-analysis of determinants of antibody response variability after vaccination. Int J Infect Dis. 2018;74:83–96.

Article   CAS   Google Scholar  

Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, ... Henry DA. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008.

Giang HTN, Banno K, Minh LHN, Trinh LT, Loc LT, Eltobgy A, et al. Dengue hemophagocytic syndrome: a systematic review and meta-analysis on epidemiology, clinical signs, outcomes, and risk factors. Rev Med Virol. 2018;28(6):e2005.

Morra ME, Altibi AMA, Iqtadar S, Minh LHN, Elawady SS, Hallab A, et al. Definitions for warning signs and signs of severe dengue according to the WHO 2009 classification: systematic review of literature. Rev Med Virol. 2018;28(4):e1979.

Morra ME, Van Thanh L, Kamel MG, Ghazy AA, Altibi AMA, Dat LM, et al. Clinical outcomes of current medical approaches for Middle East respiratory syndrome: a systematic review and meta-analysis. Rev Med Virol. 2018;28(3):e1977.

Vassar M, Atakpo P, Kash MJ. Manual search approaches used by systematic reviewers in dermatology. Journal of the Medical Library Association: JMLA. 2016;104(4):302.

Naunheim MR, Remenschneider AK, Scangas GA, Bunting GW, Deschler DG. The effect of initial tracheoesophageal voice prosthesis size on postoperative complications and voice outcomes. Ann Otol Rhinol Laryngol. 2016;125(6):478–84.

Rohatgi AJaiWa. Web Plot Digitizer. ht tp. 2014;2.

Hozo SP, Djulbegovic B, Hozo I. Estimating the mean and variance from the median, range, and the size of a sample. BMC Med Res Methodol. 2005;5(1):13.

Wan X, Wang W, Liu J, Tong T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med Res Methodol. 2014;14(1):135.

Van Rijkom HM, Truin GJ, Van’t Hof MA. A meta-analysis of clinical studies on the caries-inhibiting effect of fluoride gel treatment. Carries Res. 1998;32(2):83–92.

Higgins JP, Altman DG, Gotzsche PC, Juni P, Moher D, Oxman AD, et al. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928.

Tawfik GM, Tieu TM, Ghozy S, Makram OM, Samuel P, Abdelaal A, et al. Speech efficacy, safety and factors affecting lifetime of voice prostheses in patients with laryngeal cancer: a systematic review and network meta-analysis of randomized controlled trials. J Clin Oncol. 2018;36(15_suppl):e18031-e.

Wannemuehler TJ, Lobo BC, Johnson JD, Deig CR, Ting JY, Gregory RL. Vibratory stimulus reduces in vitro biofilm formation on tracheoesophageal voice prostheses. Laryngoscope. 2016;126(12):2752–7.

Sterne JAC, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355.

RevMan The Cochrane Collaboration %J Copenhagen TNCCTCC. Review Manager (RevMan). 5.0. 2008.

Schwarzer GJRn. meta: An R package for meta-analysis. 2007;7(3):40-45.

Google Scholar  

Simms LLH. Meta-analysis versus mega-analysis: is there a difference? Oral budesonide for the maintenance of remission in Crohn’s disease: Faculty of Graduate Studies, University of Western Ontario; 1998.

Download references

Acknowledgements

This study was conducted (in part) at the Joint Usage/Research Center on Tropical Disease, Institute of Tropical Medicine, Nagasaki University, Japan.

Author information

Authors and affiliations.

Faculty of Medicine, Ain Shams University, Cairo, Egypt

Gehad Mohamed Tawfik

Online research Club http://www.onlineresearchclub.org/

Gehad Mohamed Tawfik, Kadek Agus Surya Dila, Muawia Yousif Fadlelmola Mohamed, Dao Ngoc Hien Tam, Nguyen Dang Kien & Ali Mahmoud Ahmed

Pratama Giri Emas Hospital, Singaraja-Amlapura street, Giri Emas village, Sawan subdistrict, Singaraja City, Buleleng, Bali, 81171, Indonesia

Kadek Agus Surya Dila

Faculty of Medicine, University of Khartoum, Khartoum, Sudan

Muawia Yousif Fadlelmola Mohamed

Nanogen Pharmaceutical Biotechnology Joint Stock Company, Ho Chi Minh City, Vietnam

Dao Ngoc Hien Tam

Department of Obstetrics and Gynecology, Thai Binh University of Medicine and Pharmacy, Thai Binh, Vietnam

Nguyen Dang Kien

Faculty of Medicine, Al-Azhar University, Cairo, Egypt

Ali Mahmoud Ahmed

Evidence Based Medicine Research Group & Faculty of Applied Sciences, Ton Duc Thang University, Ho Chi Minh City, 70000, Vietnam

Nguyen Tien Huy

Faculty of Applied Sciences, Ton Duc Thang University, Ho Chi Minh City, 70000, Vietnam

Department of Clinical Product Development, Institute of Tropical Medicine (NEKKEN), Leading Graduate School Program, and Graduate School of Biomedical Sciences, Nagasaki University, 1-12-4 Sakamoto, Nagasaki, 852-8523, Japan

You can also search for this author in PubMed   Google Scholar

Contributions

NTH and GMT were responsible for the idea and its design. The figure was done by GMT. All authors contributed to the manuscript writing and approval of the final version.

Corresponding author

Correspondence to Nguyen Tien Huy .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:.

Figure S1. Risk of bias assessment graph of included randomized controlled trials. (TIF 20 kb)

Additional file 2:

Figure S2. Risk of bias assessment summary. (TIF 69 kb)

Additional file 3:

Figure S3. Arthralgia results of random effect meta-analysis using R meta package. (TIF 20 kb)

Additional file 4:

Figure S4. Arthralgia linear regression test of funnel plot asymmetry using R meta package. (TIF 13 kb)

Additional file 5:

Table S1. PRISMA 2009 Checklist. Table S2. Manipulation guides for online database searches. Table S3. Detailed search strategy for twelve database searches. Table S4. Baseline characteristics of the patients in the included studies. File S1. PROSPERO protocol template file. File S2. Extraction equations that can be used prior to analysis to get missed variables. File S3. R codes and its guidance for meta-analysis done for comparison between EBOLA vaccine A and placebo. (DOCX 49 kb)

Additional file 6:

Data S1. Extraction and quality assessment data sheets for EBOLA case example. (XLSX 1368 kb)

Additional file 7:

Data S2. Imaginary data for EBOLA case example. (XLSX 10 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Tawfik, G.M., Dila, K.A.S., Mohamed, M.Y.F. et al. A step by step guide for conducting a systematic review and meta-analysis with simulation data. Trop Med Health 47 , 46 (2019). https://doi.org/10.1186/s41182-019-0165-6

Download citation

Received : 30 January 2019

Accepted : 24 May 2019

Published : 01 August 2019

DOI : https://doi.org/10.1186/s41182-019-0165-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Tropical Medicine and Health

ISSN: 1349-4147

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

how to make meta analysis research

  • How it works

researchprospect post subheader

Meta-Analysis – Guide with Definition, Steps & Examples

Published by Owen Ingram at April 26th, 2023 , Revised On April 26, 2023

“A meta-analysis is a formal, epidemiological, quantitative study design that uses statistical methods to generalise the findings of the selected independent studies. “

Meta-analysis and systematic review are the two most authentic strategies in research. When researchers start looking for the best available evidence concerning their research work, they are advised to begin from the top of the evidence pyramid. The evidence available in the form of meta-analysis or systematic reviews addressing important questions is significant in academics because it informs decision-making.

What is Meta-Analysis  

Meta-analysis estimates the absolute effect of individual independent research studies by systematically synthesising or merging the results. Meta-analysis isn’t only about achieving a wider population by combining several smaller studies. It involves systematic methods to evaluate the inconsistencies in participants, variability (also known as heterogeneity), and findings to check how sensitive their findings are to the selected systematic review protocol.   

When Should you Conduct a Meta-Analysis?

Meta-analysis has become a widely-used research method in medical sciences and other fields of work for several reasons. The technique involves summarising the results of independent systematic review studies. 

The Cochrane Handbook explains that “an important step in a systematic review is the thoughtful consideration of whether it is appropriate to combine the numerical results of all, or perhaps some, of the studies. Such a meta-analysis yields an overall statistic (together with its confidence interval) that summarizes the effectiveness of an experimental intervention compared with a comparator intervention” (section 10.2).

A researcher or a practitioner should choose meta-analysis when the following outcomes are desirable. 

For generating new hypotheses or ending controversies resulting from different research studies. Quantifying and evaluating the variable results and identifying the extent of conflict in literature through meta-analysis is possible. 

To find research gaps left unfilled and address questions not posed by individual studies. Primary research studies involve specific types of participants and interventions. A review of these studies with variable characteristics and methodologies can allow the researcher to gauge the consistency of findings across a wider range of participants and interventions. With the help of meta-analysis, the reasons for differences in the effect can also be explored. 

To provide convincing evidence. Estimating the effects with a larger sample size and interventions can provide convincing evidence. Many academic studies are based on a very small dataset, so the estimated intervention effects in isolation are not fully reliable.

Elements of a Meta-Analysis

Deeks et al. (2019), Haidilch (2010), and Grant & Booth (2009) explored the characteristics, strengths, and weaknesses of conducting the meta-analysis. They are briefly explained below. 

Characteristics: 

  • A systematic review must be completed before conducting the meta-analysis because it provides a summary of the findings of the individual studies synthesised. 
  • You can only conduct a meta-analysis by synthesising studies in a systematic review. 
  • The studies selected for statistical analysis for the purpose of meta-analysis should be similar in terms of comparison, intervention, and population. 

Strengths: 

  • A meta-analysis takes place after the systematic review. The end product is a comprehensive quantitative analysis that is complicated but reliable. 
  • It gives more value and weightage to existing studies that do not hold practical value on their own. 
  • Policy-makers and academicians cannot base their decisions on individual research studies. Meta-analysis provides them with a complex and solid analysis of evidence to make informed decisions. 

Criticisms: 

  • The meta-analysis uses studies exploring similar topics. Finding similar studies for the meta-analysis can be challenging.
  • When and if biases in the individual studies or those related to reporting and specific research methodologies are involved, the meta-analysis results could be misleading.

Steps of Conducting the Meta-Analysis 

The process of conducting the meta-analysis has remained a topic of debate among researchers and scientists. However, the following 5-step process is widely accepted. 

Step 1: Research Question

The first step in conducting clinical research involves identifying a research question and proposing a hypothesis . The potential clinical significance of the research question is then explained, and the study design and analytical plan are justified.

Step 2: Systematic Review 

The purpose of a systematic review (SR) is to address a research question by identifying all relevant studies that meet the required quality standards for inclusion. While established journals typically serve as the primary source for identified studies, it is important to also consider unpublished data to avoid publication bias or the exclusion of studies with negative results.

While some meta-analyses may limit their focus to randomized controlled trials (RCTs) for the sake of obtaining the highest quality evidence, other experimental and quasi-experimental studies may be included if they meet the specific inclusion/exclusion criteria established for the review.

Step 3: Data Extraction

After selecting studies for the meta-analysis, researchers extract summary data or outcomes, as well as sample sizes and measures of data variability for both intervention and control groups. The choice of outcome measures depends on the research question and the type of study, and may include numerical or categorical measures.

For instance, numerical means may be used to report differences in scores on a questionnaire or changes in a measurement, such as blood pressure. In contrast, risk measures like odds ratios (OR) or relative risks (RR) are typically used to report differences in the probability of belonging to one category or another, such as vaginal birth versus cesarean birth.

Step 4: Standardisation and Weighting Studies

After gathering all the required data, the fourth step involves computing suitable summary measures from each study for further examination. These measures are typically referred to as Effect Sizes and indicate the difference in average scores between the control and intervention groups. For instance, it could be the variation in blood pressure changes between study participants who used drug X and those who used a placebo.

Since the units of measurement often differ across the included studies, standardization is necessary to create comparable effect size estimates. Standardization is accomplished by determining, for each study, the average score for the intervention group, subtracting the average score for the control group, and dividing the result by the relevant measure of variability in that dataset.

In some cases, the results of certain studies must carry more significance than others. Larger studies, as measured by their sample sizes, are deemed to produce more precise estimates of effect size than smaller studies. Additionally, studies with less variability in data, such as smaller standard deviation or narrower confidence intervals, are typically regarded as higher quality in study design. A weighting statistic that aims to incorporate both of these factors, known as inverse variance, is commonly employed.

Step 5: Absolute Effect Estimation

The ultimate step in conducting a meta-analysis is to choose and utilize an appropriate model for comparing Effect Sizes among diverse studies. Two popular models for this purpose are the Fixed Effects and Random Effects models. The Fixed Effects model relies on the premise that each study is evaluating a common treatment effect, implying that all studies would have estimated the same Effect Size if sample variability were equal across all studies.

Conversely, the Random Effects model posits that the true treatment effects in individual studies may vary from each other, and endeavors to consider this additional source of interstudy variation in Effect Sizes. The existence and magnitude of this latter variability is usually evaluated within the meta-analysis through a test for ‘heterogeneity.’

Forest Plot

The results of a meta-analysis are often visually presented using a “Forest Plot”. This type of plot displays, for each study, included in the analysis, a horizontal line that indicates the standardized Effect Size estimate and 95% confidence interval for the risk ratio used. Figure A provides an example of a hypothetical Forest Plot in which drug X reduces the risk of death in all three studies.

However, the first study was larger than the other two, and as a result, the estimates for the smaller studies were not statistically significant. This is indicated by the lines emanating from their boxes, including the value of 1. The size of the boxes represents the relative weights assigned to each study by the meta-analysis. The combined estimate of the drug’s effect, represented by the diamond, provides a more precise estimate of the drug’s effect, with the diamond indicating both the combined risk ratio estimate and the 95% confidence interval limits.

odds ratio

Figure-A: Hypothetical Forest Plot

Relevance to Practice and Research 

  Evidence Based Nursing commentaries often include recently published systematic reviews and meta-analyses, as they can provide new insights and strengthen recommendations for effective healthcare practices. Additionally, they can identify gaps or limitations in current evidence and guide future research directions.

The quality of the data available for synthesis is a critical factor in the strength of conclusions drawn from meta-analyses, and this is influenced by the quality of individual studies and the systematic review itself. However, meta-analysis cannot overcome issues related to underpowered or poorly designed studies.

Therefore, clinicians may still encounter situations where the evidence is weak or uncertain, and where higher-quality research is required to improve clinical decision-making. While such findings can be frustrating, they remain important for informing practice and highlighting the need for further research to fill gaps in the evidence base.

Methods and Assumptions in Meta-Analysis 

Ensuring the credibility of findings is imperative in all types of research, including meta-analyses. To validate the outcomes of a meta-analysis, the researcher must confirm that the research techniques used were accurate in measuring the intended variables. Typically, researchers establish the validity of a meta-analysis by testing the outcomes for homogeneity or the degree of similarity between the results of the combined studies.

Homogeneity is preferred in meta-analyses as it allows the data to be combined without needing adjustments to suit the study’s requirements. To determine homogeneity, researchers assess heterogeneity, the opposite of homogeneity. Two widely used statistical methods for evaluating heterogeneity in research results are Cochran’s-Q and I-Square, also known as I-2 Index.

Difference Between Meta-Analysis and Systematic Reviews

Meta-analysis and systematic reviews are both research methods used to synthesise evidence from multiple studies on a particular topic. However, there are some key differences between the two.

Systematic reviews involve a comprehensive and structured approach to identifying, selecting, and critically appraising all available evidence relevant to a specific research question. This process involves searching multiple databases, screening the identified studies for relevance and quality, and summarizing the findings in a narrative report.

Meta-analysis, on the other hand, involves using statistical methods to combine and analyze the data from multiple studies, with the aim of producing a quantitative summary of the overall effect size. Meta-analysis requires the studies to be similar enough in terms of their design, methodology, and outcome measures to allow for meaningful comparison and analysis.

Therefore, systematic reviews are broader in scope and summarize the findings of all studies on a topic, while meta-analyses are more focused on producing a quantitative estimate of the effect size of an intervention across multiple studies that meet certain criteria. In some cases, a systematic review may be conducted without a meta-analysis if the studies are too diverse or the quality of the data is not sufficient to allow for statistical pooling.

Software Packages For Meta-Analysis

Meta-analysis can be done through software packages, including free and paid options. One of the most commonly used software packages for meta-analysis is RevMan by the Cochrane Collaboration.

Assessing the Quality of Meta-Analysis 

Assessing the quality of a meta-analysis involves evaluating the methods used to conduct the analysis and the quality of the studies included. Here are some key factors to consider:

  • Study selection: The studies included in the meta-analysis should be relevant to the research question and meet predetermined criteria for quality.
  • Search strategy: The search strategy should be comprehensive and transparent, including databases and search terms used to identify relevant studies.
  • Study quality assessment: The quality of included studies should be assessed using appropriate tools, and this assessment should be reported in the meta-analysis.
  • Data extraction: The data extraction process should be systematic and clearly reported, including any discrepancies that arose.
  • Analysis methods: The meta-analysis should use appropriate statistical methods to combine the results of the included studies, and these methods should be transparently reported.
  • Publication bias: The potential for publication bias should be assessed and reported in the meta-analysis, including any efforts to identify and include unpublished studies.
  • Interpretation of results: The results should be interpreted in the context of the study limitations and the overall quality of the evidence.
  • Sensitivity analysis: Sensitivity analysis should be conducted to evaluate the impact of study quality, inclusion criteria, and other factors on the overall results.

Overall, a high-quality meta-analysis should be transparent in its methods and clearly report the included studies’ limitations and the evidence’s overall quality.

Hire an Expert Writer

Orders completed by our expert writers are

  • Formally drafted in an academic style
  • Free Amendments and 100% Plagiarism Free – or your money back!
  • 100% Confidential and Timely Delivery!
  • Free anti-plagiarism report
  • Appreciated by thousands of clients. Check client reviews

Hire an Expert Writer

Examples of Meta-Analysis

  • STANLEY T.D. et JARRELL S.B. (1989), « Meta-regression analysis : a quantitative method of literature surveys », Journal of Economics Surveys, vol. 3, n°2, pp. 161-170.
  • DATTA D.K., PINCHES G.E. et NARAYANAN V.K. (1992), « Factors influencing wealth creation from mergers and acquisitions : a meta-analysis », Strategic Management Journal, Vol. 13, pp. 67-84.
  • GLASS G. (1983), « Synthesising empirical research : Meta-analysis » in S.A. Ward and L.J. Reed (Eds), Knowledge structure and use : Implications for synthesis and interpretation, Philadelphia : Temple University Press.
  • WOLF F.M. (1986), Meta-analysis : Quantitative methods for research synthesis, Sage University Paper n°59.
  • HUNTER J.E., SCHMIDT F.L. et JACKSON G.B. (1982), « Meta-analysis : cumulating research findings across studies », Beverly Hills, CA : Sage.

Frequently Asked Questions

What is a meta-analysis in research.

Meta-analysis is a statistical method used to combine results from multiple studies on a specific topic. By pooling data from various sources, meta-analysis can provide a more precise estimate of the effect size of a treatment or intervention and identify areas for future research.

Why is meta-analysis important?

Meta-analysis is important because it combines and summarizes results from multiple studies to provide a more precise and reliable estimate of the effect of a treatment or intervention. This helps clinicians and policymakers make evidence-based decisions and identify areas for further research.

What is an example of a meta-analysis?

A meta-analysis of studies evaluating physical exercise’s effect on depression in adults is an example. Researchers gathered data from 49 studies involving a total of 2669 participants. The studies used different types of exercise and measures of depression, which made it difficult to compare the results.

Through meta-analysis, the researchers calculated an overall effect size and determined that exercise was associated with a statistically significant reduction in depression symptoms. The study also identified that moderate-intensity aerobic exercise, performed three to five times per week, was the most effective. The meta-analysis provided a more comprehensive understanding of the impact of exercise on depression than any single study could provide.

What is the definition of meta-analysis in clinical research?

Meta-analysis in clinical research is a statistical technique that combines data from multiple independent studies on a particular topic to generate a summary or “meta” estimate of the effect of a particular intervention or exposure.

This type of analysis allows researchers to synthesise the results of multiple studies, potentially increasing the statistical power and providing more precise estimates of treatment effects. Meta-analyses are commonly used in clinical research to evaluate the effectiveness and safety of medical interventions and to inform clinical practice guidelines.

Is meta-analysis qualitative or quantitative?

Meta-analysis is a quantitative method used to combine and analyze data from multiple studies. It involves the statistical synthesis of results from individual studies to obtain a pooled estimate of the effect size of a particular intervention or treatment. Therefore, meta-analysis is considered a quantitative approach to research synthesis.

You May Also Like

Experimental research refers to the experiments conducted in the laboratory or under observation in controlled conditions. Here is all you need to know about experimental research.

In historical research, a researcher collects and analyse the data, and explain the events that occurred in the past to test the truthfulness of observations.

Baffled by the concept of reliability and validity? Reliability refers to the consistency of measurement. Validity refers to the accuracy of measurement.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

APS

Introduction to Meta-Analysis: A Guide for the Novice

  • Experimental Psychology
  • Methodology
  • Statistical Analysis

Free Meta-Analysis Software and Macros

MetaXL (Version 2.0)

RevMan (Version 5.3)

Meta-Analysis Macros for SAS, SPSS, and Stata

Opposing theories and disparate findings populate the field of psychology; scientists must interpret the results of any single study in the context of its limitations. Meta-analysis is a robust tool that can help researchers overcome these challenges by assimilating data across studies identified through a literature review. In other words, rather than surveying participants, a meta-analysis surveys studies. The goal is to calculate the direction and/or magnitude of an effect across all relevant studies, both published and unpublished. Despite the utility of this statistical technique, it can intimidate a beginner who has no formal training in the approach. However, any motivated researcher with a statistics background can complete a meta-analysis. This article provides an overview of the main steps of basic meta-analysis.

Meta-analysis has many strengths. First, meta-analysis provides an organized approach for handling a large number of studies. Second, the process is systematic and documented in great detail, which allows readers to evaluate the researchers’ decisions and conclusions. Third, meta-analysis allows researchers to examine an effect within a collection of studies in a more sophisticated manner than a qualitative summary.

However, meta-analysis also involves numerous challenges. First, it consumes a great deal of time and requires a great deal of effort. Second, meta-analysis has been criticized for aggregating studies that are too different (i.e., mixing “apples and oranges”). Third, some scientists argue that the objective coding procedure used in meta-analysis ignores the context of each individual study, such as its methodological rigor. Fourth, when a researcher includes low-quality studies in a meta-analysis, the limitations of these studies impact the mean effect size (i.e., “garbage in, garbage out”). As long as researchers are aware of these issues and consider the potential influence of these limitations on their findings, meta-analysis can serve as a powerful and informative approach to help us draw conclusions from a large literature base.

  Identifying the Right Question

Similar to any research study, a meta-analysis begins with a research question. Meta-analysis can be used in any situation where the goal is to summarize quantitative findings from empirical studies. It can be used to examine different types of effects, including prevalence rates (e.g., percentage of rape survivors with depression), growth rates (e.g., changes in depression from pretreatment to posttreatment), group differences (e.g., comparison of treatment and placebo groups on depression), and associations between variables (e.g., correlation between depression and self-esteem). To select the effect metric, researchers should consider the statistical form of the results in the literature. Any given meta-analysis can focus on only one metric at a time. While selecting a research question, researchers should think about the size of the literature base and select a manageable topic. At the same time, they should make sure the number of existing studies is large enough to warrant a meta-analysis.

Determining Eligibility Criteria

After choosing a relevant question, researchers should then identify and explicitly state the types of studies to be included. These criteria ensure that the studies overlap enough in topic and methodology that it makes sense to combine them. The inclusion and exclusion criteria depend on the specific research question and characteristics of the literature. First, researchers can specify relevant participant characteristics, such as age or gender. Second, researchers can identify the key variables that must be included in the study. Third, the language, date range, and types (e.g., peer-reviewed journal articles) of studies should be specified. Fourth, pertinent study characteristics, such as experimental design, can be defined. Eligibility criteria should be clearly documented and relevant to the research question. Specifying the eligibility criteria prior to conducting the literature search allows the researcher to perform a more targeted search and reduces the number of irrelevant studies. Eligibility criteria can also be revised later, because the researcher may become aware of unforeseen issues during the literature search stage.

Conducting a Literature Search and Review

The next step is to identify, retrieve, and review published and unpublished studies. The goal is to be exhaustive; however, being too broad can result in an overwhelming number of studies to review.

Online databases, such as PsycINFO and PubMed, compile millions of searchable records, including peer-reviewed journals, books, and dissertations.  In addition, through these electronic databases, researchers can access the full text of many of the records. It is important that researchers carefully choose search terms and databases, because these decisions impact the breadth of the review. Researchers who aren’t familiar with the research topic should consult with an expert.

Additional ways to identify studies include searching conference proceedings, examining reference lists of relevant studies, and directly contacting researchers. After the literature search is completed, researchers must evaluate each study for inclusion using the eligibility criteria. At least a subset of the studies should be reviewed by two individuals (i.e., double coded) to serve as a reliability check. It is vital that researchers keep meticulous records of this process; for publication, a flow diagram is typically required to depict the search and results. Researchers should allow adequate time, because this step can be quite time consuming.

Calculating Effect Size

Next, researchers calculate an effect size for each eligible study. The effect size is the key component of a meta-analysis because it encodes the results in a numeric value that can then be aggregated. Examples of commonly used effect size metrics include Cohen’s d (i.e., group differences) and Pearson’s r (i.e., association between two variables). The effect size metric is based on the statistical form of the results in the literature and the research question. Because studies that include more participants provide more accurate estimates of an effect than those that include fewer participants, it is important to also calculate the precision of the effect size (e.g., standard error).

Meta-analysis software guides researchers through the calculation process by requesting the necessary information for the specified effect size metric. I have identified some potentially useful resources and programs below. Although meta-analysis software makes effect size calculations simple, it is good practice for researchers to understand what computations are being used.

The effect size and precision of each individual study are aggregated into a summary statistic, which can be done with meta-analysis software. Researchers should confirm that the effect sizes are independent of each other (i.e., no overlap in participants). Additionally, researchers must select either a fixed effects model (i.e., assumes all studies share one true effect size) or a random effects model (i.e., assumes the true effect size varies among studies). The random effects model is typically preferred when the studies have been conducted using different methodologies. Depending on the software, additional specifications or adjustments may be possible.

During analysis, the effect sizes of the included studies are weighted by their precision (e.g., inverse of the sampling error variance) and the mean is calculated. The mean effect size represents the direction and/or magnitude of the effect summarized across all eligible studies. This statistic is typically accompanied by an estimate of its precision (e.g., confidence interval) and p -value representing statistical significance. Forest plots are a common way of displaying meta-analysis results.

Depending on the situation, follow-up analyses may be advised. Researchers can quantify heterogeneity (e.g., Q, t 2 , I 2 ), which is a measure of the variation among the effect sizes of included studies. Moderator variables, such as the quality of the studies or age of participants, may be included to examine sources of heterogeneity. Because published studies may be biased towards significant effects, it is important to evaluate the impact of publication bias (e.g., funnel plot, Rosenthal’s Fail-safe N ). Sensitivity analysis can indicate how the results of the meta-analysis would change if one study were excluded from the analysis.

If properly conducted and clearly documented, meta-analyses often make significant contributions to a specific field of study and therefore stand a good chance of being published in a top-tier journal. The biggest obstacle for most researchers who attempt meta-analysis for the first time is the amount of work and organization required for proper execution, rather than their level of statistical knowledge.

Recommended Resources

Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2009). Introduction to meta-analysis . Hoboken, NJ: Wiley.

Cooper, H., Hedges, L., & Valentine, J. (2009). The handbook of research synthesis and meta-analysis (2nd ed.). New York, NY: Russell Sage Foundation.

Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis . Thousand Oaks, California: Sage Publications.

Rothstein, H. R., Sutton, A. J., & Borenstein, M. (2005). Publication bias in meta-analysis: Prevention, assessment, and adjustments . Hoboken, NJ: Wiley.

how to make meta analysis research

It is nice to see the software we developed (MetaXL) being mentioned. However, the reason we developed the software and made publicly available for free is that we disagree with an important statement in the review. This statement is “researchers must select either a fixed effects model (i.e., assumes all studies share one true effect size) or a random effects model (i.e., assumes the true effect size varies among studies)”. We developed MetaXL because we think that the random effects model is seriously flawed and should be abandoned. We implemented in MetaXL two additional models, the Inverse Variance heterogeneity model and the Quality Effects model, both meant to be used in case of heterogeneity. More details are in the User Guide, available from the Epigear website.

how to make meta analysis research

Thank you very much! The article really helped me to start understanding what meta-analysis is about

how to make meta analysis research

thank you for sharing this article; it is very helpful.But I am still confused about how to remove quickly duplicates papers without wasting time if we more than 10 000 papers?

how to make meta analysis research

Not being one to blow my own horn all the time, but I would like to suggest that you may want to take a look at a web based application I wrote that conducts a Hunter-Schmidt type meta-analysis. The Meta-Analysis is very easy to use and corrects for sampling and error variance due to reliability. It also exports the results in excel format. You can also export the dataset effect sizes (r, d, and z), sample sizes and reliability information in excel as well.

http://www.lyonsmorris.com/lyons/MaCalc/index.cfm

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines .

Please login with your APS account to comment.

About the Author

Laura C. Wilson is an Assistant Professor in the Psychology Department at the University of Mary Washington. She earned a PhD in Clinical Psychology from Virginia Tech and MA in General/Experimental Psychology from The College of William & Mary. Her main area of expertise is post-trauma functioning, particularly in survivors of sexual violence or mass trauma (e.g., terrorism, mass shootings, combat). She also has interest in predictors of violence and aggression, including psychophysiological and personality factors.

how to make meta analysis research

Careers Up Close: Joel Anderson on Gender and Sexual Prejudices, the Freedoms of Academic Research, and the Importance of Collaboration

Joel Anderson, a senior research fellow at both Australian Catholic University and La Trobe University, researches group processes, with a specific interest on prejudice, stigma, and stereotypes.

how to make meta analysis research

Experimental Methods Are Not Neutral Tools

Ana Sofia Morais and Ralph Hertwig explain how experimental psychologists have painted too negative a picture of human rationality, and how their pessimism is rooted in a seemingly mundane detail: methodological choices. 

APS Fellows Elected to SEP

In addition, an APS Rising Star receives the society’s Early Investigator Award.

Privacy Overview

CookieDurationDescription
__cf_bm30 minutesThis cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
CookieDurationDescription
AWSELBCORS5 minutesThis cookie is used by Elastic Load Balancing from Amazon Web Services to effectively balance load on the servers.
CookieDurationDescription
at-randneverAddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT2 yearsYouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc1 year 27 daysSet by addthis.com to determine the usage of addthis.com service.
_ga2 yearsThe _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_3507334_11 minuteSet by Google to distinguish users.
_gid1 dayInstalled by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CookieDurationDescription
loc1 year 27 daysAddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE5 months 27 daysA cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSCsessionYSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devicesneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-idneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextIdneverThis cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requestsneverThis cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 05 January 2022

The 5 min meta-analysis: understanding how to read and interpret a forest plot

  • Yaping Chang   ORCID: orcid.org/0000-0002-0549-5087 1 , 2 ,
  • Mark R. Phillips   ORCID: orcid.org/0000-0003-0923-261X 1 , 3 ,
  • Robyn H. Guymer   ORCID: orcid.org/0000-0002-9441-4356 4 , 5 ,
  • Lehana Thabane   ORCID: orcid.org/0000-0003-0355-9734 1 , 6 ,
  • Mohit Bhandari   ORCID: orcid.org/0000-0001-9608-4808 1 , 2 , 3 &
  • Varun Chaudhary   ORCID: orcid.org/0000-0002-9988-4146 1 , 3

on behalf of the R.E.T.I.N.A. study group

Eye volume  36 ,  pages 673–675 ( 2022 ) Cite this article

89k Accesses

23 Citations

247 Altmetric

Metrics details

  • Outcomes research

A Correction to this article was published on 08 May 2023

This article has been updated

Introduction

In the evidence-based practice of ophthalmology, we often read systematic reviews. Why do we bother about systematic reviews? In science, new findings are built cumulatively on multiple and repeatable experiments [ 1 ]. In clinical research, rarely is one study definitive. Using a comprehensive and cumulative approach, systematic reviews synthesize results of individual studies to address a focused question that can guide important decisions, when well-conducted and current [ 2 , 3 , 4 , 5 ].

A systematic review may or may not include a meta-analysis, which provides a statistical approach to quantitatively combine results of studies eligible for a systematic review topic [ 2 , 3 , 4 , 5 ]. Such pooling also improves precision [ 2 , 4 , 5 ]. A “forest plot” is a form of graphical result presentation [ 2 , 4 ]. In this editorial, we start with introducing the anatomy of a forest plot and present 5 tips for understanding the results of a meta-analysis.

Anatomy of a forest plot

We demonstrate the components of a typical forest plot in Fig.  1 , using a topic from a recently published systematic review [ 6 ] but replaced with mockup numbers in analysis. In this example, four randomized trials (Studies #1 to #4) are included to compare a new surgical approach with the conventional surgery for patients with pseudoexfoliation glaucoma. Outcomes of intraocular pressure (IOP) and incidence of minor zonulolysis are evaluated at 1-year follow-up after surgery.

figure 1

A Example of a continuous outcome measure: Intraocular pressure assessed with mean difference; B Example of a dichotomous outcome measure: Incidence of minor zonulolysis, at 1 year after surgery. Tau, the estimated standard deviation of underlying effects across studies (Tau 2 is only displayed in the random model). Chi 2 , the value of Chi-square test for heterogeneity. Random, random model (an analysis model in meta-analysis).

In a forest plot, the box in the middle of each horizontal line (confidence interval, CI) represents the point estimate of the effect for a single study. The size of the box is proportional to the weight of the study in relation to the pooled estimate. The diamond represents the overall effect estimate of the meta-analysis. The placement of the center of the diamond on the x-axis represents the point estimate, and the width of the diamond represents the 95% CI around the point estimate of the pooled effect.

Tip 1: Know the type of outcome than

There are differences in a forest plot depending on the type of outcomes. For a continuous outcome, the mean, standard deviation and number of patients are provided in Columns 2 and 3. A mean difference (MD, the absolute difference between the mean scores in the two groups) with its 95% CI is presented in Column 5 (Fig.  1A ). Some examples of continuous outcomes include IOP (mmHg), visual acuity in rank values, subfoveal choroidal thickness (μm) and cost.

For a dichotomous outcome, the number of events and number of patients, and a risk ratio (RR), also called relative risk, along with its 95% CI are presented in Columns 2,3 and 5 (Fig.  1B ). Examples of dichotomous outcomes include incidence of any adverse events, zonulolysis, capsulotomy and patients’ needing of medication (yes or no).

Tip 2: Understand the weight in a forest plot

Weights (Column 4) are assigned to individual studies according to their contributions to the pooled estimate, by calculating the inverse of the variance of the treatment effect, i.e., one over the square of the standard error. The weight is closely related to a study’s sample size [ 2 ]. In our example, Study #4 consisting of the largest sample size of 114 patients (57 in each group) has the greatest weight, 42.2% in IOP result (Figs.  1A ) and 49.9% in zonulolysis result (Fig.  1B ).

Tip 3: Pay attention to heterogeneity

Heterogeneity represents variation in results that might relate to population, intervention, comparator, outcome measure, risk of bias, study method, healthcare systems and other factors of the individual studies in a meta-analysis [ 2 , 7 ]. If no important heterogeneity is observed, we can trust the pooled estimate more because most or all the individual studies are telling the same answer [ 7 ].

We can identify heterogeneity by visual inspection of similarity of point estimates, overlapping of confidence intervals, and looking at the results of statistical heterogeneity tests outlined at near the bottom of a forest plot [ 2 , 7 ]. When more similarity of point estimates and more overlapping of confidence intervals are observed, it means less heterogeneity [ 2 , 7 ]. The P value generated by the Chi-squared test is the probability of the null hypothesis that there is no heterogeneity between studies. When P  < 0.10 is shown, we reject this null hypothesis and consider that there is heterogeneity across the studies [ 2 ]. P value of 0.10 is typically used for the test of heterogeneity because of the lack of power for the test [ 2 ]. The I 2 statistic ranging from 0 to 100%, indicates the magnitude of heterogeneity. Greater I 2 indicates more heterogeneity. The I 2 below 40% may suggest not important heterogeneity; while the I 2 over 75% may suggest considerable heterogeneity [ 2 ].

For example in Fig.  1A , the point estimate of Study #1 (i.e., the between-group difference of mean IOP, 2.60 mmHg) is different from the point estimates of Studies #2 to #4 (0.20, 0.60 and 0.90 mmHg, respectively). By virtual observation of 95% CI (the horizontal lines), the 95% of Study #1 just partly overlaps with the other studies’. P -value for heterogeneity of 0.12 is relatively small but still >0.05. The I 2 of 49% indicates that a moderate heterogeneity may present [ 2 ]. In Fig.  1B , the 95% CIs of all the four studies largely overlap. The large P value for heterogeneity of 0.74 and the I 2 of 0% both indicate that no important heterogeneity is detected.

Tip 4: Understand subgroups

When heterogeneity is detected, which may indicate the unexplained differences between study estimates, using a subgroup analysis is one of the approaches to explain heterogeneity [ 2 ]. In our example, Study #3 only studied patients who were equal and below 65 years; Studies #1, 2, and 4 also reported IOP for patients of the two different age groups separately (Fig.  2 ). We can find the pooled effects of the two subgroups respectively in the forest plot: 1.1.1 over 65 years, the overall effect favours the new surgery (Section A in Fig.  2 , subtotal MD and 95% CI does not include the line of no effect, P value for overall effect <0.00001, I 2  = 0); and 1.1.2 equal and below 65 years, there is no difference between the conventional and new surgeries (Section B in Fig.  2 , subtotal MD and 95% CI includes the line of no effect, P value for overall effect is 0.10, I 2  = 0%).

figure 2

Subgroup results of IOP by age groups.

There is a subgroup effect by patients' age groups. We can find the result of test for subgroup difference in the last row of the forest plot (Section C in Fig.  2 ): P value of 0.001 and I 2 of 90.1% indicate a significant difference in treatment effects between the subgroups of patients of older or younger age.

Tip 5: Interpret the results in plain language

In our example, lower IOP and fewer zonulolysis are favoured outcomes. The statistical significance of a pooled estimate can be detected by visual inspection of the diamond (if the diamond width includes the line of no effect, there is no statistical difference between the two groups) or checking the p-value in the last row of a forest plot, “Test for overall effect” ( P  < 0.05 indicates a significant difference).

In plain language, for patients with pseudoexfoliation glaucoma, the overall effect for IOP is in favour of the new surgery. More specifically, the new surgery is associated with the lower IOP compared to the conventional surgery 1 year after surgery (mean difference, 0.92 mmHg; 95% CI, 0.21 to 1.63 mmHg) with some concerns of heterogeneity and risk of bias. There is no difference in the incidence of minor zonulolysis between new and conventional surgeries.

In summary, knowing the structure of a forest plot, types of outcome measures, heterogeneity and risk of bias assessments will help us to understand the results of a systematic review. With more practice, the readers will gain more confidence in interpreting a forest plot and making application of systematic reviews’ results in your clinical practice.

Change history

08 may 2023.

A Correction to this paper has been published: https://doi.org/10.1038/s41433-023-02493-0

Zeigler D. Evolution and the cumulative nature of science. Evolution: Education Outreach. 2012;5:585–8. https://doi.org/10.1007/s12052-012-0454-6 .

Article   Google Scholar  

Higgins JP, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane handbook for systematic reviews of interventions. John Wiley & Sons; 2019.

Haynes RB. Clinical epidemiology: how to do clinical practice research. Lippincott williams & wilkins; 2012.

Murad MH, Montori VM, Ioannidis JP, Neumann I, Hatala R, Meade MO, et al. Understanding and applying the results of a systematic review and meta-analysis. User’s guides to the medical literature: a manual for evidence-based clinical practice. 3rd edn. New York: JAMA/McGraw-Hill Global. 2015.

Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D, et al. GRADE guidelines 6. Rating the quality of evidence—imprecision. J Clin Epidemiol. 2011;64:1283–93. https://doi.org/10.1016/j.jclinepi.2011.01.012 .

Article   PubMed   Google Scholar  

Pose-Bazarra S, López-Valladares MJ, López-de-Ullibarri I, Azuara-Blanco A. Surgical and laser interventions for pseudoexfoliation glaucoma systematic review of randomized controlled trials. Eye. 2021;35:1551–61. https://doi.org/10.1038/s41433-021-01424-1 .

Article   PubMed   PubMed Central   Google Scholar  

Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 7. Rating the quality of evidence—inconsistency. J Clinl Epidemiol. 2011;64:1294–302. https://doi.org/10.1016/j.jclinepi.2011.03.017 .

Download references

Author information

Authors and affiliations.

Department of Health Research Methods, Evidence & Impact, McMaster University, Hamilton, ON, Canada

Yaping Chang, Mark R. Phillips, Lehana Thabane, Mohit Bhandari & Varun Chaudhary

OrthoEvidence Inc., Burlington, ON, Canada

Yaping Chang & Mohit Bhandari

Department of Surgery, McMaster University, Hamilton, ON, Canada

Mark R. Phillips, Mohit Bhandari & Varun Chaudhary

Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, Australia

Robyn H. Guymer

Department of Surgery, (Ophthalmology), The University of Melbourne, Melbourne, Australia

Biostatistics Unit, St. Joseph’s Healthcare Hamilton, Hamilton, ON, Canada

Lehana Thabane

Retina Consultants of Texas (Retina Consultants of America), Houston, TX, USA

Charles C. Wykoff

Blanton Eye Institute, Houston Methodist Hospital, Houston, TX, USA

NIHR Moorfields Biomedical Research Centre, Moorfields Eye Hospital, London, UK

Sobha Sivaprasad

Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Peter Kaiser

Retinal Disorders and Ophthalmic Genetics, Stein Eye Institute, University of California, Los Angeles, CA, USA

David Sarraf

Department of Ophthalmology, Mayo Clinic, Rochester, MN, USA

Sophie Bakri

The Retina Service at Wills Eye Hospital, Philadelphia, PA, USA

Sunir J. Garg

Center for Ophthalmic Bioinformatics, Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Rishi P. Singh

Cleveland Clinic Lerner College of Medicine, Cleveland, OH, USA

Department of Ophthalmology, University of Bonn, Boon, Germany

Frank G. Holz

Singapore Eye Research Institute, Singapore, Singapore

Tien Y. Wong

Singapore National Eye Centre, Duke-NUD Medical School, Singapore, Singapore

You can also search for this author in PubMed   Google Scholar

  • Varun Chaudhary
  • , Mohit Bhandari
  • , Charles C. Wykoff
  • , Sobha Sivaprasad
  • , Lehana Thabane
  • , Peter Kaiser
  • , David Sarraf
  • , Sophie Bakri
  • , Sunir J. Garg
  • , Rishi P. Singh
  • , Frank G. Holz
  • , Tien Y. Wong
  •  & Robyn H. Guymer

Contributions

YC was responsible for the conception of idea, writing of manuscript and review of manuscript. MRP was responsible for the conception of idea, and review of the manuscript. VC was responsible for conception of idea, and review of manuscript. MB was responsible for conception of idea, and review of manuscript. RHG was responsible for critical review and feedback on manuscript. LT was responsible for critical review and feedback on manuscript.

Corresponding author

Correspondence to Varun Chaudhary .

Ethics declarations

Competing interests.

YC: Nothing to disclose. MRP: Nothing to disclose. RHG: Advisory boards: Bayer, Novartis, Apellis, Roche, Genentech Inc. LT: Nothing to disclose. MB: Research funds: Pendopharm, Bioventus, Acumed – unrelated to this study. VC: Advisory Board Member: Alcon, Roche, Bayer, Novartis; Grants: Bayer, Novartis – unrelated to this study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: in part 'TIP 4: UNDERSTAND SUBGROUPS', the phrase "In our example, Study #3 only studied patients over 65 years" was corrected to read "In our example, Study #3 only studied patients who were equal and below 65 years".

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Chang, Y., Phillips, M.R., Guymer, R.H. et al. The 5 min meta-analysis: understanding how to read and interpret a forest plot. Eye 36 , 673–675 (2022). https://doi.org/10.1038/s41433-021-01867-6

Download citation

Received : 11 November 2021

Revised : 12 November 2021

Accepted : 16 November 2021

Published : 05 January 2022

Issue Date : April 2022

DOI : https://doi.org/10.1038/s41433-021-01867-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Estimate the burden of malnutrition among children with cerebral palsy in sub-saharan africa: a systematic review with meta-analysis.

  • Ermias Sisay Chanie
  • Natnael Moges
  • Sewunt Sisay Chanie

Scientific Reports (2024)

Surrogate markers of metabolic syndrome and insulin resistance in children and young adults with type 1 diabetes: a systematic review & meta-analysis (MetS and IR in T1DM)

  • Sukeshini B. Khandagale
  • Vinesh S. Kamble
  • Satyajeet P. Khare

International Journal of Diabetes in Developing Countries (2023)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

how to make meta analysis research

  • Statistical Techniques
  • Mathematics
  • Meta-Analysis

How to conduct a meta-analysis in eight steps: a practical guide

  • November 2021
  • Management Review Quarterly 72(1)

Christopher Hansen at University of Luxembourg

  • University of Luxembourg

Holger Steinmetz at Universität Trier

  • Universität Trier

Joern Hendrich Block at Universität Trier

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Joshua King Safo Lartey
  • Shamika Almeida
  • Alfredo R. Paloyo

Arash Haqbin

  • Hanafi Risdiawan

Eko Arief Sudaryono

  • Augustito Fachrudin

Doddy Setiawan

  • D. Djuminah
  • Tastaftiyan Risfandy

Mohd.Imran Hossain Chowdhury

  • Md. Zobaer Hossain
  • George P. Moschis
  • Hyun Soo Cho
  • Jong Ho Hong
  • ENERG BUILDINGS
  • Xiaomeng Yuan

Jingjing Li

  • INT J PROD ECON

Christina W.Y. Wong

  • Patrick M. Bossuyt
  • David Moher

Tanja Burgard

  • Robert Studtrucker

Piers Steel

  • Beau Gamble
  • Tomas Havranek

T. D. Stanley

  • Hristos Doucouliagos
  • Robbie C. M. van Aert
  • Frank L. Schmidt
  • John E. Hunter

Michael Borenstein

  • Julian P T Higgins

Hannah Rothstein

  • Justin A. DeSimone

Michael T Brannick

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Advertisement

Advertisement

A Guide to Conducting a Meta-Analysis

  • Published: 21 May 2016
  • Volume 26 , pages 121–128, ( 2016 )

Cite this article

how to make meta analysis research

  • Mike W.-L. Cheung 1 &
  • Ranjith Vijayakumar 1  

20k Accesses

121 Citations

9 Altmetric

Explore all metrics

Meta-analysis is widely accepted as the preferred method to synthesize research findings in various disciplines. This paper provides an introduction to when and how to conduct a meta-analysis. Several practical questions, such as advantages of meta-analysis over conventional narrative review and the number of studies required for a meta-analysis, are addressed. Common meta-analytic models are then introduced. An artificial dataset is used to illustrate how a meta-analysis is conducted in several software packages. The paper concludes with some common pitfalls of meta-analysis and their solutions. The primary goal of this paper is to provide a summary background to readers who would like to conduct their first meta-analytic study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

how to make meta analysis research

What is Qualitative in Qualitative Research

how to make meta analysis research

Criteria for Good Qualitative Research: A Comprehensive Review

A new criterion for assessing discriminant validity in variance-based structural equation modeling.

Aytug, Z. G., Rothstein, H. R., Zhou, W., & Kern, M. C. (2012). Revealed or concealed? Transparency of procedures, decisions, and judgment calls in meta-analyses. Organizational Research Methods, 15 (1), 103–133. doi: 10.1177/1094428111403495 .

Article   Google Scholar  

Bax, L., Yu, L.-M., Ikeda, N., & Moons, K. G. (2007). A systematic comparison of software dedicated to meta-analysis of causal studies. BMC Medical Research Methodology, 7 (1), 40. doi: 10.1186/1471-2288-7-40 .

Article   PubMed   PubMed Central   Google Scholar  

Borenstein, M. (2009). Effect sizes for continuous data. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (2nd ed., pp. 221–235). New York: Russell Sage Foundation.

Google Scholar  

Borenstein, M., Hedges, L. V., & Rothstein, H. R. (2005). Comprehensive meta-analysis (version 2) . Englewood NJ: Biostat.

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis . Chichester, West Sussex, U.K.; Hoboken: John Wiley & Sons.

Book   Google Scholar  

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2010). A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods, 1 (2), 97–111. doi: 10.1002/jrsm.12 .

Article   PubMed   Google Scholar  

Chalmers, I., Hedges, L. V., & Cooper, H. (2002). A brief history of research synthesis. Evaluation & the Health Professions, 25 (1), 12–37. doi: 10.1177/0163278702025001003 .

Cheung, M. W.-L. (2013). Multivariate meta-analysis as structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 20 (3), 429–454. doi: 10.1080/10705511.2013.797827 .

Cheung, M. W.-L. (2014). Modeling dependent effect sizes with three-level meta-analyses: a structural equation modeling approach. Psychological Methods, 19 (2), 211–229. doi: 10.1037/a0032968 .

Cheung, M. W.-L. (2015a). Meta-analysis: A structural equation modeling approach . Chichester, West Sussex: John Wiley & Sons, Inc..

Cheung, M. W.-L. (2015b). metaSEM: an R package for meta-analysis using structural equation modeling. Frontiers in Psychology, 5 (1521). doi: 10.3389/fpsyg.2014.01521 .

Cheung, M. W.-L., Ho, R. C. M., Lim, Y., & Mak, A. (2012). Conducting a meta-analysis: Basics and good practices. International Journal of Rheumatic Diseases , 15 (2), 129–135. doi: 10.1111/j.1756-185X.2012.01712.x

Cooper, H., & Hedges, L. V. (2009). Research synthesis as a scientific process. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (2nd ed., pp. 3–16). New York: Russell Sage Foundation.

Cooper, H., Maxwell, S., Stone, A., & Sher, K. (2008). Reporting standards for research in psychology why do we need them? What might they be? American Psychologist, 63 (9), 839–851. doi: 10.1037/0003-066X.63.9.839 .

Cooper, H. M., Hedges, L. V., & Valentine, J. C. (2009). The handbook of research synthesis and meta-analysis (2nd ed.). New York: Russell Sage Foundation.

Davey, J., Turner, R. M., Clarke, M. J., & Higgins, J. P. (2011). Characteristics of meta-analyses and their component studies in the Cochrane database of systematic reviews: a cross-sectional, descriptive analysis. BMC Medical Research Methodology, 11 , 160. doi: 10.1186/1471-2288-11-160 .

Dewey, M. (2015). CRAN task view: Meta-analysis. Retrieved from http://CRAN.R-project.org/view=MetaAnalysis

Dickersin, K., Chan, S., Chalmersx, T. C., Sacks, H. S., & Smith, H. (1987). Publication bias and clinical trials. Controlled Clinical Trials, 8 (4), 343–353. doi: 10.1016/0197-2456(87)90155-3 .

Article   CAS   PubMed   Google Scholar  

Easterbrook, P. J., Gopalan, R., Berlin, J. A., & Matthews, D. R. (1991). Publication bias in clinical research. The Lancet, 337 (8746), 867–872. doi: 10.1016/0140-6736(91)90201-Y .

Article   CAS   Google Scholar  

Eysenck, H. J. (1978). An exercise in mega-silliness. American Psychologist, 33 (5), 517.

Ferguson, C. J., & Brannick, M. T. (2012). Publication bias in psychological science: prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychological Methods, 17 (1), 120–128. doi: 10.1037/a0024445 .

Fioravanti, M., Carlone, O., Vitale, B., Cinti, M. E., & Clare, L. (2005). A meta-analysis of cognitive deficits in adults with a diagnosis of schizophrenia. Neuropsychology Review, 15 (2), 73–95. doi: 10.1007/s11065-005-6254-9 .

Fleiss, J. L., & Berlin, J. A. (2009). Effect sizes for dichotomous data. In H. M. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (2nd ed., pp. 237–253). New York: Russell Sage Foundation.

Francis, G. (2013). Replication, statistical consistency, and publication bias. Journal of Mathematical Psychology, 57 (5), 153–169. doi: 10.1016/j.jmp.2013.02.003 .

Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5 (10), 3–8. doi: 10.2307/1174772 .

Harwell, M. (1997). An empirical study of Hedge’s homogeneity test. Psychological Methods, 2 (2), 219–231. doi: 10.1037/1082-989X.2.2.219 .

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis . Orlando, FL: Academic Press.

Hedges, L. V., & Pigott, T. D. (2001). The power of statistical tests in meta-analysis. Psychological Methods, 6 (3), 203–217. doi: 10.1037/1082-989X.6.3.203 .

Hedges, L. V., & Pigott, T. D. (2004). The power of statistical tests for moderators in meta-analysis. Psychological Methods, 9 (4), 426–445. doi: 10.1037/1082-989X.9.4.426 .

Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21 (11), 1539–1558. doi: 10.1002/sim.1186 .

Higgins, J. P. T., Thompson, S. G., Deeks, J. J., & Altman, D. G. (2003). Measuring inconsistency in meta-analyses. British Medical Journal, 327 (7414), 557–560.

Huedo-Medina, T. B., Sanchez-Meca, J., Marin-Martinez, F., & Botella, J. (2006). Assessing heterogeneity in meta-analysis. Q statistic or I2 index? Psychological Methods June 2006, 11 (2), 193–206.

PubMed   Google Scholar  

Hunt, M. (1997). How science takes stock: The story of meta-analysis . New York: Russell Sage Foundation.

Koricheva, J., Gurevitch, J., & Mengersen, K. (Eds.) (2013). Handbook of meta-analysis in ecology and evolution . Princeton: Princeton University Press.

Kupfersmid, J., & Fiala, M. (1991). A survey of attitudes and behaviors of authors who publish in psychology and education journals. American Psychologist, 46 (3), 249–250. doi: 10.1037/0003-066X.46.3.249 .

Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gotzsche, P. C., Ioannidis, J. P. A., et al. (2009). The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ, 339 (jul21 1), b2700–b2700. doi: 10.1136/bmj.b2700 .

Lindsay, D. S. (2015). Replication in psychological science. Psychological Science, 26 (12), 1827–1832. doi: 10.1177/0956797615616374 .

Lipsey, M. W., & Wilson, D. (2000). Practical meta-analysis. Sage Publications, Inc ,Thousand Oaks.

Littell, J. H., Corcoran, J., & Pillai, V. (2008). Systematic reviews and meta-analysis. Oxford University Press. Retrieved from http://www.oxfordscholarship.com/view/10.1093/acprof:oso/9780195326543.001.0001/acprof-9780195326543

Lucas, R. E., & Brent Donnellan, M. (2013). Improving the replicability and reproducibility of research published in the journal of research in personality. Journal of Research in Personality, 47 (4), 453–454. doi: 10.1016/j.jrp.2013.05.002 .

Maxwell, S. E., Lau, M. Y., & Howard, G. S. (2015). Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? American Psychologist, 70 (6), 487–498. doi: 10.1037/a0039400 .

Morris, S. B., & DeShon, R. P. (2002). Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. Psychological Methods, 7 (1), 105–125. doi: 10.1037/1082-989X.7.1.105 .

National Research Council (1992). Combining information: statistical issues and opportunities for research . Washington, DC: National Academies Press.

O’Rourke, K. (2007). An historical perspective on meta-analysis: dealing quantitatively with varying study results. JRSM, 100 (12), 579–582. doi: 10.1258/jrsm.100.12.579 .

Open Science Collaboration (2012). An open, large-scale, collaborative effort to estimate the reproducibility of psychological science. Perspectives on Psychological Science, 7 (6), 657–660. doi: 10.1177/1745691612462588 .

Open Science Collaboration (2015). Estimating the reproducibility of psychological. Science, 349 (6251), aac4716. doi: 10.1126/science.aac4716 .

Palmer, T. M., & Sterne, J. A. C. (Eds.) (2016). Meta-analysis: an updated collection from the Stata journal (2nd ed.). College Station: Stata Press.

Pearson, K. (1904). Report on certain enteric fever inoculation statistics. BMJ, 2 (2288), 1243–1246. doi: 10.1136/bmj.2.2288.1243 .

Pigott, T. D. (2012). Advances in meta-analysis. New York: Springer. Retrieved from http://www.springer.com.libproxy1.nus.edu.sg/statistics/social+sciences+%26+law/book/978-1-4614-2277-8

R Development Core Team. (2016). R : a language and environment for statistical computing . Vienna, Austria. Retrieved from http://www.R-project.org /

Reed, J. G., & Baxter, P. M. (2009). Using reference databases. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (2nd ed., pp. 73–101). New York: Russell Sage Foundation.

Rothstein, H. R., & Bushman, B. J. (2012). Publication bias in psychological science: comment on Ferguson and Brannick (2012). Psychological Methods, 17 (1), 129–136. doi: 10.1037/a0027128 .

Rothstein, H. R., Sutton, A. J., & Borenstein, M. (2005). Publication bias in meta-analysis: Prevention, assessment and adjustments . Chichester: John Wiley and Sons.

Schmidt, F. L., & Hunter, J. E. (2015). Methods of meta-analysis: Correcting error and bias in research findings (3rd ed.). Thousand Oaks, CA: Sage.

Shadish, W. R. (2015). Introduction to the special issue on the origins of modern meta-analysis. Research Synthesis Methods, 6 (3), 219–220. doi: 10.1002/jrsm.1148 .

Smith, M. L., & Glass, G. V. (1977). Meta-analysis of psychotherapy outcome studies. American Psychologist, 32 (9), 752–760.

Song, F., Parekh, S., Hooper, L., Loke, Y., Ryder, J., Sutton, A., et al. (2010). Dissemination and publication of research findings: an updated review of related biases. Health Technology Assessment, 14 (8). doi: 10.3310/hta14080 .

Sterne, J. A. C., Egger, M., & Sutton, A. J. (2001). Meta-analysis software. In M. Egger, G. D. Smith, & D. G. Altman (Eds.), Systematic Reviews in Health Care : Meta-Analysis in Context (pp. 336–346). London: BMJ Publishing Group. Retrieved from doi: 10.1002/9780470693926.ch17/summary

Valentine, J. C., Pigott, T. D., & Rothstein, H. R. (2010). How many studies do you need? A primer on statistical power for meta-analysis. Journal of Educational and Behavioral Statistics, 35 (2), 215–247. doi: 10.3102/1076998609346961 .

Viechtbauer, W. (2007). Hypothesis tests for population heterogeneity in meta-analysis. British Journal of Mathematical and Statistical Psychology, 60 (1), 29–60. doi: 10.1348/000711005X64042 .

Wallace, B. C., Schmid, C. H., Lau, J., & Trikalinos, T. A. (2009). Meta-analyst: software for meta-analysis of binary, continuous and diagnostic data. BMC Medical Research Methodology, 9 (1), 80. doi: 10.1186/1471-2288-9-80 .

Weare, K., & Nind, M. (2011). Mental health promotion and problem prevention in schools: what does the evidence say? Health Promotion International, 26 (suppl 1), i29–i69. doi: 10.1093/heapro/dar075 .

Download references

Acknowledgments

Mike W.-L. Cheung was supported by the Academic Research Fund Tier 1 (FY2013-FRC5-002) from the Ministry of Education, Singapore. We would like to thank Maggie Chan for providing comments on an earlier version of this manuscript.

Author information

Authors and affiliations.

Department of Psychology, Faculty of Arts and Social Sciences, National University of Singapore, Block AS4, Level 2, 9 Arts Link, Singapore, 117570, Singapore

Mike W.-L. Cheung & Ranjith Vijayakumar

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Mike W.-L. Cheung .

Rights and permissions

Reprints and permissions

About this article

Cheung, M.WL., Vijayakumar, R. A Guide to Conducting a Meta-Analysis. Neuropsychol Rev 26 , 121–128 (2016). https://doi.org/10.1007/s11065-016-9319-z

Download citation

Received : 29 February 2016

Accepted : 02 May 2016

Published : 21 May 2016

Issue Date : June 2016

DOI : https://doi.org/10.1007/s11065-016-9319-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Literature review
  • Systematic review
  • Meta-analysis
  • Moderator analysis
  • Find a journal
  • Publish with us
  • Track your research

Study Design 101: Meta-Analysis

  • Case Report
  • Case Control Study
  • Cohort Study
  • Randomized Controlled Trial
  • Practice Guideline
  • Systematic Review

Meta-Analysis

  • Helpful Formulas
  • Finding Specific Study Types

A subset of systematic reviews; a method for systematically combining pertinent qualitative and quantitative study data from several selected studies to develop a single conclusion that has greater statistical power. This conclusion is statistically stronger than the analysis of any single study, due to increased numbers of subjects, greater diversity among subjects, or accumulated effects and results.

Meta-analysis would be used for the following purposes:

  • To establish statistical significance with studies that have conflicting results
  • To develop a more correct estimate of effect magnitude
  • To provide a more complex analysis of harms, safety data, and benefits
  • To examine subgroups with individual numbers that are not statistically significant

If the individual studies utilized randomized controlled trials (RCT), combining several selected RCT results would be the highest-level of evidence on the evidence hierarchy, followed by systematic reviews, which analyze all available studies on a topic.

  • Greater statistical power
  • Confirmatory data analysis
  • Greater ability to extrapolate to general population affected
  • Considered an evidence-based resource

Disadvantages

  • Difficult and time consuming to identify appropriate studies
  • Not all studies provide adequate data for inclusion and analysis
  • Requires advanced statistical techniques
  • Heterogeneity of study populations

Design pitfalls to look out for

The studies pooled for review should be similar in type (i.e. all randomized controlled trials).

Are the studies being reviewed all the same type of study or are they a mixture of different types?

The analysis should include published and unpublished results to avoid publication bias.

Does the meta-analysis include any appropriate relevant studies that may have had negative outcomes?

Fictitious Example

Do individuals who wear sunscreen have fewer cases of melanoma than those who do not wear sunscreen? A MEDLINE search was conducted using the terms melanoma, sunscreening agents, and zinc oxide, resulting in 8 randomized controlled studies, each with between 100 and 120 subjects. All of the studies showed a positive effect between wearing sunscreen and reducing the likelihood of melanoma. The subjects from all eight studies (total: 860 subjects) were pooled and statistically analyzed to determine the effect of the relationship between wearing sunscreen and melanoma. This meta-analysis showed a 50% reduction in melanoma diagnosis among sunscreen-wearers.

Real-life Examples

Goyal, A., Elminawy, M., Kerezoudis, P., Lu, V., Yolcu, Y., Alvi, M., & Bydon, M. (2019). Impact of obesity on outcomes following lumbar spine surgery: A systematic review and meta-analysis. Clinical Neurology and Neurosurgery, 177 , 27-36. https://doi.org/10.1016/j.clineuro.2018.12.012

This meta-analysis was interested in determining whether obesity affects the outcome of spinal surgery. Some previous studies have shown higher perioperative morbidity in patients with obesity while other studies have not shown this effect. This study looked at surgical outcomes including "blood loss, operative time, length of stay, complication and reoperation rates and functional outcomes" between patients with and without obesity. A meta-analysis of 32 studies (23,415 patients) was conducted. There were no significant differences for patients undergoing minimally invasive surgery, but patients with obesity who had open surgery had experienced higher blood loss and longer operative times (not clinically meaningful) as well as higher complication and reoperation rates. Further research is needed to explore this issue in patients with morbid obesity.

Nakamura, A., van Der Waerden, J., Melchior, M., Bolze, C., El-Khoury, F., & Pryor, L. (2019). Physical activity during pregnancy and postpartum depression: Systematic review and meta-analysis. Journal of Affective Disorders, 246 , 29-41. https://doi.org/10.1016/j.jad.2018.12.009

This meta-analysis explored whether physical activity during pregnancy prevents postpartum depression. Seventeen studies were included (93,676 women) and analysis showed a "significant reduction in postpartum depression scores in women who were physically active during their pregnancies when compared with inactive women." Possible limitations or moderators of this effect include intensity and frequency of physical activity, type of physical activity, and timepoint in pregnancy (e.g. trimester).

Related Terms

A document often written by a panel that provides a comprehensive review of all relevant studies on a particular clinical or health-related topic/question.

Publication Bias

A phenomenon in which studies with positive results have a better chance of being published, are published earlier, and are published in journals with higher impact factors. Therefore, conclusions based exclusively on published studies can be misleading.

Now test yourself!

1. A Meta-Analysis pools together the sample populations from different studies, such as Randomized Controlled Trials, into one statistical analysis and treats them as one large sample population with one conclusion.

a) True b) False

2. One potential design pitfall of Meta-Analyses that is important to pay attention to is:

a) Whether it is evidence-based. b) If the authors combined studies with conflicting results. c) If the authors appropriately combined studies so they did not compare apples and oranges. d) If the authors used only quantitative data.

Evidence Pyramid - Navigation

  • Meta- Analysis
  • Case Reports
  • << Previous: Systematic Review
  • Next: Helpful Formulas >>

Creative Commons License

  • Last Updated: Sep 25, 2023 10:59 AM
  • URL: https://guides.himmelfarb.gwu.edu/studydesign101

GW logo

  • Himmelfarb Intranet
  • Privacy Notice
  • Terms of Use
  • GW is committed to digital accessibility. If you experience a barrier that affects your ability to access content on this page, let us know via the Accessibility Feedback Form .
  • Himmelfarb Health Sciences Library
  • 2300 Eye St., NW, Washington, DC 20037
  • Phone: (202) 994-2850
  • [email protected]
  • https://himmelfarb.gwu.edu

Jump to navigation

Home

Cochrane Training

Chapter 10: analysing data and undertaking meta-analyses.

Jonathan J Deeks, Julian PT Higgins, Douglas G Altman; on behalf of the Cochrane Statistical Methods Group

Key Points:

  • Meta-analysis is the statistical combination of results from two or more separate studies.
  • Potential advantages of meta-analyses include an improvement in precision, the ability to answer questions not posed by individual studies, and the opportunity to settle controversies arising from conflicting claims. However, they also have the potential to mislead seriously, particularly if specific study designs, within-study biases, variation across studies, and reporting biases are not carefully considered.
  • It is important to be familiar with the type of data (e.g. dichotomous, continuous) that result from measurement of an outcome in an individual study, and to choose suitable effect measures for comparing intervention groups.
  • Most meta-analysis methods are variations on a weighted average of the effect estimates from the different studies.
  • Studies with no events contribute no information about the risk ratio or odds ratio. For rare events, the Peto method has been observed to be less biased and more powerful than other methods.
  • Variation across studies (heterogeneity) must be considered, although most Cochrane Reviews do not have enough studies to allow for the reliable investigation of its causes. Random-effects meta-analyses allow for heterogeneity by assuming that underlying effects follow a normal distribution, but they must be interpreted carefully. Prediction intervals from random-effects meta-analyses are a useful device for presenting the extent of between-study variation.
  • Many judgements are required in the process of preparing a meta-analysis. Sensitivity analyses should be used to examine whether overall findings are robust to potentially influential decisions.

Cite this chapter as: Deeks JJ, Higgins JPT, Altman DG (editors). Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August  2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

10.1 Do not start here!

It can be tempting to jump prematurely into a statistical analysis when undertaking a systematic review. The production of a diamond at the bottom of a plot is an exciting moment for many authors, but results of meta-analyses can be very misleading if suitable attention has not been given to formulating the review question; specifying eligibility criteria; identifying and selecting studies; collecting appropriate data; considering risk of bias; planning intervention comparisons; and deciding what data would be meaningful to analyse. Review authors should consult the chapters that precede this one before a meta-analysis is undertaken.

10.2 Introduction to meta-analysis

An important step in a systematic review is the thoughtful consideration of whether it is appropriate to combine the numerical results of all, or perhaps some, of the studies. Such a meta-analysis yields an overall statistic (together with its confidence interval) that summarizes the effectiveness of an experimental intervention compared with a comparator intervention. Potential advantages of meta-analyses include the following:

  • T o improve precision . Many studies are too small to provide convincing evidence about intervention effects in isolation. Estimation is usually improved when it is based on more information.
  • To answer questions not posed by the individual studies . Primary studies often involve a specific type of participant and explicitly defined interventions. A selection of studies in which these characteristics differ can allow investigation of the consistency of effect across a wider range of populations and interventions. It may also, if relevant, allow reasons for differences in effect estimates to be investigated.
  • To settle controversies arising from apparently conflicting studies or to generate new hypotheses . Statistical synthesis of findings allows the degree of conflict to be formally assessed, and reasons for different results to be explored and quantified.

Of course, the use of statistical synthesis methods does not guarantee that the results of a review are valid, any more than it does for a primary study. Moreover, like any tool, statistical methods can be misused.

This chapter describes the principles and methods used to carry out a meta-analysis for a comparison of two interventions for the main types of data encountered. The use of network meta-analysis to compare more than two interventions is addressed in Chapter 11 . Formulae for most of the methods described are provided in the RevMan Web Knowledge Base under Statistical Algorithms and calculations used in Review Manager (documentation.cochrane.org/revman-kb/statistical-methods-210600101.html), and a longer discussion of many of the issues is available ( Deeks et al 2001 ).

10.2.1 Principles of meta-analysis

The commonly used methods for meta-analysis follow the following basic principles:

  • Meta-analysis is typically a two-stage process. In the first stage, a summary statistic is calculated for each study, to describe the observed intervention effect in the same way for every study. For example, the summary statistic may be a risk ratio if the data are dichotomous, or a difference between means if the data are continuous (see Chapter 6 ).

how to make meta analysis research

  • The combination of intervention effect estimates across studies may optionally incorporate an assumption that the studies are not all estimating the same intervention effect, but estimate intervention effects that follow a distribution across studies. This is the basis of a random-effects meta-analysis (see Section 10.10.4 ). Alternatively, if it is assumed that each study is estimating exactly the same quantity, then a fixed-effect meta-analysis is performed.
  • The standard error of the summary intervention effect can be used to derive a confidence interval, which communicates the precision (or uncertainty) of the summary estimate; and to derive a P value, which communicates the strength of the evidence against the null hypothesis of no intervention effect.
  • As well as yielding a summary quantification of the intervention effect, all methods of meta-analysis can incorporate an assessment of whether the variation among the results of the separate studies is compatible with random variation, or whether it is large enough to indicate inconsistency of intervention effects across studies (see Section 10.10 ).
  • The problem of missing data is one of the numerous practical considerations that must be thought through when undertaking a meta-analysis. In particular, review authors should consider the implications of missing outcome data from individual participants (due to losses to follow-up or exclusions from analysis) (see Section 10.12 ).

Meta-analyses are usually illustrated using a forest plot . An example appears in Figure 10.2.a . A forest plot displays effect estimates and confidence intervals for both individual studies and meta-analyses (Lewis and Clarke 2001). Each study is represented by a block at the point estimate of intervention effect with a horizontal line extending either side of the block. The area of the block indicates the weight assigned to that study in the meta-analysis while the horizontal line depicts the confidence interval (usually with a 95% level of confidence). The area of the block and the confidence interval convey similar information, but both make different contributions to the graphic. The confidence interval depicts the range of intervention effects compatible with the study’s result. The size of the block draws the eye towards the studies with larger weight (usually those with narrower confidence intervals), which dominate the calculation of the summary result, presented as a diamond at the bottom.

Figure 10.2.a Example of a forest plot from a review of interventions to promote ownership of smoke alarms (DiGuiseppi and Higgins 2001). Reproduced with permission of John Wiley & Sons

how to make meta analysis research

10.3 A generic inverse-variance approach to meta-analysis

A very common and simple version of the meta-analysis procedure is commonly referred to as the inverse-variance method . This approach is implemented in its most basic form in RevMan, and is used behind the scenes in many meta-analyses of both dichotomous and continuous data.

The inverse-variance method is so named because the weight given to each study is chosen to be the inverse of the variance of the effect estimate (i.e. 1 over the square of its standard error). Thus, larger studies, which have smaller standard errors, are given more weight than smaller studies, which have larger standard errors. This choice of weights minimizes the imprecision (uncertainty) of the pooled effect estimate.

10.3.1 Fixed-effect method for meta-analysis

A fixed-effect meta-analysis using the inverse-variance method calculates a weighted average as:

how to make meta analysis research

where Y i is the intervention effect estimated in the i th study, SE i is the standard error of that estimate, and the summation is across all studies. The basic data required for the analysis are therefore an estimate of the intervention effect and its standard error from each study. A fixed-effect meta-analysis is valid under an assumption that all effect estimates are estimating the same underlying intervention effect, which is referred to variously as a ‘fixed-effect’ assumption, a ‘common-effect’ assumption or an ‘equal-effects’ assumption. However, the result of the meta-analysis can be interpreted without making such an assumption (Rice et al 2018).

10.3.2 Random-effects methods for meta-analysis

A variation on the inverse-variance method is to incorporate an assumption that the different studies are estimating different, yet related, intervention effects (Higgins et al 2009). This produces a random-effects meta-analysis, and the simplest version is known as the DerSimonian and Laird method (DerSimonian and Laird 1986). Random-effects meta-analysis is discussed in detail in Section 10.10.4 .

10.3.3 Performing inverse-variance meta-analyses

Most meta-analysis programs perform inverse-variance meta-analyses. Usually the user provides summary data from each intervention arm of each study, such as a 2×2 table when the outcome is dichotomous (see Chapter 6, Section 6.4 ), or means, standard deviations and sample sizes for each group when the outcome is continuous (see Chapter 6, Section 6.5 ). This avoids the need for the author to calculate effect estimates, and allows the use of methods targeted specifically at different types of data (see Sections 10.4 and 10.5 ).

When the data are conveniently available as summary statistics from each intervention group, the inverse-variance method can be implemented directly. For example, estimates and their standard errors may be entered directly into RevMan under the ‘Generic inverse variance’ outcome type. For ratio measures of intervention effect, the data must be entered into RevMan as natural logarithms (for example, as a log odds ratio and the standard error of the log odds ratio). However, it is straightforward to instruct the software to display results on the original (e.g. odds ratio) scale. It is possible to supplement or replace this with a column providing the sample sizes in the two groups. Note that the ability to enter estimates and standard errors creates a high degree of flexibility in meta-analysis. It facilitates the analysis of properly analysed crossover trials, cluster-randomized trials and non-randomized trials (see Chapter 23 ), as well as outcome data that are ordinal, time-to-event or rates (see Chapter 6 ).

10.4 Meta-analysis of dichotomous outcomes

There are four widely used methods of meta-analysis for dichotomous outcomes, three fixed-effect methods (Mantel-Haenszel, Peto and inverse variance) and one random-effects method (DerSimonian and Laird inverse variance). All of these methods are available as analysis options in RevMan. The Peto method can only combine odds ratios, whilst the other three methods can combine odds ratios, risk ratios or risk differences. Formulae for all of the meta-analysis methods are available elsewhere (Deeks et al 2001).

Note that having no events in one group (sometimes referred to as ‘zero cells’) causes problems with computation of estimates and standard errors with some methods: see Section 10.4.4 .

10.4.1 Mantel-Haenszel methods

When data are sparse, either in terms of event risks being low or study size being small, the estimates of the standard errors of the effect estimates that are used in the inverse-variance methods may be poor. Mantel-Haenszel methods are fixed-effect meta-analysis methods using a different weighting scheme that depends on which effect measure (e.g. risk ratio, odds ratio, risk difference) is being used (Mantel and Haenszel 1959, Greenland and Robins 1985). They have been shown to have better statistical properties when there are few events. As this is a common situation in Cochrane Reviews, the Mantel-Haenszel method is generally preferable to the inverse variance method in fixed-effect meta-analyses. In other situations the two methods give similar estimates.

10.4.2 Peto odds ratio method

Peto’s method can only be used to combine odds ratios (Yusuf et al 1985). It uses an inverse-variance approach, but uses an approximate method of estimating the log odds ratio, and uses different weights. An alternative way of viewing the Peto method is as a sum of ‘O – E’ statistics. Here, O is the observed number of events and E is an expected number of events in the experimental intervention group of each study under the null hypothesis of no intervention effect.

The approximation used in the computation of the log odds ratio works well when intervention effects are small (odds ratios are close to 1), events are not particularly common and the studies have similar numbers in experimental and comparator groups. In other situations it has been shown to give biased answers. As these criteria are not always fulfilled, Peto’s method is not recommended as a default approach for meta-analysis.

Corrections for zero cell counts are not necessary when using Peto’s method. Perhaps for this reason, this method performs well when events are very rare (Bradburn et al 2007); see Section 10.4.4.1 . Also, Peto’s method can be used to combine studies with dichotomous outcome data with studies using time-to-event analyses where log-rank tests have been used (see Section 10.9 ).

10.4.3 Which effect measure for dichotomous outcomes?

Effect measures for dichotomous data are described in Chapter 6, Section 6.4.1 . The effect of an intervention can be expressed as either a relative or an absolute effect. The risk ratio (relative risk) and odds ratio are relative measures, while the risk difference and number needed to treat for an additional beneficial outcome are absolute measures. A further complication is that there are, in fact, two risk ratios. We can calculate the risk ratio of an event occurring or the risk ratio of no event occurring. These give different summary results in a meta-analysis, sometimes dramatically so.

The selection of a summary statistic for use in meta-analysis depends on balancing three criteria (Deeks 2002). First, we desire a summary statistic that gives values that are similar for all the studies in the meta-analysis and subdivisions of the population to which the interventions will be applied. The more consistent the summary statistic, the greater is the justification for expressing the intervention effect as a single summary number. Second, the summary statistic must have the mathematical properties required to perform a valid meta-analysis. Third, the summary statistic would ideally be easily understood and applied by those using the review. The summary intervention effect should be presented in a way that helps readers to interpret and apply the results appropriately. Among effect measures for dichotomous data, no single measure is uniformly best, so the choice inevitably involves a compromise.

Consistency Empirical evidence suggests that relative effect measures are, on average, more consistent than absolute measures (Engels et al 2000, Deeks 2002, Rücker et al 2009). For this reason, it is wise to avoid performing meta-analyses of risk differences, unless there is a clear reason to suspect that risk differences will be consistent in a particular clinical situation. On average there is little difference between the odds ratio and risk ratio in terms of consistency (Deeks 2002). When the study aims to reduce the incidence of an adverse event, there is empirical evidence that risk ratios of the adverse event are more consistent than risk ratios of the non-event (Deeks 2002). Selecting an effect measure based on what is the most consistent in a particular situation is not a generally recommended strategy, since it may lead to a selection that spuriously maximizes the precision of a meta-analysis estimate.

Mathematical properties The most important mathematical criterion is the availability of a reliable variance estimate. The number needed to treat for an additional beneficial outcome does not have a simple variance estimator and cannot easily be used directly in meta-analysis, although it can be computed from the meta-analysis result afterwards (see Chapter 15, Section 15.4.2 ). There is no consensus regarding the importance of two other often-cited mathematical properties: the fact that the behaviour of the odds ratio and the risk difference do not rely on which of the two outcome states is coded as the event, and the odds ratio being the only statistic which is unbounded (see Chapter 6, Section 6.4.1 ).

Ease of interpretation The odds ratio is the hardest summary statistic to understand and to apply in practice, and many practising clinicians report difficulties in using them. There are many published examples where authors have misinterpreted odds ratios from meta-analyses as risk ratios. Although odds ratios can be re-expressed for interpretation (as discussed here), there must be some concern that routine presentation of the results of systematic reviews as odds ratios will lead to frequent over-estimation of the benefits and harms of interventions when the results are applied in clinical practice. Absolute measures of effect are thought to be more easily interpreted by clinicians than relative effects (Sinclair and Bracken 1994), and allow trade-offs to be made between likely benefits and likely harms of interventions. However, they are less likely to be generalizable.

It is generally recommended that meta-analyses are undertaken using risk ratios (taking care to make a sensible choice over which category of outcome is classified as the event) or odds ratios. This is because it seems important to avoid using summary statistics for which there is empirical evidence that they are unlikely to give consistent estimates of intervention effects (the risk difference), and it is impossible to use statistics for which meta-analysis cannot be performed (the number needed to treat for an additional beneficial outcome). It may be wise to plan to undertake a sensitivity analysis to investigate whether choice of summary statistic (and selection of the event category) is critical to the conclusions of the meta-analysis (see Section 10.14 ).

It is often sensible to use one statistic for meta-analysis and to re-express the results using a second, more easily interpretable statistic. For example, often meta-analysis may be best performed using relative effect measures (risk ratios or odds ratios) and the results re-expressed using absolute effect measures (risk differences or numbers needed to treat for an additional beneficial outcome – see Chapter 15, Section 15.4 . This is one of the key motivations for ‘Summary of findings’ tables in Cochrane Reviews: see Chapter 14 ). If odds ratios are used for meta-analysis they can also be re-expressed as risk ratios (see Chapter 15, Section 15.4 ). In all cases the same formulae can be used to convert upper and lower confidence limits. However, all of these transformations require specification of a value of baseline risk that indicates the likely risk of the outcome in the ‘control’ population to which the experimental intervention will be applied. Where the chosen value for this assumed comparator group risk is close to the typical observed comparator group risks across the studies, similar estimates of absolute effect will be obtained regardless of whether odds ratios or risk ratios are used for meta-analysis. Where the assumed comparator risk differs from the typical observed comparator group risk, the predictions of absolute benefit will differ according to which summary statistic was used for meta-analysis.

10.4.4 Meta-analysis of rare events

For rare outcomes, meta-analysis may be the only way to obtain reliable evidence of the effects of healthcare interventions. Individual studies are usually under-powered to detect differences in rare outcomes, but a meta-analysis of many studies may have adequate power to investigate whether interventions do have an impact on the incidence of the rare event. However, many methods of meta-analysis are based on large sample approximations, and are unsuitable when events are rare. Thus authors must take care when selecting a method of meta-analysis (Efthimiou 2018).

There is no single risk at which events are classified as ‘rare’. Certainly risks of 1 in 1000 constitute rare events, and many would classify risks of 1 in 100 the same way. However, the performance of methods when risks are as high as 1 in 10 may also be affected by the issues discussed in this section. What is typical is that a high proportion of the studies in the meta-analysis observe no events in one or more study arms.

10.4.4.1 Studies with no events in one or more arms

Computational problems can occur when no events are observed in one or both groups in an individual study. Inverse variance meta-analytical methods involve computing an intervention effect estimate and its standard error for each study. For studies where no events were observed in one or both arms, these computations often involve dividing by a zero count, which yields a computational error. Most meta-analytical software routines (including those in RevMan) automatically check for problematic zero counts, and add a fixed value (typically 0.5) to all cells of a 2×2 table where the problems occur. The Mantel-Haenszel methods require zero-cell corrections only if the same cell is zero in all the included studies, and hence need to use the correction less often. However, in many software applications the same correction rules are applied for Mantel-Haenszel methods as for the inverse-variance methods. Odds ratio and risk ratio methods require zero cell corrections more often than difference methods, except for the Peto odds ratio method, which encounters computation problems only in the extreme situation of no events occurring in all arms of all studies.

Whilst the fixed correction meets the objective of avoiding computational errors, it usually has the undesirable effect of biasing study estimates towards no difference and over-estimating variances of study estimates (consequently down-weighting inappropriately their contribution to the meta-analysis). Where the sizes of the study arms are unequal (which occurs more commonly in non-randomized studies than randomized trials), they will introduce a directional bias in the treatment effect. Alternative non-fixed zero-cell corrections have been explored by Sweeting and colleagues, including a correction proportional to the reciprocal of the size of the contrasting study arm, which they found preferable to the fixed 0.5 correction when arm sizes were not balanced (Sweeting et al 2004).

10.4.4.2 Studies with no events in either arm

The standard practice in meta-analysis of odds ratios and risk ratios is to exclude studies from the meta-analysis where there are no events in both arms. This is because such studies do not provide any indication of either the direction or magnitude of the relative treatment effect. Whilst it may be clear that events are very rare on both the experimental intervention and the comparator intervention, no information is provided as to which group is likely to have the higher risk, or on whether the risks are of the same or different orders of magnitude (when risks are very low, they are compatible with very large or very small ratios). Whilst one might be tempted to infer that the risk would be lowest in the group with the larger sample size (as the upper limit of the confidence interval would be lower), this is not justified as the sample size allocation was determined by the study investigators and is not a measure of the incidence of the event.

Risk difference methods superficially appear to have an advantage over odds ratio methods in that the risk difference is defined (as zero) when no events occur in either arm. Such studies are therefore included in the estimation process. Bradburn and colleagues undertook simulation studies which revealed that all risk difference methods yield confidence intervals that are too wide when events are rare, and have associated poor statistical power, which make them unsuitable for meta-analysis of rare events (Bradburn et al 2007). This is especially relevant when outcomes that focus on treatment safety are being studied, as the ability to identify correctly (or attempt to refute) serious adverse events is a key issue in drug development.

It is likely that outcomes for which no events occur in either arm may not be mentioned in reports of many randomized trials, precluding their inclusion in a meta-analysis. It is unclear, though, when working with published results, whether failure to mention a particular adverse event means there were no such events, or simply that such events were not included as a measured endpoint. Whilst the results of risk difference meta-analyses will be affected by non-reporting of outcomes with no events, odds and risk ratio based methods naturally exclude these data whether or not they are published, and are therefore unaffected.

10.4.4.3 Validity of methods of meta-analysis for rare events

Simulation studies have revealed that many meta-analytical methods can give misleading results for rare events, which is unsurprising given their reliance on asymptotic statistical theory. Their performance has been judged suboptimal either through results being biased, confidence intervals being inappropriately wide, or statistical power being too low to detect substantial differences.

In the following we consider the choice of statistical method for meta-analyses of odds ratios. Appropriate choices appear to depend on the comparator group risk, the likely size of the treatment effect and consideration of balance in the numbers of experimental and comparator participants in the constituent studies. We are not aware of research that has evaluated risk ratio measures directly, but their performance is likely to be very similar to corresponding odds ratio measurements. When events are rare, estimates of odds and risks are near identical, and results of both can be interpreted as ratios of probabilities.

Bradburn and colleagues found that many of the most commonly used meta-analytical methods were biased when events were rare (Bradburn et al 2007). The bias was greatest in inverse variance and DerSimonian and Laird odds ratio and risk difference methods, and the Mantel-Haenszel odds ratio method using a 0.5 zero-cell correction. As already noted, risk difference meta-analytical methods tended to show conservative confidence interval coverage and low statistical power when risks of events were low.

At event rates below 1% the Peto one-step odds ratio method was found to be the least biased and most powerful method, and provided the best confidence interval coverage, provided there was no substantial imbalance between treatment and comparator group sizes within studies, and treatment effects were not exceptionally large. This finding was consistently observed across three different meta-analytical scenarios, and was also observed by Sweeting and colleagues (Sweeting et al 2004).

This finding was noted despite the method producing only an approximation to the odds ratio. For very large effects (e.g. risk ratio=0.2) when the approximation is known to be poor, treatment effects were under-estimated, but the Peto method still had the best performance of all the methods considered for event risks of 1 in 1000, and the bias was never more than 6% of the comparator group risk.

In other circumstances (i.e. event risks above 1%, very large effects at event risks around 1%, and meta-analyses where many studies were substantially imbalanced) the best performing methods were the Mantel-Haenszel odds ratio without zero-cell corrections, logistic regression and an exact method. None of these methods is available in RevMan.

Methods that should be avoided with rare events are the inverse-variance methods (including the DerSimonian and Laird random-effects method) (Efthimiou 2018). These directly incorporate the study’s variance in the estimation of its contribution to the meta-analysis, but these are usually based on a large-sample variance approximation, which was not intended for use with rare events. We would suggest that incorporation of heterogeneity into an estimate of a treatment effect should be a secondary consideration when attempting to produce estimates of effects from sparse data – the primary concern is to discern whether there is any signal of an effect in the data.

10.5 Meta-analysis of continuous outcomes

An important assumption underlying standard methods for meta-analysis of continuous data is that the outcomes have a normal distribution in each intervention arm in each study. This assumption may not always be met, although it is unimportant in very large studies. It is useful to consider the possibility of skewed data (see Section 10.5.3 ).

10.5.1 Which effect measure for continuous outcomes?

The two summary statistics commonly used for meta-analysis of continuous data are the mean difference (MD) and the standardized mean difference (SMD). Other options are available, such as the ratio of means (see Chapter 6, Section 6.5.1 ). Selection of summary statistics for continuous data is principally determined by whether studies all report the outcome using the same scale (when the mean difference can be used) or using different scales (when the standardized mean difference is usually used). The ratio of means can be used in either situation, but is appropriate only when outcome measurements are strictly greater than zero. Further considerations in deciding on an effect measure that will facilitate interpretation of the findings appears in Chapter 15, Section 15.5 .

The different roles played in MD and SMD approaches by the standard deviations (SDs) of outcomes observed in the two groups should be understood.

For the mean difference approach, the SDs are used together with the sample sizes to compute the weight given to each study. Studies with small SDs are given relatively higher weight whilst studies with larger SDs are given relatively smaller weights. This is appropriate if variation in SDs between studies reflects differences in the reliability of outcome measurements, but is probably not appropriate if the differences in SD reflect real differences in the variability of outcomes in the study populations.

For the standardized mean difference approach, the SDs are used to standardize the mean differences to a single scale, as well as in the computation of study weights. Thus, studies with small SDs lead to relatively higher estimates of SMD, whilst studies with larger SDs lead to relatively smaller estimates of SMD. For this to be appropriate, it must be assumed that between-study variation in SDs reflects only differences in measurement scales and not differences in the reliability of outcome measures or variability among study populations, as discussed in Chapter 6, Section 6.5.1.2 .

These assumptions of the methods should be borne in mind when unexpected variation of SDs is observed across studies.

10.5.2 Meta-analysis of change scores

In some circumstances an analysis based on changes from baseline will be more efficient and powerful than comparison of post-intervention values, as it removes a component of between-person variability from the analysis. However, calculation of a change score requires measurement of the outcome twice and in practice may be less efficient for outcomes that are unstable or difficult to measure precisely, where the measurement error may be larger than true between-person baseline variability. Change-from-baseline outcomes may also be preferred if they have a less skewed distribution than post-intervention measurement outcomes. Although sometimes used as a device to ‘correct’ for unlucky randomization, this practice is not recommended.

The preferred statistical approach to accounting for baseline measurements of the outcome variable is to include the baseline outcome measurements as a covariate in a regression model or analysis of covariance (ANCOVA). These analyses produce an ‘adjusted’ estimate of the intervention effect together with its standard error. These analyses are the least frequently encountered, but as they give the most precise and least biased estimates of intervention effects they should be included in the analysis when they are available. However, they can only be included in a meta-analysis using the generic inverse-variance method, since means and SDs are not available for each intervention group separately.

In practice an author is likely to discover that the studies included in a review include a mixture of change-from-baseline and post-intervention value scores. However, mixing of outcomes is not a problem when it comes to meta-analysis of MDs. There is no statistical reason why studies with change-from-baseline outcomes should not be combined in a meta-analysis with studies with post-intervention measurement outcomes when using the (unstandardized) MD method. In a randomized study, MD based on changes from baseline can usually be assumed to be addressing exactly the same underlying intervention effects as analyses based on post-intervention measurements. That is to say, the difference in mean post-intervention values will on average be the same as the difference in mean change scores. If the use of change scores does increase precision, appropriately, the studies presenting change scores will be given higher weights in the analysis than they would have received if post-intervention values had been used, as they will have smaller SDs.

When combining the data on the MD scale, authors must be careful to use the appropriate means and SDs (either of post-intervention measurements or of changes from baseline) for each study. Since the mean values and SDs for the two types of outcome may differ substantially, it may be advisable to place them in separate subgroups to avoid confusion for the reader, but the results of the subgroups can legitimately be pooled together.

In contrast, post-intervention value and change scores should not in principle be combined using standard meta-analysis approaches when the effect measure is an SMD. This is because the SDs used in the standardization reflect different things. The SD when standardizing post-intervention values reflects between-person variability at a single point in time. The SD when standardizing change scores reflects variation in between-person changes over time, so will depend on both within-person and between-person variability; within-person variability in turn is likely to depend on the length of time between measurements. Nevertheless, an empirical study of 21 meta-analyses in osteoarthritis did not find a difference between combined SMDs based on post-intervention values and combined SMDs based on change scores (da Costa et al 2013). One option is to standardize SMDs using post-intervention SDs rather than change score SDs. This would lead to valid synthesis of the two approaches, but we are not aware that an appropriate standard error for this has been derived.

A common practical problem associated with including change-from-baseline measures is that the SD of changes is not reported. Imputation of SDs is discussed in Chapter 6, Section 6.5.2.8 .

10.5.3 Meta-analysis of skewed data

Analyses based on means are appropriate for data that are at least approximately normally distributed, and for data from very large trials. If the true distribution of outcomes is asymmetrical, then the data are said to be skewed. Review authors should consider the possibility and implications of skewed data when analysing continuous outcomes (see MECIR Box 10.5.a ). Skew can sometimes be diagnosed from the means and SDs of the outcomes. A rough check is available, but it is only valid if a lowest or highest possible value for an outcome is known to exist. Thus, the check may be used for outcomes such as weight, volume and blood concentrations, which have lowest possible values of 0, or for scale outcomes with minimum or maximum scores, but it may not be appropriate for change-from-baseline measures. The check involves calculating the observed mean minus the lowest possible value (or the highest possible value minus the observed mean), and dividing this by the SD. A ratio less than 2 suggests skew (Altman and Bland 1996). If the ratio is less than 1, there is strong evidence of a skewed distribution.

Transformation of the original outcome data may reduce skew substantially. Reports of trials may present results on a transformed scale, usually a log scale. Collection of appropriate data summaries from the trialists, or acquisition of individual patient data, is currently the approach of choice. Appropriate data summaries and analysis strategies for the individual patient data will depend on the situation. Consultation with a knowledgeable statistician is advised.

Where data have been analysed on a log scale, results are commonly presented as geometric means and ratios of geometric means. A meta-analysis may be then performed on the scale of the log-transformed data; an example of the calculation of the required means and SD is given in Chapter 6, Section 6.5.2.4 . This approach depends on being able to obtain transformed data for all studies; methods for transforming from one scale to the other are available (Higgins et al 2008b). Log-transformed and untransformed data should not be mixed in a meta-analysis.

MECIR Box 10.5.a Relevant expectations for conduct of intervention reviews

Addressing skewed data ( )

Skewed data are sometimes not summarized usefully by means and standard deviations. While statistical methods are approximately valid for large sample sizes, skewed outcome data can lead to misleading results when studies are small.

10.6 Combining dichotomous and continuous outcomes

Occasionally authors encounter a situation where data for the same outcome are presented in some studies as dichotomous data and in other studies as continuous data. For example, scores on depression scales can be reported as means, or as the percentage of patients who were depressed at some point after an intervention (i.e. with a score above a specified cut-point). This type of information is often easier to understand, and more helpful, when it is dichotomized. However, deciding on a cut-point may be arbitrary, and information is lost when continuous data are transformed to dichotomous data.

There are several options for handling combinations of dichotomous and continuous data. Generally, it is useful to summarize results from all the relevant, valid studies in a similar way, but this is not always possible. It may be possible to collect missing data from investigators so that this can be done. If not, it may be useful to summarize the data in three ways: by entering the means and SDs as continuous outcomes, by entering the counts as dichotomous outcomes and by entering all of the data in text form as ‘Other data’ outcomes.

There are statistical approaches available that will re-express odds ratios as SMDs (and vice versa), allowing dichotomous and continuous data to be combined (Anzures-Cabrera et al 2011). A simple approach is as follows. Based on an assumption that the underlying continuous measurements in each intervention group follow a logistic distribution (which is a symmetrical distribution similar in shape to the normal distribution, but with more data in the distributional tails), and that the variability of the outcomes is the same in both experimental and comparator participants, the odds ratios can be re-expressed as a SMD according to the following simple formula (Chinn 2000):

how to make meta analysis research

The standard error of the log odds ratio can be converted to the standard error of a SMD by multiplying by the same constant (√3/π=0.5513). Alternatively SMDs can be re-expressed as log odds ratios by multiplying by π/√3=1.814. Once SMDs (or log odds ratios) and their standard errors have been computed for all studies in the meta-analysis, they can be combined using the generic inverse-variance method. Standard errors can be computed for all studies by entering the data as dichotomous and continuous outcome type data, as appropriate, and converting the confidence intervals for the resulting log odds ratios and SMDs into standard errors (see Chapter 6, Section 6.3 ).

10.7 Meta-analysis of ordinal outcomes and measurement scale s

Ordinal and measurement scale outcomes are most commonly meta-analysed as dichotomous data (if so, see Section 10.4 ) or continuous data (if so, see Section 10.5 ) depending on the way that the study authors performed the original analyses.

Occasionally it is possible to analyse the data using proportional odds models. This is the case when ordinal scales have a small number of categories, the numbers falling into each category for each intervention group can be obtained, and the same ordinal scale has been used in all studies. This approach may make more efficient use of all available data than dichotomization, but requires access to statistical software and results in a summary statistic for which it is challenging to find a clinical meaning.

The proportional odds model uses the proportional odds ratio as the measure of intervention effect (Agresti 1996) (see Chapter 6, Section 6.6 ), and can be used for conducting a meta-analysis in advanced statistical software packages (Whitehead and Jones 1994). Estimates of log odds ratios and their standard errors from a proportional odds model may be meta-analysed using the generic inverse-variance method (see Section 10.3.3 ). If the same ordinal scale has been used in all studies, but in some reports has been presented as a dichotomous outcome, it may still be possible to include all studies in the meta-analysis. In the context of the three-category model, this might mean that for some studies category 1 constitutes a success, while for others both categories 1 and 2 constitute a success. Methods are available for dealing with this, and for combining data from scales that are related but have different definitions for their categories (Whitehead and Jones 1994).

10.8 Meta-analysis of counts and rates

Results may be expressed as count data when each participant may experience an event, and may experience it more than once (see Chapter 6, Section 6.7 ). For example, ‘number of strokes’, or ‘number of hospital visits’ are counts. These events may not happen at all, but if they do happen there is no theoretical maximum number of occurrences for an individual. Count data may be analysed using methods for dichotomous data if the counts are dichotomized for each individual (see Section 10.4 ), continuous data (see Section 10.5 ) and time-to-event data (see Section 10.9 ), as well as being analysed as rate data.

Rate data occur if counts are measured for each participant along with the time over which they are observed. This is particularly appropriate when the events being counted are rare. For example, a woman may experience two strokes during a follow-up period of two years. Her rate of strokes is one per year of follow-up (or, equivalently 0.083 per month of follow-up). Rates are conventionally summarized at the group level. For example, participants in the comparator group of a clinical trial may experience 85 strokes during a total of 2836 person-years of follow-up. An underlying assumption associated with the use of rates is that the risk of an event is constant across participants and over time. This assumption should be carefully considered for each situation. For example, in contraception studies, rates have been used (known as Pearl indices) to describe the number of pregnancies per 100 women-years of follow-up. This is now considered inappropriate since couples have different risks of conception, and the risk for each woman changes over time. Pregnancies are now analysed more often using life tables or time-to-event methods that investigate the time elapsing before the first pregnancy.

Analysing count data as rates is not always the most appropriate approach and is uncommon in practice. This is because:

  • the assumption of a constant underlying risk may not be suitable; and
  • the statistical methods are not as well developed as they are for other types of data.

The results of a study may be expressed as a rate ratio , that is the ratio of the rate in the experimental intervention group to the rate in the comparator group. The (natural) logarithms of the rate ratios may be combined across studies using the generic inverse-variance method (see Section 10.3.3 ). Alternatively, Poisson regression approaches can be used (Spittal et al 2015).

In a randomized trial, rate ratios may often be very similar to risk ratios obtained after dichotomizing the participants, since the average period of follow-up should be similar in all intervention groups. Rate ratios and risk ratios will differ, however, if an intervention affects the likelihood of some participants experiencing multiple events.

It is possible also to focus attention on the rate difference (see Chapter 6, Section 6.7.1 ). The analysis again can be performed using the generic inverse-variance method (Hasselblad and McCrory 1995, Guevara et al 2004).

10.9 Meta-analysis of time-to-event outcomes

Two approaches to meta-analysis of time-to-event outcomes are readily available to Cochrane Review authors. The choice of which to use will depend on the type of data that have been extracted from the primary studies, or obtained from re-analysis of individual participant data.

If ‘O – E’ and ‘V’ statistics have been obtained (see Chapter 6, Section 6.8.2 ), either through re-analysis of individual participant data or from aggregate statistics presented in the study reports, then these statistics may be entered directly into RevMan using the ‘O – E and Variance’ outcome type. There are several ways to calculate these ‘O – E’ and ‘V’ statistics. Peto’s method applied to dichotomous data (Section 10.4.2 ) gives rise to an odds ratio; a log-rank approach gives rise to a hazard ratio; and a variation of the Peto method for analysing time-to-event data gives rise to something in between (Simmonds et al 2011). The appropriate effect measure should be specified. Only fixed-effect meta-analysis methods are available in RevMan for ‘O – E and Variance’ outcomes.

Alternatively, if estimates of log hazard ratios and standard errors have been obtained from results of Cox proportional hazards regression models, study results can be combined using generic inverse-variance methods (see Section 10.3.3 ).

If a mixture of log-rank and Cox model estimates are obtained from the studies, all results can be combined using the generic inverse-variance method, as the log-rank estimates can be converted into log hazard ratios and standard errors using the approaches discussed in Chapter 6, Section 6.8 .

10.10 Heterogeneity

10.10.1 what is heterogeneity.

Inevitably, studies brought together in a systematic review will differ. Any kind of variability among studies in a systematic review may be termed heterogeneity. It can be helpful to distinguish between different types of heterogeneity. Variability in the participants, interventions and outcomes studied may be described as clinical diversity (sometimes called clinical heterogeneity), and variability in study design, outcome measurement tools and risk of bias may be described as methodological diversity (sometimes called methodological heterogeneity). Variability in the intervention effects being evaluated in the different studies is known as statistical heterogeneity , and is a consequence of clinical or methodological diversity, or both, among the studies. Statistical heterogeneity manifests itself in the observed intervention effects being more different from each other than one would expect due to random error (chance) alone. We will follow convention and refer to statistical heterogeneity simply as heterogeneity .

Clinical variation will lead to heterogeneity if the intervention effect is affected by the factors that vary across studies; most obviously, the specific interventions or patient characteristics. In other words, the true intervention effect will be different in different studies.

Differences between studies in terms of methodological factors, such as use of blinding and concealment of allocation sequence, or if there are differences between studies in the way the outcomes are defined and measured, may be expected to lead to differences in the observed intervention effects. Significant statistical heterogeneity arising from methodological diversity or differences in outcome assessments suggests that the studies are not all estimating the same quantity, but does not necessarily suggest that the true intervention effect varies. In particular, heterogeneity associated solely with methodological diversity would indicate that the studies suffer from different degrees of bias. Empirical evidence suggests that some aspects of design can affect the result of clinical trials, although this is not always the case. Further discussion appears in Chapter 7 and Chapter 8 .

The scope of a review will largely determine the extent to which studies included in a review are diverse. Sometimes a review will include studies addressing a variety of questions, for example when several different interventions for the same condition are of interest (see also Chapter 11 ) or when the differential effects of an intervention in different populations are of interest. Meta-analysis should only be considered when a group of studies is sufficiently homogeneous in terms of participants, interventions and outcomes to provide a meaningful summary (see MECIR Box 10.10.a. ). It is often appropriate to take a broader perspective in a meta-analysis than in a single clinical trial. A common analogy is that systematic reviews bring together apples and oranges, and that combining these can yield a meaningless result. This is true if apples and oranges are of intrinsic interest on their own, but may not be if they are used to contribute to a wider question about fruit. For example, a meta-analysis may reasonably evaluate the average effect of a class of drugs by combining results from trials where each evaluates the effect of a different drug from the class.

MECIR Box 10.10.a Relevant expectations for conduct of intervention reviews

( )

Meta-analyses of very diverse studies can be misleading, for example where studies use different forms of control. Clinical diversity does not indicate necessarily that a meta-analysis should not be performed. However, authors must be clear about the underlying question that all studies are addressing.

There may be specific interest in a review in investigating how clinical and methodological aspects of studies relate to their results. Where possible these investigations should be specified a priori (i.e. in the protocol for the systematic review). It is legitimate for a systematic review to focus on examining the relationship between some clinical characteristic(s) of the studies and the size of intervention effect, rather than on obtaining a summary effect estimate across a series of studies (see Section 10.11 ). Meta-regression may best be used for this purpose, although it is not implemented in RevMan (see Section 10.11.4 ).

10.10.2 Identifying and measuring heterogeneity

It is essential to consider the extent to which the results of studies are consistent with each other (see MECIR Box 10.10.b ). If confidence intervals for the results of individual studies (generally depicted graphically using horizontal lines) have poor overlap, this generally indicates the presence of statistical heterogeneity. More formally, a statistical test for heterogeneity is available. This Chi 2 (χ 2 , or chi-squared) test is included in the forest plots in Cochrane Reviews. It assesses whether observed differences in results are compatible with chance alone. A low P value (or a large Chi 2 statistic relative to its degree of freedom) provides evidence of heterogeneity of intervention effects (variation in effect estimates beyond chance).

MECIR Box 10.10.b Relevant expectations for conduct of intervention reviews

Assessing statistical heterogeneity ( )

The presence of heterogeneity affects the extent to which generalizable conclusions can be formed. It is important to identify heterogeneity in case there is sufficient information to explain it and offer new insights. Authors should recognize that there is much uncertainty in measures such as and Tau when there are few studies. Thus, use of simple thresholds to diagnose heterogeneity should be avoided.

Care must be taken in the interpretation of the Chi 2 test, since it has low power in the (common) situation of a meta-analysis when studies have small sample size or are few in number. This means that while a statistically significant result may indicate a problem with heterogeneity, a non-significant result must not be taken as evidence of no heterogeneity. This is also why a P value of 0.10, rather than the conventional level of 0.05, is sometimes used to determine statistical significance. A further problem with the test, which seldom occurs in Cochrane Reviews, is that when there are many studies in a meta-analysis, the test has high power to detect a small amount of heterogeneity that may be clinically unimportant.

Some argue that, since clinical and methodological diversity always occur in a meta-analysis, statistical heterogeneity is inevitable (Higgins et al 2003). Thus, the test for heterogeneity is irrelevant to the choice of analysis; heterogeneity will always exist whether or not we happen to be able to detect it using a statistical test. Methods have been developed for quantifying inconsistency across studies that move the focus away from testing whether heterogeneity is present to assessing its impact on the meta-analysis. A useful statistic for quantifying inconsistency is:

how to make meta analysis research

In this equation, Q is the Chi 2 statistic and df is its degrees of freedom (Higgins and Thompson 2002, Higgins et al 2003). I 2 describes the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error (chance).

Thresholds for the interpretation of the I 2 statistic can be misleading, since the importance of inconsistency depends on several factors. A rough guide to interpretation in the context of meta-analyses of randomized trials is as follows:

  • 0% to 40%: might not be important;
  • 30% to 60%: may represent moderate heterogeneity*;
  • 50% to 90%: may represent substantial heterogeneity*;
  • 75% to 100%: considerable heterogeneity*.

*The importance of the observed value of I 2 depends on (1) magnitude and direction of effects, and (2) strength of evidence for heterogeneity (e.g. P value from the Chi 2 test, or a confidence interval for I 2 : uncertainty in the value of I 2 is substantial when the number of studies is small).

10.10.3 Strategies for addressing heterogeneity

Review authors must take into account any statistical heterogeneity when interpreting results, particularly when there is variation in the direction of effect (see MECIR Box 10.10.c ). A number of options are available if heterogeneity is identified among a group of studies that would otherwise be considered suitable for a meta-analysis.

MECIR Box 10.10.c  Relevant expectations for conduct of intervention reviews

Considering statistical heterogeneity when interpreting the results ( )

The presence of heterogeneity affects the extent to which generalizable conclusions can be formed. If a fixed-effect analysis is used, the confidence intervals ignore the extent of heterogeneity. If a random-effects analysis is used, the result pertains to the mean effect across studies. In both cases, the implications of notable heterogeneity should be addressed. It may be possible to understand the reasons for the heterogeneity if there are sufficient studies.

  • Check again that the data are correct. Severe apparent heterogeneity can indicate that data have been incorrectly extracted or entered into meta-analysis software. For example, if standard errors have mistakenly been entered as SDs for continuous outcomes, this could manifest itself in overly narrow confidence intervals with poor overlap and hence substantial heterogeneity. Unit-of-analysis errors may also be causes of heterogeneity (see Chapter 6, Section 6.2 ).  
  • Do not do a meta -analysis. A systematic review need not contain any meta-analyses. If there is considerable variation in results, and particularly if there is inconsistency in the direction of effect, it may be misleading to quote an average value for the intervention effect.  
  • Explore heterogeneity. It is clearly of interest to determine the causes of heterogeneity among results of studies. This process is problematic since there are often many characteristics that vary across studies from which one may choose. Heterogeneity may be explored by conducting subgroup analyses (see Section 10.11.3 ) or meta-regression (see Section 10.11.4 ). Reliable conclusions can only be drawn from analyses that are truly pre-specified before inspecting the studies’ results, and even these conclusions should be interpreted with caution. Explorations of heterogeneity that are devised after heterogeneity is identified can at best lead to the generation of hypotheses. They should be interpreted with even more caution and should generally not be listed among the conclusions of a review. Also, investigations of heterogeneity when there are very few studies are of questionable value.  
  • Ignore heterogeneity. Fixed-effect meta-analyses ignore heterogeneity. The summary effect estimate from a fixed-effect meta-analysis is normally interpreted as being the best estimate of the intervention effect. However, the existence of heterogeneity suggests that there may not be a single intervention effect but a variety of intervention effects. Thus, the summary fixed-effect estimate may be an intervention effect that does not actually exist in any population, and therefore have a confidence interval that is meaningless as well as being too narrow (see Section 10.10.4 ).  
  • Perform a random-effects meta-analysis. A random-effects meta-analysis may be used to incorporate heterogeneity among studies. This is not a substitute for a thorough investigation of heterogeneity. It is intended primarily for heterogeneity that cannot be explained. An extended discussion of this option appears in Section 10.10.4 .  
  • Reconsider the effect measure. Heterogeneity may be an artificial consequence of an inappropriate choice of effect measure. For example, when studies collect continuous outcome data using different scales or different units, extreme heterogeneity may be apparent when using the mean difference but not when the more appropriate standardized mean difference is used. Furthermore, choice of effect measure for dichotomous outcomes (odds ratio, risk ratio, or risk difference) may affect the degree of heterogeneity among results. In particular, when comparator group risks vary, homogeneous odds ratios or risk ratios will necessarily lead to heterogeneous risk differences, and vice versa. However, it remains unclear whether homogeneity of intervention effect in a particular meta-analysis is a suitable criterion for choosing between these measures (see also Section 10.4.3 ).  
  • Exclude studies. Heterogeneity may be due to the presence of one or two outlying studies with results that conflict with the rest of the studies. In general it is unwise to exclude studies from a meta-analysis on the basis of their results as this may introduce bias. However, if an obvious reason for the outlying result is apparent, the study might be removed with more confidence. Since usually at least one characteristic can be found for any study in any meta-analysis which makes it different from the others, this criterion is unreliable because it is all too easy to fulfil. It is advisable to perform analyses both with and without outlying studies as part of a sensitivity analysis (see Section 10.14 ). Whenever possible, potential sources of clinical diversity that might lead to such situations should be specified in the protocol.

10.10.4 Incorporating heterogeneity into random-effects models

The random-effects meta-analysis approach incorporates an assumption that the different studies are estimating different, yet related, intervention effects (DerSimonian and Laird 1986, Borenstein et al 2010). The approach allows us to address heterogeneity that cannot readily be explained by other factors. A random-effects meta-analysis model involves an assumption that the effects being estimated in the different studies follow some distribution. The model represents our lack of knowledge about why real, or apparent, intervention effects differ, by considering the differences as if they were random. The centre of the assumed distribution describes the average of the effects, while its width describes the degree of heterogeneity. The conventional choice of distribution is a normal distribution. It is difficult to establish the validity of any particular distributional assumption, and this is a common criticism of random-effects meta-analyses. The importance of the assumed shape for this distribution has not been widely studied.

To undertake a random-effects meta-analysis, the standard errors of the study-specific estimates (SE i in Section 10.3.1 ) are adjusted to incorporate a measure of the extent of variation, or heterogeneity, among the intervention effects observed in different studies (this variation is often referred to as Tau-squared, τ 2 , or Tau 2 ). The amount of variation, and hence the adjustment, can be estimated from the intervention effects and standard errors of the studies included in the meta-analysis.

In a heterogeneous set of studies, a random-effects meta-analysis will award relatively more weight to smaller studies than such studies would receive in a fixed-effect meta-analysis. This is because small studies are more informative for learning about the distribution of effects across studies than for learning about an assumed common intervention effect.

Note that a random-effects model does not ‘take account’ of the heterogeneity, in the sense that it is no longer an issue. It is always preferable to explore possible causes of heterogeneity, although there may be too few studies to do this adequately (see Section 10.11 ).

10.10.4.1 Fixed or random effects?

A fixed-effect meta-analysis provides a result that may be viewed as a ‘typical intervention effect’ from the studies included in the analysis. In order to calculate a confidence interval for a fixed-effect meta-analysis the assumption is usually made that the true effect of intervention (in both magnitude and direction) is the same value in every study (i.e. fixed across studies). This assumption implies that the observed differences among study results are due solely to the play of chance (i.e. that there is no statistical heterogeneity).

A random-effects model provides a result that may be viewed as an ‘average intervention effect’, where this average is explicitly defined according to an assumed distribution of effects across studies. Instead of assuming that the intervention effects are the same, we assume that they follow (usually) a normal distribution. The assumption implies that the observed differences among study results are due to a combination of the play of chance and some genuine variation in the intervention effects.

The random-effects method and the fixed-effect method will give identical results when there is no heterogeneity among the studies.

When heterogeneity is present, a confidence interval around the random-effects summary estimate is wider than a confidence interval around a fixed-effect summary estimate. This will happen whenever the I 2 statistic is greater than zero, even if the heterogeneity is not detected by the Chi 2 test for heterogeneity (see Section 10.10.2 ).

Sometimes the central estimate of the intervention effect is different between fixed-effect and random-effects analyses. In particular, if results of smaller studies are systematically different from results of larger ones, which can happen as a result of publication bias or within-study bias in smaller studies (Egger et al 1997, Poole and Greenland 1999, Kjaergard et al 2001), then a random-effects meta-analysis will exacerbate the effects of the bias (see also Chapter 13, Section 13.3.5.6 ). A fixed-effect analysis will be affected less, although strictly it will also be inappropriate.

The decision between fixed- and random-effects meta-analyses has been the subject of much debate, and we do not provide a universal recommendation. Some considerations in making this choice are as follows:

  • Many have argued that the decision should be based on an expectation of whether the intervention effects are truly identical, preferring the fixed-effect model if this is likely and a random-effects model if this is unlikely (Borenstein et al 2010). Since it is generally considered to be implausible that intervention effects across studies are identical (unless the intervention has no effect at all), this leads many to advocate use of the random-effects model.
  • Others have argued that a fixed-effect analysis can be interpreted in the presence of heterogeneity, and that it makes fewer assumptions than a random-effects meta-analysis. They then refer to it as a ‘fixed-effects’ meta-analysis (Peto et al 1995, Rice et al 2018).
  • Under any interpretation, a fixed-effect meta-analysis ignores heterogeneity. If the method is used, it is therefore important to supplement it with a statistical investigation of the extent of heterogeneity (see Section 10.10.2 ).
  • In the presence of heterogeneity, a random-effects analysis gives relatively more weight to smaller studies and relatively less weight to larger studies. If there is additionally some funnel plot asymmetry (i.e. a relationship between intervention effect magnitude and study size), then this will push the results of the random-effects analysis towards the findings in the smaller studies. In the context of randomized trials, this is generally regarded as an unfortunate consequence of the model.
  • A pragmatic approach is to plan to undertake both a fixed-effect and a random-effects meta-analysis, with an intention to present the random-effects result if there is no indication of funnel plot asymmetry. If there is an indication of funnel plot asymmetry, then both methods are problematic. It may be reasonable to present both analyses or neither, or to perform a sensitivity analysis in which small studies are excluded or addressed directly using meta-regression (see Chapter 13, Section 13.3.5.6 ).
  • The choice between a fixed-effect and a random-effects meta-analysis should never be made on the basis of a statistical test for heterogeneity.

10.10.4.2 Interpretation of random-effects meta-analyses

The summary estimate and confidence interval from a random-effects meta-analysis refer to the centre of the distribution of intervention effects, but do not describe the width of the distribution. Often the summary estimate and its confidence interval are quoted in isolation and portrayed as a sufficient summary of the meta-analysis. This is inappropriate. The confidence interval from a random-effects meta-analysis describes uncertainty in the location of the mean of systematically different effects in the different studies. It does not describe the degree of heterogeneity among studies, as may be commonly believed. For example, when there are many studies in a meta-analysis, we may obtain a very tight confidence interval around the random-effects estimate of the mean effect even when there is a large amount of heterogeneity. A solution to this problem is to consider a prediction interval (see Section 10.10.4.3 ).

Methodological diversity creates heterogeneity through biases variably affecting the results of different studies. The random-effects summary estimate will only correctly estimate the average intervention effect if the biases are symmetrically distributed, leading to a mixture of over-estimates and under-estimates of effect, which is unlikely to be the case. In practice it can be very difficult to distinguish whether heterogeneity results from clinical or methodological diversity, and in most cases it is likely to be due to both, so these distinctions are hard to draw in the interpretation.

When there is little information, either because there are few studies or if the studies are small with few events, a random-effects analysis will provide poor estimates of the amount of heterogeneity (i.e. of the width of the distribution of intervention effects). Fixed-effect methods such as the Mantel-Haenszel method will provide more robust estimates of the average intervention effect, but at the cost of ignoring any heterogeneity.

10.10.4.3 Prediction intervals from a random-effects meta-analysis

An estimate of the between-study variance in a random-effects meta-analysis is typically presented as part of its results. The square root of this number (i.e. Tau) is the estimated standard deviation of underlying effects across studies. Prediction intervals are a way of expressing this value in an interpretable way.

To motivate the idea of a prediction interval, note that for absolute measures of effect (e.g. risk difference, mean difference, standardized mean difference), an approximate 95% range of normally distributed underlying effects can be obtained by creating an interval from 1.96´Tau below the random-effects mean, to 1.96✕Tau above it. (For relative measures such as the odds ratio and risk ratio, an equivalent interval needs to be based on the natural logarithm of the summary estimate.) In reality, both the summary estimate and the value of Tau are associated with uncertainty. A prediction interval seeks to present the range of effects in a way that acknowledges this uncertainty (Higgins et al 2009). A simple 95% prediction interval can be calculated as:

how to make meta analysis research

where M is the summary mean from the random-effects meta-analysis, t k −2 is the 95% percentile of a t -distribution with k –2 degrees of freedom, k is the number of studies, Tau 2 is the estimated amount of heterogeneity and SE( M ) is the standard error of the summary mean.

The term ‘prediction interval’ relates to the use of this interval to predict the possible underlying effect in a new study that is similar to the studies in the meta-analysis. A more useful interpretation of the interval is as a summary of the spread of underlying effects in the studies included in the random-effects meta-analysis.

Prediction intervals have proved a popular way of expressing the amount of heterogeneity in a meta-analysis (Riley et al 2011). They are, however, strongly based on the assumption of a normal distribution for the effects across studies, and can be very problematic when the number of studies is small, in which case they can appear spuriously wide or spuriously narrow. Nevertheless, we encourage their use when the number of studies is reasonable (e.g. more than ten) and there is no clear funnel plot asymmetry.

10.10.4.4 Implementing random-effects meta-analyses

As introduced in Section 10.3.2 , the random-effects model can be implemented using an inverse-variance approach, incorporating a measure of the extent of heterogeneity into the study weights. RevMan implements a version of random-effects meta-analysis that is described by DerSimonian and Laird, making use of a ‘moment-based’ estimate of the between-study variance (DerSimonian and Laird 1986). The attraction of this method is that the calculations are straightforward, but it has a theoretical disadvantage in that the confidence intervals are slightly too narrow to encompass full uncertainty resulting from having estimated the degree of heterogeneity.

For many years, RevMan has implemented two random-effects methods for dichotomous data: a Mantel-Haenszel method and an inverse-variance method. Both use the moment-based approach to estimating the amount of between-studies variation. The difference between the two is subtle: the former estimates the between-study variation by comparing each study’s result with a Mantel-Haenszel fixed-effect meta-analysis result, whereas the latter estimates it by comparing each study’s result with an inverse-variance fixed-effect meta-analysis result. In practice, the difference is likely to be trivial.

There are alternative methods for performing random-effects meta-analyses that have better technical properties than the DerSimonian and Laird approach with a moment-based estimate (Veroniki et al 2016). Most notable among these is an adjustment to the confidence interval proposed by Hartung and Knapp and by Sidik and Jonkman (Hartung and Knapp 2001, Sidik and Jonkman 2002). This adjustment widens the confidence interval to reflect uncertainty in the estimation of between-study heterogeneity, and it should be used if available to review authors. An alternative option to encompass full uncertainty in the degree of heterogeneity is to take a Bayesian approach (see Section 10.13 ).

An empirical comparison of different ways to estimate between-study variation in Cochrane meta-analyses has shown that they can lead to substantial differences in estimates of heterogeneity, but seldom have major implications for estimating summary effects (Langan et al 2015). Several simulation studies have concluded that an approach proposed by Paule and Mandel should be recommended (Langan et al 2017); whereas a comprehensive recent simulation study recommended a restricted maximum likelihood approach, although noted that no single approach is universally preferable (Langan et al 2019). Review authors are encouraged to select one of these options if it is available to them.

10.11 Investigating heterogeneity

10.11.1 interaction and effect modification.

Does the intervention effect vary with different populations or intervention characteristics (such as dose or duration)? Such variation is known as interaction by statisticians and as effect modification by epidemiologists. Methods to search for such interactions include subgroup analyses and meta-regression. All methods have considerable pitfalls.

10.11.2 What are subgroup analyses?

Subgroup analyses involve splitting all the participant data into subgroups, often in order to make comparisons between them. Subgroup analyses may be done for subsets of participants (such as males and females), or for subsets of studies (such as different geographical locations). Subgroup analyses may be done as a means of investigating heterogeneous results, or to answer specific questions about particular patient groups, types of intervention or types of study.

Subgroup analyses of subsets of participants within studies are uncommon in systematic reviews based on published literature because sufficient details to extract data about separate participant types are seldom published in reports. By contrast, such subsets of participants are easily analysed when individual participant data have been collected (see Chapter 26 ). The methods we describe in the remainder of this chapter are for subgroups of studies.

Findings from multiple subgroup analyses may be misleading. Subgroup analyses are observational by nature and are not based on randomized comparisons. False negative and false positive significance tests increase in likelihood rapidly as more subgroup analyses are performed. If their findings are presented as definitive conclusions there is clearly a risk of people being denied an effective intervention or treated with an ineffective (or even harmful) intervention. Subgroup analyses can also generate misleading recommendations about directions for future research that, if followed, would waste scarce resources.

It is useful to distinguish between the notions of ‘qualitative interaction’ and ‘quantitative interaction’ (Yusuf et al 1991). Qualitative interaction exists if the direction of effect is reversed, that is if an intervention is beneficial in one subgroup but is harmful in another. Qualitative interaction is rare. This may be used as an argument that the most appropriate result of a meta-analysis is the overall effect across all subgroups. Quantitative interaction exists when the size of the effect varies but not the direction, that is if an intervention is beneficial to different degrees in different subgroups.

10.11.3 Undertaking subgroup analyses

Meta-analyses can be undertaken in RevMan both within subgroups of studies as well as across all studies irrespective of their subgroup membership. It is tempting to compare effect estimates in different subgroups by considering the meta-analysis results from each subgroup separately. This should only be done informally by comparing the magnitudes of effect. Noting that either the effect or the test for heterogeneity in one subgroup is statistically significant whilst that in the other subgroup is not statistically significant does not indicate that the subgroup factor explains heterogeneity. Since different subgroups are likely to contain different amounts of information and thus have different abilities to detect effects, it is extremely misleading simply to compare the statistical significance of the results.

10.11.3.1 Is the effect different in different subgroups?

Valid investigations of whether an intervention works differently in different subgroups involve comparing the subgroups with each other. It is a mistake to compare within-subgroup inferences such as P values. If one subgroup analysis is statistically significant and another is not, then the latter may simply reflect a lack of information rather than a smaller (or absent) effect. When there are only two subgroups, non-overlap of the confidence intervals indicates statistical significance, but note that the confidence intervals can overlap to a small degree and the difference still be statistically significant.

A formal statistical approach should be used to examine differences among subgroups (see MECIR Box 10.11.a ). A simple significance test to investigate differences between two or more subgroups can be performed (Borenstein and Higgins 2013). This procedure consists of undertaking a standard test for heterogeneity across subgroup results rather than across individual study results. When the meta-analysis uses a fixed-effect inverse-variance weighted average approach, the method is exactly equivalent to the test described by Deeks and colleagues (Deeks et al 2001). An I 2 statistic is also computed for subgroup differences. This describes the percentage of the variability in effect estimates from the different subgroups that is due to genuine subgroup differences rather than sampling error (chance). Note that these methods for examining subgroup differences should be used only when the data in the subgroups are independent (i.e. they should not be used if the same study participants contribute to more than one of the subgroups in the forest plot).

If fixed-effect models are used for the analysis within each subgroup, then these statistics relate to differences in typical effects across different subgroups. If random-effects models are used for the analysis within each subgroup, then the statistics relate to variation in the mean effects in the different subgroups.

An alternative method for testing for differences between subgroups is to use meta-regression techniques, in which case a random-effects model is generally preferred (see Section 10.11.4 ). Tests for subgroup differences based on random-effects models may be regarded as preferable to those based on fixed-effect models, due to the high risk of false-positive results when a fixed-effect model is used to compare subgroups (Higgins and Thompson 2004).

MECIR Box 10.11.a Relevant expectations for conduct of intervention reviews

Comparing subgroups ( )

Concluding that there is a difference in effect in different subgroups on the basis of differences in the level of statistical significance within subgroups can be very misleading.

10.11.4 Meta-regression

If studies are divided into subgroups (see Section 10.11.2 ), this may be viewed as an investigation of how a categorical study characteristic is associated with the intervention effects in the meta-analysis. For example, studies in which allocation sequence concealment was adequate may yield different results from those in which it was inadequate. Here, allocation sequence concealment, being either adequate or inadequate, is a categorical characteristic at the study level. Meta-regression is an extension to subgroup analyses that allows the effect of continuous, as well as categorical, characteristics to be investigated, and in principle allows the effects of multiple factors to be investigated simultaneously (although this is rarely possible due to inadequate numbers of studies) (Thompson and Higgins 2002). Meta-regression should generally not be considered when there are fewer than ten studies in a meta-analysis.

Meta-regressions are similar in essence to simple regressions, in which an outcome variable is predicted according to the values of one or more explanatory variables . In meta-regression, the outcome variable is the effect estimate (for example, a mean difference, a risk difference, a log odds ratio or a log risk ratio). The explanatory variables are characteristics of studies that might influence the size of intervention effect. These are often called ‘potential effect modifiers’ or covariates. Meta-regressions usually differ from simple regressions in two ways. First, larger studies have more influence on the relationship than smaller studies, since studies are weighted by the precision of their respective effect estimate. Second, it is wise to allow for the residual heterogeneity among intervention effects not modelled by the explanatory variables. This gives rise to the term ‘random-effects meta-regression’, since the extra variability is incorporated in the same way as in a random-effects meta-analysis (Thompson and Sharp 1999).

The regression coefficient obtained from a meta-regression analysis will describe how the outcome variable (the intervention effect) changes with a unit increase in the explanatory variable (the potential effect modifier). The statistical significance of the regression coefficient is a test of whether there is a linear relationship between intervention effect and the explanatory variable. If the intervention effect is a ratio measure, the log-transformed value of the intervention effect should always be used in the regression model (see Chapter 6, Section 6.1.2.1 ), and the exponential of the regression coefficient will give an estimate of the relative change in intervention effect with a unit increase in the explanatory variable.

Meta-regression can also be used to investigate differences for categorical explanatory variables as done in subgroup analyses. If there are J subgroups, membership of particular subgroups is indicated by using J minus 1 dummy variables (which can only take values of zero or one) in the meta-regression model (as in standard linear regression modelling). The regression coefficients will estimate how the intervention effect in each subgroup differs from a nominated reference subgroup. The P value of each regression coefficient will indicate the strength of evidence against the null hypothesis that the characteristic is not associated with the intervention effect.

Meta-regression may be performed using the ‘metareg’ macro available for the Stata statistical package, or using the ‘metafor’ package for R, as well as other packages.

10.11.5 Selection of study characteristics for subgroup analyses and meta-regression

Authors need to be cautious about undertaking subgroup analyses, and interpreting any that they do. Some considerations are outlined here for selecting characteristics (also called explanatory variables, potential effect modifiers or covariates) that will be investigated for their possible influence on the size of the intervention effect. These considerations apply similarly to subgroup analyses and to meta-regressions. Further details may be obtained elsewhere (Oxman and Guyatt 1992, Berlin and Antman 1994).

10.11.5.1 Ensure that there are adequate studies to justify subgroup analyses and meta-regressions

It is very unlikely that an investigation of heterogeneity will produce useful findings unless there is a substantial number of studies. Typical advice for undertaking simple regression analyses: that at least ten observations (i.e. ten studies in a meta-analysis) should be available for each characteristic modelled. However, even this will be too few when the covariates are unevenly distributed across studies.

10.11.5.2 Specify characteristics in advance

Authors should, whenever possible, pre-specify characteristics in the protocol that later will be subject to subgroup analyses or meta-regression. The plan specified in the protocol should then be followed (data permitting), without undue emphasis on any particular findings (see MECIR Box 10.11.b ). Pre-specifying characteristics reduces the likelihood of spurious findings, first by limiting the number of subgroups investigated, and second by preventing knowledge of the studies’ results influencing which subgroups are analysed. True pre-specification is difficult in systematic reviews, because the results of some of the relevant studies are often known when the protocol is drafted. If a characteristic was overlooked in the protocol, but is clearly of major importance and justified by external evidence, then authors should not be reluctant to explore it. However, such post-hoc analyses should be identified as such.

MECIR Box 10.11.b Relevant expectations for conduct of intervention reviews

Interpreting subgroup analyses ( )

If subgroup analyses are conducted

Selective reporting, or over-interpretation, of particular subgroups or particular subgroup analyses should be avoided. This is a problem especially when multiple subgroup analyses are performed. This does not preclude the use of sensible and honest post hoc subgroup analyses.

10.11.5.3 Select a small number of characteristics

The likelihood of a false-positive result among subgroup analyses and meta-regression increases with the number of characteristics investigated. It is difficult to suggest a maximum number of characteristics to look at, especially since the number of available studies is unknown in advance. If more than one or two characteristics are investigated it may be sensible to adjust the level of significance to account for making multiple comparisons.

10.11.5.4 Ensure there is scientific rationale for investigating each characteristic

Selection of characteristics should be motivated by biological and clinical hypotheses, ideally supported by evidence from sources other than the included studies. Subgroup analyses using characteristics that are implausible or clinically irrelevant are not likely to be useful and should be avoided. For example, a relationship between intervention effect and year of publication is seldom in itself clinically informative, and if identified runs the risk of initiating a post-hoc data dredge of factors that may have changed over time.

Prognostic factors are those that predict the outcome of a disease or condition, whereas effect modifiers are factors that influence how well an intervention works in affecting the outcome. Confusion between prognostic factors and effect modifiers is common in planning subgroup analyses, especially at the protocol stage. Prognostic factors are not good candidates for subgroup analyses unless they are also believed to modify the effect of intervention. For example, being a smoker may be a strong predictor of mortality within the next ten years, but there may not be reason for it to influence the effect of a drug therapy on mortality (Deeks 1998). Potential effect modifiers may include participant characteristics (age, setting), the precise interventions (dose of active intervention, choice of comparison intervention), how the study was done (length of follow-up) or methodology (design and quality).

10.11.5.5 Be aware that the effect of a characteristic may not always be identified

Many characteristics that might have important effects on how well an intervention works cannot be investigated using subgroup analysis or meta-regression. These are characteristics of participants that might vary substantially within studies, but that can only be summarized at the level of the study. An example is age. Consider a collection of clinical trials involving adults ranging from 18 to 60 years old. There may be a strong relationship between age and intervention effect that is apparent within each study. However, if the mean ages for the trials are similar, then no relationship will be apparent by looking at trial mean ages and trial-level effect estimates. The problem is one of aggregating individuals’ results and is variously known as aggregation bias, ecological bias or the ecological fallacy (Morgenstern 1982, Greenland 1987, Berlin et al 2002). It is even possible for the direction of the relationship across studies be the opposite of the direction of the relationship observed within each study.

10.11.5.6 Think about whether the characteristic is closely related to another characteristic (confounded)

The problem of ‘confounding’ complicates interpretation of subgroup analyses and meta-regressions and can lead to incorrect conclusions. Two characteristics are confounded if their influences on the intervention effect cannot be disentangled. For example, if those studies implementing an intensive version of a therapy happened to be the studies that involved patients with more severe disease, then one cannot tell which aspect is the cause of any difference in effect estimates between these studies and others. In meta-regression, co-linearity between potential effect modifiers leads to similar difficulties (Berlin and Antman 1994). Computing correlations between study characteristics will give some information about which study characteristics may be confounded with each other.

10.11.6 Interpretation of subgroup analyses and meta-regressions

Appropriate interpretation of subgroup analyses and meta-regressions requires caution (Oxman and Guyatt 1992).

  • Subgroup comparisons are observational. It must be remembered that subgroup analyses and meta-regressions are entirely observational in their nature. These analyses investigate differences between studies. Even if individuals are randomized to one group or other within a clinical trial, they are not randomized to go in one trial or another. Hence, subgroup analyses suffer the limitations of any observational investigation, including possible bias through confounding by other study-level characteristics. Furthermore, even a genuine difference between subgroups is not necessarily due to the classification of the subgroups. As an example, a subgroup analysis of bone marrow transplantation for treating leukaemia might show a strong association between the age of a sibling donor and the success of the transplant. However, this probably does not mean that the age of donor is important. In fact, the age of the recipient is probably a key factor and the subgroup finding would simply be due to the strong association between the age of the recipient and the age of their sibling.  
  • Was the analysis pre-specified or post hoc? Authors should state whether subgroup analyses were pre-specified or undertaken after the results of the studies had been compiled (post hoc). More reliance may be placed on a subgroup analysis if it was one of a small number of pre-specified analyses. Performing numerous post-hoc subgroup analyses to explain heterogeneity is a form of data dredging. Data dredging is condemned because it is usually possible to find an apparent, but false, explanation for heterogeneity by considering lots of different characteristics.  
  • Is there indirect evidence in support of the findings? Differences between subgroups should be clinically plausible and supported by other external or indirect evidence, if they are to be convincing.  
  • Is the magnitude of the difference practically important? If the magnitude of a difference between subgroups will not result in different recommendations for different subgroups, then it may be better to present only the overall analysis results.  
  • Is there a statistically significant difference between subgroups? To establish whether there is a different effect of an intervention in different situations, the magnitudes of effects in different subgroups should be compared directly with each other. In particular, statistical significance of the results within separate subgroup analyses should not be compared (see Section 10.11.3.1 ).  
  • Are analyses looking at within-study or between-study relationships? For patient and intervention characteristics, differences in subgroups that are observed within studies are more reliable than analyses of subsets of studies. If such within-study relationships are replicated across studies then this adds confidence to the findings.

10.11.7 Investigating the effect of underlying risk

One potentially important source of heterogeneity among a series of studies is when the underlying average risk of the outcome event varies between the studies. The underlying risk of a particular event may be viewed as an aggregate measure of case-mix factors such as age or disease severity. It is generally measured as the observed risk of the event in the comparator group of each study (the comparator group risk, or CGR). The notion is controversial in its relevance to clinical practice since underlying risk represents a summary of both known and unknown risk factors. Problems also arise because comparator group risk will depend on the length of follow-up, which often varies across studies. However, underlying risk has received particular attention in meta-analysis because the information is readily available once dichotomous data have been prepared for use in meta-analyses. Sharp provides a full discussion of the topic (Sharp 2001).

Intuition would suggest that participants are more or less likely to benefit from an effective intervention according to their risk status. However, the relationship between underlying risk and intervention effect is a complicated issue. For example, suppose an intervention is equally beneficial in the sense that for all patients it reduces the risk of an event, say a stroke, to 80% of the underlying risk. Then it is not equally beneficial in terms of absolute differences in risk in the sense that it reduces a 50% stroke rate by 10 percentage points to 40% (number needed to treat=10), but a 20% stroke rate by 4 percentage points to 16% (number needed to treat=25).

Use of different summary statistics (risk ratio, odds ratio and risk difference) will demonstrate different relationships with underlying risk. Summary statistics that show close to no relationship with underlying risk are generally preferred for use in meta-analysis (see Section 10.4.3 ).

Investigating any relationship between effect estimates and the comparator group risk is also complicated by a technical phenomenon known as regression to the mean. This arises because the comparator group risk forms an integral part of the effect estimate. A high risk in a comparator group, observed entirely by chance, will on average give rise to a higher than expected effect estimate, and vice versa. This phenomenon results in a false correlation between effect estimates and comparator group risks. There are methods, which require sophisticated software, that correct for regression to the mean (McIntosh 1996, Thompson et al 1997). These should be used for such analyses, and statistical expertise is recommended.

10.11.8 Dose-response analyses

The principles of meta-regression can be applied to the relationships between intervention effect and dose (commonly termed dose-response), treatment intensity or treatment duration (Greenland and Longnecker 1992, Berlin et al 1993). Conclusions about differences in effect due to differences in dose (or similar factors) are on stronger ground if participants are randomized to one dose or another within a study and a consistent relationship is found across similar studies. While authors should consider these effects, particularly as a possible explanation for heterogeneity, they should be cautious about drawing conclusions based on between-study differences. Authors should be particularly cautious about claiming that a dose-response relationship does not exist, given the low power of many meta-regression analyses to detect genuine relationships.

10.12 Missing data

10.12.1 types of missing data.

There are many potential sources of missing data in a systematic review or meta-analysis (see Table 10.12.a ). For example, a whole study may be missing from the review, an outcome may be missing from a study, summary data may be missing for an outcome, and individual participants may be missing from the summary data. Here we discuss a variety of potential sources of missing data, highlighting where more detailed discussions are available elsewhere in the Handbook .

Whole studies may be missing from a review because they are never published, are published in obscure places, are rarely cited, or are inappropriately indexed in databases. Thus, review authors should always be aware of the possibility that they have failed to identify relevant studies. There is a strong possibility that such studies are missing because of their ‘uninteresting’ or ‘unwelcome’ findings (that is, in the presence of publication bias). This problem is discussed at length in Chapter 13 . Details of comprehensive search methods are provided in Chapter 4 .

Some studies might not report any information on outcomes of interest to the review. For example, there may be no information on quality of life, or on serious adverse effects. It is often difficult to determine whether this is because the outcome was not measured or because the outcome was not reported. Furthermore, failure to report that outcomes were measured may be dependent on the unreported results (selective outcome reporting bias; see Chapter 7, Section 7.2.3.3 ). Similarly, summary data for an outcome, in a form that can be included in a meta-analysis, may be missing. A common example is missing standard deviations (SDs) for continuous outcomes. This is often a problem when change-from-baseline outcomes are sought. We discuss imputation of missing SDs in Chapter 6, Section 6.5.2.8 . Other examples of missing summary data are missing sample sizes (particularly those for each intervention group separately), numbers of events, standard errors, follow-up times for calculating rates, and sufficient details of time-to-event outcomes. Inappropriate analyses of studies, for example of cluster-randomized and crossover trials, can lead to missing summary data. It is sometimes possible to approximate the correct analyses of such studies, for example by imputing correlation coefficients or SDs, as discussed in Chapter 23, Section 23.1 , for cluster-randomized studies and Chapter 23,Section 23.2 , for crossover trials. As a general rule, most methodologists believe that missing summary data (e.g. ‘no usable data’) should not be used as a reason to exclude a study from a systematic review. It is more appropriate to include the study in the review, and to discuss the potential implications of its absence from a meta-analysis.

It is likely that in some, if not all, included studies, there will be individuals missing from the reported results. Review authors are encouraged to consider this problem carefully (see MECIR Box 10.12.a ). We provide further discussion of this problem in Section 10.12.3 ; see also Chapter 8, Section 8.5 .

Missing data can also affect subgroup analyses. If subgroup analyses or meta-regressions are planned (see Section 10.11 ), they require details of the study-level characteristics that distinguish studies from one another. If these are not available for all studies, review authors should consider asking the study authors for more information.

Table 10.12.a Types of missing data in a meta-analysis

Missing studies

Publication bias

Search not sufficiently comprehensive

Missing outcomes

Outcome not measured

Selective reporting bias

Missing summary data

Selective reporting bias

Incomplete reporting

Missing individuals

Lack of intention-to-treat analysis

Attrition from the study

Selective reporting bias

Missing study-level characteristics (for subgroup analysis or meta-regression)

Characteristic not measured

Incomplete reporting

MECIR Box 10.12.a Relevant expectations for conduct of intervention reviews

Addressing missing outcome data ( )

Incomplete outcome data can introduce bias. In most circumstances, authors should follow the principles of intention-to-treat analyses as far as possible (this may not be appropriate for adverse effects or if trying to demonstrate equivalence). Risk of bias due to incomplete outcome data is addressed in the Cochrane risk-of-bias tool. However, statistical analyses and careful interpretation of results are additional ways in which the issue can be addressed by review authors. Imputation methods can be considered (accompanied by, or in the form of, sensitivity analyses).

10.12.2 General principles for dealing with missing data

There is a large literature of statistical methods for dealing with missing data. Here we briefly review some key concepts and make some general recommendations for Cochrane Review authors. It is important to think why data may be missing. Statisticians often use the terms ‘missing at random’ and ‘not missing at random’ to represent different scenarios.

Data are said to be ‘missing at random’ if the fact that they are missing is unrelated to actual values of the missing data. For instance, if some quality-of-life questionnaires were lost in the postal system, this would be unlikely to be related to the quality of life of the trial participants who completed the forms. In some circumstances, statisticians distinguish between data ‘missing at random’ and data ‘missing completely at random’, although in the context of a systematic review the distinction is unlikely to be important. Data that are missing at random may not be important. Analyses based on the available data will often be unbiased, although based on a smaller sample size than the original data set.

Data are said to be ‘not missing at random’ if the fact that they are missing is related to the actual missing data. For instance, in a depression trial, participants who had a relapse of depression might be less likely to attend the final follow-up interview, and more likely to have missing outcome data. Such data are ‘non-ignorable’ in the sense that an analysis of the available data alone will typically be biased. Publication bias and selective reporting bias lead by definition to data that are ‘not missing at random’, and attrition and exclusions of individuals within studies often do as well.

The principal options for dealing with missing data are:

  • analysing only the available data (i.e. ignoring the missing data);
  • imputing the missing data with replacement values, and treating these as if they were observed (e.g. last observation carried forward, imputing an assumed outcome such as assuming all were poor outcomes, imputing the mean, imputing based on predicted values from a regression analysis);
  • imputing the missing data and accounting for the fact that these were imputed with uncertainty (e.g. multiple imputation, simple imputation methods (as point 2) with adjustment to the standard error); and
  • using statistical models to allow for missing data, making assumptions about their relationships with the available data.

Option 2 is practical in most circumstances and very commonly used in systematic reviews. However, it fails to acknowledge uncertainty in the imputed values and results, typically, in confidence intervals that are too narrow. Options 3 and 4 would require involvement of a knowledgeable statistician.

Five general recommendations for dealing with missing data in Cochrane Reviews are as follows:

  • Whenever possible, contact the original investigators to request missing data.
  • Make explicit the assumptions of any methods used to address missing data: for example, that the data are assumed missing at random, or that missing values were assumed to have a particular value such as a poor outcome.
  • Follow the guidance in Chapter 8 to assess risk of bias due to missing outcome data in randomized trials.
  • Perform sensitivity analyses to assess how sensitive results are to reasonable changes in the assumptions that are made (see Section 10.14 ).
  • Address the potential impact of missing data on the findings of the review in the Discussion section.

10.12.3 Dealing with missing outcome data from individual participants

Review authors may undertake sensitivity analyses to assess the potential impact of missing outcome data, based on assumptions about the relationship between missingness in the outcome and its true value. Several methods are available (Akl et al 2015). For dichotomous outcomes, Higgins and colleagues propose a strategy involving different assumptions about how the risk of the event among the missing participants differs from the risk of the event among the observed participants, taking account of uncertainty introduced by the assumptions (Higgins et al 2008a). Akl and colleagues propose a suite of simple imputation methods, including a similar approach to that of Higgins and colleagues based on relative risks of the event in missing versus observed participants. Similar ideas can be applied to continuous outcome data (Ebrahim et al 2013, Ebrahim et al 2014). Particular care is required to avoid double counting events, since it can be unclear whether reported numbers of events in trial reports apply to the full randomized sample or only to those who did not drop out (Akl et al 2016).

Although there is a tradition of implementing ‘worst case’ and ‘best case’ analyses clarifying the extreme boundaries of what is theoretically possible, such analyses may not be informative for the most plausible scenarios (Higgins et al 2008a).

10.13 Bayesian approaches to meta-analysis

Bayesian statistics is an approach to statistics based on a different philosophy from that which underlies significance tests and confidence intervals. It is essentially about updating of evidence. In a Bayesian analysis, initial uncertainty is expressed through a prior distribution about the quantities of interest. Current data and assumptions concerning how they were generated are summarized in the likelihood . The posterior distribution for the quantities of interest can then be obtained by combining the prior distribution and the likelihood. The likelihood summarizes both the data from studies included in the meta-analysis (for example, 2×2 tables from randomized trials) and the meta-analysis model (for example, assuming a fixed effect or random effects). The result of the analysis is usually presented as a point estimate and 95% credible interval from the posterior distribution for each quantity of interest, which look much like classical estimates and confidence intervals. Potential advantages of Bayesian analyses are summarized in Box 10.13.a . Bayesian analysis may be performed using WinBUGS software (Smith et al 1995, Lunn et al 2000), within R (Röver 2017), or – for some applications – using standard meta-regression software with a simple trick (Rhodes et al 2016).

A difference between Bayesian analysis and classical meta-analysis is that the interpretation is directly in terms of belief: a 95% credible interval for an odds ratio is that region in which we believe the odds ratio to lie with probability 95%. This is how many practitioners actually interpret a classical confidence interval, but strictly in the classical framework the 95% refers to the long-term frequency with which 95% intervals contain the true value. The Bayesian framework also allows a review author to calculate the probability that the odds ratio has a particular range of values, which cannot be done in the classical framework. For example, we can determine the probability that the odds ratio is less than 1 (which might indicate a beneficial effect of an experimental intervention), or that it is no larger than 0.8 (which might indicate a clinically important effect). It should be noted that these probabilities are specific to the choice of the prior distribution. Different meta-analysts may analyse the same data using different prior distributions and obtain different results. It is therefore important to carry out sensitivity analyses to investigate how the results depend on any assumptions made.

In the context of a meta-analysis, prior distributions are needed for the particular intervention effect being analysed (such as the odds ratio or the mean difference) and – in the context of a random-effects meta-analysis – on the amount of heterogeneity among intervention effects across studies. Prior distributions may represent subjective belief about the size of the effect, or may be derived from sources of evidence not included in the meta-analysis, such as information from non-randomized studies of the same intervention or from randomized trials of other interventions. The width of the prior distribution reflects the degree of uncertainty about the quantity. When there is little or no information, a ‘non-informative’ prior can be used, in which all values across the possible range are equally likely.

Most Bayesian meta-analyses use non-informative (or very weakly informative) prior distributions to represent beliefs about intervention effects, since many regard it as controversial to combine objective trial data with subjective opinion. However, prior distributions are increasingly used for the extent of among-study variation in a random-effects analysis. This is particularly advantageous when the number of studies in the meta-analysis is small, say fewer than five or ten. Libraries of data-based prior distributions are available that have been derived from re-analyses of many thousands of meta-analyses in the Cochrane Database of Systematic Reviews (Turner et al 2012).

Box 10.13.a Some potential advantages of Bayesian meta-analysis

Some potential advantages of Bayesian approaches over classical methods for meta-analyses are that they:

of various clinical outcome states; ); ); ); and

Statistical expertise is strongly recommended for review authors who wish to carry out Bayesian analyses. There are several good texts (Sutton et al 2000, Sutton and Abrams 2001, Spiegelhalter et al 2004).

10.14 Sensitivity analyses

The process of undertaking a systematic review involves a sequence of decisions. Whilst many of these decisions are clearly objective and non-contentious, some will be somewhat arbitrary or unclear. For instance, if eligibility criteria involve a numerical value, the choice of value is usually arbitrary: for example, defining groups of older people may reasonably have lower limits of 60, 65, 70 or 75 years, or any value in between. Other decisions may be unclear because a study report fails to include the required information. Some decisions are unclear because the included studies themselves never obtained the information required: for example, the outcomes of those who were lost to follow-up. Further decisions are unclear because there is no consensus on the best statistical method to use for a particular problem.

It is highly desirable to prove that the findings from a systematic review are not dependent on such arbitrary or unclear decisions by using sensitivity analysis (see MECIR Box 10.14.a ). A sensitivity analysis is a repeat of the primary analysis or meta-analysis in which alternative decisions or ranges of values are substituted for decisions that were arbitrary or unclear. For example, if the eligibility of some studies in the meta-analysis is dubious because they do not contain full details, sensitivity analysis may involve undertaking the meta-analysis twice: the first time including all studies and, second, including only those that are definitely known to be eligible. A sensitivity analysis asks the question, ‘Are the findings robust to the decisions made in the process of obtaining them?’

MECIR Box 10.14.a Relevant expectations for conduct of intervention reviews

Sensitivity analysis ( )

It is important to be aware when results are robust, since the strength of the conclusion may be strengthened or weakened.

There are many decision nodes within the systematic review process that can generate a need for a sensitivity analysis. Examples include:

Searching for studies:

  • Should abstracts whose results cannot be confirmed in subsequent publications be included in the review?

Eligibility criteria:

  • Characteristics of participants: where a majority but not all people in a study meet an age range, should the study be included?
  • Characteristics of the intervention: what range of doses should be included in the meta-analysis?
  • Characteristics of the comparator: what criteria are required to define usual care to be used as a comparator group?
  • Characteristics of the outcome: what time point or range of time points are eligible for inclusion?
  • Study design: should blinded and unblinded outcome assessment be included, or should study inclusion be restricted by other aspects of methodological criteria?

What data should be analysed?

  • Time-to-event data: what assumptions of the distribution of censored data should be made?
  • Continuous data: where standard deviations are missing, when and how should they be imputed? Should analyses be based on change scores or on post-intervention values?
  • Ordinal scales: what cut-point should be used to dichotomize short ordinal scales into two groups?
  • Cluster-randomized trials: what values of the intraclass correlation coefficient should be used when trial analyses have not been adjusted for clustering?
  • Crossover trials: what values of the within-subject correlation coefficient should be used when this is not available in primary reports?
  • All analyses: what assumptions should be made about missing outcomes? Should adjusted or unadjusted estimates of intervention effects be used?

Analysis methods:

  • Should fixed-effect or random-effects methods be used for the analysis?
  • For dichotomous outcomes, should odds ratios, risk ratios or risk differences be used?
  • For continuous outcomes, where several scales have assessed the same dimension, should results be analysed as a standardized mean difference across all scales or as mean differences individually for each scale?

Some sensitivity analyses can be pre-specified in the study protocol, but many issues suitable for sensitivity analysis are only identified during the review process where the individual peculiarities of the studies under investigation are identified. When sensitivity analyses show that the overall result and conclusions are not affected by the different decisions that could be made during the review process, the results of the review can be regarded with a higher degree of certainty. Where sensitivity analyses identify particular decisions or missing information that greatly influence the findings of the review, greater resources can be deployed to try and resolve uncertainties and obtain extra information, possibly through contacting trial authors and obtaining individual participant data. If this cannot be achieved, the results must be interpreted with an appropriate degree of caution. Such findings may generate proposals for further investigations and future research.

Reporting of sensitivity analyses in a systematic review may best be done by producing a summary table. Rarely is it informative to produce individual forest plots for each sensitivity analysis undertaken.

Sensitivity analyses are sometimes confused with subgroup analysis. Although some sensitivity analyses involve restricting the analysis to a subset of the totality of studies, the two methods differ in two ways. First, sensitivity analyses do not attempt to estimate the effect of the intervention in the group of studies removed from the analysis, whereas in subgroup analyses, estimates are produced for each subgroup. Second, in sensitivity analyses, informal comparisons are made between different ways of estimating the same thing, whereas in subgroup analyses, formal statistical comparisons are made across the subgroups.

10.15 Chapter information

Editors: Jonathan J Deeks, Julian PT Higgins, Douglas G Altman; on behalf of the Cochrane Statistical Methods Group

Contributing authors: Douglas Altman, Deborah Ashby, Jacqueline Birks, Michael Borenstein, Marion Campbell, Jonathan Deeks, Matthias Egger, Julian Higgins, Joseph Lau, Keith O’Rourke, Gerta Rücker, Rob Scholten, Jonathan Sterne, Simon Thompson, Anne Whitehead

Acknowledgements: We are grateful to the following for commenting helpfully on earlier drafts: Bodil Als-Nielsen, Deborah Ashby, Jesse Berlin, Joseph Beyene, Jacqueline Birks, Michael Bracken, Marion Campbell, Chris Cates, Wendong Chen, Mike Clarke, Albert Cobos, Esther Coren, Francois Curtin, Roberto D’Amico, Keith Dear, Heather Dickinson, Diana Elbourne, Simon Gates, Paul Glasziou, Christian Gluud, Peter Herbison, Sally Hollis, David Jones, Steff Lewis, Tianjing Li, Joanne McKenzie, Philippa Middleton, Nathan Pace, Craig Ramsey, Keith O’Rourke, Rob Scholten, Guido Schwarzer, Jack Sinclair, Jonathan Sterne, Simon Thompson, Andy Vail, Clarine van Oel, Paula Williamson and Fred Wolf.

Funding: JJD received support from the National Institute for Health Research (NIHR) Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. JPTH is a member of the NIHR Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol. JPTH received funding from National Institute for Health Research Senior Investigator award NF-SI-0617-10145. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

10.16 References

Agresti A. An Introduction to Categorical Data Analysis . New York (NY): John Wiley & Sons; 1996.

Akl EA, Kahale LA, Agoritsas T, Brignardello-Petersen R, Busse JW, Carrasco-Labra A, Ebrahim S, Johnston BC, Neumann I, Sola I, Sun X, Vandvik P, Zhang Y, Alonso-Coello P, Guyatt G. Handling trial participants with missing outcome data when conducting a meta-analysis: a systematic survey of proposed approaches. Systematic Reviews 2015; 4 : 98.

Akl EA, Kahale LA, Ebrahim S, Alonso-Coello P, Schünemann HJ, Guyatt GH. Three challenges described for identifying participants with missing data in trials reports, and potential solutions suggested to systematic reviewers. Journal of Clinical Epidemiology 2016; 76 : 147-154.

Altman DG, Bland JM. Detecting skewness from summary information. BMJ 1996; 313 : 1200.

Anzures-Cabrera J, Sarpatwari A, Higgins JPT. Expressing findings from meta-analyses of continuous outcomes in terms of risks. Statistics in Medicine 2011; 30 : 2967-2985.

Berlin JA, Longnecker MP, Greenland S. Meta-analysis of epidemiologic dose-response data. Epidemiology 1993; 4 : 218-228.

Berlin JA, Antman EM. Advantages and limitations of metaanalytic regressions of clinical trials data. Online Journal of Current Clinical Trials 1994; Doc No 134 .

Berlin JA, Santanna J, Schmid CH, Szczech LA, Feldman KA, Group A-LAITS. Individual patient- versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Statistics in Medicine 2002; 21 : 371-387.

Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods 2010; 1 : 97-111.

Borenstein M, Higgins JPT. Meta-analysis and subgroups. Prev Sci 2013; 14 : 134-143.

Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events. Statistics in Medicine 2007; 26 : 53-77.

Chinn S. A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine 2000; 19 : 3127-3131.

da Costa BR, Nuesch E, Rutjes AW, Johnston BC, Reichenbach S, Trelle S, Guyatt GH, Jüni P. Combining follow-up and change data is valid in meta-analyses of continuous outcomes: a meta-epidemiological study. Journal of Clinical Epidemiology 2013; 66 : 847-855.

Deeks JJ. Systematic reviews of published evidence: Miracles or minefields? Annals of Oncology 1998; 9 : 703-709.

Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for examining heterogeneity and combining results from several studies in meta-analysis. In: Egger M, Davey Smith G, Altman DG, editors. Systematic Reviews in Health Care: Meta-analysis in Context . 2nd edition ed. London (UK): BMJ Publication Group; 2001. p. 285-312.

Deeks JJ. Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes. Statistics in Medicine 2002; 21 : 1575-1600.

DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clinical Trials 1986; 7 : 177-188.

DiGuiseppi C, Higgins JPT. Interventions for promoting smoke alarm ownership and function. Cochrane Database of Systematic Reviews 2001; 2 : CD002246.

Ebrahim S, Akl EA, Mustafa RA, Sun X, Walter SD, Heels-Ansdell D, Alonso-Coello P, Johnston BC, Guyatt GH. Addressing continuous data for participants excluded from trial analysis: a guide for systematic reviewers. Journal of Clinical Epidemiology 2013; 66 : 1014-1021 e1011.

Ebrahim S, Johnston BC, Akl EA, Mustafa RA, Sun X, Walter SD, Heels-Ansdell D, Alonso-Coello P, Guyatt GH. Addressing continuous data measured with different instruments for participants excluded from trial analysis: a guide for systematic reviewers. Journal of Clinical Epidemiology 2014; 67 : 560-570.

Efthimiou O. Practical guide to the meta-analysis of rare events. Evidence-Based Mental Health 2018; 21 : 72-76.

Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997; 315 : 629-634.

Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and statistical significance in meta-analysis: an empirical study of 125 meta-analyses. Statistics in Medicine 2000; 19 : 1707-1728.

Greenland S, Robins JM. Estimation of a common effect parameter from sparse follow-up data. Biometrics 1985; 41 : 55-68.

Greenland S. Quantitative methods in the review of epidemiologic literature. Epidemiologic Reviews 1987; 9 : 1-30.

Greenland S, Longnecker MP. Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. American Journal of Epidemiology 1992; 135 : 1301-1309.

Guevara JP, Berlin JA, Wolf FM. Meta-analytic methods for pooling rates when follow-up duration varies: a case study. BMC Medical Research Methodology 2004; 4 : 17.

Hartung J, Knapp G. A refined method for the meta-analysis of controlled clinical trials with binary outcome. Statistics in Medicine 2001; 20 : 3875-3889.

Hasselblad V, McCrory DC. Meta-analytic tools for medical decision making: A practical guide. Medical Decision Making 1995; 15 : 81-96.

Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 2002; 21 : 1539-1558.

Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ 2003; 327 : 557-560.

Higgins JPT, Thompson SG. Controlling the risk of spurious findings from meta-regression. Statistics in Medicine 2004; 23 : 1663-1682.

Higgins JPT, White IR, Wood AM. Imputation methods for missing outcome data in meta-analysis of clinical trials. Clinical Trials 2008a; 5 : 225-239.

Higgins JPT, White IR, Anzures-Cabrera J. Meta-analysis of skewed data: combining results reported on log-transformed or raw scales. Statistics in Medicine 2008b; 27 : 6072-6092.

Higgins JPT, Thompson SG, Spiegelhalter DJ. A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2009; 172 : 137-159.

Kjaergard LL, Villumsen J, Gluud C. Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Annals of Internal Medicine 2001; 135 : 982-989.

Langan D, Higgins JPT, Simmonds M. An empirical comparison of heterogeneity variance estimators in 12 894 meta-analyses. Research Synthesis Methods 2015; 6 : 195-205.

Langan D, Higgins JPT, Simmonds M. Comparative performance of heterogeneity variance estimators in meta-analysis: a review of simulation studies. Research Synthesis Methods 2017; 8 : 181-198.

Langan D, Higgins JPT, Jackson D, Bowden J, Veroniki AA, Kontopantelis E, Viechtbauer W, Simmonds M. A comparison of heterogeneity variance estimators in simulated random-effects meta-analyses. Research Synthesis Methods 2019; 10 : 83-98.

Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ 2001; 322 : 1479-1480.

Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing 2000; 10 : 325-337.

Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 1959; 22 : 719-748.

McIntosh MW. The population risk as an explanatory variable in research synthesis of clinical trials. Statistics in Medicine 1996; 15 : 1713-1728.

Morgenstern H. Uses of ecologic analysis in epidemiologic research. American Journal of Public Health 1982; 72 : 1336-1344.

Oxman AD, Guyatt GH. A consumers guide to subgroup analyses. Annals of Internal Medicine 1992; 116 : 78-84.

Peto R, Collins R, Gray R. Large-scale randomized evidence: large, simple trials and overviews of trials. Journal of Clinical Epidemiology 1995; 48 : 23-40.

Poole C, Greenland S. Random-effects meta-analyses are not always conservative. American Journal of Epidemiology 1999; 150 : 469-475.

Rhodes KM, Turner RM, White IR, Jackson D, Spiegelhalter DJ, Higgins JPT. Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data. Statistics in Medicine 2016; 35 : 5495-5511.

Rice K, Higgins JPT, Lumley T. A re-evaluation of fixed effect(s) meta-analysis. Journal of the Royal Statistical Society Series A (Statistics in Society) 2018; 181 : 205-227.

Riley RD, Higgins JPT, Deeks JJ. Interpretation of random effects meta-analyses. BMJ 2011; 342 : d549.

Röver C. Bayesian random-effects meta-analysis using the bayesmeta R package 2017. https://arxiv.org/abs/1711.08683 .

Rücker G, Schwarzer G, Carpenter J, Olkin I. Why add anything to nothing? The arcsine difference as a measure of treatment effect in meta-analysis with zero cells. Statistics in Medicine 2009; 28 : 721-738.

Sharp SJ. Analysing the relationship between treatment benefit and underlying risk: precautions and practical recommendations. In: Egger M, Davey Smith G, Altman DG, editors. Systematic Reviews in Health Care: Meta-analysis in Context . 2nd edition ed. London (UK): BMJ Publication Group; 2001. p. 176-188.

Sidik K, Jonkman JN. A simple confidence interval for meta-analysis. Statistics in Medicine 2002; 21 : 3153-3159.

Simmonds MC, Tierney J, Bowden J, Higgins JPT. Meta-analysis of time-to-event data: a comparison of two-stage methods. Research Synthesis Methods 2011; 2 : 139-149.

Sinclair JC, Bracken MB. Clinically useful measures of effect in binary analyses of randomized trials. Journal of Clinical Epidemiology 1994; 47 : 881-889.

Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to random-effects meta-analysis: a comparative study. Statistics in Medicine 1995; 14 : 2685-2699.

Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-Care Evaluation . Chichester (UK): John Wiley & Sons; 2004.

Spittal MJ, Pirkis J, Gurrin LC. Meta-analysis of incidence rate data in the presence of zero events. BMC Medical Research Methodology 2015; 15 : 42.

Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. Methods for Meta-analysis in Medical Research . Chichester (UK): John Wiley & Sons; 2000.

Sutton AJ, Abrams KR. Bayesian methods in meta-analysis and evidence synthesis. Statistical Methods in Medical Research 2001; 10 : 277-303.

Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Statistics in Medicine 2004; 23 : 1351-1375.

Thompson SG, Smith TC, Sharp SJ. Investigating underlying risk as a source of heterogeneity in meta-analysis. Statistics in Medicine 1997; 16 : 2741-2758.

Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: a comparison of methods. Statistics in Medicine 1999; 18 : 2693-2708.

Thompson SG, Higgins JPT. How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine 2002; 21 : 1559-1574.

Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JPT. Predicting the extent of heterogeneity in meta-analysis, using empirical data from the Cochrane Database of Systematic Reviews. International Journal of Epidemiology 2012; 41 : 818-827.

Veroniki AA, Jackson D, Viechtbauer W, Bender R, Bowden J, Knapp G, Kuss O, Higgins JPT, Langan D, Salanti G. Methods to estimate the between-study variance and its uncertainty in meta-analysis. Research Synthesis Methods 2016; 7 : 55-79.

Whitehead A, Jones NMB. A meta-analysis of clinical trials involving different classifications of response into ordered categories. Statistics in Medicine 1994; 13 : 2503-2515.

Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Progress in Cardiovascular Diseases 1985; 27 : 335-371.

Yusuf S, Wittes J, Probstfield J, Tyroler HA. Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. JAMA 1991; 266 : 93-98.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Understanding the Basics of Meta-Analysis and How to Read a Forest Plot: As Simple as It Gets

Affiliation.

  • 1 Department of Clinical Psychopharmacology and Neurotoxicology, National Institute of Mental Health and Neurosciences, Bangalore, India. [email protected].
  • PMID: 33027562
  • DOI: 10.4088/JCP.20f13698

The results of research on a specific question differ across studies, some to a small extent and some to a large extent. Meta-analysis is a way to statistically combine and summarize the results of different studies so as to obtain a pooled or summary estimate that may better represent what is true in the population. Meta-analysis can be conducted for a variety of statistics, including means, mean differences, standardized mean differences, proportions, differences in proportions, relative risks, odds ratios, and others. The results of meta-analysis are presented in forest plots. This article explains why meta-analysis may be necessary, how a systematic review is conducted to identify studies for meta-analysis, and how to interpret the various elements in a forest plot. Brief discussions are provided about important concepts relevant to meta-analysis, including heterogeneity, subgroup analyses, sensitivity analyses, fixed effect and random effects meta-analyses, and the detection of publication bias. Other procedures briefly explained include meta-regression analysis, pooled analysis, individual participant data meta-analysis, and network meta-analysis. The limitations of meta-analysis are also discussed.

© Copyright 2020 Physicians Postgraduate Press, Inc.

PubMed Disclaimer

Similar articles

  • Key concepts in clinical epidemiology: detecting and dealing with heterogeneity in meta-analyses. Cordero CP, Dans AL. Cordero CP, et al. J Clin Epidemiol. 2021 Feb;130:149-151. doi: 10.1016/j.jclinepi.2020.09.045. J Clin Epidemiol. 2021. PMID: 33483004
  • [Instructions for aggregated evidence: About reviews, systematic reviews and meta-analyses (part 2)]. Günther J, Briel M, Schindler B. Günther J, et al. Med Monatsschr Pharm. 2015 Oct;38(10):401-8. Med Monatsschr Pharm. 2015. PMID: 26731858 Review. German.
  • How to read and understand and use systematic reviews and meta-analyses. Leucht S, Kissling W, Davis JM. Leucht S, et al. Acta Psychiatr Scand. 2009 Jun;119(6):443-50. doi: 10.1111/j.1600-0447.2009.01388.x. Acta Psychiatr Scand. 2009. PMID: 19469725 Review.
  • More than numbers: the power of graphs in meta-analysis. Bax L, Ikeda N, Fukui N, Yaju Y, Tsuruta H, Moons KG. Bax L, et al. Am J Epidemiol. 2009 Jan 15;169(2):249-55. doi: 10.1093/aje/kwn340. Epub 2008 Dec 8. Am J Epidemiol. 2009. PMID: 19064649
  • Quantitative synthesis in systematic reviews. Lau J, Ioannidis JP, Schmid CH. Lau J, et al. Ann Intern Med. 1997 Nov 1;127(9):820-6. doi: 10.7326/0003-4819-127-9-199711010-00008. Ann Intern Med. 1997. PMID: 9382404
  • 3D visualization technology for Learning human anatomy among medical students and residents: a meta- and regression analysis. Wang J, Li W, Dun A, Zhong N, Ye Z. Wang J, et al. BMC Med Educ. 2024 Apr 26;24(1):461. doi: 10.1186/s12909-024-05403-4. BMC Med Educ. 2024. PMID: 38671399 Free PMC article.
  • Deprivation-Induced Plasticity in the Early Central Circuits of the Rodent Visual, Auditory, and Olfactory Systems. Huang L, Hardyman F, Edwards M, Galliano E. Huang L, et al. eNeuro. 2024 Feb 20;11(2):ENEURO.0435-23.2023. doi: 10.1523/ENEURO.0435-23.2023. Print 2024 Feb. eNeuro. 2024. PMID: 38195533 Free PMC article.
  • Causal effects of lipid-lowering therapies on aging-related outcomes and risk of cancers: a drug-target Mendelian randomization study. Chen H, Tang X, Su W, Li S, Yang R, Cheng H, Zhang G, Zhou X. Chen H, et al. Aging (Albany NY). 2023 Dec 19;15(24):15228-15242. doi: 10.18632/aging.205347. Epub 2023 Dec 19. Aging (Albany NY). 2023. PMID: 38127052 Free PMC article.
  • Genetic insights into the association of statin and newer nonstatin drug target genes with human longevity: a Mendelian randomization analysis. Chen H, Zhou X, Hu J, Li S, Wang Z, Zhu T, Cheng H, Zhang G. Chen H, et al. Lipids Health Dis. 2023 Dec 12;22(1):220. doi: 10.1186/s12944-023-01983-0. Lipids Health Dis. 2023. PMID: 38082436 Free PMC article.
  • A systematic review and meta-analysis on the correlation between HIV infection and multidrug-resistance tuberculosis. Song Y, Jin Q, Qiu J, Ye D. Song Y, et al. Heliyon. 2023 Nov 7;9(11):e21956. doi: 10.1016/j.heliyon.2023.e21956. eCollection 2023 Nov. Heliyon. 2023. PMID: 38034813 Free PMC article.
  • Search in MeSH

Related information

Linkout - more resources, full text sources.

  • Physicians Postgraduate Press, Inc.

Other Literature Sources

  • The Lens - Patent Citations

Research Materials

  • NCI CPTC Antibody Characterization Program
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • Link to facebook
  • Link to linkedin
  • Link to twitter
  • Link to youtube
  • Writing Tips

How to Conduct a Meta-Analysis for Research

How to Conduct a Meta-Analysis for Research

  • 6-minute read
  • 19th January 2024

Are you considering conducting a meta-analysis for your research paper ? When applied to the right problem, meta-analyses can be useful. In this post, we will discuss what a meta-analysis is, when it’s the appropriate method to use, and how to perform one. Let’s jump in! 

What is a Meta-Analysis?

Meta-analysis is a statistical technique that allows researchers to combine findings from multiple individual studies to reach more reliable and generalizable conclusions. It provides a systematic and objective way of synthesizing the results of different studies on a particular topic. There are several benefits of meta-analyses in academic research:

  • Synthesizing diverse evidence : Meta-analysis allows researchers to synthesize evidence from diverse studies, providing a more comprehensive understanding of a research question .
  • Statistical power enhancement : By pooling data from multiple studies, meta-analysis increases statistical power , enabling researchers to detect effects that may be missed in individual studies with smaller sample sizes.
  • Precision and reliability : Meta-analysis offers a more precise estimate of the true effect size , enhancing the reliability and precision of research findings.

When Should I Conduct a Meta-Analysis?

  Although some similarities exist between meta-analyses, literature reviews, and systematic reviews, these methods are distinct, and they serve different purposes. Here’s a breakdown of when to use each.

Meta-Analysis

Meta-analysis is a statistical method that combines and analyzes quantitative data from multiple independent studies to provide an overall estimate of an effect. You should conduct a meta-analysis: 

  • When you want to quantitatively synthesize the results of multiple studies that have measured similar outcomes
  • When there is a sufficient number of studies with compatible data and statistical methods
  • When you’re interested in obtaining a more precise and generalizable estimate of an effect size

Systematic Review

A systematic review is a comprehensive, structured review of existing literature that follows a predefined protocol to identify, select, and critically appraise relevant research studies. You should perform a systematic review: 

  • When you want to provide a comprehensive overview of the existing evidence on a particular research question
  • When you need to assess the quality of available studies and identify gaps or limitations in the literature
  • When a quantitative synthesis (meta-analysis) is not feasible due to variability in study designs or outcomes

Literature Review

A literature review is a broader examination and narrative summary of existing research that may not follow the strict methodology of a systematic review. You should utilize a literature review: 

  • When you want to familiarize yourself with the existing research on a topic without the rigorous methodology required for a systematic review 
  • When you’re exploring a new research area and want to understand the key concepts, theories, and findings 
  • When a more narrative and qualitative synthesis of the literature is sufficient for your purpose

The nature of your research question and the available evidence will guide your choice. If you’re interested in a quantitative summary of results, a meta-analysis might be appropriate. For a comprehensive overview, you could use a systematic review. In many cases, researchers use a combination of these methods. For instance, a systematic review may precede a meta-analysis to identify and evaluate relevant studies before their results are pooled quantitatively. Always consider the specific goals of your research and the nature of the available evidence when deciding which type of data analysis to employ.

Steps to Perform a Meta-Analysis

If you’ve decided that a meta-analysis is the best approach for your research, follow the steps below to guide you through the process.

  • Define your research question and objective.

Clearly define the research objective of your meta-analysis. Doing this will help you narrow down your search and establish inclusion and exclusion criteria for selecting studies.

      2. Conduct a comprehensive literature search.

Thoroughly search electronic databases, such as PubMed, Google Scholar, or Scopus, to identify all relevant studies on your research question. Use a combination of keywords , subject heading terms, and search strategies to ensure a comprehensive search.

      3. Screen and select studies.

Carefully read the titles and abstracts of the identified studies to determine their relevance to your research question. Exclude studies that do not meet your inclusion criteria. Obtain the full text of potentially relevant studies and assess their eligibility based on predefined criteria.

      4. Extract data from selected studies.

Develop a standardized data extraction form to record relevant information from each selected study. Extract data such as study characteristics, sample size, outcomes, and statistical measures. Doing this ensures consistency and reliability in data extraction.

Find this useful?

Subscribe to our newsletter and get writing tips from our editors straight to your inbox.

      5. Evaluate the study quality and biases.

Assess the quality and risk of bias in each study using established tools, such as the Cochrane Collaboration’s risk-of-bias tool. Consider factors such as study design, sample size, randomization, blinding, and the handling of missing data. This step helps identify potential sources of bias in the included studies.

       6. Perform a statistical analysis.

Choose appropriate statistical methods to combine the results from the selected studies. Commonly used measures include odds ratios, risk ratios, and mean differences. Calculate the effect sizes and their associated confidence intervals. You might consider using statistical software to help you with this step.

        7. Assess heterogeneity.

Assess the heterogeneity of the included studies to determine whether the results can be pooled. Use statistical tests, such as Cochran’s Q test or I 2 statistic, to quantify the degree of heterogeneity.

       8. Interpret and report the results.

Interpret the pooled effect size and its confidence interval in light of the research question. Provide a clear summary of the findings, including any limitations or caveats. Use forest plots or other graphical tools to present the results visually. Make sure to adhere to reporting guidelines, such as the PRISMA Statement .

       9. Assess the publication bias.

Publication bias occurs when studies with positive results are more likely to be published, leading to an overestimation of the effect size. Assess the publication bias using methods such as funnel plots, Egger’s test, or the Begg and Mazumdar test. Consider exploring potential publication bias through a sensitivity analysis.

     10. Discuss the implications and limitations.

Finally, discuss the implications of the meta-analysis findings in the context of the existing literature. Identify any limitations or potential biases that may affect the validity of the results. You might also highlight areas for further research or recommendations for practice.

There you have it! Now that we’ve gone over what a meta-analysis is, when to use one in research, and what steps to take to conduct a robust meta-analysis, you’re well prepared to begin your research journey.

Finally, if you’d like any help proofreading your research paper , consider our research paper editing services . You can even try a free sample . Good luck with your meta-analysis!

Share this article:

Post A New Comment

Got content that needs a quick turnaround? Let us polish your work. Explore our editorial business services.

9-minute read

How to Use Infographics to Boost Your Presentation

Is your content getting noticed? Capturing and maintaining an audience’s attention is a challenge when...

8-minute read

Why Interactive PDFs Are Better for Engagement

Are you looking to enhance engagement and captivate your audience through your professional documents? Interactive...

7-minute read

Seven Key Strategies for Voice Search Optimization

Voice search optimization is rapidly shaping the digital landscape, requiring content professionals to adapt their...

4-minute read

Five Creative Ways to Showcase Your Digital Portfolio

Are you a creative freelancer looking to make a lasting impression on potential clients or...

How to Ace Slack Messaging for Contractors and Freelancers

Effective professional communication is an important skill for contractors and freelancers navigating remote work environments....

3-minute read

How to Insert a Text Box in a Google Doc

Google Docs is a powerful collaborative tool, and mastering its features can significantly enhance your...

Logo Harvard University

Make sure your writing is the best it can be with our expert English proofreading and editing.

  • Introduction
  • Conclusions
  • Article Information

Contributing studies for clinically elevated depression symptoms are presented in order of largest to smallest prevalence rate. Square data markers represent prevalence rates, with lines around the marker indicating 95% CIs. The diamond data marker represents the overall effect size based on included studies.

Contributing studies for clinically elevated anxiety symptoms are presented in order of largest to smallest prevalence rate. Square data markers represent prevalence rates, with lines around the marker indicating 95% CIs. The diamond data marker represents the overall effect size based on included studies.

eTable 1. Example Search Strategy from Medline

eTable 2. Study Quality Evaluation Criteria

eTable 3. Quality Assessment of Studies Included

eTable 4. Sensitivity analysis excluding low quality studies (score=2) for moderators of the prevalence of clinically elevated depressive symptoms in children and adolescence during COVID-19

eTable 5. Sensitivity analysis excluding low quality studies (score=2) for moderators of the prevalence of clinically elevated anxiety symptoms in children and adolescence during COVID-19

eFigure 1. PRISMA diagram of review search strategy

eFigure 2. Funnel plot for studies included in the clinically elevated depressive symptoms

eFigure 3. Funnel plot for studies included in the clinically elevated anxiety symptoms

  • Pediatric Depression and Anxiety Doubled During the Pandemic JAMA News From the JAMA Network October 5, 2021 Anita Slomski
  • Guidelines Synopsis: Screening for Anxiety in Adolescent and Adult Women JAMA JAMA Clinical Guidelines Synopsis March 8, 2022 This JAMA Clinical Guidelines Synopsis summarizes the 2020 Women’s Preventive Services Initiative recommendation on screening for anxiety in adolescent and adult women. Tiffany I. Leung, MD, MPH; Adam S. Cifu, MD; Wei Wei Lee, MD, MPH
  • Addressing the Global Crisis of Child and Adolescent Mental Health JAMA Pediatrics Editorial November 1, 2021 Tami D. Benton, MD; Rhonda C. Boyd, PhD; Wanjikũ F.M. Njoroge, MD
  • Effect of the COVID-19 pandemic on Adolescents With Eating Disorders JAMA Pediatrics Comment & Response February 1, 2022 Thonmoy Dey, BSc; Zachariah John Mansell, BSc; Jasmin Ranu, BSc

See More About

Select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing

Others Also Liked

  • Download PDF
  • X Facebook More LinkedIn
  • CME & MOC

Racine N , McArthur BA , Cooke JE , Eirich R , Zhu J , Madigan S. Global Prevalence of Depressive and Anxiety Symptoms in Children and Adolescents During COVID-19 : A Meta-analysis . JAMA Pediatr. 2021;175(11):1142–1150. doi:10.1001/jamapediatrics.2021.2482

Manage citations:

© 2024

  • Permissions

Global Prevalence of Depressive and Anxiety Symptoms in Children and Adolescents During COVID-19 : A Meta-analysis

  • 1 Department of Psychology, University of Calgary, Calgary, Alberta, Canada
  • 2 Alberta Children’s Hospital Research Institute, Calgary, Alberta, Canada
  • Editorial Addressing the Global Crisis of Child and Adolescent Mental Health Tami D. Benton, MD; Rhonda C. Boyd, PhD; Wanjikũ F.M. Njoroge, MD JAMA Pediatrics
  • News From the JAMA Network Pediatric Depression and Anxiety Doubled During the Pandemic Anita Slomski JAMA
  • JAMA Clinical Guidelines Synopsis Guidelines Synopsis: Screening for Anxiety in Adolescent and Adult Women Tiffany I. Leung, MD, MPH; Adam S. Cifu, MD; Wei Wei Lee, MD, MPH JAMA
  • Comment & Response Effect of the COVID-19 pandemic on Adolescents With Eating Disorders Thonmoy Dey, BSc; Zachariah John Mansell, BSc; Jasmin Ranu, BSc JAMA Pediatrics

Question   What is the global prevalence of clinically elevated child and adolescent anxiety and depression symptoms during COVID-19?

Findings   In this meta-analysis of 29 studies including 80 879 youth globally, the pooled prevalence estimates of clinically elevated child and adolescent depression and anxiety were 25.2% and 20.5%, respectively. The prevalence of depression and anxiety symptoms during COVID-19 have doubled, compared with prepandemic estimates, and moderator analyses revealed that prevalence rates were higher when collected later in the pandemic, in older adolescents, and in girls.

Meaning   The global estimates of child and adolescent mental illness observed in the first year of the COVID-19 pandemic in this study indicate that the prevalence has significantly increased, remains high, and therefore warrants attention for mental health recovery planning.

Importance   Emerging research suggests that the global prevalence of child and adolescent mental illness has increased considerably during COVID-19. However, substantial variability in prevalence rates have been reported across the literature.

Objective   To ascertain more precise estimates of the global prevalence of child and adolescent clinically elevated depression and anxiety symptoms during COVID-19; to compare these rates with prepandemic estimates; and to examine whether demographic (eg, age, sex), geographical (ie, global region), or methodological (eg, pandemic data collection time point, informant of mental illness, study quality) factors explained variation in prevalence rates across studies.

Data Sources   Four databases were searched (PsycInfo, Embase, MEDLINE, and Cochrane Central Register of Controlled Trials) from January 1, 2020, to February 16, 2021, and unpublished studies were searched in PsycArXiv on March 8, 2021, for studies reporting on child/adolescent depression and anxiety symptoms. The search strategy combined search terms from 3 themes: (1) mental illness (including depression and anxiety), (2) COVID-19, and (3) children and adolescents (age ≤18 years). For PsycArXiv , the key terms COVID-19 , mental health , and child/adolescent were used.

Study Selection   Studies were included if they were published in English, had quantitative data, and reported prevalence of clinically elevated depression or anxiety in youth (age ≤18 years).

Data Extraction and Synthesis   A total of 3094 nonduplicate titles/abstracts were retrieved, and 136 full-text articles were reviewed. Data were analyzed from March 8 to 22, 2021.

Main Outcomes and Measures   Prevalence rates of clinically elevated depression and anxiety symptoms in youth.

Results   Random-effect meta-analyses were conducted. Twenty-nine studies including 80 879 participants met full inclusion criteria. Pooled prevalence estimates of clinically elevated depression and anxiety symptoms were 25.2% (95% CI, 21.2%-29.7%) and 20.5% (95% CI, 17.2%-24.4%), respectively. Moderator analyses revealed that the prevalence of clinically elevated depression and anxiety symptoms were higher in studies collected later in the pandemic and in girls. Depression symptoms were higher in older children.

Conclusions and Relevance   Pooled estimates obtained in the first year of the COVID-19 pandemic suggest that 1 in 4 youth globally are experiencing clinically elevated depression symptoms, while 1 in 5 youth are experiencing clinically elevated anxiety symptoms. These pooled estimates, which increased over time, are double of prepandemic estimates. An influx of mental health care utilization is expected, and allocation of resources to address child and adolescent mental health concerns are essential.

Prior to the COVID-19 pandemic, rates of clinically significant generalized anxiety and depressive symptoms in large youth cohorts were approximately 11.6% 1 and 12.9%, 2 respectively. Since COVID-19 was declared an international public health emergency, youth around the world have experienced dramatic disruptions to their everyday lives. 3 Youth are enduring pervasive social isolation and missed milestones, along with school closures, quarantine orders, increased family stress, and decreased peer interactions, all potential precipitants of psychological distress and mental health difficulties in youth. 4 - 7 Indeed, in both cross-sectional 8 , 9 and longitudinal studies 10 , 11 amassed to date, the prevalence of youth mental illness appears to have increased during the COVID-19 pandemic. 3 However, data collected vary considerably. Specifically, ranges from 2.2% 12 to 63.8% 13 and 1.8% 12 to 49.5% 13 for clinically elevated depression and anxiety symptoms, respectively. As governments and policy makers deploy and implement recovery plans, ascertaining precise estimates of the burden of mental illness for youth are urgently needed to inform service deployment and resource allocation.

Depression and generalized anxiety are 2 of the most common mental health concerns in youth. 14 Depressive symptoms, which include feelings of sadness, loss of interest and pleasure in activities, as well as disruption to regulatory functions such as sleep and appetite, 15 could be elevated during the pandemic as a result of social isolation due to school closures and physical distancing requirements. 6 Generalized anxiety symptoms in youth manifest as uncontrollable worry, fear, and hyperarousal. 15 Uncertainty, disruptions in daily routines, and concerns for the health and well-being of family and loved ones during the COVID-19 pandemic are likely associated with increases in generalized anxiety in youth. 16

When heterogeneity is observed across studies, as is the case with youth mental illness during COVID-19, it often points to the need to examine demographic, geographical, and methodological moderators. Moderator analyses can determine for whom and under what circumstances prevalence is higher vs lower. With regard to demographic factors, prevalence rates of mental illness both prior to and during the COVID-19 pandemic are differentially reported across child age and sex, with girls 17 , 18 and older children 17 , 19 being at greater risk for internalizing disorders. Studies have also shown that youth living in regions that experienced greater disease burden 2 and urban areas 20 had greater mental illness severity. Methodological characteristics of studies also have the potential to influence the estimated prevalence rates. For example, studies of poorer methodological quality may be more likely to overestimate prevalence rates. 21 The symptom reporter (ie, child vs parent) may also contribute to variability in the prevalence of mental illness across studies. Indeed, previous research prior to the pandemic has demonstrated that child and parent reports of internalizing symptoms vary, 22 with children/adolescents reporting more internalizing symptoms than parents. 23 Lastly, it is important to consider the role of data collection timing on potential prevalence rates. While feelings of stress and overwhelm may have been greater in the early months of the pandemic compared with later, 24 extended social isolation and school closures may have exerted mental health concerns.

Although a narrative systematic review of 6 studies early in the pandemic was conducted, 8 to our knowledge, no meta-analysis of prevalence rates of child and adolescent mental illness during the pandemic has been undertaken. In the current study, we conducted a meta-analysis of the global prevalence of clinically elevated symptoms of depression and anxiety (ie, exceeding a clinical cutoff score on a validated measure or falling in the moderate to severe symptom range of anxiety and depression) in youth during the first year of the COVID-19 pandemic. While research has documented a worsening of symptoms for children and youth with a wide range of anxiety disorders, 25 including social anxiety, 26 clinically elevated symptoms of generalized anxiety are the focus of the current meta-analysis. In addition to deriving pooled prevalence estimates, we examined demographic, geographical, and methodological factors that may explain between-study differences. Given that there have been several precipitants of psychological distress for youth during COVID-19, we hypothesized that pooled prevalence rates would be higher compared with prepandemic estimates. We also hypothesized that child mental illness would be higher among studies with older children, a higher percentage of female individuals, studies conducted later in the pandemic, and that higher-quality studies would have lower prevalence rates.

This systematic review was registered as a protocol with PROSPERO (CRD42020184903) and the Preferred Reporting Items for Systematic Reviews and Meta-analyses ( PRISMA ) reporting guideline was followed. 27 Ethics review was not required for the study. Electronic searches were conducted in collaboration with a health sciences librarian in PsycInfo, Cochrane Central Register of Controlled Trials (CENTRAL), Embase, and MEDLINE from inception to February 16, 2021. The search strategy (eTable 1 in the Supplement ) combined search terms from 3 themes: (1) mental illness (including depression and anxiety), (2) COVID-19, and (3) children and adolescents (age ≤18 years). Both database and subject headings were used to search keywords. As a result of the rapidly evolving nature of research during the COVID-19 pandemic, we also searched a repository of unpublished preprints, PsycArXiv . The key terms COVID-19 , mental health , and child/adolescent were used on March 8, 2021, and yielded 38 studies of which 1 met inclusion criteria.

The following inclusion criteria were applied: (1) sample was drawn from a general population; (2) proportion of individuals meeting clinical cutoff scores or falling in the moderate to severe symptom range of anxiety or depression as predetermined by validated self-report measures were provided; (3) data were collected during COVID-19; (4) participants were 18 years or younger; (5) study was empirical; and (6) studies were written in English. Samples of participants who may be affected differently from a mental health perspective during COVID-19 were excluded (eg, children with preexisting psychiatric diagnoses, children with chronic illnesses, children diagnosed or suspected of having COVID-19). We also excluded case studies and qualitative analyses.

Five (N.R., B.A.M., J.E.C., R.E. and J.Z.) authors used Covidence software (Covidence Inc) to review all abstracts and to determine if the study met criteria for inclusion. Twenty percent of abstracts reviewed for inclusion were double-coded, and the mean random agreement probability was 0.89; disagreements were resolved via consensus with the first author (N.R.). Two authors (N.R. and B.A.M.) reviewed full-text articles to determine if they met all inclusion criteria and the percent agreement was 0.80; discrepancies were resolved via consensus.

When studies met inclusion criteria, prevalence rates for anxiety and depression were extracted, as well as potential moderators. When more than 1 wave of data was provided, the wave with the largest sample size was selected. For 1 study in which both parent and youth reports were provided, 26 the youth report was selected, given research that they are the reliable informants of their own behavior. 28 The following moderators were extracted: (1) study quality (see the next subsection); (2) participant age (continuously as a mean); (3) sex (% female in a sample); (4) geographical region (eg, East Asia, Europe, North America), (5) informant (child, parent), (6) month in 2020 when data were collected (range, 1-12). Data from all studies were extracted by 1 coder and the first author (N.R.). Discrepancies were resolved via consensus.

Adapted from the National Institute of Health Quality Assessment Tool for Observation Cohort and Cross-Sectional Studies, a short 5-item questionnaire was used (eTable 2 in the Supplement ). 29 Studies were given a score of 0 (no) or 1 (yes) for each of the 5 criteria (validated measure; peer-reviewed, response rate ≥50%, objective assessment, sufficient exposure time) and summed to give a total score of 5. When information was unclear or not provided by the study authors, it was marked as 0 (no).

All included studies are from independent samples. Comprehensive Meta-Analysis version 3.0 (Biostat) software was used for data analysis. Pooled prevalence estimates with associated 95% confidence intervals around the estimate were computed. We weighted pooled prevalence estimates by the weight of the inverse of their variance, which gives greater weight to large sample sizes.

We used random-effects models to reflect the variations observed across studies and assessed between-study heterogeneity using the Q and I 2 statistics. Pooled prevalence is reported as an event rate (ie, 0.30) but interpreted as prevalence (ie, 30.0%). Significant Q statistics and I 2 values more than 75% suggest moderator analyses should be explored. 30 As recommended by Bornstein et al, 30 we examined categorical moderators when k of 10 or higher and a minimum cell size of k more than 3 were available. A P value of .05 was considered statistically significant. For continuous moderators, random-effect meta-regression analyses were conducted. Publication bias was examined using the Egger test 31 and by inspecting funnel plots for symmetry.

Our electronic search yielded 3094 nonduplicate records (eFigure 1 in the Supplement ). Based on the abstract review, a total of 136 full-text articles were retrieved to examine against inclusion criteria, and 29 nonoverlapping studies 10 , 12 , 13 , 17 , 19 , 20 , 26 , 32 - 53 met full inclusion criteria.

A total of 29 studies were included in the meta-analyses, of which 26 had youth symptom reports and 3 studies 39 , 42 , 48 had parent reports of child symptoms. As outlined in Table 1 , across all 29 studies, 80 879 participants were included, of which the mean (SD) perecentage of female individuals was 52.7% (12.3%), and the mean age was 13.0 years (range, 4.1-17.6 years). All studies provided binary reports of sex or gender. Sixteen studies (55.2%) were from East Asia, 4 were from Europe (13.8%), 6 were from North America (20.7%), 2 were from Central America and South America (6.9%), and 1 study was from the Middle East (3.4%). Eight studies (27.6%) reported having racial or ethnic minority participants with the mean across studies being 36.9%. Examining study quality, the mean score was 3.10 (range, 2-4; eTable 3 in the Supplement ).

The pooled prevalence from a random-effects meta-analysis of 26 studies revealed a pooled prevalence rate of 0.25 (95% CI, 0.21-0.30; Figure 1 ) or 25.2%. The funnel plot was symmetrical (eFigure 2 in the Supplement ); however, the Egger test was statistically significant (intercept, −9.5; 95% CI, −18.4 to −0.48; P  = .02). The between-study heterogeneity statistic was significant ( Q  = 4675.91; P  < .001; I 2  = 99.47). Significant moderators are reported below, and all moderator analyses are presented in Table 2 .

As the number of months in the year increased, so too did the prevalence of depressive symptoms ( b  = 0.26; 95% CI, 0.06-0.46). Prevalence rates were higher as child age increased ( b  = 0.08; 95% CI, 0.01-0.15), and as the percentage of female individuals ( b  = 0.03; 95% CI, 0.01-0.05) in samples increased. Sensitivity analyses removing low-quality studies were conducted (ie, scores of 2) 32 , 43 (eTable 4 in the Supplement ). Moderators remained significant, except for age, which became nonsignificant ( b  = 0.06; 95% CI, −0.02 to 0.13; P  = .14).

The overall pooled prevalence rate across 25 studies for elevated anxiety was 0.21 (95% CI, 0.17-0.24; Figure 2 ) or 20.5%. The funnel plot was symmetrical (eFigure 3 in the Supplement ) and the Egger test was nonsignificant (intercept, −6.24; 95% CI, −14.10 to 1.62; P  = .06). The heterogeneity statistic was significant ( Q  = 3300.17; P  < .001; I 2  = 99.27). Significant moderators are reported below, and all moderator analyses are presented in Table 3 .

As the number of months in the year increased, so too did the prevalence of anxiety symptoms ( b  = 0.27; 95% CI, 0.10-0.44). Prevalence rates of clinically elevated anxiety was higher as the percentage of female individuals in the sample increased ( b  = 0.04; 95% CI, 0.01-0.07) and also higher in European countries ( k  = 4; rate = 0.34; 95% CI, 0.23-0.46; P  = .01) compared with East Asian countries ( k  = 14; rate = 0.17; 95% CI, 0.13-0.21; P  < .001). Lastly, the prevalence of clinically elevated anxiety was higher in studies deemed to have poorer quality ( k  = 21; rate = 0.22; 95% CI, 0.18-0.27; P  < .001) compared with studies with better study quality scores ( k  = 4; rate = 0.12; 95% CI, 0.07-0.20; P  < .001). Sensitivity analyses removing low quality studies (ie, scores of 2) 32 , 43 yielded the same pattern of results (eTable 5 in the Supplement ).

The current meta-analysis provides a timely estimate of clinically elevated depression and generalized anxiety symptoms globally among youth during the COVID-19 pandemic. Across 29 samples and 80 879 youth, the pooled prevalence of clinically elevated depression and anxiety symptoms was 25.2% and 20.5%, respectively. Thus, 1 in 4 youth globally are experiencing clinically elevated depression symptoms, while 1 in 5 youth are experiencing clinically elevated anxiety symptoms. A comparison of these findings to prepandemic estimates (12.9% for depression 2 and 11.6% for anxiety 1 ) suggests that youth mental health difficulties during the COVID-19 pandemic has likely doubled.

The COVID-19 pandemic, and its associated restrictions and consequences, appear to have taken a considerable toll on youth and their psychological well-being. Loss of peer interactions, social isolation, and reduced contact with buffering supports (eg, teachers, coaches) may have precipitated these increases. 3 In addition, schools are often a primary location for receiving psychological services, with 80% of children relying on school-based services to address their mental health needs. 54 For many children, these services were rendered unavailable owing to school closures.

As the month of data collection increased, rates of depression and anxiety increased correspondingly. One possibility is that ongoing social isolation, 6 family financial difficulties, 55 missed milestones, and school disruptions 3 are compounding over time for youth and having a cumulative association. However, longitudinal research supporting this possibility is currently scarce and urgently needed. A second possibility is that studies conducted in the earlier months of the pandemic (February to March 2020) 12 , 51 were more likely to be conducted in East Asia where self-reported prevalence of mental health symptoms tends to be lower. 56 Longitudinal trajectory research on youth well-being as the pandemic progresses and in pandemic recovery phases will be needed to confirm the long-term mental health implications of the COVID-19 pandemic on youth mental illness.

Prevalence rates for anxiety varied according to study quality, with lower-quality studies yielding higher prevalence rates. It is important to note that in sensitivity analyses removing lower-quality studies, other significant moderators (ie, child sex and data collection time point) remained significant. There has been a rapid proliferation of youth mental health research during the COVID-19 pandemic; however, the rapid execution of these studies has been criticized owing to the potential for some studies to sacrifice methodological quality for methodological rigor. 21 , 57 Additionally, several studies estimating prevalence rates of mental illness during the pandemic have used nonprobability or convenience samples, which increases the likelihood of bias in reporting. 21 Studies with representative samples and/or longitudinal follow-up studies that have the potential to demonstrate changes in mental health symptoms from before to after the pandemic should be prioritized in future research.

In line with previous research on mental illness in childhood and adolescence, 58 female sex was associated with both increased depressive and anxiety symptoms. Biological susceptibility, lower baseline self-esteem, a higher likelihood of having experienced interpersonal violence, and exposure to stress associated with gender inequity may all be contributing factors. 59 Higher rates of depression in older children were observed and may be due to puberty and hormonal changes 60 in addition to the added effects of social isolation and physical distancing on older children who particularly rely on socialization with peers. 6 , 61 However, age was not a significant moderator for prevalence rates of anxiety. Although older children may be more acutely aware of the stress of their parents and the implications of the current global pandemic, younger children may be able to recognize changes to their routine, both of which may contribute to similar rates of anxiety with different underlying mechanisms.

In terms of practice implications, a routine touch point for many youth is the family physician or pediatrician’s office. Within this context, it is critical to inquire about or screen for youth mental health difficulties. Emerging research 42 suggests that in families using more routines during COVID-19, lower child depression and conduct problems are observed. Thus, a tangible solution to help mitigate the adverse effects of COVID-19 on youth is working with children and families to implement consistent and predictable routines around schoolwork, sleep, screen use, and physical activity. Additional resources should be made available, and clinical referrals should be placed when children experience clinically elevated mental distress. At a policy level, research suggests that social isolation may contribute to and confer risk for mental health concerns. 4 , 5 As such, the closure of schools and recreational activities should be considered a last resort. 62 In addition, methods of delivering mental health resources widely to youth, such as group and individual telemental health services, need to be adapted to increase scalability, while also prioritizing equitable access across diverse populations. 63

There are some limitations to the current study. First, although the current meta-analysis includes global estimates of child and adolescent mental illness, it will be important to reexamine cross-regional differences once additional data from underrepresented countries are available. Second, most study designs were cross-sectional in nature, which precluded an examination of the long-term association of COVID-19 with child mental health over time. To determine whether clinically elevated symptoms are sustained, exacerbated, or mitigated, longitudinal studies with baseline estimates of anxiety and depression are needed. Third, few studies included racial or ethnic minority participants (27.6%), and no studies included gender-minority youth. Given that racial and ethnic minority 64 and gender-diverse youth 65 , 66 may be at increased risk for mental health difficulties during the pandemic, future work should include and focus on these groups. Finally, all studies used self- or parent-reported questionnaires to examine the prevalence of clinically elevated (ie, moderate to high) symptoms. Thus, studies using criterion standard assessments of child depression and anxiety disorders via diagnostic interviews or multimethod approaches may supplement current findings and provide further details on changes beyond generalized anxiety symptoms, such symptoms of social anxiety, separation anxiety, and panic.

Overall, this meta-analysis shows increased rates of clinically elevated anxiety and depression symptoms for youth during the COVID-19 pandemic. While this meta-analysis supports an urgent need for intervention and recovery efforts aimed at improving child and adolescent well-being, it also highlights that individual differences need to be considered when determining targets for intervention (eg, age, sex, exposure to COVID-19 stressors). Research on the long-term effect of the COVID-19 pandemic on mental health, including studies with pre– to post–COVID-19 measurement, is needed to augment understanding of the implications of this crisis on the mental health trajectories of today’s children and youth.

Corresponding Author: Sheri Madigan, PhD, RPsych, Department of Psychology University of Calgary, Calgary, AB T2N 1N4, Canada ( [email protected] ).

Accepted for Publication: May 19, 2021.

Published Online: August 9, 2021. doi:10.1001/jamapediatrics.2021.2482

Author Contributions: Drs Racine and Madigan had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Racine, Madigan.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Racine, McArthur, Eirich, Zhu, Madigan.

Critical revision of the manuscript for important intellectual content: Racine, Cooke, Eirich, Madigan.

Statistical analysis: Racine, McArthur.

Administrative, technical, or material support: Madigan.

Supervision: Racine, Madigan.

Conflict of Interest Disclosures: Dr Racine reported fellowship support from Alberta Innovates. Dr McArthur reported a postdoctoral fellowship award from the Alberta Children’s Hospital Research Institute. Ms Cooke reported graduate scholarship support from Vanier Canada and Alberta Innovates Health Solutions outside the submitted work. Ms Eirich reported graduate scholarship support from the Social Science and Humanities Research Council. No other disclosures were reported.

Additional Contributions: We acknowledge Nicole Dunnewold, MLIS (Research and Learning Librarian, Health Sciences Library, University of Calgary), for her assistance with the search strategy, for which they were not compensated outside of their salary. We also acknowledge the contribution of members of the Determinants of Child Development Laboratory at the University of Calgary, in particular, Julianna Watt, BA, and Katarina Padilla, BSc, for their contribution to data extraction, for which they were paid as research assistants.

  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

YouTube

Are Claims About Magnesium for Stress and Sleep Backed by Research?

Published on June 18, 2024

Findings from a new meta-analysis show that higher doses of magnesium were most effective in addressing anxiety and sleep issues, especially among those with low magnesium status

  • Magnesium is well-known for its calming and anti-anxiety effects, and research investigating magnesium supplementation and sleep has shown significant improvements in insomnia severity, sleep time, sleep efficiency, sleep onset latency, and the regulation of sleep related hormones
  • A systematic review was recently conducted using data from 15 interventional trials (randomized control trials or observational trials); each study included a known dose of magnesium that was at least 50 mg/day and measured outcomes for sleep, anxiety, or both
  • The study found that higher doses of magnesium were more effective in addressing both issues, and magnesium combined with vitamin B6 appeared most effective for addressing anxiety symptoms; it also concluded that more research may be necessary to determine which forms and specific doses of magnesium might be best for these conditions

how to make meta analysis research

Magnesium has repeatedly been in the spotlight on many social media platforms as a natural way to fight stress and improve sleep – but how well is this claim backed by research? Magnesium is well-known for its calming and anti-anxiety effects, and research investigating magnesium supplementation and sleep has shown significant improvements in insomnia severity, sleep time, sleep efficiency, sleep onset latency, and the regulation of sleep related hormones. Magnesium supplementation has also been shown to decrease periodic limb movements during sleep.

Findings from a New Meta-Analysis on Magnesium, Sleep, and Anxiety

A systematic review was recently conducted and published by Rawji et al. on the clinical evidence supporting the use of magnesium for quality of sleep and anxiety symptoms. Only data from interventional trials (randomized control trials or observational trials) was used, and each study had to include a known dose of magnesium that was at least 50 mg/day. 15 studies were included, with 8 measuring sleep outcomes, 7 measuring anxiety, and one measuring both.

Among the sleep-related studies, 5 found that magnesium supplementation resulted in improvements in sleep quality, while two reported negative results and one reporting mixed results. Of those reporting negative results, one gave the lowest dose of magnesium oxide compared to all other studies (250 mg), with supplementation given in the morning (the authors suggest that magnesium supplementation closer to bedtime may be more effective for sleep). The other negative study gave only 100 mg of magnesium chloride, the lowest dose of magnesium among each of the sleep studies evaluated.

Combining Magnesium with B Vitamins Resulted in Better Improvements for Anxiety Sufferers

Among the stress-related studies included in the review, several combined magnesium with other ingredients such as vitamin B6. Taking B vitamins, especially vitamin B6, is important for addressing stress – in fact, vitamin B6 has demonstrated anti-stress benefits itself, and it assists in the cellular uptake of magnesium. As illustrated in the chart below, a study by Pouteau et al. found that those taking vitamin B6 with magnesium experienced a 24% greater improvement in stress compared to those taking magnesium alone, however, this was only significant among those with severe or extremely severe stress.

how to make meta analysis research

Click to Enlarge & Print

Five of the studies reviewed by Rawji et al. found that magnesium supplementation was related to positive stress outcomes, with higher doses of magnesium in combination with vitamin B6 related to the greatest reductions in anxiety scores . The only study showing clearly negative results used the lowest dose of magnesium among all studies included in the review.

Magnesium Found Most Helpful Among those with Low Magnesium Status

The study concluded,

“supplemental magnesium is likely useful in the treatment of mild anxiety and insomnia, particularly in those with low magnesium status at baseline.”

Higher doses of magnesium were more effective in addressing both issues, and magnesium combined with vitamin B6 appeared most effective for addressing anxiety symptoms. The study also concluded that more research may be necessary to determine which forms and specific doses of magnesium might be best for these conditions.

Adverse events were uncommon among all studies and were mild, with the most commonly reported side-effect being loose stool.

Make Sure You Are Getting Enough Vitamin D, Magnesium, and Other Important Nutrients – Test At Home!

Enroll in D*action and Test Your Levels Today!

How Can You Use this Information for YOUR Health?

Having and maintaining healthy vitamin D and other nutrient levels can help improve your health now and for your future. Measuring is the only way to make sure you are getting enough!

STEP 1 Order your at-home blood spot test kit to measure vitamin D and other nutrients of concern to you, such as omega-3s, magnesium, essential and toxic elements (zinc, copper, selenium, lead, cadmium, mercury); include hsCRP as a marker of inflammation or HbA1c for blood sugar health

STEP 2 Answer the online questionnaire as part of the GrassrootsHealth study

STEP 3 Using our educational materials and tools (such as our dose calculators), assess your results to determine if you are in your desired target range or if actions should be taken to get there

STEP 4 After 3-6 months of implementing your changes, re-test to see if you have achieved your target level(s)

Enroll in D*action and Build Your Custom Test Kit!

You may also like

how to make meta analysis research

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

Table of Contents

Which social media platforms are most common, who uses each social media platform, find out more, social media fact sheet.

Many Americans use social media to connect with one another, engage with news content, share information and entertain themselves. Explore the patterns and trends shaping the social media landscape.

To better understand Americans’ social media use, Pew Research Center surveyed 5,733 U.S. adults from May 19 to Sept. 5, 2023. Ipsos conducted this National Public Opinion Reference Survey (NPORS) for the Center using address-based sampling and a multimode protocol that included both web and mail. This way nearly all U.S. adults have a chance of selection. The survey is weighted to be representative of the U.S. adult population by gender, race and ethnicity, education and other categories.

Polls from 2000 to 2021 were conducted via phone. For more on this mode shift, read our Q&A.

Here are the questions used for this analysis , along with responses, and  its methodology ­­­.

A note on terminology: Our May-September 2023 survey was already in the field when Twitter changed its name to “X.” The terms  Twitter  and  X  are both used in this report to refer to the same platform.

how to make meta analysis research

YouTube and Facebook are the most-widely used online platforms. About half of U.S. adults say they use Instagram, and smaller shares use sites or apps such as TikTok, LinkedIn, Twitter (X) and BeReal.

YearYouTubeFacebookInstagramPinterestTikTokLinkedInWhatsAppSnapchatTwitter (X)RedditBeRealNextdoor
8/5/201254%9%10%16%13%
8/7/201214%
12/9/201211%13%13%
12/16/201257%
5/19/201315%
7/14/201316%
9/16/201357%14%17%17%14%
9/30/201316%
1/26/201416%
9/21/201458%21%22%23%19%
4/12/201562%24%26%22%20%
4/4/201668%28%26%25%21%
1/10/201873%68%35%29%25%22%27%24%
2/7/201973%69%37%28%27%20%24%22%11%
2/8/202181%69%40%31%21%28%23%25%23%18%13%
9/5/202383%68%47%35%33%30%29%27%22%22%3%

Note: The vertical line indicates a change in mode. Polls from 2012-2021 were conducted via phone. In 2023, the poll was conducted via web and mail. For more details on this shift, please read our Q&A . Refer to the topline for more information on how question wording varied over the years. Pre-2018 data is not available for YouTube, Snapchat or WhatsApp; pre-2019 data is not available for Reddit; pre-2021 data is not available for TikTok; pre-2023 data is not available for BeReal. Respondents who did not give an answer are not shown.

Source: Surveys of U.S. adults conducted 2012-2023.

how to make meta analysis research

Usage of the major online platforms varies by factors such as age, gender and level of formal education.

% of U.S. adults who say they ever use __ by …

  • RACE & ETHNICITY
  • POLITICAL AFFILIATION
Ages 18-2930-4950-6465+
Facebook67756958
Instagram78593515
LinkedIn32403112
Twitter (X)4227176
Pinterest45403321
Snapchat6530134
YouTube93928360
WhatsApp32382916
Reddit4431113
TikTok62392410
BeReal1231<1
MenWomen
Facebook5976
Instagram3954
LinkedIn3129
Twitter (X)2619
Pinterest1950
Snapchat2132
YouTube8283
WhatsApp2731
Reddit2717
TikTok2540
BeReal25
WhiteBlackHispanicAsian*
Facebook69646667
Instagram43465857
LinkedIn30292345
Twitter (X)20232537
Pinterest36283230
Snapchat25253525
YouTube81828693
WhatsApp20315451
Reddit21142336
TikTok28394929
BeReal3149
Less than $30,000$30,000- $69,999$70,000- $99,999$100,000+
Facebook63707468
Instagram37464954
LinkedIn13193453
Twitter (X)18212029
Pinterest27343541
Snapchat27302625
YouTube73838689
WhatsApp26263334
Reddit12232230
TikTok36373427
BeReal3335
High school or lessSome collegeCollege graduate+
Facebook637170
Instagram375055
LinkedIn102853
Twitter (X)152429
Pinterest264238
Snapchat263223
YouTube748589
WhatsApp252339
Reddit142330
TikTok353826
BeReal344
UrbanSuburbanRural
Facebook666870
Instagram534938
LinkedIn313618
Twitter (X)252613
Pinterest313636
Snapchat292627
YouTube858577
WhatsApp383020
Reddit292414
TikTok363133
BeReal442
Rep/Lean RepDem/Lean Dem
Facebook7067
Instagram4353
LinkedIn2934
Twitter (X)2026
Pinterest3535
Snapchat2728
YouTube8284
WhatsApp2533
Reddit2025
TikTok3036
BeReal44

how to make meta analysis research

This fact sheet was compiled by Research Assistant  Olivia Sidoti , with help from Research Analyst  Risa Gelles-Watnick , Research Analyst  Michelle Faverio , Digital Producer  Sara Atske , Associate Information Graphics Designer Kaitlyn Radde and Temporary Researcher  Eugenie Park .

Follow these links for more in-depth analysis of the impact of social media on American life.

  • Americans’ Social Media Use  Jan. 31, 2024
  • Americans’ Use of Mobile Technology and Home Broadband  Jan. 31 2024
  • Q&A: How and why we’re changing the way we study tech adoption  Jan. 31, 2024

Find more reports and blog posts related to  internet and technology .

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

© 2024 Pew Research Center

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Physiol

Meta-Analytic Methodology for Basic Research: A Practical Guide

Nicholas mikolajewicz.

1 Faculty of Dentistry, McGill University, Montreal, QC, Canada

2 Shriners Hospital for Children-Canada, Montreal, QC, Canada

Svetlana V. Komarova

Associated data.

Basic life science literature is rich with information, however methodically quantitative attempts to organize this information are rare. Unlike clinical research, where consolidation efforts are facilitated by systematic review and meta-analysis, the basic sciences seldom use such rigorous quantitative methods. The goal of this study is to present a brief theoretical foundation, computational resources and workflow outline along with a working example for performing systematic or rapid reviews of basic research followed by meta-analysis. Conventional meta-analytic techniques are extended to accommodate methods and practices found in basic research. Emphasis is placed on handling heterogeneity that is inherently prevalent in studies that use diverse experimental designs and models. We introduce MetaLab , a meta-analytic toolbox developed in MATLAB R2016b which implements the methods described in this methodology and is provided for researchers and statisticians at Git repository ( https://github.com/NMikolajewicz/MetaLab ). Through the course of the manuscript, a rapid review of intracellular ATP concentrations in osteoblasts is used as an example to demonstrate workflow, intermediate and final outcomes of basic research meta-analyses. In addition, the features pertaining to larger datasets are illustrated with a systematic review of mechanically-stimulated ATP release kinetics in mammalian cells. We discuss the criteria required to ensure outcome validity, as well as exploratory methods to identify influential experimental and biological factors. Thus, meta-analyses provide informed estimates for biological outcomes and the range of their variability, which are critical for the hypothesis generation and evidence-driven design of translational studies, as well as development of computational models.

Introduction

Evidence-based medical practice aims to consolidate best research evidence with clinical and patient expertise. Systematic reviews and meta-analyses are essential tools for synthesizing evidence needed to inform clinical decision making and policy. Systematic reviews summarize available literature using specific search parameters followed by critical appraisal and logical synthesis of multiple primary studies (Gopalakrishnan and Ganeshkumar, 2013 ). Meta-analysis refers to the statistical analysis of the data from independent primary studies focused on the same question, which aims to generate a quantitative estimate of the studied phenomenon, for example, the effectiveness of the intervention (Gopalakrishnan and Ganeshkumar, 2013 ). In clinical research, systematic reviews and meta-analyses are a critical part of evidence-based medicine. However, in basic science, attempts to evaluate prior literature in such rigorous and quantitative manner are rare, and narrative reviews are prevalent. The goal of this manuscript is to provide a brief theoretical foundation, computational resources and workflow outline for performing a systematic or rapid review followed by a meta-analysis of basic research studies.

Meta-analyses can be a challenging undertaking, requiring tedious screening and statistical understanding. There are several guides available that outline how to undertake a meta-analysis in clinical research (Higgins and Green, 2011 ). Software packages supporting clinical meta-analyses include the Excel plugins MetaXL (Barendregt and Doi, 2009 ) and Mix 2.0 (Bax, 2016 ), Revman (Cochrane Collaboration, 2011 ), Comprehensive Meta-Analysis Software [CMA (Borenstein et al., 2005 )], JASP (JASP Team, 2018 ) and MetaFOR library for R (Viechtbauer, 2010 ). While these packages can be adapted to basic science projects, difficulties may arise due to specific features of basic science studies, such as large and complex datasets and heterogeneity in experimental methodology. To address these limitations, we developed a software package aimed to facilitate meta-analyses of basic research, MetaLab in MATLAB R2016b, with an intuitive graphical interface that permits users with limited statistical and coding background to proceed with a meta-analytic project. We organized MetaLab into six modules ( Figure 1 ), each focused on different stages of the meta-analytic process, including graphical-data extraction, model parameter estimation, quantification and exploration of heterogeneity, data-synthesis, and meta-regression.

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0001.jpg

General framework of MetaLab . The Data Extraction module assists with graphical data extraction from study figures. Fit Model module applies Monte-Carlo error propagation approach to fit complex datasets to model of interest. Prior to further analysis, reviewers have opportunity to manually curate and consolidate data from all sources. Prepare Data module imports datasets from a spreadsheet into MATLAB in a standardized format. Heterogeneity, Meta-analysis and Meta-regression modules facilitate meta-analytic synthesis of data.

In the present manuscript, we describe each step of the meta-analytic process with emphasis on specific considerations made when conducting a review of basic research. The complete workflow of parameter estimation using MetaLab is demonstrated for evaluation of intracellular ATP content in osteoblasts (OB [ATP] ic dataset) based on a rapid literature review. In addition, the features pertaining to larger datasets are explored with the ATP release kinetics from mechanically-stimulated mammalian cells (ATP release dataset) obtained as a result of a systematic review in our prior work (Mikolajewicz et al., 2018 ).

MetaLab can be freely accessed at Git repository ( https://github.com/NMikolajewicz/MetaLab ), and a detailed documentation of how to use MetaLab together with a working example is available in the Supporting materials .

Validity of Evidence in the Basic Sciences

To evaluate the translational potential of basic research, the validity of evidence must first be assessed, usually by examining the approach taken to collect and evaluate the data. Studies in the basic sciences are broadly grouped as hypothesis-generating and hypothesis-driven. The former tend to be small-sampled proof-of-principle studies and are typically exploratory and less valid than the latter. An argument can even be made that studies that report novel findings fall into this group as well, since their findings remain subject to external validation prior to being accepted by the broader scientific community. Alternatively, hypothesis-driven studies build upon what is known or strongly suggested by earlier work. These studies can also validate prior experimental findings with incremental contributions. Although such studies are often overlooked and even dismissed due to a lack of substantial novelty, their role in external validation of prior work is critical for establishing the translational potential of findings.

Another dimension to the validity of evidence in the basic sciences is the selection of experimental model. The human condition is near-impossible to recapitulate in a laboratory setting, therefore experimental models (e.g., cell lines, primary cells, animal models) are used to mimic the phenomenon of interest, albeit imperfectly. For these reasons, the best quality evidence comes from evaluating the performance of several independent experimental models. This is accomplished through systematic approaches that consolidate evidence from multiple studies, thereby filtering the signal from the noise and allowing for side-by-side comparison. While systematic reviews can be conducted to accomplish a qualitative comparison, meta-analytic approaches employ statistical methods which enable hypothesis generation and testing. When a meta-analysis in the basic sciences is hypothesis-driven, it can be used to evaluate the translational potential of a given outcome and provide recommendations for subsequent translational- and clinical-studies. Alternatively, if meta-analytic hypothesis testing is inconclusive, or exploratory analyses are conducted to examine sources of inconsistency between studies, novel hypotheses can be generated, and subsequently tested experimentally. Figure 2 summarizes this proposed framework.

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0002.jpg

Schematic of proposed hierarchy of translational potential in basic research.

Steps in Quantitative Literature Review

All meta-analytic efforts prescribe to a similar workflow, outlined as follows:

  • Define primary and secondary objectives
  • Determine breadth of question
  • Construct search strategy: rapid or systematic search
  • Screen studies and determine eligibility
  • Extract data from relevant studies
  • Collect relevant study-level characteristics and experi-mental covariates
  • Evaluate quality of studies
  • Estimate model parameters for complex relation-ships (optional)
  • Compute appropriate outcome measure
  • Evaluate extent of between-study inconsistency (heterogeneity)
  • Perform relevant data transformations
  • Select meta-analytic model
  • Pool data and calculate summary measure and confidence interval
  • Explore potential sources of heterogeneity (ex. biological or experimental)
  • Subgroup and meta-regression analyses
  • Interpret findings
  • Provide recommendations for future work

Meta-Analysis Methodology

Search and selection strategies.

The first stage of any review involves formulating a primary objective in the form of a research question or hypothesis. Reviewers must explicitly define the objective of the review before starting the project, which serves to reduce the risk of data dredging, where reviewers later assign meaning to significant findings. Secondary objectives may also be defined; however, precaution must be taken as the search strategies formulated for the primary objective may not entirely encompass the body of work required to address the secondary objective. Depending on the purpose of a review, reviewers may choose to undertake a rapid or systematic review. While the meta-analytic methodology is similar for systematic and rapid reviews, the scope of literature assessed tends to be significantly narrower for rapid reviews permitting the project to proceed faster.

Systematic Review and Meta-Analysis

Systematic reviews involve comprehensive search strategies that enable reviewers to identify all relevant studies on a defined topic (DeLuca et al., 2008 ). Meta-analytic methods then permit reviewers to quantitatively appraise and synthesize outcomes across studies to obtain information on statistical significance and relevance. Systematic reviews of basic research data have the potential of producing information-rich databases which allow extensive secondary analysis. To comprehensively examine the pool of available information, search criteria must be sensitive enough not to miss relevant studies. Key terms and concepts that are expressed as synonymous keywords and index terms, such as Medical Subject Headings (MeSH), must be combined using Boolean operators AND, OR and NOT (Ecker and Skelly, 2010 ). Truncations, wildcards, and proximity operators can also help refine a search strategy by including spelling variations and different wordings of the same concept (Ecker and Skelly, 2010 ). Search strategies can be validated using a selection of expected relevant studies. If the search strategy fails to retrieve even one of the selected studies, the search strategy requires further optimization. This process is iterated, updating the search strategy in each iterative step until the search strategy performs at a satisfactory level (Finfgeld-Connett and Johnson, 2013 ). A comprehensive search is expected to return a large number of studies, many of which are not relevant to the topic, commonly resulting in a specificity of <10% (McGowan and Sampson, 2005 ). Therefore, the initial stage of sifting through the library to select relevant studies is time-consuming (may take 6 months to 2 years) and prone to human error. At this stage, it is recommended to include at least two independent reviewers to minimize selection bias and related errors. Nevertheless, systematic reviews have a potential to provide the highest quality quantitative evidence synthesis to directly inform the experimental and computational basic, preclinical and translational studies.

Rapid Review and Meta-Analysis

The goal of the rapid review, as the name implies, is to decrease the time needed to synthesize information. Rapid reviews are a suitable alternative to systematic approaches if reviewers prefer to get a general idea of the state of the field without an extensive time investment. Search strategies are constructed by increasing search specificity, thus reducing the number of irrelevant studies identified by the search at the expense of search comprehensiveness (Haby et al., 2016 ). The strength of a rapid review is in its flexibility to adapt to the needs of the reviewer, resulting in a lack of standardized methodology (Mattivi and Buchberger, 2016 ). Common shortcuts made in rapid reviews are: (i) narrowing search criteria, (ii) imposing date restrictions, (iii) conducting the review with a single reviewer, (iv) omitting expert consultation (i.e., librarian for search strategy development), (v) narrowing language criteria (ex. English only), (vi) foregoing the iterative process of searching and search term selection, (vii) omitting quality checklist criteria and (viii) limiting number of databases searched (Ganann et al., 2010 ). These shortcuts will limit the initial pool of studies returned from the search, thus expediting the selection process, but also potentially resulting in the exclusion of relevant studies and introduction of selection bias. While there is a consensus that rapid reviews do not sacrifice quality, or synthesize misrepresentative results (Haby et al., 2016 ), it is recommended that critical outcomes be later verified by systematic review (Ganann et al., 2010 ). Nevertheless, rapid reviews are a viable alternative when parameters for computational modeling need to be estimated. While systematic and rapid reviews rely on different strategies to select the relevant studies, the statistical methods used to synthesize data from the systematic and rapid review are identical.

Screening and Selection

When the literature search is complete (the date articles were retrieved from the databases needs to be recorded), articles are extracted and stored in a reference manager for screening. Before study screening, the inclusion and exclusion criteria must be defined to ensure consistency in study identification and retrieval, especially when multiple reviewers are involved. The critical steps in screening and selection are (1) removing duplicates, (2) screening for relevant studies by title and abstract, and (3) inspecting full texts to ensure they fulfill the eligibility criteria. There are several reference managers available including Mendeley and Rayyan, specifically developed to assist with screening systematic reviews. However, 98% of authors report using Endnote, Reference Manager or RefWorks to prepare their reviews (Lorenzetti and Ghali, 2013 ). Reference managers often have deduplication functions; however, these can be tedious and error-prone (Kwon et al., 2015 ). A protocol for faster and more reliable de-duplication in Endnote has been recently proposed (Bramer et al., 2016 ). The selection of articles should be sufficiently broad not to be dominated by a single lab or author. In basic research articles, it is common to find data sets that are reused by the same group in multiple studies. Therefore, additional precautions should be taken when deciding to include multiple studies published by a single group. At the end of the search, screening and selection process, the reviewer obtains a complete list of eligible full-text manuscripts. The entire screening and selection process should be reported in a PRISMA diagram, which maps the flow of information throughout the review according to prescribed guidelines published elsewhere (Moher et al., 2009 ). Figure 3 provides a summary of the workflow of search and selection strategies using the OB [ATP] ic rapid review and meta-analysis as an example.

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0003.jpg

Example of the rapid review literature search. (A) Development of the search parameters to find literature on the intracellular ATP content in osteoblasts. (B) PRISMA diagram for the information flow.

Data Extraction, Initial Appraisal, and Preparation

Identification of parameters to be extracted.

It is advised to predefine analytic strategies before data extraction and analysis. However, the availability of reported effect measures and study designs will often influence this decision. When reviewers aim to estimate the absolute mean difference (absolute effect), normalized mean difference, response ratio or standardized mean difference (ex. Hedges' g), they need to extract study-level means (θ i ), standard deviations ( sd (θ i )), and sample sizes ( n i ), for control (denoted θ i c , s d ( θ i c ) , and n i c ) and intervention (denoted θ i r , s d ( θ i r ) , and n i r ) groups, for studies i . To estimate absolute mean effect, only the mean ( θ i r ), standard deviation ( s d ( θ i r ) ) , and sample size ( n i r ) are required. In basic research, it is common for a single study to present variations of the same observation (ex. measurements of the same entity using different techniques). In such cases, each point may be treated as an individual observation, or common outcomes within a study can be pooled by taking the mean weighted by the sample size. Another consideration is inconsistency between effect size units reported on the absolute scale, for example, protein concentrations can be reported as g/cell, mol/cell, g/g wet tissue or g/g dry tissue. In such cases, conversion to a common representation is required for comparison across studies, for which appropriate experimental parameters and calibrations need to be extracted from the studies. While some parameters can be approximated by reviewers, such as cell-related parameters found in BioNumbers database (Milo et al., 2010 ) and equipment-related parameters presumed from manufacturer manuals, reviewers should exercise caution when making such approximations as they can introduce systematic errors that manifest throughout the analysis. When data conversion is judged to be difficult but negative/basal controls are available, scale-free measures (i.e., normalized, standardized, or ratio effects) can still be used in the meta-analysis without the need to convert effects to common units on the absolute scale. In many cases, reviewers may only be able to decide on a suitable effect size measure after data extraction is complete.

It is regrettably common to encounter unclear or incomplete reporting, especially for the sample sizes and uncertainties. Reviewers may choose to reject studies with such problems due to quality concerns or to employ conservative assumptions to estimate missing data. For example, if it is unclear if a study reports the standard deviation or standard error of the mean, it can be assumed to be a standard error, which provides a more conservative estimate. If a study does not report uncertainties but is deemed important because it focuses on a rare phenomenon, imputation methods have been proposed to estimate uncertainty terms (Chowdhry et al., 2016 ). If a study reports a range of sample sizes, reviewers should extract the lowest value. Strategies to handle missing data should be pre-defined and thoroughly documented.

In addition to identifying relevant primary parameters, a priori defined study-level characteristics that have a potential to influence the outcome, such as species, cell type, specific methodology, should be identified and collected in parallel to data extraction. This information is valuable in subsequent exploratory analyses and can provide insight into influential factors through between-study comparison.

Quality Assessment

Formal quality assessment allows the reviewer to appraise the quality of identified studies and to make informed and methodical decision regarding exclusion of poorly conducted studies. In general, based on initial evaluation of full texts, each study is scored to reflect the study's overall quality and scientific rigor. Several quality-related characteristics have been described (Sena et al., 2007 ), such as: (i) published in peer-reviewed journal, (ii) complete statistical reporting, (iii) randomization of treatment or control, (iv) blinded analysis, (v) sample size calculation prior to the experiment, (vi) investigation of a dose-response relationship, and (vii) statement of compliance with regulatory requirements. We also suggest that the reviewers of basic research studies assess (viii) objective alignment between the study in question and the meta-analytic project. This involves noting if the outcome of interest was the primary study objective or was reported as a supporting or secondary outcome, which may not receive the same experimental rigor and is subject to expectation bias (Sheldrake, 1997 ). Additional quality criteria specific to experimental design may be included at the discretion of the reviewer. Once study scores have been assembled, study-level aggregate quality scores are determined by summing the number of satisfied criteria, and then evaluating how outcome estimates and heterogeneity vary with study quality. Significant variation arising from poorer quality studies may justify study omission in subsequent analysis.

Extraction of Tabular and Graphical Data

The next step is to compile the meta-analytic data set, which reviewers will use in subsequent analysis. For each study, the complete dataset which includes parameters required to estimate the target outcome, study characteristics, as well as data necessary for unit conversion needs to be extracted. Data reporting in basic research are commonly tabular or graphical. Reviewers can accurately extract tabular data from the text or tables. However, graphical data often must be extracted from the graph directly using time consuming and error prone methods. The Data Extraction Module in MetaLab was developed to facilitate systematic and unbiased data extraction; Reviewers provide study figures as inputs, then specify the reference points that are used to calibrate the axes and extract the data ( Figures 4A,B ).

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0004.jpg

MetaLab data extraction procedure is accurate, unbiased and robust to quality of data presentation. (A,B) Example of graphical data extraction using MetaLab. (A) Original figure (Bodin et al., 1992 ) with axes, data points and corresponding errors marked by reviewer. (B) Extracted data with error terms. (C–F) Validation of MetaLab data-extraction module. (C) Synthetic datasets were constructed using randomly generated data coordinates and marker sizes. (D) Extracted values were consistent with true values evaluated by linear regression with the slope β slope , red line: line of equality. (E) Data extraction was unbiased, evaluated with distribution of percent errors between true and extracted values. E mean , E median , E min , and E max are mean, median, minimum, and maximum % error respectively. (F) The absolute errors of extracted data were independent of data marker size, red line: line regression with the slope β slope .

To validate the performance of the MetaLab Data Extraction Module, we generated figures using 319 synthetic data points plotted with varying markers sizes ( Figure 4C ). Extracted and actual values were correlated ( R 2 = 0.99) with the relationship slope estimated as 1.00 (95% CI: 0.99 to 1.01) ( Figure 4D ). Bias was absent, with a mean percent error of 0.00% (95% CI: −0.02 to 0.02%) ( Figure 4E ). The narrow range of errors between −2.00 and 1.37%, and consistency between the median and mean error indicated no skewness. Data marker size did not contribute to the extraction error, as 0.00% of the variation in absolute error was explained by marker size, and the slope of the relationship between marker size and extraction error was 0.000 (95% CI: −0.001, 0.002) ( Figure 4F ). There data demonstrate that graphical data can be reliably extracted using MetaLab .

Extracting Data From Complex Relationships

Basic science often focuses on natural processes and phenomena characterized by complex relationships between a series of inputs (e.g., exposures) and outputs (e.g., response). The results are commonly explained by an accepted model of the relationship, such as Michaelis-Menten model of enzyme kinetics which involves two parameters–V max for the maximum rate and K m for the substrate concentration half of V max . For meta-analysis, model parameters characterizing complex relationships are of interest as they allow direct comparison of different multi-observational datasets. However, study-level outcomes for complex relationships often (i) lack consistency in reporting, and (ii) lack estimates of uncertainties for model parameters. Therefore, reviewers wishing to perform a meta-analysis of complex relationships may need to fit study-level data to a unified model y = f ( x , β) to estimate parameter set β characterizing the relationship ( Table 1 ), and assess the uncertainty in β.

Commonly used models of complex relationships in basic sciences.

Linear model = + β : slope, magnitude of relationship
β : intercept,
response at x = 0
Reaction rates
Quadratic model
(vertex form)
β : curvature factor
β : x at global max/min
β : global maxima/minimal
Trajectory modeling
Exponential model β : intercept, response at x = 0
β : decay/growth constant
Population decay/growth
Michaelis-Menten, hyperbolic curve β : max response
β : x at half max response
Enzyme kinetics, reaction rates, infection rates, drug clearance
Sigmoidal E Model,
Hill Function
β : max response
β : x at half max response
β : slope-related term
Dose-response relationships, pharmaco dynamics

The study-level data can be fitted to a model using conventional fitting methods, in which the model parameter error terms depend on the goodness of fit and number of available observations. Alternatively, a Monte Carlo simulation approach (Cox et al., 2003 ) allows for the propagation of study-level variances (uncertainty in the model inputs) to the uncertainty in the model parameter estimates ( Figure 5 ). Suppose that study i reported a set of k predictor variables x = { x j |1 ≤ j ≤ k } for a set of outcomes θ = {θ j |1 ≤ j ≤ k }, and that there is a corresponding set of standard deviations sd (θ) = { sd (θ j )|1 ≤ j ≤ k } and sample sizes n = { n j |1 ≤ j ≤ k } ( Figure 5A ). The Monte Carlo error propagation method assumes that outcomes are normally distributed, enabling pseudo random observations to be sampled from a distribution approximated by N ( θ j , s d ( θ j ) 2 ) . The pseudo random observations are then averaged to obtain a Monte-Carlo estimate θ j * for each observation such that

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0005.jpg

Model parameter estimation with Monte-Carlo error propagation method. (A) Study-level data taken from ATP release meta-analysis. (B) Assuming sigmoidal model, parameters were estimated using Fit Model MetaLab module by randomly sampling data from distributions defined by study level data. Model parameters were estimated for each set of sampled data. (C) Final model using parameters estimated from 400 simulations. (D) Distributions of parameters estimated for given dataset are unimodal and symmetrical.

where θ( j , m ) * represents a pseudo-random variable sampled n j times from N ( θ j , s d ( θ j ) 2 ) . The relationship between x and θ * = { θ j * | 1 ≤ j ≤ k } is then fitted with the model of interest using the least-squares method to obtain an estimate of model parameters β ( Figure 5B ). After many iterations of resampling and fitting, a distribution of parameter estimates N ( β ¯ ,   s d ( β ¯ ) 2 ) is obtained, from which the parameter means β ¯ and variances s d ( β ¯ ) 2 can be estimated ( Figures 5C,D ). As the number of iterations M tend to infinity, the parameter estimate converges to the expected value E (β).

It is critical for reviewers to ensure the data is consistent with the model such that the estimated parameters sufficiently capture the information conveyed in the underlying study-level data. In general, reliable model fittings are characterized by normal parameter distributions ( Figure 5D ) and have a high goodness of fit as quantified by R 2 . The advantage of using the Monte-Carlo approach is that it works as a black box procedure that does not require complex error propagation formulas, thus allowing handling of correlated and independent parameters without additional consideration.

Study-Level Effect Sizes

Depending on the purpose of the review product, study-level outcomes θ i can be expressed as one of several effect size measures. The absolute effect size, computed as a mean outcome or absolute difference from baseline, is the simplest, is independent of variance, and retains information about the context of the data (Baguley, 2009 ). However, the use of absolute effect size requires authors to report on a common scale or provide conversion parameters. In cases where a common scale is difficult to establish, a scale-free measure, such as standardized, normalized or relative measures can be used. Standardized mean differences, such Hedges' g or Cohen d, report the outcome as the size of the effect (difference between the means of experimental and control groups) relative to the overall variance (pooled and weighted standard deviation of combined experimental and control groups). The standardized mean difference, in addition to odds or risk ratios, is widely used in meta-analysis of clinical studies (Vesterinen et al., 2014 ), since it allows to summarize metrics that do not have unified meaning (e.g., a pain score), and takes into account the variability in the samples. However, the standardized measure is rarely used in basic science since study outcomes are commonly a defined measure, sample sizes are small, and variances are highly influenced by experimental and biological factors. Other measures that are more suited for basic science are the normalized mean difference, which expresses the difference between the outcome and baseline as a proportion of the baseline (alternatively called the percentage difference), and response ratio, which reports the outcome as a proportion of the baseline. All discussed measures have been included in MetaLab ( Table 2 ).

Types of effect sizes.

Absolute
Where
Standardized (Hedges' g)
Where
Normalized
Ratio

Provided are formulas to calculate the mean and standard error for the specified effect sizes .

Data Synthesis

The goal of any meta-analysis is to provide an outcome estimate that is representative of all study-level findings. One important feature of the meta-analysis is its ability to incorporate information about the quality and reliability of the primary studies by weighing larger, better reported studies more heavily. The two quantities of interest are the overall estimate and the measure of the variability in this estimate. Study-level outcomes θ i are synthesized as a weighted mean θ ^ according to the study-level weights w i :

where N is number of studies or datasets. The choice of a weighting scheme dictates how study-level variances are pooled to estimate the variance of the weighted mean. The weighting scheme thus significantly influences the outcome of meta-analysis, and if poorly chosen, potentially risks over-weighing less precise studies and generating a less valid, non-generalizable outcome. Thus, the notion of defining an a priori analysis protocol has to be balanced with the need to assure that the dataset is compatible with the chosen analytic strategy, which may be uncertain prior to data extraction. We provide strategies to compute and compare different study-level and global outcomes and their variances.

Weighting Schemes

To generate valid estimates of cumulative knowledge, studies are weighed according to their reliability. This conceptual framework, however, deteriorates if reported measures of precision are themselves flawed. The most commonly used measure of precision is the inverse variance which is a composite measure of total variance and sample size, such that studies with larger sample sizes and lower experimental errors are more reliable and more heavily weighed. Inverse variance weighting schemes are valid when (i) sampling error is random, (ii) the reported effects are homoscedastic, i.e., have equal variance and (iii) the sample size reflects the number of independent experimental observations. When assumptions (i) or (ii) are violated, sample size weighing can be used as an alternative. Despite sample size and sample variance being such critical parameters in the estimation of the global outcome, they are often prone to deficient reporting practices.

Potential problems with sample variance and sample size

The standard error se (θ i ) is required to compute inverse variance weights, however, primary literature as well as meta-analysis reviewers often confuse standard errors with standard deviations sd (θ i ) (Altman and Bland, 2005 ). Additionally, many assays used in basic research often have uneven error distributions, such that the variance component arising from experimental error depends on the magnitude of the effect (Bittker and Ross, 2016 ). Such uneven error distributions will lead to biased weighing that does not reflect true precision in measurement. Fortunately, the standard error and standard deviation have characteristic properties that can be assessed by the reviewer to determine whether inverse variance weights are appropriate for a given dataset. The study-level standard error se (θ i ) is a measure of precision and is estimated as the product of the sample standard deviation sd (θ i ) and margin of error 1 n i for study i . Therefore, the standard error is expected to be approximately inversely proportionate to the root of the study-level sample size n i

Unlike the standard error, the standard deviation–a measure of the variance of a random variable sd (θ) 2 -is assumed to be independent of the sample size because it is a descriptive statistic rather than a precision statistic. Since the total observed study-level sample variance is the sum of natural variability (assumed to be constant for a phenomenon) and random error, no relationship is expected between reported standard deviations and sample sizes. These assumptions can be tested by correlation analysis and can be used to inform the reviewer about the reliability of the study-level uncertainty measures. For example, a relationship between sample size and sample variance was observed for the OB [ATP] ic dataset ( Figure 6A) , but not for the ATP release data ( Figure 6B ). Therefore, in the case of the OB [ATP] ic data set, lower variances are not associated with higher precision and inverse variance weighting is not appropriate. Sample sizes are also frequently misrepresented in the basic sciences, as experimental replicates and repeated experiments are often reported interchangeably (incorrectly) as sample sizes (Vaux et al., 2012 ). Repeated (independent) experiments refer to number of randomly sampled observations, while replicates refer to the repeated measurement of a sample from one experiment to improve measurement precision. Statistical inference theory assumes random sampling, which is satisfied by independent experiments but not by replicate measurements. Misrepresentative reporting of replicates as the sample size may artificially inflate the reliability of results. While this is difficult to identify, poor reporting may be reflected in the overall quality score of a study.

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0006.jpg

Assessment of study-level outcomes. (A,B) Reliability of study-level error measures. Relationship between study-level squared standard deviation s d ( θ i ) 2 and sample sizes n i are assumed to be independent when reliably reported. Association between s d ( θ i ) 2 and n i was present in OB [ATP] ic data set (A) and absent in ATP release data set (B) , red line : linear regression. (C,D) Distributions of study-level outcomes. Assessment of unweighted (UW– black ) and weighted (fixed effect; FE– blue , random effects; RE– red , sample-size weighting; N– green ) study-level distributions of data from OB [ATP] ic ( C ) and ATP release ( D ) data sets, before ( left ) and after log 10 transformation ( right ). Heterogeneity was quantified by Q, I 2 , and H 2 heterogeneity statistics. (E,F ) After log 10 transformation, H 2 heterogeneity statistics increased for OB [ATP] ic data set ( E ) and decreased for ATP release ( F ) data set.

Inverse variance weighting

The inverse variance is the most common measure of precision, representing a composite measure of total variance and sample size. Widely used weighting schemes based on the inverse variance are fixed effect or random effects meta-analytic models. The fixed effect model assumes that all the studies sample one true effect γ. The observed outcome θ i for study i is then a function of a within-study error ε i , θ i = γ + ε i , where ε i is normally distributed ε i ~ N ( 0 , s e ( θ i ) 2 ) . The standard error se (θ i ) is calculated from the sample standard deviation sd (θ i ) and sample size n i as:

Alternatively, the random effects model supposes that each study samples a different true outcome μ i , such that the combined effect μ is the mean of a population of true effects. The observed effect θ i for study i is then influenced by the intrastudy error ε i and interstudy error ξ i , θ i = μ i + ε i + ξ i , where ξ i is also assumed to be normally distributed ξ i ~ N (0, τ 2 ), with τ 2 representing the extent of heterogeneity, or between-study (interstudy) variance.

Study-level estimates for a fixed effect or random effects model are weighted using the inverse variance:

These weights are used to calculate the global outcome θ ^ (Equation 3) and the corresponding standard error s e ( θ ^ ) :

where N = number of datasets/studies. In practice, random effects models are favored over the fixed effect model, due to the prevalence of heterogeneity in experimental methods and biological outcomes. However, when there is no between-study variability (τ 2 = 0), the random effects model reduces to a fixed effect model. In contrast, when τ 2 is exceedingly large and interstudy variance dominates the weighting term [ τ 2 ≫ s e ( θ i ) 2 ] , random effects estimates will tend to an unweighted mean.

Interstudy variance τ 2 estimators . Under the assumptions of a random effects model, the total variance is the sum of the intrastudy variance (experimental sampling error) and interstudy variance τ 2 (variability of true effects). Since the distribution of true effects is unknown, we must estimate the value of τ 2 based on study-level outcomes (Borenstein, 2009 ). The DerSimonian and Laird (DL) method is the most commonly used in meta-analyses (DerSimonian and Laird, 1986 ). Other estimators such as the Hunter and Schmidt (Hunter and Schmidt, 2004 ), Hedges (Hedges and Olkin, 1985 ), Hartung-Makambi (Hartung and Makambi, 2002 ), Sidik-Jonkman (Sidik and Jonkman, 2005 ), and Paule-Mandel (Paule and Mandel, 1982 ) estimators have been proposed as either alternatives or improvements over the DL estimator (Sanchez-Meca and Marin-Martinez, 2008 ) and have been implemented in MetaLab ( Table 3) . Negative values of τ 2 are truncated at zero. An overview of the various τ 2 estimators along with recommendations on their use can be found elsewhere (Veroniki et al., 2016 ).

Interstudy variance estimators.

estimate
DerSimonian-Laird (DL)*
Hunter-Schmidt (HS)
Hedges (H)
Hartung-Makambi (HM)*
Sidik-Jonkman (SJ) ,
Where
Paule-Mandel (PM)
Where

N = number of datasets/studies .

Sample-size weighting

Sample-size weighting is preferred in cases where variance estimates are unavailable or unreliable. Under this weighting scheme, study-level sample sizes are used in place of inverse variances as weights. The sampling error is then unaccounted for; however, since sampling error is random, larger sample sizes will effectively average out the error and produce more dependable results. This is contingent on reliable reporting of sample sizes which is difficult to assess and can be erroneous as detailed above. For a sample size weighted estimate, study-level sample sizes n i replace weights that are used to calculate the global effect size θ ^ , such that

The pooled standard error s e ( θ ^ ) for the global effect is then:

While sample size weighting is less affected by sampling variance, the performance of this estimator depends on the availability of studies (Marin-Martinez and Sanchez-Meca, 2010 ). When variances are reliably reported, sample-size weights should roughly correlate to inverse variance weights under the fixed effect model.

Meta-Analytic Data Distributions

One important consideration the reviewer should attend to is the normality of the study-level effects distributions assumed by most meta-analytic methods. Non-parametric methods that do not assume normality are available but are more computationally intensive and inaccessible to non-statisticians (Karabatsos et al., 2015 ). The performance of parametric meta-analytic methods has been shown to be robust to non-normally distributed effects (Kontopantelis and Reeves, 2012 ). However, this robustness is achieved by deriving artificially high estimates of heterogeneity for non-normally distributed data, resulting in conservatively wide confidence intervals and severely underpowered results (Jackson and Turner, 2017 ). Therefore, it is prudent to characterize the underlying distribution of study-level effects and perform transformations to normalize distributions to preserve the inferential integrity of the meta-analysis.

Assessing data distributions

Graphical approaches, such as the histogram, are commonly used to assess the distribution of data; however, in a meta-analysis, they can misrepresent the true distribution of effect sizes that may be different due to unequal weights assigned to each study. To address this, we can use a weighted histogram to evaluate effect size distributions ( Figure 6 ). A weighted histogram can be constructed by first binning studies according to their effect sizes. Each bin is then assigned weighted frequencies, calculated as the sum of study-level weights within the given bin. The sum of weights in each bin are then normalized by the sum of all weights across all bins

where P j is the weighted frequency for bin j , w ij is the weight for the effect size in bin j from study i , and nBins is the total number of bins. If the distribution is found deviate from normality, the most common explanations are that (i) the distribution is skewed due to inconsistencies between studies, (ii) subpopulations exist within the dataset giving rise to multimodal distributions or (iii) the studied phenomenon is not normally distributed. The source of inconsistencies and multimodality can be explored during the analysis of heterogeneity (i.e., to determine whether study-level characteristics can explain observed discrepancies). Skewness may however be inherent to the data when values are small, variances are large, and values cannot be negative (Limpert et al., 2001 ) and has been credited to be characteristic of natural processes (Grönholm and Annila, 2007 ). For sufficiently large sample sizes the central limit theorem holds that the means of a skewed data are approximately normally distributed. However, due to common limitation in the number of studies available for meta-analyses, meta-analytic global estimates of skewed distributions are often sensitive to extreme values. In these cases, data transformation can be used to achieve a normal distribution on the logarithmic scale (i.e., lognormal distribution).

Lognormal distributions

Since meta-analytic methods typically assume normality, the log transformation is a useful tool used to normalize skewed distributions ( Figures 6C–F ). In the ATP release dataset, we found that log transformation normalized the data distribution. However, in the case of the OB [ATP] ic dataset, log transformation revealed a bimodal distribution that was otherwise not obvious on the raw scale.

Data normalization by log transformation allows meta-analytic techniques to maintain their inferential properties. The outcomes synthesized on the logarithmic scale can then be transformed to the original raw scale to obtain asymmetrical confidence intervals which further accommodate the skew in the data. Study-level effect sizes θ i can be related to the logarithmic mean Θ i through the forward log transformation, meta-analyzed on the logarithmic scale, and back-transformed to the original scale using one of the back-transformation methods ( Table 4 ). We have implemented three different back-transformation methods into MetaLab, including geometric approximation (anti-log), naïve approximation (rearrangement of forward-transformation method) and tailor series approximation (Higgins et al., 2008 ). The geometric back-transformation will yield an estimate of θ ^ that is approximately equal to the median of the study-level effects. The naïve or tailor series approximation differ in how the standard errors are approximated, which is used to obtain a point estimate on the original raw scale. The naïve and tailor series approximations were shown to maintain adequate inferential properties in the meta-analytic context (Higgins et al., 2008 ).

Logarithmic Transformation Methods.

Geometric Where corresponds to critical value
Naïve approximately
Tailor Series approximately

Forward-transformation of study-level estimates θ i to corresponding log-transformed estimates Θ i , and back-transformation of meta-analysis outcome Θ ^ to the corresponding outcome θ ^ on the raw scale (Higgins et al., 2008 ). v 1−α/2 : confidence interval critical value at significance level α .

Confidence Intervals

Once the meta-analysis global estimate and standard error has been computed, reviewers may proceed to construct the confidence intervals (CI). The CI represents the range of values within which the true mean outcome is contained with the probability of 1-α. In meta-analyses, the CI conveys information about the significance, magnitude and direction of an effect, and is used for inference and generalization of an outcome. Values that do not fall in the range of the CI may be interpreted as significantly different. In general, the CI is computed as the product of the standard error s e ( θ ^ )   and the critical value v 1−α/2 :

CI estimators

The critical value v 1−α/2 is derived from a theoretical distribution and represents the significance threshold for level α. A theoretical distribution describes the probability of any given possible outcome occurrence for a phenomenon. Extreme outcomes that lie furthest from the mean are known as the tails. The most commonly used theoretical distributions are the z-distribution and t -distribution, which are both symmetrical and bell-shaped, but differ in how far reaching or “heavy” the tails are. Heavier tails will result in larger critical values which translate to wider confidence intervals, and vice versa. Critical values drawn from a z-distribution, known as z-scores ( z ), are used when data are normal, and a sufficiently large number of studies are available (>30). The tails of a z-distribution are independent of the sample size and reflect those expected for a normal distribution. Critical values drawn from a t-distribution, known as t-scores (t), also assume data are normally-distributed, however, are used when there are fewer available studies (<30) because the t-distribution tails are heavier. This produces more conservative (wider) CIs, which help ensure that the data are not misleading or misrepresentative when there is limited evidence available. The heaviness of the t-distribution tails is dictated by the degree of freedom df , which is related to the number of available studies N ( df = N − 1 ) such that fewer studies will result in heavier t-distribution tails and therefore larger critical values. Importantly, the t-distribution is asymptotically normal and will thus converge to a z-distribution for a sufficiently large number of studies, resulting in similar critical values. For example, for a significance level α = 0.05 (5% false positive rate), the z-distribution will always yield a critical value v = 1.96, regardless of how many studies are available. The t-distribution will however yield v = 2.78 for 5 studies, v = 2.26 for 10 studies, v = 2.05 for 30 studies and v = 1.98 for 100 studies, gradually converging to 1.96 as the number of studies increases. We have implemented the z-distribution and t-distribution CI estimators into MetaLab.

Evaluating Meta-Analysis Performance

In general, 95% of study-level outcomes are expected to fall within the range of the 95% global CI. To determine whether the global 95% CI is consistent with the underlying study-level outcomes, the coverage of the CI can be computed as the proportion of study-level 95% CIs that overlap with the global 95% CI:

The coverage is a performance measure used to determine whether inference made on the study-level is consistent with inference made on the meta-analytic level. Coverage that is less than expected for a specified significance level (i.e., <95% coverage for α = 0.05) may be indicative of inaccurate estimators, excessive heterogeneity or inadequate choice of meta-analytic model, while coverage exceeding 95% may indicate an inefficient estimator that results in insufficient statistical power.

Overall, the performance of a meta-analysis is heavily influenced by the choice of weighting scheme and data transformation ( Figure 7 ). This is especially evident in the smaller datasets, such as our OB [ATP] i example, where both the global estimates and the confidence intervals are dramatically different under different weighting schemes ( Figure 7A ). Working with larger datasets, such as ATP release kinetics, allows to somewhat reduce the influence of the assumed model ( Figure 7B ). However, normalizing data distribution (by log transformation) produces much more consistent outcomes under different weighting schemes for both datasets, regardless of the number of available studies ( Figures 7A,B , log 10 synthesis ).

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0007.jpg

Comparison of global effect estimates using different weighting schemes. (A,B) Global effect estimates for OB [ATP] ic (A) and ATP release (B) following synthesis of original data (raw, black ) or of log 10 -transformed data followed by back-transformation to original scale (log 10 , gray ). Global effects ± 95% CI were obtained with unweighted data (UW), or using fixed effect (FE), random effects (RE), and sample-size ( n ) weighting schemes.

Analysis of Heterogeneity

Heterogeneity refers to inconsistency between studies. A large part of conducting a meta-analysis involves quantifying and accounting for sources of heterogeneity that may compromise the validity of meta-analysis. Basic research meta-analytic datasets are expected to be heterogeneous because ( i ) basic research literature searches tend to retrieve more studies than clinical literature searches and ( ii ) experimental methodologies used in basic research are more diverse and less standardized compared to clinical research. The presence of heterogeneity may limit the generalizability of an outcome due to the lack of study-level consensus. Nonetheless, exploration of heterogeneity sources can be insightful for the field in general, as it can identify biological or methodological factors that influence the outcome.

Quantifying of Heterogeneity

Higgins and Thompson emphasized that a heterogeneity metric should be (i) dependent on magnitude of heterogeneity, (ii) independent of measurement scale, (iii) independent of sample size and (iv) easily interpretable (Higgins and Thompson, 2002 ). Regrettably, the most commonly used test of heterogeneity is the Cochrane's Q test (Borenstein, 2009 ), which has been repeatedly shown to have undesirable statistical properties (Higgins et al., 2003 ). Nonetheless, we will introduce it here, not because of its widespread use, but because it is an intermediary statistic used to obtain more useful measures of heterogeneity, H 2 and I 2 . The measure of total variation Q total statistic is calculated as the sum of the weighted squared differences between the study-level means θ i and the fixed effect estimate θ ^ F E :

The Q total statistic is compared to a chi-square (χ 2 ) distribution ( df = N-1 ) to obtain a p -value, which, if significant, supports the presence of heterogeneity. However, the Q -test has been shown to be inadequately powered when the number of studies is too low ( N < 10) and excessively powered when study number is too high (N > 50) (Gavaghan et al., 2000 ; Higgins et al., 2003 ). Additionally, the Q total statistic is not a measure of the magnitude of heterogeneity due to its inherent dependence on the number of studies. To address this limitation, H 2 heterogeneity statistics was developed as the relative excess in Q total over degrees of freedom df :

H 2 is independent of the number of studies in the meta-analysis and is indicative of the magnitude of heterogeneity (Higgins and Thompson, 2002 ). For values <1, H 2 is truncated at 1, therefore values of H 2 can range from one to infinity, where H 2 = 1 indicates homogeneity. The corresponding confidence intervals for H 2 are

Intervals that do not overlap with 1 indicate significant heterogeneity. A more easily interpretable measure of heterogeneity is the I 2 statistic, which is a transformation of H 2 :

The corresponding 95% CI for I 2 is derived from the 95% CI for H 2

Values of I 2 range between 0 and 100% and describe the percentage of total variation that is attributed to heterogeneity. Like H 2 , I 2 provides a measure of the magnitude of heterogeneity. Values of I 2 at 25, 50, and 75% are generally graded as low, moderate and high heterogeneity, respectively (Higgins and Thompson, 2002 ; Pathak et al., 2017 ). However, several limitations have been noted for the I 2 statistic. I 2 has a non-linear dependence on τ 2 , thus I 2 will appear to saturate as it approaches 100% (Huedo-Medina et al., 2006 ). In cases of excessive heterogeneity, if heterogeneity is partially explained through subgroup analysis or meta-regression, residual unexplained heterogeneity may still be sufficient to maintain I 2 near saturation. Therefore, I 2 will fail to convey the decline in overall heterogeneity, while H 2 statistic that has no upper limit will allow to track changes in heterogeneity more meaningfully. In addition, a small number of studies (<10) will bias I 2 estimates, contributing to uncertainties inevitable associated with small meta-analyses (von Hippel, 2015 ). Of the three heterogeneity statistics Q total , H 2 and I 2 described, we recommend that H 2 is used as it best satisfies the criteria for a heterogeneity statistic defined by Higgins and Thompson ( 2002 ).

Identifying bias

Bias refers to distortions in the data that may result in misleading meta-analytic outcomes. In the presence of bias, meta-analysis outcomes are often contradicted by higher quality large sample-sized studies (Egger et al., 1997 ), thereby compromising the validity of the meta-analytic study. Sources of observed bias include publication bias, methodological inconsistencies and quality, data irregularities due to poor quality design, inadequate analysis or fraud, and availability or selection bias (Egger et al., 1997 ; Ahmed et al., 2012 ). At the level of study identification and inclusion for meta-analysis, systematic searches are preferred over rapid review search strategies, as narrow search strategies may omit relevant studies. Withholding negative results is also a common source of publication bias, which is further exacerbated by the small-study effect (the phenomenon by which smaller studies produce results with larger effect sizes than larger studies) (Schwarzer et al., 2015 ). By extension, smaller studies that produce negative results are more likely to not be published compared to larger studies that produce negative results. Identifying all sources of bias is unfeasible, however, tools are available to estimate the extent of bias present.

Funnel plots . Funnel plots have been widely used to assess the risk of bias and examine meta-analysis validity (Light and Pillemer, 1984 ; Borenstein, 2009 ). The logic underlying the funnel plot is that in the absence of bias, studies are symmetrically distributed around the fixed effect size estimate, due to sampling error being random. Moreover, precise study-level estimates are expected to be more consistent with the global effect size than less precise studies, where precision is inversely related to the study-level standard error. Thus, for an unbiased set of studies, study-level effects θ i plotted in relation to the inverse standard error 1/ se (θ i ) will produce a funnel shaped plot. Theoretical 95% CIs for the range of plotted standard errors are included as reference to visualize the expected distribution of studies in the absence of bias (Sterne and Harbord, 2004 ). When bias is present, study-level effects will be asymmetrically distributed around the global fixed-effect estimate. In the past, funnel plot asymmetries have been attributed solely to publication bias, however they should be interpreted more broadly as a general presence of bias or heterogeneity (Sterne et al., 2011 ). It should be noted that rapid reviews ( Figure 8A , left ) are far more subject to bias than systematic reviews ( Figure 8A , right ), due to the increased likelihood of relevant study omission.

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0008.jpg

Analysis of heterogeneity and identification of influential studies. (A) Bias and heterogeneity in OB [ATP] ic ( left ) and ATP release ( right ) data sets were assessed with funnel plots. Log 10 -transformed study-level effect sizes (black markers) were plotted in relation to their precision assessed as inverse of standard error (1/SE). Blue dashed line : fixed effect estimate, red dashed line : random effects estimate, gray lines : Expected 95% confidence interval (95% CI) in the absence of bias/heterogeneity. (B) OB [ATP] ic were evaluated using Baujat plot and inconsistent and influential studies were identified in top right corner of plot ( arrows ). (C,D) Effect of the single study exclusion (C) and cumulative sequential exclusion of the most inconsistent studies (D) . Left : heterogeneity statistics, H 2 ( red line ) and I 2 ( black line ). Right : 95% CI ( red band ) and Q -test p -value ( black line ). Arrows : influential studies contributing to heterogeneity (same as those identified on Baujat Plot). Dashed Black line : homogeneity threshold T H where Q -test p = 0.05.

Heterogeneity sensitivity analyses

Inconsistencies between studies can arise for a number of reasons, including methodological or biological heterogeneity (Patsopoulos et al., 2008 ). Since accounting for heterogeneity is an essential part of any meta-analysis, it is of interest to identify influential studies that may contribute to the observed heterogeneity.

Baujat plot . The Baujat Plot was proposed as a diagnostic tool to identify the studies that contribute most to heterogeneity and influence the global outcome (Baujat, 2002 ). The graph illustrates the contribution Q i in f of each study to heterogeneity on the x-axis

and contribution θ i in f to global effect on the y-axis

Studies that strongly influence the global outcome and contribute to heterogeneity are visualized in the upper right corner of the plot ( Figure 8B ). This approach has been used to identify outlying studies in the past (Anzures-Cabrera and Higgins, 2010 ).

Single-study exclusion sensitivity . Single-study exclusion analysis assesses the sensitivity of the global outcome and heterogeneity to exclusion of single studies. The global outcomes and heterogeneity statistics are computed for a dataset with a single omitted study; single study exclusion is iterated for all studies; and influential outlying studies are identified by observing substantial declines in observed heterogeneity, as determined by Q total , H 2 , or I 2 , and by significant differences in the global outcome ( Figure 8C ). Influential studies should not be blindly discarded, but rather carefully examined to determine the reason for inconsistency. If a cause for heterogeneity can be identified, such as experimental design flaw, it is appropriate to omit the study from the analysis. All reasons for omission must be justified and made transparent by reviewers.

Cumulative-study exclusion sensitivity . Cumulative study exclusion sequentially removes studies to maximize the decrease in total variance Q total , such that a more homogenous set of studies with updated heterogeneity statistics is achieved with each iteration of exclusion ( Figure 8D ).

This method was proposed by Patsopoulos et al. to achieve desired levels of homogeneity (Patsopoulos et al., 2008 ), however, Higgins argued that its application should remain limited to (i) quantifying the extent to which heterogeneity permeates the set of studies and (ii) identifying sources of heterogeneity (Higgins, 2008 ). We propose the homogeneity threshold T H as a measure of heterogeneity that can be derived from cumulative-study exclusion sensitivity analysis. The homogeneity threshold describes the percentage of studies that need to be removed (by the maximal Q-reduction criteria) before a homogenous set of studies is achieved. For example, in the OB [ATP] ic dataset, the homogeneity threshold was 71%, since removal of 71% of the most inconsistent studies resulted in a homogeneous dataset ( Figure 8D , right ). After homogeneity is attained by cumulative exclusion, the global effect generally stabilizes with respect to subsequent study removal. This metric provides information about the extent of inconsistency present in the set of studies that is scale invariant (independent of the number of studies), and is easily interpretable.

Exploratory Analyses

The purpose of an exploratory analysis is to understand the data in ways that may not be represented by a pooled global estimate. This involves identifying sources of observed heterogeneity related to biological and experimental factors. Subgroup and meta-regression analyses are techniques used to explore known data groupings define by study-level characteristics (i.e., covariates). Additionally, we introduce the cluster-covariate dependence analysis, which is an unsupervised exploratory technique used to identify covariates that coincide well will natural groupings within the data, and the intrastudy regression analysis, which is used to validate meta-regression outcomes.

Cluster-covariate dependence analysis

Natural groupings within the data can be informative and serve as a basis to guide further analysis. Using an unsupervised k-means clustering approach (Lloyd, 1982 ), we can identify natural groupings within the study-level data and assign cluster memberships to these data ( Figure 9A ). Reviewers then have two choices: either proceed directly to subgroup analysis ( Figure 9B ) or look for covariates that co-cluster with cluster memberships ( Figure 9C ) In the latter case, dependencies between cluster memberships and known data covariates can be tested using Pearson's Chi-Squared test for independence. Covariates that coincide with clusters can be verified by subgroup analysis ( Figure 9D ). The dependence test is limited by the availability of studies and requires that at least 80% of covariate-cluster pairs are represented by at least 5 studies (McHugh, 2013 ). Clustering results should be considered exploratory and warrant further investigation due to several limitations. If the subpopulations were identified through clustering, however they do not depend on extracted covariates, reviewers risk assigning misrepresentative meaning to these clusters. Moreover, conventional clustering methods always converge to a result, therefore the data will still be partitioned even in the absence of natural data groupings. Future adaptations of this method might involve using different clustering algorithms (hierarchical clustering) or independence tests (G-test for independence) as well as introducing weighting terms to bias clustering to reflect study-level precisions.

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0009.jpg

Exploratory subgroup analysis. (A) Exploratory k-means clustering was used to partition OB [ATP] ic ( left ) and ATP release ( right ) data into potential clusters/subpopulations of interest. (B) Subgroup analysis of OB [ATP] ic data by differentiation status (immature – 0 to 3 day osteoblasts vs. mature – 4 to 28 day osteoblasts). Subgroup outcomes (fmol ATP/cell) estimated using sample-size weighting-scheme; black markers : Study-level outcomes ± 95% CI, marker sizes are proportional to sample size n . Orange and green bands : 95% CI for immature and mature osteoblast subgroups, respectively. (C) Dependence between ATP release cluster membership and known covariates/characteristics was assessed using Pearson's χ 2 independence test. Black bars : χ 2 test p -values for each covariate-cluster dependence test. Red line : α = 0.05 significance threshold. Arrow : most influential covariate (ex. recording method). (D) Subgroup analysis of ATP release by recording method. Subgroup outcomes (t half ) estimated using random effects weighting, τ 2 computed using DerSimonian-Laird estimator. Round markers : subgroup estimates ± 95% CI, marker sizes are proportional to number of studies per subgroup N . Gray band/diamond : global effect ± 95% CI.

Subgroup analysis

Subgroup analyses attempt to explain heterogeneity and explore differences in effects by partitioning studies into characteristic groups defined by study-level categorical covariates ( Figures 9B,D ; Table 5 ). Subgroup effects are estimated along with corresponding heterogeneity statistics. To evaluate the extent to which subgroup covariates contribute to observed inconsistencies, the explained heterogeneity Q between and unexplained heterogeneity Q within can be calculated.

Exploratory subgroup analysis.

±
Total (74)101 (86, 117)94161133
 Method A (22)32 (16, 66)9417358
 Method B (52)136 (117, 159)9213669
Total1,13373<0.001Data are heterogeneous
 Method A35821<0.001Data are heterogeneous
 Method B66951<0.001Data are heterogeneous
Between1061<0.001Subgrouping explained significant heterogeneity
Within1,02772<0.001Significant heterogeneity remains

Effect and heterogeneity estimates of ATP release by recording method .

where S is the total number of subgroups per given covariate and each subgroup j contains N j studies. The explained heterogeneity Q between is then the difference between total and subgroup heterogeneity:

If the p -value for the χ 2 distributed statistic Q between is significant, the subgrouping can be assumed to explain a significant amount of heterogeneity (Borenstein, 2009 ). Similarly, Q within statistic can be used to test whether there is any residual heterogeneity present within the subgroups.

The R e x p l a i n e d 2 is a related statistic that can be used to describe the percent of total heterogeneity that was explained by the covariate and is estimated as

Where pooled heterogeneity within subgroups τ w i t h i n 2 represents the remaining unexplained variation (Borenstein, 2009 ):

Subgroup analysis of the ATP release dataset revealed that recording method had a major influence on ATP release outcome, such that method A produced significantly lower outcomes than method B ( Figure 9D ; Table 5 , significance determined by non-overlapping 95% CIs). Additionally, recording method accounted for a significant amount of heterogeneity ( Q between , p < 0.001), however it represented only 4% ( R e x p l a i n e d 2 ) of the total observed heterogeneity. Needless to say, the remaining 96% of heterogeneity is significant ( Q within , p < 0.001). To explore the remaining heterogeneity, additional subgroup analysis can be conducted by further stratifying method A and method B subgroups by other covariates. However, in many meta-analyses multi-level data stratification may be unfeasible if covariates are unavailable or if the number of studies within subgroups are low.

Multiple comparisons . When multiple subgroups are present for a given covariate, and the reviewer wishes to investigate the statistical differences between the subgroups, the problem of multiple comparisons should be addressed. Error rates are multiplicative and increase substantially as the number of subgroup comparisons increases. The Bonferroni correction has been advocated to control for false positive findings in meta-analyses (Hedges and Olkin, 1985 ) which involves adjusting the significance threshold:

α * is the adjusted significance threshold to attain intended error rates α for m subgroup comparisons. Confidence intervals can then be computed using α * in place of α:

Meta-regression

Meta-regression attempts to explain heterogeneity by examining the relationship between study-level outcomes and continuous covariates while incorporating the influence of categorical covariates ( Figure 10A ). The main differences between conventional linear regression and meta-regression are (i) the incorporation of weights and (ii) covariates are at the level of the study rather than the individual sample. The magnitude of the relationship β n between the covariates x n,i and outcome y i for study i and covariate n are of interest when conducting a meta-regression analysis. It should be noted that the intercept β 0 of a meta-regression with negligible effect of covariates is equivalent to the estimate approximated by a weighted mean (Equation 3). The generalized meta-regression model is specified as

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0010.jpg

Meta-regression analysis and validation. (A) Relationship between osteoblast differentiation day (covariate) and intracellular ATP content (outcome) investigated by meta-regression analysis. Outcomes are on log 10 scale, meta-regression markers sizes are proportional to weights. Red bands : 95% CI. Gray bands : 95% CI of intercept only model. Solid red lines : intrastudy regression. (B) Meta-regression coefficient β inter ( black ) compared to intrastudy regression coefficient β intra ( red ). Shown are regression coefficients ± 95% CI.

where intrastudy variance ε i is

and the deviation from the distribution of effects η i depends on the chosen meta-analytic model:

The residual Q statistic that explains the dispersion of the studies from the regression line is calculated as follows

Where y i is the predicted value at x i according to the meta-regression model. Q residual is analogous to Q between computed during subgroup analysis and is used to test the degree of remaining unaccounted heterogeneity. Q residual is also used to approximate the unexplained interstudy variance τ r e s i d u a l 2

Which can be used to calculate R e x p l a i n e d 2 estimated as

Q model quantifies the amount of heterogeneity explained by the regression model and is analogous to Q within computed during subgroup analysis.

Intrastudy regression analysis The challenge of interpreting results from a meta-regression is that relationships that exist within studies may not necessarily exist across studies, and vice versa. Such inconsistencies are known as aggregation bias and in the context of meta-analyses can arise from excess heterogeneity or from confounding factors at the level of the study. This problem has been acknowledged in clinical meta-analyses (Thompson and Higgins, 2002 ), however cannot be corrected without access to individual patient data. Fortunately, basic research studies often report outcomes at varying predictor levels (ex. dose-response curves), permitting for intrastudy (within-study) relationships to be evaluated by the reviewer. If study-level regression coefficients can be computed for several studies ( Figure 10A , red lines ), they can be pooled to estimate an overall effect β intra . The meta-regression interstudy coefficient β inter and the overall intrastudy-regression coefficient β intra can then be compared in terms of magnitude and sign. Similarity in the magnitude and sign validates the existence of the relationship and characterizes its strength, while similarity in sign but not the magnitude, still supports the presence of the relationship, but calls for additional experiments to further characterize it. For the Ob [ATP] i dataset, the magnitude of the relationship between osteoblast differentiation day and intracellular ATP concentration was inconsistent between intrastudy and interstudy estimates, however the estimates were of consistent sign ( Figure 10B ).

Limitations of exploratory analyses

When performed with knowledge and care, exploratory analysis of meta-analytic data has an enormous potential for hypothesis generation, cataloging current practices and trends, and identifying gaps in the literature. Thus, we emphasize the inherent limitations of exploratory analyses:

Data dredging . A major pitfall in meta-analyses is data dredging (also known as p-hacking), which refers to searching for significant outcomes only to assign meaning later. While exploring the dataset for potential patterns can identify outcomes of interest, reviewers must be wary of random patterns that can arise in any dataset. Therefore, if a relationship is observed it should be used to generate hypotheses, which can then be tested on new datasets. Steps to avoid data dredging involve defining an a priori analysis plan for study-level covariates, limiting exploratory analysis of rapid review meta-analyses and correcting for multiple comparisons.

Statistical power . The statistical power reflects the probability of rejecting the null hypothesis when the alternative is true. Meta-analyses are believed to have higher statistical power than the underlying primary studies, however this is not always true (Hedges and Pigott, 2001 ; Jackson and Turner, 2017 ). Random effects meta-analyses handle data heterogeneity by accounting for between-study variance, however this weakens the inference properties of the model. To maintain statistical powers that exceed those of the contributing studies in a random effects meta-analysis, at least five studies are required (Jackson and Turner, 2017 ). This consequently limits subgroup analyses that partition studies into smaller groups to isolate covariate-dependent effects. Thus, reviewers should ensure that group are not under-represented to maintain statistical power. Another determinant of statistical power is the expected effect size, which if small, will be much more difficult to support with existing evidence than if it is large. Thus, if reviewers find that there is insufficient evidence to conclude that a small effect exists, this should not be interpreted as evidence of no effect.

Causal inference . Meta-analyses are not a tool for establishing causal inference. However, there are several criteria for causality that can be investigated through exploratory analyses that include consistency, strength of association, dose-dependence and plausibility (Weed, 2000 , 2010 ). For example, consistency, the strength of association, and dose-dependence can help establish that the outcome is dependent on exposure. However, reviewers are still posed with the challenge of accounting for confounding factors and bias. Therefore, while meta-analyses can explore various criteria for causality, causal claims are inappropriate, and outcomes should remain associative.

Conclusions

Meta-analyses of basic research can offer critical insights into the current state of knowledge. In this manuscript, we have adapted meta-analytic methods to basic science applications and provided a theoretical foundation, using OB [ATP] i and ATP release datasets, to illustrate the workflow. Since the generalizability of any meta-analysis relies on the transparent, unbiased and accurate methodology, the implications of deficient reporting practices and the limitations of the meta-analytic methods were discussed. Emphasis was placed on the analysis and exploration of heterogeneity. Additionally, several alternative and supporting methods have been proposed, including a method for validating meta-regression outcomes—intrastudy regression analysis, and a novel measure of heterogeneity—the homogeneity threshold. All analyses were conducted using MetaLab , a meta-analysis toolbox that we have developed in MATLAB R2016b. MetaLab has been provided for free to promote meta-analyses in basic research ( https://github.com/NMikolajewicz/MetaLab ).

In its current state, the translational pipeline from benchtop to bedside is an inefficient process, in one case estimated to produce ~1 clinically favorable clinical outcome for ~1,000 basic research studies (O'Collins et al., 2006 ). The methods we have described here serve as a general framework for comprehensive data consolidation, knowledge gap-identification, evidence-driven hypothesis generation and informed parameter estimation in computation modeling, which we hope will contribute to meta-analytic outcomes that better inform translation studies, thereby minimizing current failures in translational research.

Author Contributions

Both authors contributed to the study conception and design, data acquisition and interpretation and drafting and critical revision of the manuscript. NM developed MetaLab. Both authors approved the final version to be published.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This work was supported by Natural Sciences and Engineering Research Council (NSERC, RGPIN-288253) and Canadian Institutes for Health Research (CIHR MOP-77643). NM was supported by the Faculty of Dentistry, McGill University and le Réseau de Recherche en Santé Buccodentaire et Osseuse (RSBO). Special thanks to Ali Mohammed (McGill University) for help with validation of MetaLab data extraction module.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphys.2019.00203/full#supplementary-material

  • Ahmed I., Sutton A. J., Riley R. D. (2012). Assessment of publication bias, selection bias, and unavailable data in meta-analyses using individual participant data: a database survey . Br. Med. J. 344 :d7762 10.1136/bmj.d7762 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Altman D. G., Bland J. M. (2005). Standard deviations and standard errors . Br. Med. J. 331 , 903–903. 10.1136/bmj.331.7521.903 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Anzures-Cabrera J., Higgins J. P. T. (2010). Graphical displays for meta-analysis: an overview with suggestions for practice . Res. Synth. Methods 1 , 66–80. 10.1002/jrsm.6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baguley T. (2009). Standardized or simple effect size: what should be reported? Br. J. Soc. Psychol. 100 , 603–617. 10.1348/000712608X377117 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Barendregt J., Doi S. (2009). MetaXL User Guide: Version 1.0 . Wilston, QLD: EpiGear International Pty Ltd. [ Google Scholar ]
  • Baujat B. (2002). A graphical method for exploring heterogeneity in meta-analyses: application to a meta-analysis of 65 trials . Stat. Med. 21 :18. 10.1002/sim.1221 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bax L. (2016). MIX 2.0 – Professional Software for Meta-analysis in Excel. Version 2.0.1.5. BiostatXL . Available online at: https://www.meta-analysis-made-easy.com
  • Bittker J. A., Ross N. T. (2016). High Throughput Screening Methods: Evolution and Refinement. Cambridge: Royal Society of Chemistry; 10.1039/9781782626770 [ CrossRef ] [ Google Scholar ]
  • Bodin P., Milner P., Winter R., Burnstock G. (1992). Chronic hypoxia changes the ratio of endothelin to ATP release from rat aortic endothelial cells exposed to high flow . Proc. Biol. Sci. 247 , 131–135. 10.1098/rspb.1992.0019 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borenstein M. (2009). Introduction to Meta-Analysis . Chichester: John Wiley & Sons. 10.1002/9780470743386 [ CrossRef ] [ Google Scholar ]
  • Borenstein M., Hedges L., Higgins J. P. T., Rothstein H. R. (2005). Comprehensive meta-analysis (Version 2.2.027) [Computer software]. Englewood, CO. [ Google Scholar ]
  • Bramer W. M., Giustini D., de Jonge G. B., Holland L., Bekhuis T. (2016). De-duplication of database search results for systematic reviews in EndNote . J. Med. Libr. Assoc. 104 , 240–243. 10.3163/1536-5050.104.3.014 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chowdhry A. K., Dworkin R. H., McDermott M. P. (2016). Meta-analysis with missing study-level sample variance data . Stat. Med. 35 , 3021–3032. 10.1002/sim.6908 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cochrane Collaboration (2011). Review Manager (RevMan) [Computer Program] . Copenhagen. [ Google Scholar ]
  • Cox M., Harris P., Siebert B. R.-L. (2003). Evaluation of measurement uncertainty based on the propagation of distributions using monte carlo simulation . Measure. Techniq. 46 , 824–833. 10.1023/B:METE.0000008439.82231.ad [ CrossRef ] [ Google Scholar ]
  • DeLuca J. B., Mullins M. M., Lyles C. M., Crepaz N., Kay L., Thadiparthi S. (2008). Developing a comprehensive search strategy for evidence based systematic reviews . Evid. Based Libr. Inf. Pract. 3 , 3–32. 10.18438/B8KP66 [ CrossRef ] [ Google Scholar ]
  • DerSimonian R., Laird N. (1986). Meta-analysis in clinical trials . Control. Clin. Trials 7 , 177–188. 10.1016/0197-2456(86)90046-2 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ecker E. D., Skelly A. C. (2010). Conducting a winning literature search . Evid. Based Spine Care J. 1 , 9–14. 10.1055/s-0028-1100887 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Egger M., Smith G. D., Schneider M., Minder C. (1997). Bias in meta-analysis detected by a simple, graphical test . Br. Med. J. 315 , 629–634. 10.1136/bmj.315.7109.629 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Finfgeld-Connett D., Johnson E. D. (2013). Literature search strategies for conducting knowledge-building and theory-generating qualitative systematic reviews . J. Adv. Nurs. 69 , 194–204. 10.1111/j.1365-2648.2012.06037.x [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ganann R., Ciliska D., Thomas H. (2010). Expediting systematic reviews: methods and implications of rapid reviews . Implementation Sci. 5 , 56–56. 10.1186/1748-5908-5-56 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gavaghan D. J., Moore R. A., McQuay H. J. (2000). An evaluation of homogeneity tests in meta-analyses in pain using simulations of individual patient data . Pain 85 , 415–424. 10.1016/S0304-3959(99)00302-4 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gopalakrishnan S., Ganeshkumar P. (2013). Systematic reviews and meta-analysis: understanding the best evidence in primary healthcare . J Fam. Med. Prim. Care 2 , 9–14. 10.4103/2249-4863.109934 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Grönholm T., Annila A. (2007). Natural distribution . Math. Biosci. 210 , 659–667. 10.1016/j.mbs.2007.07.004 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Haby M. M., Chapman E., Clark R., Barreto J., Reveiz L., Lavis J. N. (2016). What are the best methodologies for rapid reviews of the research evidence for evidence-informed decision making in health policy and practice: a rapid review . Health Res. Policy Syst. 14 :83. 10.1186/s12961-016-0155-7 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hartung J., Makambi K. H. (2002). Positive estimation of the between-study variance in meta-analysis: theory and methods . S. Afr. Stat. J. 36 , 55–76. [ Google Scholar ]
  • Hedges L. V., Olkin I. (1985). Statistical Methods for Meta-Analysis . New York, NY: Academic Press. [ Google Scholar ]
  • Hedges L. V., Pigott T. D. (2001). The power of statistical tests in meta-analysis . Psychol. Methods 6 , 203–217. 10.1037/1082-989X.6.3.203 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Higgins J. P. (2008). Commentary: heterogeneity in meta-analysis should be expected and appropriately quantified . Int. J. Epidemiol. 37 , 1158–1160. 10.1093/ije/dyn204 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Higgins J. P., Green S. (Eds.) (2011). Cochrane Handbook for Systematic Reviews of Interventions , Vol. 4 Oxford: John Wiley & Sons. [ Google Scholar ]
  • Higgins J. P., Thompson S. G. (2002). Quantifying heterogeneity in a meta-analysis . Stat. Med. 21 , 1539–1558. 10.1002/sim.1186 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Higgins J. P., Thompson S. G., Deeks J. J., Altman D. G. (2003). Measuring inconsistency in meta-analyses . Br. Med. J. 327 , 557–560. 10.1136/bmj.327.7414.557 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Higgins J. P., White I. R., Anzures-Cabrera J. (2008). Meta-analysis of skewed data: combining results reported on log-transformed or raw scales . Stat. Med. 27 , 6072–6092. 10.1002/sim.3427 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Huedo-Medina T. B., Sanchez-Meca J., Marin-Martinez F., Botella J. (2006). Assessing heterogeneity in meta-analysis: Q statistic or I 2 index? Psychol. Methods 11 , 193–206. 10.1037/1082-989X.11.2.193 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hunter J. E., Schmidt F. L. (2004). Methods of Meta-analysis: Correcting Error and Bias in Research Findings . Thousand Oaks, CA: Sage. [ Google Scholar ]
  • Jackson D., Turner R. (2017). Power analysis for random-effects meta-analysis . Res. Synth. Methods 8 , 290–302. 10.1002/jrsm.1240 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • JASP Team (2018). JASP (Verision 0.9) [Computer Software] . Amsterdam. [ Google Scholar ]
  • Karabatsos G., Talbott E., Walker S. G. (2015). A Bayesian nonparametric meta-analysis model . Res. Synth. Methods 6 , 28–44. 10.1002/jrsm.1117 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kontopantelis E., Reeves D. (2012). Performance of statistical methods for meta-analysis when true study effects are non-normally distributed: a simulation study . Stat. Methods Med. Res. 21 , 409–426. 10.1177/0962280210392008 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kwon Y., Lemieux M., McTavish J., Wathen N. (2015). Identifying and removing duplicate records from systematic review searches . J. Med. Libr. Assoc. 103 , 184–188. 10.3163/1536-5050.103.4.004 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Light R. J., Pillemer D. B. (1984). Summing Up: The Science of Reviewing Research . Cambridge, MA: Harvard University Press. [ Google Scholar ]
  • Limpert E., Stahel W. A., Abbt M. (2001). Log-normal Distributions across the sciences: keys and clues: on the charms of statistics, and how mechanical models resembling gambling machines offer a link to a handy way to characterize log-normal distributions, which can provide deeper insight into variability and probability—normal or log-normal: That is the question . AIBS Bull. 51 , 341–352. [ Google Scholar ]
  • Lloyd S. (1982). Least squares quantization in PCM . IEEE Trans. Inf. Theory 28 , 129–137. 10.1109/TIT.1982.1056489 [ CrossRef ] [ Google Scholar ]
  • Lorenzetti D. L., Ghali W. A. (2013). Reference management software for systematic reviews and meta-analyses: an exploration of usage and usability . BMC Med. Res. Methodol. 13 , 141–141. 10.1186/1471-2288-13-141 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marin-Martinez F., Sanchez-Meca J. (2010). Weighting by inverse variance or by sample size in random-effects meta-analysis . Educ. Psychol. Meas. 70 , 56–73. 10.1177/0013164409344534 [ CrossRef ] [ Google Scholar ]
  • Mattivi J. T., Buchberger B. (2016). Using the amstar checklist for rapid reviews: is it feasible? Int. J. Technol. Assess. Health Care 32 , 276–283. 10.1017/S0266462316000465 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McGowan J., Sampson M. (2005). Systematic reviews need systematic searchers . J. Med. Libr. Assoc. 93 , 74–80. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • McHugh M. L. (2013). The Chi-square test of independence . Biochem. Med. 23 , 143–149. 10.11613/BM.2013.018 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mikolajewicz N., Mohammed A., Morris M., Komarova S. V. (2018). Mechanically-stimulated ATP release from mammalian cells: systematic review and meta-analysis . J. Cell Sci. 131 :22. 10.1242/jcs.223354 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Milo R., Jorgensen P., Moran U., Weber G., Springer M. (2010). BioNumbers—the database of key numbers in molecular and cell biology . Nucleic Acids Res. 38 :D750–3. 10.1093/nar/gkp889 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Moher D., Liberati A., Tetzlaff J., Altman D. G. (2009). Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement . PLoS Med. 6 :e1000097 10.1371/journal.pmed.1000097 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • O'Collins V. E., Macleod M. R., Donnan G. A., Horky L. L., van der Worp B. H., Howells D. W. (2006). 1,026 experimental treatments in acute stroke . Ann. Neurol. 59 , 467–477. 10.1002/ana.20741 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pathak M., Dwivedi S. N., Deo S. V. S., Sreenivas V., Thakur B. (2017). Which is the preferred measure of heterogeneity in meta-analysis and why? a revisit . Biostat Biometrics Open Acc . 1 , 1–7. 10.19080/BBOAJ.2017.01.555555 [ CrossRef ] [ Google Scholar ]
  • Patsopoulos N. A., Evangelou E., Ioannidis J. P. A. (2008). Sensitivity of between-study heterogeneity in meta-analysis: proposed metrics and empirical evaluation . Int. J. Epidemiol. 37 , 1148–1157. 10.1093/ije/dyn065 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Paule R. C., Mandel J. (1982). Consensus values and weighting factors . J. Res. Natl. Bur. Stand. 87 , 377–385. 10.6028/jres.087.022 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sanchez-Meca J., Marin-Martinez F. (2008). Confidence intervals for the overall effect size in random-effects meta-analysis . Psychol. Methods 13 , 31–48. 10.1037/1082-989X.13.1.31 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schwarzer G., Carpenter J. R., Rücker G. (2015). Small-study effects in meta-analysis , in Meta-Analysis with R , eds Schwarzer G., Carpenter J. R., Rücker G. (Cham: Springer International Publishing; ), 107–141. [ Google Scholar ]
  • Sena E., van der Worp H. B., Howells D., Macleod M. (2007). How can we improve the pre-clinical development of drugs for stroke? Trends Neurosci. 30 , 433–439. 10.1016/j.tins.2007.06.009 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sheldrake R. (1997). Experimental effects in scientific research: how widely are they neglected? Bull. Sci. Technol. Soc. 17 , 171–174. 10.1177/027046769701700405 [ CrossRef ] [ Google Scholar ]
  • Sidik K., Jonkman J. N. (2005). Simple heterogeneity variance estimation for meta-analysis . J. R. Stat. Soc. Ser. C Appl. Stat. 54 , 367–384. 10.1111/j.1467-9876.2005.00489.x [ CrossRef ] [ Google Scholar ]
  • Sterne J. A., Sutton A. J., Ioannidis J. P., Terrin N., Jones D. R., Lau J., et al.. (2011). Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials . Br. Med. J. 343 :d4002. 10.1136/bmj.d4002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sterne J. A. C., Harbord R. (2004). Funnel plots in meta-analysis . Stata J. 4 , 127–141. 10.1177/1536867X0400400204 [ CrossRef ] [ Google Scholar ]
  • Thompson S. G., Higgins J. P. (2002). How should meta-regression analyses be undertaken and interpreted? Stat. Med. 21 , 1559–1573. 10.1002/sim.1187 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Vaux D. L., Fidler F., Cumming G. (2012). Replicates and repeats—what is the difference and is it significant?: a brief discussion of statistics and experimental design . EMBO Rep. 13 , 291–296. 10.1038/embor.2012.36 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Veroniki A. A., Jackson D., Viechtbauer W., Bender R., Bowden J., Knapp G., et al.. (2016). Methods to estimate the between-study variance and its uncertainty in meta-analysis . Res. Synth. Methods 7 , 55–79. 10.1002/jrsm.1164 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Vesterinen H. M., Sena E. S., Egan K. J., Hirst T. C., Churolov L., Currie G. L., et al.. (2014). Meta-analysis of data from animal studies: a practical guide . J. Neurosci. Methods 221 , 92–102. 10.1016/j.jneumeth.2013.09.010 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Viechtbauer W. (2010). Conducting meta-analyses in R with the metafor package . J. Stat. Softw. 36 , 1–48. 10.18637/jss.v036.i03 [ CrossRef ] [ Google Scholar ]
  • von Hippel P. T. (2015). The heterogeneity statistic I 2 can be biased in small meta-analyses . BMC Med. Res. Methodol. 15 :35 10.1186/s12874-015-0024-z [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Weed D. L. (2000). Interpreting epidemiological evidence: how meta-analysis and causal inference methods are related . Int. J. Epidemiol. 29 , 387–390. 10.1093/intjepid/29.3.387 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Weed D. L. (2010). Meta-analysis and causal inference: a case study of benzene and non-hodgkin lymphoma . Ann. Epidemiol. 20 , 347–355. 10.1016/j.annepidem.2010.02.001 [ PubMed ] [ CrossRef ] [ Google Scholar ]

35 Instagram Statistics That Matter to Marketers in 2024

Are you marketing on the world’s third most-used social platform? These Instagram statistics can help you shape your strategy.

cover image

Table of Contents

Instagram evolves so quickly, we’re practically dizzy watching the changes. In 2023 alone, features shifted and warped at a staggering speed as Insta tried to keep pace with the ever-changing expectations of its users (or capitalize on an opportunity ).

Chasing platform updates from Meta—Instagram’s parent company—is a fool’s errand. If Instagram is part of your social media marketing or advertising strategy , ground your plans in hard data, not think pieces. These numbers prove exactly where Instagram is at this moment in time, which hopefully will also give you a hint as to where the social behemoth is going from here.

Read on to find compelling Instagram statistics that paint a picture of the platform’s strengths (and weaknesses). Dig in to find the truth beneath all the headlines so your team can create a game plan to make the most of the ‘gram.

Bonus : Find out what people really want from brands on social—including why they follow, engage, buy, and even unfollow them—in The Social Media Consumer Report .

General Instagram statistics

1. instagram has 2 billion active monthly users.

It’s still just a fraction of Facebook’s monthly active users, but an impressive amount nonetheless.

2. Instagram is the world’s third most-used social platform (tied with WhatsApp!)

Instagram and WhatsApp both have 2,000,000,0000 active users as of October 2023. The most-used platform? Facebook, with more than 3 billion monthly active users.

3. Instagram is the world’s second-favorite social media platform

In a survey of active social media users across the globe aged 16 to 64, Instagram was ranked number two as the preferred social media platform, claiming 15,7% of the vote. Whats App is the most liked media platform, with 16.6% of users reporting it’s their favorite.

4. Instagram is the 4th most-visited website in the world

According to SimilarWeb, as of November 2023, Instagram is the fourth most-visited website globally.

5. Only 0.1% of Instagram users are only on Instagram

That means 99.9% of Instagrammers also are active on at least one other social media platform, according to a 2023 GWI report. The most common overlap is with Facebook (81.4% of Instagram users also have accounts here), while the least common overlap is Line (only 10.9% of Instagram’s user base is found here).

And psstt — if you’re active on multiple platforms, you really should use a tool like Hootsuite to track the performance of all your social channels in one place.

Hootsuite analytics mixed overview

Start your free Hootsuite trial now

6. The average Instagram Post engagement rate is just 0.60%

… but depending on your industry, that may vary. In education, for instance, the average Instagram engagement rate is 3.44%.

[ Not sure where you stand? Try our engagement rate calculator here .]

7. The most popular Instagram account is… @Instagram

…with 648 million followers. Cristiano Ronaldo comes in a close second (595M), followed by Lionel Messi (477M), Selena Gomez (435M) and Kylie Jenner (397M).

8. The most-used Instagram hashtag of all time is #love

More than 2 billion posts have been tagged with #love. More than 1.6 billion posts are tagged #instagood.

Instagram user statistics

9. nearly 85% of instagram’s audience is younger than 45.

If you’re trying to reach Gen X or Boomers, you might be better off spending your time and money on a platform with an older demographic. (Only 2.6% of Instagram’s audience is over 65 years old.)

10. Instagram is the preferred platform of social media users aged 16-34

Despite all the TikTok hype, Gen Zers and millennials still overwhelmingly prefer Instagram. We Are Social reports that 44.4% of female internet users in this age group rank Instagram as their favorite social app; 40.7% of male internet users agree.

11. Just 8% of Instagram users are teenagers

Only 8% of Instagram users worldwide are between 13 and 17 years old. It’s a great platform for reaching Millennials and Gen Z still, but the next generation of social media users seems to prefer other outlets.

Source: Statista

12. But of the teens who do use Instagram, half use it once a day

10% of teen Instagram users are visiting Instagram “almost constantly,” but 50% of this demographic uses the app or site at least once a day.

1 in 5 teens are constantly on YouTube vs 2 percent for Facebook

Source: Pew Research

13. The Asia-Pacific region accounts for one-third of Instagram users

Experts predict this demographic will account for nearly 40% of Instagram accounts by 2025. This is primarily thanks to India, which has been a huge growth market for Instagram ever since TikTok was banned in 2020.

14. India is the biggest ad market for Instagram

With an ad reach of more than 223 million, India is an absolutely massive market for Instagram. The United States has the second-largest Instagram advertising audience, with over 143 million users.

15. 48.2% of Instagram users are female

51.8% are male. (Though this survey did not consider users who identify as non-binary, so these numbers aren’t exact.)

Instagram usage stats

16. instagram users spend an average of 12 hours a month on the app.

That’s a jump of a couple of hours from the previous year. But while Instagram has passionate fans (see our previous stats), social media users spend far more time using other apps — a little over 23 hours on YouTube a month.

17. 66,000 photos are shared on Instagram every minute

We can’t confirm how many of those are good photos, but that’s a staggering amount of content regardless.

18. 69.9% of Instagrammers use the app to post or share photos or videos

And 64% use Instagram to seek out funny or entertaining content. Almost half of Instagram users use the app to keep up with the news.

Instagram Story stats

19. more than 500 million people can be reached with ads on instagram stories.

That’s because half a billion people watch Instagram Stories daily. Not too shabby.

Instagram Reels stats

20. 53.7% of u.s. marketers will use instagram reels for influencer marketing this year.

This makes it an even more popular platform for influencer marketing than TikTok and YouTube. While Reels still falls behind traditional Instagram Posts, its growth rate is more intense: we wouldn’t be surprised to see Instagram Reels surpass main-feed posts in the next few years.

more US marketers use Instagram Reels than TikTok for influencer marketing

Source: Insider Intelligence

21. Reels make up more than half of content reshared in messages

According to Meta, Instagram Reels are the most popular format to share in DMs. With shares taking the lead as the most important engagement metric , that means engaging Reels are one of the best ways to connect with your audience and provide real value.

22. Influencer engagement on Reels is  slightly  lower than on TikTok

According to a study from September 2023, while influencer engagement remains higher on TikTok than Instagram Reels, Reels isn’t far behind. Plus, Instagram Reels engagement for influencers of all sizes far outstrips average engagement rates for most business accounts .

us average engagement on tiktok vs reels 2023

Instagram Threads statistics

23. threads reached 1 million users faster than any other app in history.

In its first five days, Threads earned over 100 million sign-ups. This broke the previous record set by ChatGPT. Instagram itself took two and a half months to reach that number of users.

Threads shoots past one million user mark at lightning speed

24. Instagram Threads currently has 15.36 million active users

Performance of Threads an Instagram app

25. India is the country with the most Threads users

More than 54 million users downloaded the Instagram Threads app in July 2023 — that was 33.5% of all Threads users at that time.

Top 10 countries worldwide ranked by threads app downloads

Instagram statistics for business

26. 44% of instagram users use the app to shop on a weekly basis.

Whether that’s actually making the purchase or collecting info for a future purchase, people use Instagram as a tool for research and easy shopping.

27. The average Instagram Business account posts 1.55 posts on main feed per day

A good reminder that consistency is the name of the game: keep those posts coming so your audience doesn’t forget about you. 41.7% of business account main feed posts are photos.

That’s the most common type of post, but videos make up 38.2% of main feed posts for business accounts, while 20% of business posts are carousel posts. (But if you’re a business and you read the next stat in this list, you’ll probably realize that 20% is not enough…)

28. Instagram carousel posts have the highest engagement rate for business accounts

The average engagement rate for a business Instagram account is 0.65% overall, but carousels, in particular, enjoy a 0.76% engagement rate.

29. 62.4% of social users use Insta to follow or research brands or products

Social media is a powerful consumer tool for research, and 82.5% of social media users say they use social platforms for collecting information about brands and products. More than 60% prefer to use Instagram for this research phase of the consumer cycle… better level up those social selling skills, right?

30. 20.9% of social media users follow influencers or experts on social

Okay, this stat isn’t Instagram-specific, but the lesson absolutely applies to the ‘Gram.

31. 97.6% of U.S. marketers use Instagram

Thanks to its diverse content formats and great creator tools, Instagram remains a popular choice for U.S. marketers. So even if you’re not sure if Instagram makes sense for brand, the truth is, your competitors are almost certainly on there anyway. (eMarketer predicts that Instagram Reels will be used by 62.2% of marketers in 2025, which would put it neck-in-neck with Facebook.)

32. Conversations with customers in Instagram DMs convert 70% of the time

According to Meta, businesses that use Instagram’s direct messaging to connect with local customers have a 70% conversion rate.

And if you use a tool like Hootsuite Inbox, you can manage all of your social media messages in one place (just putting that out there).

Managing comments from multiple social platforms using Hootsuite Inbox

Try Inbox for yourself! Book a free Hootsuite demo today

33. Product-tagging increases sales by 37%

Instagram businesses that tag their products in posts enjoy more sales on average than their non-tagging peers. (Learn more about setting up your Instagram Shop here !)

34. 2 out of 3 active users say Instagram allows them to build meaningful relationships with brands

According to Meta, Instagram offers a unique opportunity to connect customers and companies through value-driven content and personal customer service.

Instagram ad statistics

35. instagram’s ad revenue was $43.28 billion in 2022.

eMarketer predicts that the final tally for ad revenue in 2023 will be even higher, at $50.58 billion, with more growth around the bend. Overall, Instagram’s ad revenue is growing faster than Facebook’s — perhaps a trend to watch if your social ad strategy involves advertising with Meta.

Meta net ad revenues worldwide by segment 2020 to 2024

Ready to turn these Instagram statistics into action? Start building your Instagram presence using Hootsuite. Schedule and publish posts directly to Instagram, engage your audience, measure performance, and run all your other social media profiles — all from one simple dashboard. Try it free today.

Get Started

Easily create, analyze, and schedule Instagram posts, Stories, Reels, and Threads with Hootsuite. Save time and get results.

Become a better social marketer.

Get expert social media advice delivered straight to your inbox.

Stacey McLachlan is an award-winning writer and editor from Vancouver with more than a decade of experience working for print and digital publications.

She is editor-at-large for Western Living and Vancouver Magazine, author of the National Magazine Award-nominated 'City Informer' column, and a regular contributor to Dwell. Her previous work covers a wide range of topics, from SEO-focused thought-leadership to profiles of mushroom foragers, but her specialties include design, people, social media strategy, and humor.

You can usually find her at the beach, or cleaning sand out of her bag.

Related Articles

cover image

45 Facebook Statistics Marketers Need to Know in 2024

This list of Facebook statistics has all the data you need to craft a smart strategy and build reports, presentations, and more.

cover image

How to Schedule Instagram Posts: 3 Ways for 2024

Learning how to schedule Instagram posts helps you save time so that you can focus on creating incredible content.

cover image

Instagram Marketing Strategy Guide: Tips for 2024

Instagram marketing is competitive, but not impossible. Follow this guide and try our top strategies to succeed in 2024 and beyond.

cover image

How to Use Instagram for Business and Drive Results in 2024

Everything you need to know about using Instagram for business — from setting up your account to creating a winning strategy.

IMAGES

  1. Meta-Analysis Methodology for Basic Research: A Practical Guide

    how to make meta analysis research

  2. Meta-Analysis Methodology for Basic Research: A Practical Guide

    how to make meta analysis research

  3. What is a Meta-Analysis? The benefits and challenges

    how to make meta analysis research

  4. The importance of meta-analysis and systematic review: How research

    how to make meta analysis research

  5. Steps for Conducting a Meta-Analysis

    how to make meta analysis research

  6. Meta-Analysis Methodology for Basic Research: A Practical Guide

    how to make meta analysis research

VIDEO

  1. Statistical Procedure in Meta-Essentials

  2. Statistical Power of a Meta-Analysis

  3. Meta Analysis Research (मेटा विश्लेषण अनुसंधान) #ugcnet #ResearchMethodology #educationalbyarun

  4. Introduction to Meta Analysis

  5. What is a Meta-Analysis?

  6. Eight-Step Process for Meta-Analysis

COMMENTS

  1. How to conduct a meta-analysis in eight steps: a practical guide

    2.1 Step 1: defining the research question. The first step in conducting a meta-analysis, as with any other empirical study, is the definition of the research question. Most importantly, the research question determines the realm of constructs to be considered or the type of interventions whose effects shall be analyzed.

  2. Ten simple rules for carrying out and writing meta-analyses

    Introduction. In the context of evidence-based medicine, meta-analyses provide novel and useful information [], as they are at the top of the pyramid of evidence and consolidate previous evidence published in multiple previous reports [].Meta-analysis is a powerful tool to cumulate and summarize the knowledge in a research field [].Because of the significant increase in the published ...

  3. Systematic Reviews and Meta-Analysis: A Guide for Beginners

    The graphical output of meta-analysis is a forest plot which provides information on individual studies and the pooled effect. Systematic reviews of literature can be undertaken for all types of questions, and all types of study designs. This article highlights the key features of systematic reviews, and is designed to help readers understand ...

  4. PDF How to conduct a meta-analysis in eight steps: a practical guide

    it denes methodological recommendations for meta-analyses submitted to Manage-ment Review Quarterly (MRQ); and (2) it serves as a practical guide for researchers who have little experience with meta-analysis as a method but plan to conduct one in the future. 2 Eight steps in conducting a meta‑analysis 2.1 Step 1: dening the research question

  5. Getting Started

    It may take several weeks to complete and run a search. Moreover, all guidelines for carrying out systematic reviews recommend that at least two subject experts screen the studies identified in the search. The first round of screening can consume 1 hour per screener for every 100-200 records. A systematic review is a labor-intensive team effort.

  6. Introduction to systematic review and meta-analysis

    It is easy to confuse systematic reviews and meta-analyses. A systematic review is an objective, reproducible method to find answers to a certain research question, by collecting all available studies related to that question and reviewing and analyzing their results. A meta-analysis differs from a systematic review in that it uses statistical ...

  7. Meta-analysis of social science research: A practitioner's guide

    But meta-analysis is based on a uniquely laborious data collection that often takes months of expert researcher time. So, meta-analysis can benefit from AI more than most research fields. We believe that in a few years new versions of GPT (or some equivalent) will be able to assist with data collection from primary studies.

  8. A 24-step guide on how to design, conduct, and successfully ...

    A 24-step guide on how to design, conduct, and successfully publish a systematic review and meta-analysis in medical research Eur J Epidemiol. 2020 Jan;35(1):49-60. doi: 10.1007/s10654-019-00576-5. ... step-by-step guide on how to conduct a systemic review and meta-analysis is essential. While most of the guidelines on systematic reviews focus ...

  9. Ten simple rules for carrying out and writing meta-analyses

    Introduction. In the context of evidence-based medicine, meta-analyses provide novel and useful information [], as they are at the top of the pyramid of evidence and consolidate previous evidence published in multiple previous reports [].Meta-analysis is a powerful tool to cumulate and summarize the knowledge in a research field [].Because of the significant increase in the published ...

  10. A step by step guide for conducting a systematic review and meta

    To do the meta-analysis, we can use free software, such as RevMan or R package meta . In this example, we will use the R package meta. The tutorial of meta package can be accessed through "General Package for Meta-Analysis" tutorial pdf . The R codes and its guidance for meta-analysis done can be found in Additional file 5: File S3.

  11. A Guide to Conducting a Meta-Analysis

    Abstract. Meta-analysis is widely accepted as the preferred method to synthesize research findings in various disciplines. This paper provides an introduction to when and how to conduct a meta-analysis. Several practical questions, such as advantages of meta-analysis over conventional narrative review and the number of studies required for a ...

  12. Meta-Analysis

    Definition. "A meta-analysis is a formal, epidemiological, quantitative study design that uses statistical methods to generalise the findings of the selected independent studies. Meta-analysis and systematic review are the two most authentic strategies in research. When researchers start looking for the best available evidence concerning ...

  13. Introduction to Meta-Analysis: A Guide for the Novice

    Meta-analysis has many strengths. First, meta-analysis provides an organized approach for handling a large number of studies. Second, the process is systematic and documented in great detail, which allows readers to evaluate the researchers' decisions and conclusions. Third, meta-analysis allows researchers to examine an effect within a ...

  14. The 5 min meta-analysis: understanding how to read and ...

    Tip 1: Know the type of outcome than. There are differences in a forest plot depending on the type of outcomes. For a continuous outcome, the mean, standard deviation and number of patients are ...

  15. How to conduct a meta-analysis in eight steps: a practical guide

    2.1 Step 1: defining the research question. The first step in conducting a meta-analysis, as with any other empirical study, is the. definition of the research question. Most importantly, the ...

  16. A Guide to Conducting a Meta-Analysis

    Meta-analysis is widely accepted as the preferred method to synthesize research findings in various disciplines. This paper provides an introduction to when and how to conduct a meta-analysis. Several practical questions, such as advantages of meta-analysis over conventional narrative review and the number of studies required for a meta-analysis, are addressed. Common meta-analytic models are ...

  17. Research Guides: Study Design 101: Meta-Analysis

    Meta-analysis would be used for the following purposes: To establish statistical significance with studies that have conflicting results. To develop a more correct estimate of effect magnitude. To provide a more complex analysis of harms, safety data, and benefits. To examine subgroups with individual numbers that are not statistically significant.

  18. How to Perform a Meta-analysis: a Practical Step-by-step Guide Using R

    To install meta ( Figure 4 ), open RStudio (remember to install R before), (A) click Packages; (B) click Install; (C) The box for installation will open and then type the name meta. Click install and after installing, make sure that the meta package is enabled, that is, with the "check" in the box next to its name.

  19. Chapter 10: Analysing data and undertaking meta-analyses

    Many judgements are required in the process of preparing a meta-analysis. Sensitivity analyses should be used to examine whether overall findings are robust to potentially influential decisions. Cite this chapter as: Deeks JJ, Higgins JPT, Altman DG (editors). Chapter 10: Analysing data and undertaking meta-analyses.

  20. Methodological Guidance Paper: High-Quality Meta-Analysis in a

    The term meta-analysis was first used by Gene Glass (1976) in his presidential address at the AERA (American Educational Research Association) annual meeting, though Pearson (1904) used methods to combine results from studies on the relationship between enteric fever and mortality in 1904. The 1980s was a period of rapid development of statistical methods (Cooper & Hedges, 2009) leading to the ...

  21. Understanding the Basics of Meta-Analysis and How to Read a ...

    Meta-analysis can be conducted for a variety of statistics, including means, mean differences, standardized mean differences, proportions, differences in proportions, relative risks, odds ratios, and others. The results of meta-analysis are presented in forest plots. This article explains why meta-analysis may be necessary, how a systematic ...

  22. How to Conduct a Meta-Analysis for Research

    Define your research question and objective. Clearly define the research objective of your meta-analysis. Doing this will help you narrow down your search and establish inclusion and exclusion criteria for selecting studies. 2. Conduct a comprehensive literature search. Thoroughly search electronic databases, such as PubMed, Google Scholar, or ...

  23. Global Prevalence of Depressive and Anxiety Symptoms in Children and

    While research has documented a worsening of symptoms for children and youth with a wide range of anxiety disorders, 25 including social anxiety, 26 clinically elevated symptoms of generalized anxiety are the focus of the current meta-analysis. In addition to deriving pooled prevalence estimates, we examined demographic, geographical, and ...

  24. Are Claims About Magnesium for Stress and Sleep Backed by Research

    Findings from a New Meta-Analysis on Magnesium, Sleep, and Anxiety. A systematic review was recently conducted and published by Rawji et al. on the clinical evidence supporting the use of magnesium for quality of sleep and anxiety symptoms. Only data from interventional trials (randomized control trials or observational trials) was used, and ...

  25. Social Media Fact Sheet

    ABOUT PEW RESEARCH CENTER Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions.

  26. Meta news, analysis, trends, tactics and how-to guides from Search

    Search Engine Land is your source for Meta news and content. You'll find a variety of up-to-date and authoritative resources, including the latest news, tactic-rich tutorials, and the latest ...

  27. Meta-Analytic Methodology for Basic Research: A Practical Guide

    Meta-analysis refers to the statistical analysis of the data from independent primary studies focused on the same question, which aims to generate a quantitative estimate of the studied phenomenon, for example, the effectiveness of the intervention (Gopalakrishnan and Ganeshkumar, 2013). In clinical research, systematic reviews and meta ...

  28. META Quantitative Stock Analysis

    META PLATFORMS INC is a large-cap growth stock in the Business Services industry. The rating using this strategy is 88% based on the firm's underlying fundamentals and the stock's valuation.

  29. Sharing new research, models, and datasets from Meta FAIR

    As we shared in our research paper last month, Meta Chameleon is a family of models that can combine text and images as input and output any combination of text and images with a single unified architecture for both encoding and decoding. While most current late-fusion models use diffusion-based learning, Meta Chameleon uses tokenization for text and images.

  30. 35 Instagram Statistics That Matter to Marketers in 2024

    21. Reels make up more than half of content reshared in messages. According to Meta, Instagram Reels are the most popular format to share in DMs. With shares taking the lead as the most important engagement metric, that means engaging Reels are one of the best ways to connect with your audience and provide real value. 22.