  • Comparative Analysis

What It Is and Why It's Useful

Comparative analysis asks writers to make an argument about the relationship between two or more texts. Beyond that, there's a lot of variation, but three overarching kinds of comparative analysis stand out:

  • Coordinate (A ↔ B): In this kind of analysis, two (or more) texts are being read against each other in terms of a shared element, e.g., a memoir and a novel, both by Jesmyn Ward; two sets of data for the same experiment; a few op-ed responses to the same event; two YA books written in Chicago in the 2000s; a film adaption of a play; etc. 
  • Subordinate (A  → B) or (B → A ): Using a theoretical text (as a "lens") to explain a case study or work of art (e.g., how Anthony Jack's The Privileged Poor can help explain divergent experiences among students at elite four-year private colleges who are coming from similar socio-economic backgrounds) or using a work of art or case study (i.e., as a "test" of) a theory's usefulness or limitations (e.g., using coverage of recent incidents of gun violence or legislation un the U.S. to confirm or question the currency of Carol Anderson's The Second ).
  • Hybrid [A  → (B ↔ C)] or [(B ↔ C) → A] , i.e., using coordinate and subordinate analysis together. For example, using Jack to compare or contrast the experiences of students at elite four-year institutions with students at state universities and/or community colleges; or looking at gun culture in other countries and/or other timeframes to contextualize or generalize Anderson's main points about the role of the Second Amendment in U.S. history.

"In the wild," these three kinds of comparative analysis represent increasingly complex—and scholarly—modes of comparison. Students can of course compare two poems in terms of imagery or two data sets in terms of methods, but in each case the analysis will eventually be richer if the students have had a chance to encounter other people's ideas about how imagery or methods work. At that point, we're getting into a hybrid kind of reading (or even into research essays), especially if we start introducing different approaches to imagery or methods that are themselves being compared along with a couple (or few) poems or data sets.

Why It's Useful

In the context of a particular course, each kind of comparative analysis has its place and can be a useful step up from single-source analysis. Intellectually, comparative analysis helps overcome the "n of 1" problem that can face single-source analysis. That is, a writer drawing broad conclusions about the influence of the Iranian New Wave based on one film is relying entirely—and almost certainly too much—on that film to support those findings. In the context of even just one more film, though, the analysis is suddenly more likely to arrive at one of the best features of any comparative approach: both films will be more richly experienced than they would have been in isolation, and the themes or questions in terms of which they're being explored (here the general question of the influence of the Iranian New Wave) will arrive at conclusions that are less at-risk of oversimplification.

For scholars working in comparative fields or through comparative approaches, these features of comparative analysis animate their work. To borrow from a stock example in Western epistemology, our concept of "green" isn't based on a single encounter with something we intuit or are told is "green." Not at all. Our concept of "green" is derived from a complex set of experiences of what others say is green or what's labeled green or what seems to be something that's neither blue nor yellow but kind of both, etc. Comparative analysis essays offer us the chance to engage with that process—even if only enough to help us see where a more in-depth exploration with a higher and/or more diverse "n" might lead—and in that sense, from the standpoint of the subject matter students are exploring through writing as well the complexity of the genre of writing they're using to explore it—comparative analysis forms a bridge of sorts between single-source analysis and research essays.

Typical learning objectives for single-sources essays: formulate analytical questions and an arguable thesis, establish stakes of an argument, summarize sources accurately, choose evidence effectively, analyze evidence effectively, define key terms, organize argument logically, acknowledge and respond to counterargument, cite sources properly, and present ideas in clear prose.

Common types of comparative analysis essays and related types: two works in the same genre, two works from the same period (but in different places or in different cultures), a work adapted into a different genre or medium, two theories treating the same topic; a theory and a case study or other object, etc.

How to Teach It: Framing + Practice

Framing multi-source writing assignments (comparative analysis, research essays, multi-modal projects) is likely to overlap a great deal with "Why It's Useful" (see above), because the range of reasons why we might use these kinds of writing in academic or non-academic settings is itself the reason why they so often appear later in courses. In many courses, they're the best vehicles for exploring the complex questions that arise once we've been introduced to the course's main themes, core content, leading protagonists, and central debates.

For comparative analysis in particular, it's helpful to frame assignment's process and how it will help students successfully navigate the challenges and pitfalls presented by the genre. Ideally, this will mean students have time to identify what each text seems to be doing, take note of apparent points of connection between different texts, and start to imagine how those points of connection (or the absence thereof)

  • complicates or upends their own expectations or assumptions about the texts
  • complicates or refutes the expectations or assumptions about the texts presented by a scholar
  • confirms and/or nuances expectations and assumptions they themselves hold or scholars have presented
  • presents entirely unforeseen ways of understanding the texts

—and all with implications for the texts themselves or for the axes along which the comparative analysis took place. If students know that this is where their ideas will be heading, they'll be ready to develop those ideas and engage with the challenges that comparative analysis presents in terms of structure (See "Tips" and "Common Pitfalls" below for more on these elements of framing).

Like single-source analyses, comparative essays have several moving parts, and giving students practice here means adapting the sample sequence laid out at the " Formative Writing Assignments " page. Three areas that have already been mentioned above are worth noting:

  • Gathering evidence : Depending on what your assignment is asking students to compare (or in terms of what), students will benefit greatly from structured opportunities to create inventories or data sets of the motifs, examples, trajectories, etc., shared (or not shared) by the texts they'll be comparing. See the sample exercises below for a basic example of what this might look like.
  • Why it Matters: Moving beyond "x is like y but also different" or even "x is more like y than we might think at first" is what moves an essay from being "compare/contrast" to being a comparative analysis . It's also a move that can be hard to make and that will often evolve over the course of an assignment. A great way to get feedback from students about where they're at on this front? Ask them to start considering early on why their argument "matters" to different kinds of imagined audiences (while they're just gathering evidence) and again as they develop their thesis and again as they're drafting their essays. ( Cover letters , for example, are a great place to ask writers to imagine how a reader might be affected by reading an their argument.)
  • Structure: Having two texts on stage at the same time can suddenly feel a lot more complicated for any writer who's used to having just one at a time. Giving students a sense of what the most common patterns (AAA / BBB, ABABAB, etc.) are likely to be can help them imagine, even if provisionally, how their argument might unfold over a series of pages. See "Tips" and "Common Pitfalls" below for more information on this front.

Sample Exercises and Links to Other Resources

  • Common Pitfalls
  • Advice on Timing
  • Try to keep students from thinking of a proposed thesis as a commitment. Instead, help them see it as more of a hypothesis that has emerged out of readings and discussion and analytical questions and that they'll now test through an experiment, namely, writing their essay. When students see writing as part of the process of inquiry—rather than just the result—and when that process is committed to acknowledging and adapting itself to evidence, it makes writing assignments more scientific, more ethical, and more authentic. 
  • Have students create an inventory of touch points between the two texts early in the process.
  • Ask students to make the case—early on and at points throughout the process—for the significance of the claim they're making about the relationship between the texts they're comparing.
  • For coordinate kinds of comparative analysis, a common pitfall is tied to thesis and evidence. Basically, it's a thesis that tells the reader that there are "similarities and differences" between two texts, without telling the reader why it matters that these two texts have or don't have these particular features in common. This kind of thesis is stuck at the level of description or positivism, and it's not uncommon when a writer is grappling with the complexity that can in fact accompany the "taking inventory" stage of comparative analysis. The solution is to make the "taking inventory" stage part of the process of the assignment. When this stage comes before students have formulated a thesis, that formulation is then able to emerge out of a comparative data set, rather than the data set emerging in terms of their thesis (which can lead to confirmation bias, or frequency illusion, or—just for the sake of streamlining the process of gathering evidence—cherry picking). 
  • For subordinate kinds of comparative analysis , a common pitfall is tied to how much weight is given to each source. Having students apply a theory (in a "lens" essay) or weigh the pros and cons of a theory against case studies (in a "test a theory") essay can be a great way to help them explore the assumptions, implications, and real-world usefulness of theoretical approaches. The pitfall of these approaches is that they can quickly lead to the same biases we saw here above. Making sure that students know they should engage with counterevidence and counterargument, and that "lens" / "test a theory" approaches often balance each other out in any real-world application of theory is a good way to get out in front of this pitfall.
  • For any kind of comparative analysis, a common pitfall is structure. Every comparative analysis asks writers to move back and forth between texts, and that can pose a number of challenges, including: what pattern the back and forth should follow and how to use transitions and other signposting to make sure readers can follow the overarching argument as the back and forth is taking place. Here's some advice from an experienced writing instructor to students about how to think about these considerations:

a quick note on STRUCTURE

     Most of us have encountered the question of whether to adopt what we might term the “A→A→A→B→B→B” structure or the “A→B→A→B→A→B” structure.  Do we make all of our points about text A before moving on to text B?  Or do we go back and forth between A and B as the essay proceeds?  As always, the answers to our questions about structure depend on our goals in the essay as a whole.  In a “similarities in spite of differences” essay, for instance, readers will need to encounter the differences between A and B before we offer them the similarities (A d →B d →A s →B s ).  If, rather than subordinating differences to similarities you are subordinating text A to text B (using A as a point of comparison that reveals B’s originality, say), you may be well served by the “A→A→A→B→B→B” structure.  

     Ultimately, you need to ask yourself how many “A→B” moves you have in you.  Is each one identical?  If so, you may wish to make the transition from A to B only once (“A→A→A→B→B→B”), because if each “A→B” move is identical, the “A→B→A→B→A→B” structure will appear to involve nothing more than directionless oscillation and repetition.  If each is increasingly complex, however—if each AB pair yields a new and progressively more complex idea about your subject—you may be well served by the “A→B→A→B→A→B” structure, because in this case it will be visible to readers as a progressively developing argument.

As we discussed in "Advice on Timing" at the page on single-source analysis, that timeline itself roughly follows the "Sample Sequence of Formative Assignments for a 'Typical' Essay" outlined under " Formative Writing Assignments, " and it spans about 5–6 steps or 2–4 weeks. 

Comparative analysis assignments have a lot of the same DNA as single-source essays, but they potentially bring more reading into play and ask students to engage in more complicated acts of analysis and synthesis during the drafting stages. With that in mind, closer to 4 weeks is probably a good baseline for many single-source analysis assignments. For sections that meet once per week, the timeline will either probably need to expand—ideally—a little past the 4-week side of things, or some of the steps will need to be combined or done asynchronously.

What It Can Build Up To

Comparative analyses can build up to other kinds of writing in a number of ways. For example:

  • They can build toward other kinds of comparative analysis, e.g., student can be asked to choose an additional source to complicate their conclusions from a previous analysis, or they can be asked to revisit an analysis using a different axis of comparison, such as race instead of class. (These approaches are akin to moving from a coordinate or subordinate analysis to more of a hybrid approach.)
  • They can scaffold up to research essays, which in many instances are an extension of a "hybrid comparative analysis."
  • Like single-source analysis, in a course where students will take a "deep dive" into a source or topic for their capstone, they can allow students to "try on" a theoretical approach or genre or time period to see if it's indeed something they want to research more fully.
What is comparative analysis? A complete guide

Last updated

18 April 2023

Reviewed by

Jean Kaluza

Comparative analysis is a valuable tool for acquiring deep insights into your organization’s processes, products, and services so you can continuously improve them. 

Similarly, if you want to streamline, price appropriately, and ultimately be a market leader, you’ll likely need to draw on comparative analyses quite often.

When faced with multiple options or solutions to a given problem, a thorough comparative analysis can help you compare and contrast your options and make a clear, informed decision.

If you want to get up to speed on conducting a comparative analysis or need a refresher, here’s your guide.

Make comparative analysis less tedious

Dovetail streamlines comparative analysis to help you uncover and share actionable insights

  • What exactly is comparative analysis?

A comparative analysis is a side-by-side comparison that systematically compares two or more things to pinpoint their similarities and differences. The focus of the investigation might be conceptual—a particular problem, idea, or theory—or perhaps something more tangible, like two different data sets.

For instance, you could use comparative analysis to investigate how your product features measure up to the competition.

After a successful comparative analysis, you should be able to identify strengths and weaknesses and clearly understand which product is more effective.

You could also use comparative analysis to examine different methods of producing that product and determine which way is most efficient and profitable.

The potential applications for using comparative analysis in everyday business are almost unlimited. That said, a comparative analysis is most commonly used to examine

Emerging trends and opportunities (new technologies, marketing)

Competitor strategies

Financial health

Effects of trends on a target audience

Free AI content analysis generator

Make sense of your research by automatically summarizing key takeaways through our free content analysis tool.

comparative analysis of research papers

  • Why is comparative analysis so important? 

Comparative analysis can help narrow your focus so your business pursues the most meaningful opportunities rather than attempting dozens of improvements simultaneously.

A comparative approach also helps frame up data to illuminate interrelationships. For example, comparative research might reveal nuanced relationships or critical contexts behind specific processes or dependencies that wouldn’t be well-understood without the research.

For instance, if your business compares the cost of producing several existing products relative to which ones have historically sold well, that should provide helpful information once you’re ready to look at developing new products or features.

  • Comparative vs. competitive analysis—what’s the difference?

Comparative analysis is generally divided into three subtypes, using quantitative or qualitative data and then extending the findings to a larger group. These include

Pattern analysis —identifying patterns or recurrences of trends and behavior across large data sets.

Data filtering —analyzing large data sets to extract an underlying subset of information. It may involve rearranging, excluding, and apportioning comparative data to fit different criteria. 

Decision tree —flowcharting to visually map and assess potential outcomes, costs, and consequences.

In contrast, competitive analysis is a type of comparative analysis in which you deeply research one or more of your industry competitors. In this case, you’re using qualitative research to explore what the competition is up to across one or more dimensions.

For example

Service delivery —metrics like the Net Promoter Scores indicate customer satisfaction levels.

Market position — the share of the market that the competition has captured.

Brand reputation —how well-known or recognized your competitors are within their target market.

  • Tips for optimizing your comparative analysis

Conduct original research

Thorough, independent research is a significant asset when doing comparative analysis. It provides evidence to support your findings and may present a perspective or angle not considered previously. 

Make analysis routine

To get the maximum benefit from comparative research, make it a regular practice, and establish a cadence you can realistically stick to. Some business areas you could plan to analyze regularly include:



Experiment with controlled and uncontrolled variables

In addition to simply comparing and contrasting, explore how different variables might affect your outcomes.

For example, a controllable variable would be offering a seasonal feature like a shopping bot to assist in holiday shopping or raising or lowering the selling price of a product.

Uncontrollable variables include weather, changing regulations, the current political climate, or global pandemics.

Put equal effort into each point of comparison

Most people enter into comparative research with a particular idea or hypothesis already in mind to validate. For instance, you might try to prove the worthwhileness of launching a new service. So, you may be disappointed if your analysis results don’t support your plan.

However, in any comparative analysis, try to maintain an unbiased approach by spending equal time debating the merits and drawbacks of any decision. Ultimately, this will be a practical, more long-term sustainable approach for your business than focusing only on the evidence that favors pursuing your argument or strategy.

Writing a comparative analysis in five steps

To put together a coherent, insightful analysis that goes beyond a list of pros and cons or similarities and differences, try organizing the information into these five components:

1. Frame of reference

Here is where you provide context. First, what driving idea or problem is your research anchored in? Then, for added substance, cite existing research or insights from a subject matter expert, such as a thought leader in marketing, startup growth, or investment

2. Grounds for comparison Why have you chosen to examine the two things you’re analyzing instead of focusing on two entirely different things? What are you hoping to accomplish?

3. Thesis What argument or choice are you advocating for? What will be the before and after effects of going with either decision? What do you anticipate happening with and without this approach?

For example, “If we release an AI feature for our shopping cart, we will have an edge over the rest of the market before the holiday season.” The finished comparative analysis will weigh all the pros and cons of choosing to build the new expensive AI feature including variables like how “intelligent” it will be, what it “pushes” customers to use, how much it takes off the plates of customer service etc.

Ultimately, you will gauge whether building an AI feature is the right plan for your e-commerce shop.

4. Organize the scheme Typically, there are two ways to organize a comparative analysis report. First, you can discuss everything about comparison point “A” and then go into everything about aspect “B.” Or, you alternate back and forth between points “A” and “B,” sometimes referred to as point-by-point analysis.

Using the AI feature as an example again, you could cover all the pros and cons of building the AI feature, then discuss the benefits and drawbacks of building and maintaining the feature. Or you could compare and contrast each aspect of the AI feature, one at a time. For example, a side-by-side comparison of the AI feature to shopping without it, then proceeding to another point of differentiation.

5. Connect the dots Tie it all together in a way that either confirms or disproves your hypothesis.

For instance, “Building the AI bot would allow our customer service team to save 12% on returns in Q3 while offering optimizations and savings in future strategies. However, it would also increase the product development budget by 43% in both Q1 and Q2. Our budget for product development won’t increase again until series 3 of funding is reached, so despite its potential, we will hold off building the bot until funding is secured and more opportunities and benefits can be proved effective.”

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

comparative analysis of research papers

Writing a Paper: Comparing & Contrasting

A compare and contrast paper discusses the similarities and differences between two or more topics. The paper should contain an introduction with a thesis statement, a body where the comparisons and contrasts are discussed, and a conclusion.

Address Both Similarities and Differences

Because this is a compare and contrast paper, both the similarities and differences should be discussed. This will require analysis on your part, as some topics will appear to be quite similar, and you will have to work to find the differing elements.

Make Sure You Have a Clear Thesis Statement

Just like any other essay, a compare and contrast essay needs a thesis statement. The thesis statement should not only tell your reader what you will do, but it should also address the purpose and importance of comparing and contrasting the material.

Use Clear Transitions

Transitions are important in compare and contrast essays, where you will be moving frequently between different topics or perspectives.

  • Examples of transitions and phrases for comparisons: as well, similar to, consistent with, likewise, too
  • Examples of transitions and phrases for contrasts: on the other hand, however, although, differs, conversely, rather than.

For more information, check out our transitions page.

Structure Your Paper

Consider how you will present the information. You could present all of the similarities first and then present all of the differences. Or you could go point by point and show the similarity and difference of one point, then the similarity and difference for another point, and so on.

Include Analysis

It is tempting to just provide summary for this type of paper, but analysis will show the importance of the comparisons and contrasts. For instance, if you are comparing two articles on the topic of the nursing shortage, help us understand what this will achieve. Did you find consensus between the articles that will support a certain action step for people in the field? Did you find discrepancies between the two that point to the need for further investigation?

Make Analogous Comparisons

When drawing comparisons or making contrasts, be sure you are dealing with similar aspects of each item. To use an old cliché, are you comparing apples to apples?

  • Example of poor comparisons: Kubista studied the effects of a later start time on high school students, but Cook used a mixed methods approach. (This example does not compare similar items. It is not a clear contrast because the sentence does not discuss the same element of the articles. It is like comparing apples to oranges.)
  • Example of analogous comparisons: Cook used a mixed methods approach, whereas Kubista used only quantitative methods. (Here, methods are clearly being compared, allowing the reader to understand the distinction.

Related Webinar


Didn't find what you need? Email us at [email protected] .

  • Previous Page: Developing Arguments
  • Next Page: Avoiding Logical Fallacies
How do I write a comparative analysis?

A comparative analysis is an essay in which two things are compared and contrasted. You may have done a "compare and contrast" paper in your English class, and a comparative analysis is the same general idea, but as a graduate student you are expected to produce a higher level of analysis in your writing. You can follow these guidelines to get started. 

  • Conduct your research. Need help? Ask a Librarian!
  • Brainstorm a list of similarities and differences. The Double Bubble  document linked below can be helpful for this step.
  • Write your thesis. This will be based on what you have discovered regarding the weight of similarities and differences between the things you are comparing. 
  • Alternating (point-by-point) method: Find similar points between each subject and alternate writing about each of them.
  • Block (subject-by-subject) method: Discuss all of the first subject and then all of the second.
  • This page from the University of Toronto gives some great examples of when each of these is most effective.
  • Don't forget to cite your sources! 

Visvis, V., & Plotnik, J. (n.d.). The comparative essay . University of Toronto. https://advice.writing.utoronto.ca/types-of-writing/comparative-essay/

Walk, K. (1998). How to write a comparative analysis . Harvard University. https://writingcenter.fas.harvard.edu/pages/how-write-comparative-analysis

Links & Files

  • Double_Bubble_Map.docx
  • Health Sciences
  • Reading and Writing
  • Graduate Writing
  • Last Updated Sep 06, 2023
  • Views 128753
  • Answered By Kerry Louvier

Sociology Group: Welcome to Social Sciences Blog

How to Do Comparative Analysis in Research ( Examples )

Comparative analysis is a method that is widely used in social science . It is a method of comparing two or more items with an idea of uncovering and discovering new ideas about them. It often compares and contrasts social structures and processes around the world to grasp general patterns. Comparative analysis tries to understand the study and explain every element of data that comparing. 

Comparative Analysis in Social SCIENCE RESEARCH

We often compare and contrast in our daily life. So it is usual to compare and contrast the culture and human society. We often heard that ‘our culture is quite good than theirs’ or ‘their lifestyle is better than us’. In social science, the social scientist compares primitive, barbarian, civilized, and modern societies. They use this to understand and discover the evolutionary changes that happen to society and its people.  It is not only used to understand the evolutionary processes but also to identify the differences, changes, and connections between societies.

Most social scientists are involved in comparative analysis. Macfarlane has thought that “On account of history, the examinations are typically on schedule, in that of other sociologies, transcendently in space. The historian always takes their society and compares it with the past society, and analyzes how far they differ from each other.

The comparative method of social research is a product of 19 th -century sociology and social anthropology. Sociologists like Emile Durkheim, Herbert Spencer Max Weber used comparative analysis in their works. For example, Max Weber compares the protestant of Europe with Catholics and also compared it with other religions like Islam, Hinduism, and Confucianism.

To do a systematic comparison we need to follow different elements of the method.

1. Methods of comparison The comparison method

In social science, we can do comparisons in different ways. It is merely different based on the topic, the field of study. Like Emile Durkheim compare societies as organic solidarity and mechanical solidarity. The famous sociologist Emile Durkheim provides us with three different approaches to the comparative method. Which are;

  • The first approach is to identify and select one particular society in a fixed period. And by doing that, we can identify and determine the relationship, connections and differences exist in that particular society alone. We can find their religious practices, traditions, law, norms etc.
  •  The second approach is to consider and draw various societies which have common or similar characteristics that may vary in some ways. It may be we can select societies at a specific period, or we can select societies in the different periods which have common characteristics but vary in some ways. For example, we can take European and American societies (which are universally similar characteristics) in the 20 th century. And we can compare and contrast their society in terms of law, custom, tradition, etc. 
  • The third approach he envisaged is to take different societies of different times that may share some similar characteristics or maybe show revolutionary changes. For example, we can compare modern and primitive societies which show us revolutionary social changes.

2 . The unit of comparison

We cannot compare every aspect of society. As we know there are so many things that we cannot compare. The very success of the compare method is the unit or the element that we select to compare. We are only able to compare things that have some attributes in common. For example, we can compare the existing family system in America with the existing family system in Europe. But we are not able to compare the food habits in china with the divorce rate in America. It is not possible. So, the next thing you to remember is to consider the unit of comparison. You have to select it with utmost care.

3. The motive of comparison

As another method of study, a comparative analysis is one among them for the social scientist. The researcher or the person who does the comparative method must know for what grounds they taking the comparative method. They have to consider the strength, limitations, weaknesses, etc. He must have to know how to do the analysis.

Steps of the comparative method

1. Setting up of a unit of comparison

As mentioned earlier, the first step is to consider and determine the unit of comparison for your study. You must consider all the dimensions of your unit. This is where you put the two things you need to compare and to properly analyze and compare it. It is not an easy step, we have to systematically and scientifically do this with proper methods and techniques. You have to build your objectives, variables and make some assumptions or ask yourself about what you need to study or make a hypothesis for your analysis.

The best casings of reference are built from explicit sources instead of your musings or perceptions. To do that you can select some attributes in the society like marriage, law, customs, norms, etc. by doing this you can easily compare and contrast the two societies that you selected for your study. You can set some questions like, is the marriage practices of Catholics are different from Protestants? Did men and women get an equal voice in their mate choice? You can set as many questions that you wanted. Because that will explore the truth about that particular topic. A comparative analysis must have these attributes to study. A social scientist who wishes to compare must develop those research questions that pop up in your mind. A study without those is not going to be a fruitful one.

2. Grounds of comparison

The grounds of comparison should be understandable for the reader. You must acknowledge why you selected these units for your comparison. For example, it is quite natural that a person who asks why you choose this what about another one? What is the reason behind choosing this particular society? If a social scientist chooses primitive Asian society and primitive Australian society for comparison, he must acknowledge the grounds of comparison to the readers. The comparison of your work must be self-explanatory without any complications.

If you choose two particular societies for your comparative analysis you must convey to the reader what are you intended to choose this and the reason for choosing that society in your analysis.

3 . Report or thesis

The main element of the comparative analysis is the thesis or the report. The report is the most important one that it must contain all your frame of reference. It must include all your research questions, objectives of your topic, the characteristics of your two units of comparison, variables in your study, and last but not least the finding and conclusion must be written down. The findings must be self-explanatory because the reader must understand to what extent did they connect and what are their differences. For example, in Emile Durkheim’s Theory of Division of Labour, he classified organic solidarity and Mechanical solidarity . In which he means primitive society as Mechanical solidarity and modern society as Organic Solidarity. Like that you have to mention what are your findings in the thesis.

4. Relationship and linking one to another

Your paper must link each point in the argument. Without that the reader does not understand the logical and rational advance in your analysis. In a comparative analysis, you need to compare the ‘x’ and ‘y’ in your paper. (x and y mean the two-unit or things in your comparison). To do that you can use likewise, similarly, on the contrary, etc. For example, if we do a comparison between primitive society and modern society we can say that; ‘in the primitive society the division of labour is based on gender and age on the contrary (or the other hand), in modern society, the division of labour is based on skill and knowledge of a person.

Demerits of comparison

Comparative analysis is not always successful. It has some limitations. The broad utilization of comparative analysis can undoubtedly cause the feeling that this technique is a solidly settled, smooth, and unproblematic method of investigation, which because of its undeniable intelligent status can produce dependable information once some specialized preconditions are met acceptably.

Perhaps the most fundamental issue here respects the independence of the unit picked for comparison. As different types of substances are gotten to be analyzed, there is frequently a fundamental and implicit supposition about their independence and a quiet propensity to disregard the mutual influences and common impacts among the units.

One more basic issue with broad ramifications concerns the decision of the units being analyzed. The primary concern is that a long way from being a guiltless as well as basic assignment, the decision of comparison units is a basic and precarious issue. The issue with this sort of comparison is that in such investigations the depictions of the cases picked for examination with the principle one will in general turn out to be unreasonably streamlined, shallow, and stylised with contorted contentions and ends as entailment.

However, a comparative analysis is as yet a strategy with exceptional benefits, essentially due to its capacity to cause us to perceive the restriction of our psyche and check against the weaknesses and hurtful results of localism and provincialism. We may anyway have something to gain from history specialists’ faltering in utilizing comparison and from their regard for the uniqueness of settings and accounts of people groups. All of the above, by doing the comparison we discover the truths the underlying and undiscovered connection, differences that exist in society.

Also Read: How to write a Sociology Analysis? Explained with Examples

comparative analysis of research papers

Sociology Group

We believe in sharing knowledge with everyone and making a positive change in society through our work and contributions. If you are interested in joining us, please check our 'About' page for more information

comparative analysis of research papers

What is Comparative Analysis and How to Conduct It? (+ Examples)

Appinio Research · 30.10.2023 · 36min read

What Is Comparative Analysis and How to Conduct It Examples

Have you ever faced a complex decision, wondering how to make the best choice among multiple options? In a world filled with data and possibilities, the art of comparative analysis holds the key to unlocking clarity amidst the chaos.

In this guide, we'll demystify the power of comparative analysis, revealing its practical applications, methodologies, and best practices. Whether you're a business leader, researcher, or simply someone seeking to make more informed decisions, join us as we explore the intricacies of comparative analysis and equip you with the tools to chart your course with confidence.

What is Comparative Analysis?

Comparative analysis is a systematic approach used to evaluate and compare two or more entities, variables, or options to identify similarities, differences, and patterns. It involves assessing the strengths, weaknesses, opportunities, and threats associated with each entity or option to make informed decisions.

The primary purpose of comparative analysis is to provide a structured framework for decision-making by:

  • Facilitating Informed Choices: Comparative analysis equips decision-makers with data-driven insights, enabling them to make well-informed choices among multiple options.
  • Identifying Trends and Patterns: It helps identify recurring trends, patterns, and relationships among entities or variables, shedding light on underlying factors influencing outcomes.
  • Supporting Problem Solving: Comparative analysis aids in solving complex problems by systematically breaking them down into manageable components and evaluating potential solutions.
  • Enhancing Transparency: By comparing multiple options, comparative analysis promotes transparency in decision-making processes, allowing stakeholders to understand the rationale behind choices.
  • Mitigating Risks : It helps assess the risks associated with each option, allowing organizations to develop risk mitigation strategies and make risk-aware decisions.
  • Optimizing Resource Allocation: Comparative analysis assists in allocating resources efficiently by identifying areas where resources can be optimized for maximum impact.
  • Driving Continuous Improvement: By comparing current performance with historical data or benchmarks, organizations can identify improvement areas and implement growth strategies.

Importance of Comparative Analysis in Decision-Making

  • Data-Driven Decision-Making: Comparative analysis relies on empirical data and objective evaluation, reducing the influence of biases and subjective judgments in decision-making. It ensures decisions are based on facts and evidence.
  • Objective Assessment: It provides an objective and structured framework for evaluating options, allowing decision-makers to focus on key criteria and avoid making decisions solely based on intuition or preferences.
  • Risk Assessment: Comparative analysis helps assess and quantify risks associated with different options. This risk awareness enables organizations to make proactive risk management decisions.
  • Prioritization: By ranking options based on predefined criteria, comparative analysis enables decision-makers to prioritize actions or investments, directing resources to areas with the most significant impact.
  • Strategic Planning: It is integral to strategic planning, helping organizations align their decisions with overarching goals and objectives. Comparative analysis ensures decisions are consistent with long-term strategies.
  • Resource Allocation: Organizations often have limited resources. Comparative analysis assists in allocating these resources effectively, ensuring they are directed toward initiatives with the highest potential returns.
  • Continuous Improvement: Comparative analysis supports a culture of continuous improvement by identifying areas for enhancement and guiding iterative decision-making processes.
  • Stakeholder Communication: It enhances transparency in decision-making, making it easier to communicate decisions to stakeholders. Stakeholders can better understand the rationale behind choices when supported by comparative analysis.
  • Competitive Advantage: In business and competitive environments , comparative analysis can provide a competitive edge by identifying opportunities to outperform competitors or address weaknesses.
  • Informed Innovation: When evaluating new products , technologies, or strategies, comparative analysis guides the selection of the most promising options, reducing the risk of investing in unsuccessful ventures.

In summary, comparative analysis is a valuable tool that empowers decision-makers across various domains to make informed, data-driven choices, manage risks, allocate resources effectively, and drive continuous improvement. Its structured approach enhances decision quality and transparency, contributing to the success and competitiveness of organizations and research endeavors.

How to Prepare for Comparative Analysis?

1. define objectives and scope.

Before you begin your comparative analysis, clearly defining your objectives and the scope of your analysis is essential. This step lays the foundation for the entire process. Here's how to approach it:

  • Identify Your Goals: Start by asking yourself what you aim to achieve with your comparative analysis. Are you trying to choose between two products for your business? Are you evaluating potential investment opportunities? Knowing your objectives will help you stay focused throughout the analysis.
  • Define Scope: Determine the boundaries of your comparison. What will you include, and what will you exclude? For example, if you're analyzing market entry strategies for a new product, specify whether you're looking at a specific geographic region or a particular target audience.
  • Stakeholder Alignment: Ensure that all stakeholders involved in the analysis understand and agree on the objectives and scope. This alignment will prevent misunderstandings and ensure the analysis meets everyone's expectations.

2. Gather Relevant Data and Information

The quality of your comparative analysis heavily depends on the data and information you gather. Here's how to approach this crucial step:

  • Data Sources: Identify where you'll obtain the necessary data. Will you rely on primary sources , such as surveys and interviews, to collect original data? Or will you use secondary sources, like published research and industry reports, to access existing data? Consider the advantages and disadvantages of each source.
  • Data Collection Plan: Develop a plan for collecting data. This should include details about the methods you'll use, the timeline for data collection, and who will be responsible for gathering the data.
  • Data Relevance: Ensure that the data you collect is directly relevant to your objectives. Irrelevant or extraneous data can lead to confusion and distract from the core analysis.

3. Select Appropriate Criteria for Comparison

Choosing the right criteria for comparison is critical to a successful comparative analysis. Here's how to go about it:

  • Relevance to Objectives: Your chosen criteria should align closely with your analysis objectives. For example, if you're comparing job candidates, your criteria might include skills, experience, and cultural fit.
  • Measurability: Consider whether you can quantify the criteria. Measurable criteria are easier to analyze. If you're comparing marketing campaigns, you might measure criteria like click-through rates, conversion rates, and return on investment.
  • Weighting Criteria : Not all criteria are equally important. You'll need to assign weights to each criterion based on its relative importance. Weighting helps ensure that the most critical factors have a more significant impact on the final decision.

4. Establish a Clear Framework

Once you have your objectives, data, and criteria in place, it's time to establish a clear framework for your comparative analysis. This framework will guide your process and ensure consistency. Here's how to do it:

  • Comparative Matrix: Consider using a comparative matrix or spreadsheet to organize your data. Each row in the matrix represents an option or entity you're comparing, and each column corresponds to a criterion. This visual representation makes it easy to compare and contrast data.
  • Timeline: Determine the time frame for your analysis. Is it a one-time comparison, or will you conduct ongoing analyses? Having a defined timeline helps you manage the analysis process efficiently.
  • Define Metrics: Specify the metrics or scoring system you'll use to evaluate each criterion. For example, if you're comparing potential office locations, you might use a scoring system from 1 to 5 for factors like cost, accessibility, and amenities.

With your objectives, data, criteria, and framework established, you're ready to move on to the next phase of comparative analysis: data collection and organization.

Comparative Analysis Data Collection

Data collection and organization are critical steps in the comparative analysis process. We'll explore how to gather and structure the data you need for a successful analysis.

1. Utilize Primary Data Sources

Primary data sources involve gathering original data directly from the source. This approach offers unique advantages, allowing you to tailor your data collection to your specific research needs.

Some popular primary data sources include:

  • Surveys and Questionnaires: Design surveys or questionnaires and distribute them to collect specific information from individuals or groups. This method is ideal for obtaining firsthand insights, such as customer preferences or employee feedback.
  • Interviews: Conduct structured interviews with relevant stakeholders or experts. Interviews provide an opportunity to delve deeper into subjects and gather qualitative data, making them valuable for in-depth analysis.
  • Observations: Directly observe and record data from real-world events or settings. Observational data can be instrumental in fields like anthropology, ethnography, and environmental studies.
  • Experiments: In controlled environments, experiments allow you to manipulate variables and measure their effects. This method is common in scientific research and product testing.

When using primary data sources, consider factors like sample size , survey design, and data collection methods to ensure the reliability and validity of your data.

2. Harness Secondary Data Sources

Secondary data sources involve using existing data collected by others. These sources can provide a wealth of information and save time and resources compared to primary data collection.

Here are common types of secondary data sources:

  • Public Records: Government publications, census data, and official reports offer valuable information on demographics, economic trends, and public policies. They are often free and readily accessible.
  • Academic Journals: Scholarly articles provide in-depth research findings across various disciplines. They are helpful for accessing peer-reviewed studies and staying current with academic discourse.
  • Industry Reports: Industry-specific reports and market research publications offer insights into market trends, consumer behavior, and competitive landscapes. They are essential for businesses making strategic decisions.
  • Online Databases: Online platforms like Statista , PubMed , and Google Scholar provide a vast repository of data and research articles. They offer search capabilities and access to a wide range of data sets.

When using secondary data sources, critically assess the credibility, relevance, and timeliness of the data. Ensure that it aligns with your research objectives.

3. Ensure and Validate Data Quality

Data quality is paramount in comparative analysis. Poor-quality data can lead to inaccurate conclusions and flawed decision-making. Here's how to ensure data validation and reliability:

  • Cross-Verification: Whenever possible, cross-verify data from multiple sources. Consistency among different sources enhances the reliability of the data.
  • Sample Size : Ensure that your data sample size is statistically significant for meaningful analysis. A small sample may not accurately represent the population.
  • Data Integrity: Check for data integrity issues, such as missing values, outliers, or duplicate entries. Address these issues before analysis to maintain data quality.
  • Data Source Reliability: Assess the reliability and credibility of the data sources themselves. Consider factors like the reputation of the institution or organization providing the data.

4. Organize Data Effectively

Structuring your data for comparison is a critical step in the analysis process. Organized data makes it easier to draw insights and make informed decisions. Here's how to structure data effectively:

  • Data Cleaning: Before analysis, clean your data to remove inconsistencies, errors, and irrelevant information. Data cleaning may involve data transformation, imputation of missing values, and removing outliers.
  • Normalization: Standardize data to ensure fair comparisons. Normalization adjusts data to a standard scale, making comparing variables with different units or ranges possible.
  • Variable Labeling: Clearly label variables and data points for easy identification. Proper labeling enhances the transparency and understandability of your analysis.
  • Data Organization: Organize data into a format that suits your analysis methods. For quantitative analysis, this might mean creating a matrix, while qualitative analysis may involve categorizing data into themes.

By paying careful attention to data collection, validation, and organization, you'll set the stage for a robust and insightful comparative analysis. Next, we'll explore various methodologies you can employ in your analysis, ranging from qualitative approaches to quantitative methods and examples.

Comparative Analysis Methods

When it comes to comparative analysis, various methodologies are available, each suited to different research goals and data types. In this section, we'll explore five prominent methodologies in detail.

Qualitative Comparative Analysis (QCA)

Qualitative Comparative Analysis (QCA) is a methodology often used when dealing with complex, non-linear relationships among variables. It seeks to identify patterns and configurations among factors that lead to specific outcomes.

  • Case-by-Case Analysis: QCA involves evaluating individual cases (e.g., organizations, regions, or events) rather than analyzing aggregate data. Each case's unique characteristics are considered.
  • Boolean Logic: QCA employs Boolean algebra to analyze data. Variables are categorized as either present or absent, allowing for the examination of different combinations and logical relationships.
  • Necessary and Sufficient Conditions: QCA aims to identify necessary and sufficient conditions for a specific outcome to occur. It helps answer questions like, "What conditions are necessary for a successful product launch?"
  • Fuzzy Set Theory: In some cases, QCA may use fuzzy set theory to account for degrees of membership in a category, allowing for more nuanced analysis.

QCA is particularly useful in fields such as sociology, political science, and organizational studies, where understanding complex interactions is essential.

Quantitative Comparative Analysis

Quantitative Comparative Analysis involves the use of numerical data and statistical techniques to compare and analyze variables. It's suitable for situations where data is quantitative, and relationships can be expressed numerically.

  • Statistical Tools: Quantitative comparative analysis relies on statistical methods like regression analysis, correlation, and hypothesis testing. These tools help identify relationships, dependencies, and trends within datasets.
  • Data Measurement: Ensure that variables are measured consistently using appropriate scales (e.g., ordinal, interval, ratio) for meaningful analysis. Variables may include numerical values like revenue, customer satisfaction scores, or product performance metrics.
  • Data Visualization: Create visual representations of data using charts, graphs, and plots. Visualization aids in understanding complex relationships and presenting findings effectively.
  • Statistical Significance: Assess the statistical significance of relationships. Statistical significance indicates whether observed differences or relationships are likely to be real rather than due to chance.

Quantitative comparative analysis is commonly applied in economics, social sciences, and market research to draw empirical conclusions from numerical data.

Case Studies

Case studies involve in-depth examinations of specific instances or cases to gain insights into real-world scenarios. Comparative case studies allow researchers to compare and contrast multiple cases to identify patterns, differences, and lessons.

  • Narrative Analysis: Case studies often involve narrative analysis, where researchers construct detailed narratives of each case, including context, events, and outcomes.
  • Contextual Understanding: In comparative case studies, it's crucial to consider the context within which each case operates. Understanding the context helps interpret findings accurately.
  • Cross-Case Analysis: Researchers conduct cross-case analysis to identify commonalities and differences across cases. This process can lead to the discovery of factors that influence outcomes.
  • Triangulation: To enhance the validity of findings, researchers may use multiple data sources and methods to triangulate information and ensure reliability.

Case studies are prevalent in fields like psychology, business, and sociology, where deep insights into specific situations are valuable.

SWOT Analysis

SWOT Analysis is a strategic tool used to assess the Strengths, Weaknesses, Opportunities, and Threats associated with a particular entity or situation. While it's commonly used in business, it can be adapted for various comparative analyses.

  • Internal and External Factors: SWOT Analysis examines both internal factors (Strengths and Weaknesses), such as organizational capabilities, and external factors (Opportunities and Threats), such as market conditions and competition.
  • Strategic Planning: The insights from SWOT Analysis inform strategic decision-making. By identifying strengths and opportunities, organizations can leverage their advantages. Likewise, addressing weaknesses and threats helps mitigate risks.
  • Visual Representation: SWOT Analysis is often presented as a matrix or a 2x2 grid, making it visually accessible and easy to communicate to stakeholders.
  • Continuous Monitoring: SWOT Analysis is not a one-time exercise. Organizations use it periodically to adapt to changing circumstances and make informed decisions.

SWOT Analysis is versatile and can be applied in business, healthcare, education, and any context where a structured assessment of factors is needed.


Benchmarking involves comparing an entity's performance, processes, or practices to those of industry leaders or best-in-class organizations. It's a powerful tool for continuous improvement and competitive analysis.

  • Identify Performance Gaps: Benchmarking helps identify areas where an entity lags behind its peers or industry standards. These performance gaps highlight opportunities for improvement.
  • Data Collection: Gather data on key performance metrics from both internal and external sources. This data collection phase is crucial for meaningful comparisons.
  • Comparative Analysis: Compare your organization's performance data with that of benchmark organizations. This analysis can reveal where you excel and where adjustments are needed.
  • Continuous Improvement: Benchmarking is a dynamic process that encourages continuous improvement. Organizations use benchmarking findings to set performance goals and refine their strategies.

Benchmarking is widely used in business, manufacturing, healthcare, and customer service to drive excellence and competitiveness.

Each of these methodologies brings a unique perspective to comparative analysis, allowing you to choose the one that best aligns with your research objectives and the nature of your data. The choice between qualitative and quantitative methods, or a combination of both, depends on the complexity of the analysis and the questions you seek to answer.

How to Conduct Comparative Analysis?

Once you've prepared your data and chosen an appropriate methodology, it's time to dive into the process of conducting a comparative analysis. We will guide you through the essential steps to extract meaningful insights from your data.

What Is Comparative Analysis and How to Conduct It Examples

1. Identify Key Variables and Metrics

Identifying key variables and metrics is the first crucial step in conducting a comparative analysis. These are the factors or indicators you'll use to assess and compare your options.

  • Relevance to Objectives: Ensure the chosen variables and metrics align closely with your analysis objectives. When comparing marketing strategies, relevant metrics might include customer acquisition cost, conversion rate, and retention.
  • Quantitative vs. Qualitative : Decide whether your analysis will focus on quantitative data (numbers) or qualitative data (descriptive information). In some cases, a combination of both may be appropriate.
  • Data Availability: Consider the availability of data. Ensure you can access reliable and up-to-date data for all selected variables and metrics.
  • KPIs: Key Performance Indicators (KPIs) are often used as the primary metrics in comparative analysis. These are metrics that directly relate to your goals and objectives.

2. Visualize Data for Clarity

Data visualization techniques play a vital role in making complex information more accessible and understandable. Effective data visualization allows you to convey insights and patterns to stakeholders. Consider the following approaches:

  • Charts and Graphs: Use various types of charts, such as bar charts, line graphs, and pie charts, to represent data. For example, a line graph can illustrate trends over time, while a bar chart can compare values across categories.
  • Heatmaps: Heatmaps are particularly useful for visualizing large datasets and identifying patterns through color-coding. They can reveal correlations, concentrations, and outliers.
  • Scatter Plots: Scatter plots help visualize relationships between two variables. They are especially useful for identifying trends, clusters, or outliers.
  • Dashboards: Create interactive dashboards that allow users to explore data and customize views. Dashboards are valuable for ongoing analysis and reporting.
  • Infographics: For presentations and reports, consider using infographics to summarize key findings in a visually engaging format.

Effective data visualization not only enhances understanding but also aids in decision-making by providing clear insights at a glance.

3. Establish Clear Comparative Frameworks

A well-structured comparative framework provides a systematic approach to your analysis. It ensures consistency and enables you to make meaningful comparisons. Here's how to create one:

  • Comparison Matrices: Consider using matrices or spreadsheets to organize your data. Each row represents an option or entity, and each column corresponds to a variable or metric. This matrix format allows for side-by-side comparisons.
  • Decision Trees: In complex decision-making scenarios, decision trees help map out possible outcomes based on different criteria and variables. They visualize the decision-making process.
  • Scenario Analysis: Explore different scenarios by altering variables or criteria to understand how changes impact outcomes. Scenario analysis is valuable for risk assessment and planning.
  • Checklists: Develop checklists or scoring sheets to systematically evaluate each option against predefined criteria. Checklists ensure that no essential factors are overlooked.

A well-structured comparative framework simplifies the analysis process, making it easier to draw meaningful conclusions and make informed decisions.

4. Evaluate and Score Criteria

Evaluating and scoring criteria is a critical step in comparative analysis, as it quantifies the performance of each option against the chosen criteria.

  • Scoring System: Define a scoring system that assigns values to each criterion for every option. Common scoring systems include numerical scales, percentage scores, or qualitative ratings (e.g., high, medium, low).
  • Consistency: Ensure consistency in scoring by defining clear guidelines for each score. Provide examples or descriptions to help evaluators understand what each score represents.
  • Data Collection: Collect data or information relevant to each criterion for all options. This may involve quantitative data (e.g., sales figures) or qualitative data (e.g., customer feedback).
  • Aggregation: Aggregate the scores for each option to obtain an overall evaluation. This can be done by summing the individual criterion scores or applying weighted averages.
  • Normalization: If your criteria have different measurement scales or units, consider normalizing the scores to create a level playing field for comparison.

5. Assign Importance to Criteria

Not all criteria are equally important in a comparative analysis. Weighting criteria allows you to reflect their relative significance in the final decision-making process.

  • Relative Importance: Assess the importance of each criterion in achieving your objectives. Criteria directly aligned with your goals may receive higher weights.
  • Weighting Methods: Choose a weighting method that suits your analysis. Common methods include expert judgment, analytic hierarchy process (AHP), or data-driven approaches based on historical performance.
  • Impact Analysis: Consider how changes in the weights assigned to criteria would affect the final outcome. This sensitivity analysis helps you understand the robustness of your decisions.
  • Stakeholder Input: Involve relevant stakeholders or decision-makers in the weighting process. Their input can provide valuable insights and ensure alignment with organizational goals.
  • Transparency: Clearly document the rationale behind the assigned weights to maintain transparency in your analysis.

By weighting criteria, you ensure that the most critical factors have a more significant influence on the final evaluation, aligning the analysis more closely with your objectives and priorities.

With these steps in place, you're well-prepared to conduct a comprehensive comparative analysis. The next phase involves interpreting your findings, drawing conclusions, and making informed decisions based on the insights you've gained.

Comparative Analysis Interpretation

Interpreting the results of your comparative analysis is a crucial phase that transforms data into actionable insights. We'll delve into various aspects of interpretation and how to make sense of your findings.

  • Contextual Understanding: Before diving into the data, consider the broader context of your analysis. Understand the industry trends, market conditions, and any external factors that may have influenced your results.
  • Drawing Conclusions: Summarize your findings clearly and concisely. Identify trends, patterns, and significant differences among the options or variables you've compared.
  • Quantitative vs. Qualitative Analysis: Depending on the nature of your data and analysis, you may need to balance both quantitative and qualitative interpretations. Qualitative insights can provide context and nuance to quantitative findings.
  • Comparative Visualization: Visual aids such as charts, graphs, and tables can help convey your conclusions effectively. Choose visual representations that align with the nature of your data and the key points you want to emphasize.
  • Outliers and Anomalies: Identify and explain any outliers or anomalies in your data. Understanding these exceptions can provide valuable insights into unusual cases or factors affecting your analysis.
  • Cross-Validation: Validate your conclusions by comparing them with external benchmarks, industry standards, or expert opinions. Cross-validation helps ensure the reliability of your findings.
  • Implications for Decision-Making: Discuss how your analysis informs decision-making. Clearly articulate the practical implications of your findings and their relevance to your initial objectives.
  • Actionable Insights: Emphasize actionable insights that can guide future strategies, policies, or actions. Make recommendations based on your analysis, highlighting the steps needed to capitalize on strengths or address weaknesses.
  • Continuous Improvement: Encourage a culture of continuous improvement by using your analysis as a feedback mechanism. Suggest ways to monitor and adapt strategies over time based on evolving circumstances.

Comparative Analysis Applications

Comparative analysis is a versatile methodology that finds application in various fields and scenarios. Let's explore some of the most common and impactful applications.

Business Decision-Making

Comparative analysis is widely employed in business to inform strategic decisions and drive success. Key applications include:

Market Research and Competitive Analysis

  • Objective: To assess market opportunities and evaluate competitors.
  • Methods: Analyzing market trends, customer preferences, competitor strengths and weaknesses, and market share.
  • Outcome: Informed product development, pricing strategies, and market entry decisions.

Product Comparison and Benchmarking

  • Objective: To compare the performance and features of products or services.
  • Methods: Evaluating product specifications, customer reviews, and pricing.
  • Outcome: Identifying strengths and weaknesses, improving product quality, and setting competitive pricing.

Financial Analysis

  • Objective: To evaluate financial performance and make investment decisions.
  • Methods: Comparing financial statements, ratios, and performance indicators of companies.
  • Outcome: Informed investment choices, risk assessment, and portfolio management.

Healthcare and Medical Research

In the healthcare and medical research fields, comparative analysis is instrumental in understanding diseases, treatment options, and healthcare systems.

Clinical Trials and Drug Development

  • Objective: To compare the effectiveness of different treatments or drugs.
  • Methods: Analyzing clinical trial data, patient outcomes, and side effects.
  • Outcome: Informed decisions about drug approvals, treatment protocols, and patient care.

Health Outcomes Research

  • Objective: To assess the impact of healthcare interventions.
  • Methods: Comparing patient health outcomes before and after treatment or between different treatment approaches.
  • Outcome: Improved healthcare guidelines, cost-effectiveness analysis, and patient care plans.

Healthcare Systems Evaluation

  • Objective: To assess the performance of healthcare systems.
  • Methods: Comparing healthcare delivery models, patient satisfaction, and healthcare costs.
  • Outcome: Informed healthcare policy decisions, resource allocation, and system improvements.

Social Sciences and Policy Analysis

Comparative analysis is a fundamental tool in social sciences and policy analysis, aiding in understanding complex societal issues.

Educational Research

  • Objective: To compare educational systems and practices.
  • Methods: Analyzing student performance, curriculum effectiveness, and teaching methods.
  • Outcome: Informed educational policies, curriculum development, and school improvement strategies.

Political Science

  • Objective: To study political systems, elections, and governance.
  • Methods: Comparing election outcomes, policy impacts, and government structures.
  • Outcome: Insights into political behavior, policy effectiveness, and governance reforms.

Social Welfare and Poverty Analysis

  • Objective: To evaluate the impact of social programs and policies.
  • Methods: Comparing the well-being of individuals or communities with and without access to social assistance.
  • Outcome: Informed policymaking, poverty reduction strategies, and social program improvements.

Environmental Science and Sustainability

Comparative analysis plays a pivotal role in understanding environmental issues and promoting sustainability.

Environmental Impact Assessment

  • Objective: To assess the environmental consequences of projects or policies.
  • Methods: Comparing ecological data, resource use, and pollution levels.
  • Outcome: Informed environmental mitigation strategies, sustainable development plans, and regulatory decisions.

Climate Change Analysis

  • Objective: To study climate patterns and their impacts.
  • Methods: Comparing historical climate data, temperature trends, and greenhouse gas emissions.
  • Outcome: Insights into climate change causes, adaptation strategies, and policy recommendations.

Ecosystem Health Assessment

  • Objective: To evaluate the health and resilience of ecosystems.
  • Methods: Comparing biodiversity, habitat conditions, and ecosystem services.
  • Outcome: Conservation efforts, restoration plans, and ecological sustainability measures.

Technology and Innovation

Comparative analysis is crucial in the fast-paced world of technology and innovation.

Product Development and Innovation

  • Objective: To assess the competitiveness and innovation potential of products or technologies.
  • Methods: Comparing research and development investments, technology features, and market demand.
  • Outcome: Informed innovation strategies, product roadmaps, and patent decisions.

User Experience and Usability Testing

  • Objective: To evaluate the user-friendliness of software applications or digital products.
  • Methods: Comparing user feedback, usability metrics, and user interface designs.
  • Outcome: Improved user experiences, interface redesigns, and product enhancements.

Technology Adoption and Market Entry

  • Objective: To analyze market readiness and risks for new technologies.
  • Methods: Comparing market conditions, regulatory landscapes, and potential barriers.
  • Outcome: Informed market entry strategies, risk assessments, and investment decisions.

These diverse applications of comparative analysis highlight its flexibility and importance in decision-making across various domains. Whether in business, healthcare, social sciences, environmental studies, or technology, comparative analysis empowers researchers and decision-makers to make informed choices and drive positive outcomes.

Comparative Analysis Best Practices

Successful comparative analysis relies on following best practices and avoiding common pitfalls. Implementing these practices enhances the effectiveness and reliability of your analysis.

  • Clearly Defined Objectives: Start with well-defined objectives that outline what you aim to achieve through the analysis. Clear objectives provide focus and direction.
  • Data Quality Assurance: Ensure data quality by validating, cleaning, and normalizing your data. Poor-quality data can lead to inaccurate conclusions.
  • Transparent Methodologies: Clearly explain the methodologies and techniques you've used for analysis. Transparency builds trust and allows others to assess the validity of your approach.
  • Consistent Criteria: Maintain consistency in your criteria and metrics across all options or variables. Inconsistent criteria can lead to biased results.
  • Sensitivity Analysis: Conduct sensitivity analysis by varying key parameters, such as weights or assumptions, to assess the robustness of your conclusions.
  • Stakeholder Involvement: Involve relevant stakeholders throughout the analysis process. Their input can provide valuable perspectives and ensure alignment with organizational goals.
  • Critical Evaluation of Assumptions: Identify and critically evaluate any assumptions made during the analysis. Assumptions should be explicit and justifiable.
  • Holistic View: Take a holistic view of the analysis by considering both short-term and long-term implications. Avoid focusing solely on immediate outcomes.
  • Documentation: Maintain thorough documentation of your analysis, including data sources, calculations, and decision criteria. Documentation supports transparency and facilitates reproducibility.
  • Continuous Learning: Stay updated with the latest analytical techniques, tools, and industry trends. Continuous learning helps you adapt your analysis to changing circumstances.
  • Peer Review: Seek peer review or expert feedback on your analysis. External perspectives can identify blind spots and enhance the quality of your work.
  • Ethical Considerations: Address ethical considerations, such as privacy and data protection, especially when dealing with sensitive or personal data.

By adhering to these best practices, you'll not only improve the rigor of your comparative analysis but also ensure that your findings are reliable, actionable, and aligned with your objectives.

Comparative Analysis Examples

To illustrate the practical application and benefits of comparative analysis, let's explore several real-world examples across different domains. These examples showcase how organizations and researchers leverage comparative analysis to make informed decisions, solve complex problems, and drive improvements:

Retail Industry - Price Competitiveness Analysis

Objective: A retail chain aims to assess its price competitiveness against competitors in the same market.


  • Collect pricing data for a range of products offered by the retail chain and its competitors.
  • Organize the data into a comparative framework, categorizing products by type and price range.
  • Calculate price differentials, averages, and percentiles for each product category.
  • Analyze the findings to identify areas where the retail chain's prices are higher or lower than competitors.

Outcome: The analysis reveals that the retail chain's prices are consistently lower in certain product categories but higher in others. This insight informs pricing strategies, allowing the retailer to adjust prices to remain competitive in the market.

Healthcare - Comparative Effectiveness Research

Objective: Researchers aim to compare the effectiveness of two different treatment methods for a specific medical condition.

  • Recruit patients with the medical condition and randomly assign them to two treatment groups.
  • Collect data on treatment outcomes, including symptom relief, side effects, and recovery times.
  • Analyze the data using statistical methods to compare the treatment groups.
  • Consider factors like patient demographics and baseline health status as potential confounding variables.

Outcome: The comparative analysis reveals that one treatment method is statistically more effective than the other in relieving symptoms and has fewer side effects. This information guides medical professionals in recommending the more effective treatment to patients.

Environmental Science - Carbon Emission Analysis

Objective: An environmental organization seeks to compare carbon emissions from various transportation modes in a metropolitan area.

  • Collect data on the number of vehicles, their types (e.g., cars, buses, bicycles), and fuel consumption for each mode of transportation.
  • Calculate the total carbon emissions for each mode based on fuel consumption and emission factors.
  • Create visualizations such as bar charts and pie charts to represent the emissions from each transportation mode.
  • Consider factors like travel distance, occupancy rates, and the availability of alternative fuels.

Outcome: The comparative analysis reveals that public transportation generates significantly lower carbon emissions per passenger mile compared to individual car travel. This information supports advocacy for increased public transit usage to reduce carbon footprint.

Technology Industry - Feature Comparison for Software Development Tools

Objective: A software development team needs to choose the most suitable development tool for an upcoming project.

  • Create a list of essential features and capabilities required for the project.
  • Research and compile information on available development tools in the market.
  • Develop a comparative matrix or scoring system to evaluate each tool's features against the project requirements.
  • Assign weights to features based on their importance to the project.

Outcome: The comparative analysis highlights that Tool A excels in essential features critical to the project, such as version control integration and debugging capabilities. The development team selects Tool A as the preferred choice for the project.

Educational Research - Comparative Study of Teaching Methods

Objective: A school district aims to improve student performance by comparing the effectiveness of traditional classroom teaching with online learning.

  • Randomly assign students to two groups: one taught using traditional methods and the other through online courses.
  • Administer pre- and post-course assessments to measure knowledge gain.
  • Collect feedback from students and teachers on the learning experiences.
  • Analyze assessment scores and feedback to compare the effectiveness and satisfaction levels of both teaching methods.

Outcome: The comparative analysis reveals that online learning leads to similar knowledge gains as traditional classroom teaching. However, students report higher satisfaction and flexibility with the online approach. The school district considers incorporating online elements into its curriculum.

These examples illustrate the diverse applications of comparative analysis across industries and research domains. Whether optimizing pricing strategies in retail, evaluating treatment effectiveness in healthcare, assessing environmental impacts, choosing the right software tool, or improving educational methods, comparative analysis empowers decision-makers with valuable insights for informed choices and positive outcomes.

Conclusion for Comparative Analysis

Comparative analysis is your compass in the world of decision-making. It helps you see the bigger picture, spot opportunities, and navigate challenges. By defining your objectives, gathering data, applying methodologies, and following best practices, you can harness the power of Comparative Analysis to make informed choices and drive positive outcomes.

Remember, Comparative analysis is not just a tool; it's a mindset that empowers you to transform data into insights and uncertainty into clarity. So, whether you're steering a business, conducting research, or facing life's choices, embrace Comparative Analysis as your trusted guide on the journey to better decisions. With it, you can chart your course, make impactful choices, and set sail toward success.

  • Published: 07 May 2021

The use of Qualitative Comparative Analysis (QCA) to address causality in complex systems: a systematic review of research on public health interventions

  • Benjamin Hanckel 1 ,
  • Mark Petticrew 2 ,
  • James Thomas 3 &
  • Judith Green 4  

BMC Public Health volume  21 , Article number:  877 ( 2021 ) Cite this article

22k Accesses

42 Citations

34 Altmetric

Metrics details

Qualitative Comparative Analysis (QCA) is a method for identifying the configurations of conditions that lead to specific outcomes. Given its potential for providing evidence of causality in complex systems, QCA is increasingly used in evaluative research to examine the uptake or impacts of public health interventions. We map this emerging field, assessing the strengths and weaknesses of QCA approaches identified in published studies, and identify implications for future research and reporting.

PubMed, Scopus and Web of Science were systematically searched for peer-reviewed studies published in English up to December 2019 that had used QCA methods to identify the conditions associated with the uptake and/or effectiveness of interventions for public health. Data relating to the interventions studied (settings/level of intervention/populations), methods (type of QCA, case level, source of data, other methods used) and reported strengths and weaknesses of QCA were extracted and synthesised narratively.

The search identified 1384 papers, of which 27 (describing 26 studies) met the inclusion criteria. Interventions evaluated ranged across: nutrition/obesity ( n  = 8); physical activity ( n  = 4); health inequalities ( n  = 3); mental health ( n  = 2); community engagement ( n  = 3); chronic condition management ( n  = 3); vaccine adoption or implementation ( n  = 2); programme implementation ( n  = 3); breastfeeding ( n  = 2), and general population health ( n  = 1). The majority of studies ( n  = 24) were of interventions solely or predominantly in high income countries. Key strengths reported were that QCA provides a method for addressing causal complexity; and that it provides a systematic approach for understanding the mechanisms at work in implementation across contexts. Weaknesses reported related to data availability limitations, especially on ineffective interventions. The majority of papers demonstrated good knowledge of cases, and justification of case selection, but other criteria of methodological quality were less comprehensively met.

QCA is a promising approach for addressing the role of context in complex interventions, and for identifying causal configurations of conditions that predict implementation and/or outcomes when there is sufficiently detailed understanding of a series of comparable cases. As the use of QCA in evaluative health research increases, there may be a need to develop advice for public health researchers and journals on minimum criteria for quality and reporting.

Peer Review reports

Interest in the use of Qualitative Comparative Analysis (QCA) arises in part from growing recognition of the need to broaden methodological capacity to address causality in complex systems [ 1 , 2 , 3 ]. Guidance for researchers for evaluating complex interventions suggests process evaluations [ 4 , 5 ] can provide evidence on the mechanisms of change, and the ways in which context affects outcomes. However, this does not address the more fundamental problems with trial and quasi-experimental designs arising from system complexity [ 6 ]. As Byrne notes, the key characteristic of complex systems is ‘emergence’ [ 7 ]: that is, effects may accrue from combinations of components, in contingent ways, which cannot be reduced to any one level. Asking about ‘what works’ in complex systems is not to ask a simple question about whether an intervention has particular effects, but rather to ask: “how the intervention works in relation to all existing components of the system and to other systems and their sub-systems that intersect with the system of interest” [ 7 ]. Public health interventions are typically attempts to effect change in systems that are themselves dynamic; approaches to evaluation are needed that can deal with emergence [ 8 ]. In short, understanding the uptake and impact of interventions requires methods that can account for the complex interplay of intervention conditions and system contexts.

To build a useful evidence base for public health, evaluations thus need to assess not just whether a particular intervention (or component) causes specific change in one variable, in controlled circumstances, but whether those interventions shift systems, and how specific conditions of interventions and setting contexts interact to lead to anticipated outcomes. There have been a number of calls for the development of methods in intervention research to address these issues of complex causation [ 9 , 10 , 11 ], including calls for the greater use of case studies to provide evidence on the important elements of context [ 12 , 13 ]. One approach for addressing causality in complex systems is Qualitative Comparative Analysis (QCA): a systematic way of comparing the outcomes of different combinations of system components and elements of context (‘conditions’) across a series of cases.

The potential of qualitative comparative analysis

QCA is an approach developed by Charles Ragin [ 14 , 15 ], originating in comparative politics and macrosociology to address questions of comparative historical development. Using set theory, QCA methods explore the relationships between ‘conditions’ and ‘outcomes’ by identifying configurations of necessary and sufficient conditions for an outcome. The underlying logic is different from probabilistic reasoning, as the causal relationships identified are not inferred from the (statistical) likelihood of them being found by chance, but rather from comparing sets of conditions and their relationship to outcomes. It is thus more akin to the generative conceptualisations of causality in realist evaluation approaches [ 16 ]. QCA is a non-additive and non-linear method that emphasises diversity, acknowledging that different paths can lead to the same outcome. For evaluative research in complex systems [ 17 ], QCA therefore offers a number of benefits, including: that QCA can identify more than one causal pathway to an outcome (equifinality); that it accounts for conjectural causation (where the presence or absence of conditions in relation to other conditions might be key); and that it is asymmetric with respect to the success or failure of outcomes. That is, that specific factors explain success does not imply that their absence leads to failure (causal asymmetry).

QCA was designed, and is typically used, to compare data from a medium N (10–50) series of cases that include those with and those without the (dichotomised) outcome. Conditions can be dichotomised in ‘crisp sets’ (csQCA) or represented in ‘fuzzy sets’ (fsQCA), where set membership is calibrated (either continuously or with cut offs) between two extremes representing fully in (1) or fully out (0) of the set. A third version, multi-value QCA (mvQCA), infrequently used, represents conditions as ‘multi-value sets’, with multinomial membership [ 18 ]. In calibrating set membership, the researcher specifies the critical qualitative anchors that capture differences in kind (full membership and full non-membership), as well as differences in degree in fuzzy sets (partial membership) [ 15 , 19 ]. Data on outcomes and conditions can come from primary or secondary qualitative and/or quantitative sources. Once data are assembled and coded, truth tables are constructed which “list the logically possible combinations of causal conditions” [ 15 ], collating the number of cases where those configurations occur to see if they share the same outcome. Analysis of these truth tables assesses first whether any conditions are individually necessary or sufficient to predict the outcome, and then whether any configurations of conditions are necessary or sufficient. Necessary conditions are assessed by examining causal conditions shared by cases with the same outcome, whilst identifying sufficient conditions (or combinations of conditions) requires examining cases with the same causal conditions to identify if they have the same outcome [ 15 ]. However, as Legewie argues, the presence of a condition, or a combination of conditions in actual datasets, are likely to be “‘quasi-necessary’ or ‘quasi-sufficient’ in that the causal relation holds in a great majority of cases, but some cases deviate from this pattern” [ 20 ]. Following reduction of the complexity of the model, the final model is tested for coverage (the degree to which a configuration accounts for instances of an outcome in the empirical cases; the proportion of cases belonging to a particular configuration) and consistency (the degree to which the cases sharing a combination of conditions align with a proposed subset relation). The result is an analysis of complex causation, “defined as a situation in which an outcome may follow from several different combinations of causal conditions” [ 15 ] illuminating the ‘causal recipes’, the causally relevant conditions or configuration of conditions that produce the outcome of interest.

QCA, then, has promise for addressing questions of complex causation, and recent calls for the greater use of QCA methods have come from a range of fields related to public health, including health research [ 17 ], studies of social interventions [ 7 ], and policy evaluation [ 21 , 22 ]. In making arguments for the use of QCA across these fields, researchers have also indicated some of the considerations that must be taken into account to ensure robust and credible analyses. There is a need, for instance, to ensure that ‘contradictions’, where cases with the same configurations show different outcomes, are resolved and reported [ 15 , 23 , 24 ]. Additionally, researchers must consider the ratio of cases to conditions, and limit the number of conditions to cases to ensure the validity of models [ 25 ]. Marx and Dusa, examining crisp set QCA, have provided some guidance to the ‘ceiling’ number of conditions which can be included relative to the number of cases to increase the probability of models being valid (that is, with a low probability of being generated through random data) [ 26 ].

There is now a growing body of published research in public health and related fields drawing on QCA methods. This is therefore a timely point to map the field and assess the potential of QCA as a method for contributing to the evidence base for what works in improving public health. To inform future methodological development of robust methods for addressing complexity in the evaluation of public health interventions, we undertook a systematic review to map existing evidence, identify gaps in, and strengths and weakness of, the QCA literature to date, and identify the implications of these for conducting and reporting future QCA studies for public health evaluation. We aimed to address the following specific questions [ 27 ]:

1. How is QCA used for public health evaluation? What populations, settings, methods used in source case studies, unit/s and level of analysis (‘cases’), and ‘conditions’ have been included in QCA studies?

2. What strengths and weaknesses have been identified by researchers who have used QCA to understand complex causation in public health evaluation research?

3. What are the existing gaps in, and strengths and weakness of, the QCA literature in public health evaluation, and what implications do these have for future research and reporting of QCA studies for public health?

This systematic review was registered with the International Prospective Register of Systematic Reviews (PROSPERO) on 29 April 2019 ( CRD42019131910 ). A protocol was prepared in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocols (PRISMA-P) 2015 statement [ 28 ], and published in 2019 [ 27 ], where the methods are explained in detail. EPPI-Reviewer 4 was used to manage the process and undertake screening of abstracts [ 29 ].

Search strategy

We searched for peer-reviewed published papers in English, which used QCA methods to examine causal complexity in evaluating the implementation, uptake and/or effects of a public health intervention, in any region of the world, for any population. ‘Public health interventions’ were defined as those which aim to promote or protect health, or prevent ill health, in the population. No date exclusions were made, and papers published up to December 2019 were included.

Search strategies used the following phrases “Qualitative Comparative Analysis” and “QCA”, which were combined with the keywords “health”, “public health”, “intervention”, and “wellbeing”. See Additional file  1 for an example. Searches were undertaken on the following databases: PubMed, Web of Science, and Scopus. Additional searches were undertaken on Microsoft Academic and Google Scholar in December 2019, where the first pages of results were checked for studies that may have been missed in the initial search. No additional studies were identified. The list of included studies was sent to experts in QCA methods in health and related fields, including authors of included studies and/or those who had published on QCA methodology. This generated no additional studies within scope, but a suggestion to check the COMPASSS (Comparative Methods for Systematic Cross-Case Analysis) database; this was searched, identifying one further study that met the inclusion criteria [ 30 ]. COMPASSS ( https://compasss.org/ ) collates publications of studies using comparative case analysis.

We excluded studies where no intervention was evaluated, which included studies that used QCA to examine public health infrastructure (i.e. staff training) without a specific health outcome, and papers that report on prevalence of health issues (i.e. prevalence of child mortality). We also excluded studies of health systems or services interventions where there was no public health outcome.

After retrieval, and removal of duplicates, titles and abstracts were screened by one of two authors (BH or JG). Double screening of all records was assisted by EPPI Reviewer 4’s machine learning function. Of the 1384 papers identified after duplicates were removed, we excluded 820 after review of titles and abstracts (Fig.  1 ). The excluded studies included: a large number of papers relating to ‘quantitative coronary angioplasty’ and some which referred to the Queensland Criminal Code (both of which are also abbreviated to ‘QCA’); papers that reported methodological issues but not empirical studies; protocols; and papers that used the phrase ‘qualitative comparative analysis’ to refer to qualitative studies that compared different sub-populations or cases within the study, but did not include formal QCA methods.

figure 1

Flow Diagram

Full texts of the 51 remaining studies were screened by BH and JG for inclusion, with 10 papers double coded by both authors, with complete agreement. Uncertain inclusions were checked by the third author (MP). Of the full texts, 24 were excluded because: they did not report a public health intervention ( n  = 18); had used a methodology inspired by QCA, but had not undertaken a QCA ( n  = 2); were protocols or methodological papers only ( n  = 2); or were not published in peer-reviewed journals ( n  = 2) (see Fig.  1 ).

Data were extracted manually from the 27 remaining full texts by BH and JG. Two papers relating to the same research question and dataset were combined, such that analysis was by study ( n  = 26) not by paper. We retrieved data relating to: publication (journal, first author country affiliation, funding reported); the study setting (country/region setting, population targeted by the intervention(s)); intervention(s) studied; methods (aims, rationale for using QCA, crisp or fuzzy set QCA, other analysis methods used); data sources drawn on for cases (source [primary data, secondary data, published analyses], qualitative/quantitative data, level of analysis, number of cases, final causal conditions included in the analysis); outcome explained; and claims made about strengths and weaknesses of using QCA (see Table  1 ). Data were synthesised narratively, using thematic synthesis methods [ 31 , 32 ], with interventions categorised by public health domain and level of intervention.

Quality assessment

There are no reporting guidelines for QCA studies in public health, but there are a number of discussions of best practice in the methodological literature [ 25 , 26 , 33 , 34 ]. These discussions suggest several criteria for strengthening QCA methods that we used as indicators of methodological and/or reporting quality: evidence of familiarity of cases; justification for selection of cases; discussion and justification of set membership score calibration; reporting of truth tables; reporting and justification of solution formula; and reporting of consistency and coverage measures. For studies using csQCA, and claiming an explanatory analysis, we additionally identified whether the number of cases was sufficient for the number of conditions included in the model, using a pragmatic cut-off in line with Marx & Dusa’s guideline thresholds, which indicate how many cases are sufficient for given numbers of conditions to reject a 10% probability that models could be generated with random data [ 26 ].

Overview of scope of QCA research in public health

Twenty-seven papers reporting 26 studies were included in the review (Table  1 ). The earliest was published in 2005, and 17 were published after 2015. The majority ( n  = 19) were published in public health/health promotion journals, with the remainder published in other health science ( n  = 3) or in social science/management journals ( n  = 4). The public health domain(s) addressed by each study were broadly coded by the main area of focus. They included nutrition/obesity ( n  = 8); physical activity (PA) (n = 4); health inequalities ( n  = 3); mental health ( n  = 2); community engagement ( n  = 3); chronic condition management ( n  = 3); vaccine adoption or implementation (n = 2); programme implementation ( n  = 3); breastfeeding ( n  = 2); or general population health ( n  = 1). The majority ( n  = 24) of studies were conducted solely or predominantly in high-income countries (systematic reviews in general searched global sources, but commented that the overwhelming majority of studies were from high-income countries). Country settings included: any ( n  = 6); OECD countries ( n  = 3); USA ( n  = 6); UK ( n  = 6) and one each from Nepal, Austria, Belgium, Netherlands and Africa. These largely reflected the first author’s country affiliations in the UK ( n  = 13); USA ( n  = 9); and one each from South Africa, Austria, Belgium, and the Netherlands. All three studies primarily addressing health inequalities [ 35 , 36 , 37 ] were from the UK.

Eight of the interventions evaluated were individual-level behaviour change interventions (e.g. weight management interventions, case management, self-management for chronic conditions); eight evaluated policy/funding interventions; five explored settings-based health promotion/behaviour change interventions (e.g. schools-based physical activity intervention, store-based food choice interventions); three evaluated community empowerment/engagement interventions, and two studies evaluated networks and their impact on health outcomes.

Methods and data sets used

Fifteen studies used crisp sets (csQCA), 11 used fuzzy sets (fsQCA). No study used mvQCA. Eleven studies included additional analyses of the datasets drawn on for the QCA, including six that used qualitative approaches (narrative synthesis, case comparisons), typically to identify cases or conditions for populating the QCA; and four reporting additional statistical analyses (meta-regression, linear regression) to either identify differences overall between cases prior to conducting a QCA (e.g. [ 38 ]) or to explore correlations in more detail (e.g. [ 39 ]). One study used an additional Boolean configurational technique to reduce the number of conditions in the QCA analysis [ 40 ]. No studies reported aiming to compare the findings from the QCA with those from other techniques for evaluating the uptake or effectiveness of interventions, although some [ 41 , 42 ] were explicitly using the study to showcase the possibilities of QCA compared with other approaches in general. Twelve studies drew on primary data collected specifically for the study, with five of those additionally drawing on secondary data sets; five drew only on secondary data sets, and nine used data from systematic reviews of published research. Seven studies drew primarily on qualitative data, generally derived from interviews or observations.

Many studies were undertaken in the context of one or more trials, which provided evidence of effect. Within single trials, this was generally for a process evaluation, with cases being trial sites. Fernald et al’s study, for instance, was in the context of a trial of a programme to support primary care teams in identifying and implementing self-management support tools for their patients, which measured patient and health care provider level outcomes [ 43 ]. The QCA reported here used qualitative data from the trial to identify a set of necessary conditions for health care provider practices to implement the tools successfully. In studies drawing on data from systematic reviews, cases were always at the level of intervention or intervention component, with data included from multiple trials. Harris et al., for instance, undertook a mixed-methods systematic review of school-based self-management interventions for asthma, using meta-analysis methods to identify effective interventions and QCA methods to identify which intervention features were aligned with success [ 44 ].

The largest number of studies ( n  = 10), including all the systematic reviews, analysed cases at the level of the intervention, or a component of the intervention; seven analysed organisational level cases (e.g. school class, network, primary care practice); five analysed sub-national region level cases (e.g. state, local authority area), and two each analysed country or individual level cases. Sample sizes ranged from 10 to 131, with no study having small N (< 10) sample sizes, four having large N (> 50) sample sizes, and the majority (22) being medium N studies (in the range 10–50).

Rationale for using QCA

Most papers reported a rationale for using QCA that mentioned ‘complexity’ or ‘context’, including: noting that QCA is appropriate for addressing causal complexity or multiple pathways to outcome [ 37 , 43 , 45 , 46 , 47 , 48 , 49 , 50 , 51 ]; noting the appropriateness of the method for providing evidence on how context impacts on interventions [ 41 , 50 ]; or the need for a method that addressed causal asymmetry [ 52 ]. Three stated that the QCA was an ‘exploratory’ analysis [ 53 , 54 , 55 ]. In addition to the empirical aims, several papers (e.g. [ 42 , 48 ]) sought to demonstrate the utility of QCA, or to develop QCA methods for health research (e.g. [ 47 ]).

Reported strengths and weaknesses of approach

There was a general agreement about the strengths of QCA. Specifically, that it was a useful tool to address complex causality, providing a systematic approach to understand the mechanisms at work in implementation across contexts [ 38 , 39 , 43 , 45 , 46 , 47 , 55 , 56 , 57 ], particularly as they relate to (in) effective intervention implementation [ 44 , 51 ] and the evaluation of interventions [ 58 ], or “where it is not possible to identify linearity between variables of interest and outcomes” [ 49 ]. Authors highlighted the strengths of QCA as providing possibilities for examining complex policy problems [ 37 , 59 ]; for testing existing as well as new theory [ 52 ]; and for identifying aspects of interventions which had not been previously perceived as critical [ 41 ] or which may have been missed when drawing on statistical methods that use, for instance, linear additive models [ 42 ]. The strengths of QCA in terms of providing useful evidence for policy were flagged in a number of studies, particularly where the causal recipes suggested that conventional assumptions about effectiveness were not confirmed. Blackman et al., for instance, in a series of studies exploring why unequal health outcomes had narrowed in some areas of the UK and not others, identified poorer outcomes in settings with ‘better’ contracting [ 35 , 36 , 37 ]; Harting found, contrary to theoretical assumptions about the necessary conditions for successful implementation of public health interventions, that a multisectoral network was not a necessary condition [ 30 ].

Weaknesses reported included the limitations of QCA in general for addressing complexity, as well as specific limitations with either the csQCA or the fsQCA methods employed. One general concern discussed across a number of studies was the problem of limited empirical diversity, which resulted in: limitations in the possible number of conditions included in each study, particularly with small N studies [ 58 ]; missing data on important conditions [ 43 ]; or limited reported diversity (where, for instance, data were drawn from systematic reviews, reflecting publication biases which limit reporting of ineffective interventions) [ 41 ]. Reported methodological limitations in small and intermediate N studies included concerns about the potential that case selection could bias findings [ 37 ].

In terms of potential for addressing causal complexity, the limitations of QCA for identifying unintended consequences, tipping points, and/or feedback loops in complex adaptive systems were noted [ 60 ], as were the potential limitations (especially in csQCA studies) of reducing complex conditions, drawn from detailed qualitative understanding, to binary conditions [ 35 ]. The impossibility of doing this was a rationale for using fsQCA in one study [ 57 ], where detailed knowledge of conditions is needed to make theoretically justified calibration decisions. However, others [ 47 ] make the case that csQCA provides more appropriate findings for policy: dichotomisation forces a focus on meaningful distinctions, including those related to decisions that practitioners/policy makers can action. There is, then, a potential trade-off in providing ‘interpretable results’, but ones which preclude potential for utilising more detailed information [ 45 ]. That QCA does not deal with probabilistic causation was noted [ 47 ].

Quality of published studies

Assessment of ‘familiarity with cases’ was made subjectively on the basis of study authors’ reports of their knowledge of the settings (empirical or theoretical) and the descriptions they provided in the published paper: overall, 14 were judged as sufficient, and 12 less than sufficient. Studies which included primary data were more likely to be judged as demonstrating familiarity ( n  = 10) than those drawing on secondary sources or systematic reviews, of which only two were judged as demonstrating familiarity. All studies justified how the selection of cases had been made; for those not using the full available population of cases, this was in general (appropriately) done theoretically: following previous research [ 52 ]; purposively to include a range of positive and negative outcomes [ 41 ]; or to include a diversity of cases [ 58 ]. In identifying conditions leading to effective/not effective interventions, one purposive strategy was to include a specified percentage or number of the most effective and least effective interventions (e.g. [ 36 , 40 , 51 , 52 ]). Discussion of calibration of set membership scores was judged adequate in 15 cases, and inadequate in 11; 10 reported raw data matrices in the paper or supplementary material; 21 reported truth tables in the paper or supplementary material. The majority ( n  = 21) reported at least some detail on the coverage (the number of cases with a particular configuration) and consistency (the percentage of similar causal configurations which result in the same outcome). The majority ( n  = 21) included truth tables (or explicitly provided details of how to obtain them); fewer ( n  = 10) included raw data. Only five studies met all six of these quality criteria (evidence of familiarity with cases, justification of case selection, discussion of calibration, reporting truth tables, reporting raw data matrices, reporting coverage and consistency); a further six met at least five of them.

Of the csQCA studies which were not reporting an exploratory analysis, four appeared to have insufficient cases for the large number of conditions entered into at least one of the models reported, with a consequent risk to the validity of the QCA models [ 26 ].

QCA has been widely used in public health research over the last decade to advance understanding of causal inference in complex systems. In this review of published evidence to date, we have identified studies using QCA to examine the configurations of conditions that lead to particular outcomes across contexts. As noted by most study authors, QCA methods have promised advantages over probabilistic statistical techniques for examining causation where systems and/or interventions are complex, providing public health researchers with a method to test the multiple pathways (configurations of conditions), and necessary and sufficient conditions that lead to desired health outcomes.

The origins of QCA approaches are in comparative policy studies. Rihoux et al’s review of peer-reviewed journal articles using QCA methods published up to 2011 found the majority of published examples were from political science and sociology, with fewer than 5% of the 313 studies they identified coming from health sciences [ 61 ]. They also reported few examples of the method being used in policy evaluation and implementation studies [ 62 ]. In the decade since their review of the field [ 61 ], there has been an emerging body of evaluative work in health: we identified 26 studies in the field of public health alone, with the majority published in public health journals. Across these studies, QCA has been used for evaluative questions in a range of settings and public health domains to identify the conditions under which interventions are implemented and/or have evidence of effect for improving population health. All studies included a series of cases that included some with and some without the outcome of interest (such as behaviour change, successful programme implementation, or good vaccination uptake). The dominance of high-income countries in both intervention settings and author affiliations is disappointing, but reflects the disproportionate location of public health research in the global north more generally [ 63 ].

The largest single group of studies included were systematic reviews, using QCA to compare interventions (or intervention components) to identify successful (and non-successful) configurations of conditions across contexts. Here, the value of QCA lies in its potential for synthesis with quantitative meta-synthesis methods to identify the particular conditions or contexts in which interventions or components are effective. As Parrott et al. note, for instance, their meta-analysis could identify probabilistic effects of weight management programmes, and the QCA analysis enabled them to address the “role that the context of the [paediatric weight management] intervention has in influencing how, when, and for whom an intervention mix will be successful” [ 50 ]. However, using QCA to identify configurations of conditions that lead to effective or non- effective interventions across particular areas of population health is an application that does move away in some significant respects from the origins of the method. First, researchers drawing on evidence from systematic reviews for their data are reliant largely on published evidence for information on conditions (such as the organisational contexts in which interventions were implemented, or the types of behaviour change theory utilised). Although guidance for describing interventions [ 64 ] advises key aspects of context are included in reports, this may not include data on the full range of conditions that might be causally important, and review research teams may have limited knowledge of these ‘cases’ themselves. Second, less successful interventions are less likely to be published, potentially limiting the diversity of cases, particularly of cases with unsuccessful outcomes. A strength of QCA is the separate analysis of conditions leading to positive and negative outcomes: this is precluded where there is insufficient evidence on negative outcomes [ 50 ]. Third, when including a range of types of intervention, it can be unclear whether the cases included are truly comparable. A QCA study requires a high degree of theoretical and pragmatic case knowledge on the part of the researcher to calibrate conditions to qualitative anchors: it is reliant on deep understanding of complex contexts, and a familiarity with how conditions interact within and across contexts. Perhaps surprising is that only seven of the studies included here clearly drew on qualitative data, given that QCA is primarily seen as a method that requires thick, detailed knowledge of cases, particularly when the aim is to understand complex causation [ 8 ]. Whilst research teams conducting QCA in the context of systematic reviews may have detailed understanding in general of interventions within their spheres of expertise, they are unlikely to have this for the whole range of cases, particularly where a diverse set of contexts (countries, organisational settings) are included. Making a theoretical case for the valid comparability of such a case series is crucial. There may, then, be limitations in the portability of QCA methods for conducting studies entirely reliant on data from published evidence.

QCA was developed for small and medium N series of cases, and (as in the field more broadly, [ 61 ]), the samples in our studies predominantly had between 10 and 50 cases. However, there is increasing interest in the method as an alternative or complementary technique to regression-oriented statistical methods for larger samples [ 65 ], such as from surveys, where detailed knowledge of cases is likely to be replaced by theoretical knowledge of relationships between conditions (see [ 23 ]). The two larger N (> 100 cases) studies in our sample were an individual level analysis of survey data [ 46 , 47 ] and an analysis of intervention arms from a systematic review [ 50 ]. Larger sample sizes allow more conditions to be included in the analysis [ 23 , 26 ], although for evaluative research, where the aim is developing a causal explanation, rather than simply exploring patterns, there remains a limit to the number of conditions that can be included. As the number of conditions included increases, so too does the number of possible configurations, increasing the chance of unique combinations and of generating spurious solutions with a high level of consistency. As a rule of thumb, once the number of conditions exceeds 6–8 (with up to 50 cases) or 10 (for larger samples), the credibility of solutions may be severely compromised [ 23 ].

Strengths and weaknesses of the study

A systematic review has the potential advantages of transparency and rigour and, if not exhaustive, our search is likely to be representative of the body of research using QCA for evaluative public health research up to 2020. However, a limitation is the inevitable difficulty in operationalising a ‘public health’ intervention. Exclusions on scope are not straightforward, given that most social, environmental and political conditions impact on public health, and arguably a greater range of policy and social interventions (such as fiscal or trade policies) that have been the subject of QCA analyses could have been included, or a greater range of more clinical interventions. However, to enable a manageable number of papers to review, and restrict our focus to those papers that were most directly applicable to (and likely to be read by) those in public health policy and practice, we operationalised ‘public health interventions’ as those which were likely to be directly impacting on population health outcomes, or on behaviours (such as increased physical activity) where there was good evidence for causal relationships with public health outcomes, and where the primary research question of the study examined the conditions leading to those outcomes. This review has, of necessity, therefore excluded a considerable body of evidence likely to be useful for public health practice in terms of planning interventions, such as studies on how to better target smoking cessation [ 66 ] or foster social networks [ 67 ] where the primary research question was on conditions leading to these outcomes, rather than on conditions for outcomes of specific interventions. Similarly, there are growing number of descriptive epidemiological studies using QCA to explore factors predicting outcomes across such diverse areas as lupus and quality of life [ 68 ]; length of hospital stay [ 69 ]; constellations of factors predicting injury [ 70 ]; or the role of austerity, crisis and recession in predicting public health outcomes [ 71 ]. Whilst there is undoubtedly useful information to be derived from studying the conditions that lead to particular public health problems, these studies were not directly evaluating interventions, so they were also excluded.

Restricting our search to publications in English and to peer reviewed publications may have missed bodies of work from many regions, and has excluded research from non-governmental organisations using QCA methods in evaluation. As this is a rapidly evolving field, with relatively recent uptake in public health (all our included studies were after 2005), our studies may not reflect the most recent advances in the area.

Implications for conducting and reporting QCA studies

This systematic review has reviewed studies that deployed an emergent methodology, which has no reporting guidelines and has had, to date, a relatively low level of awareness among many potential evidence users in public health. For this reason, many of the studies reviewed were relatively detailed on the methods used, and the rationale for utilising QCA.

We did not assess quality directly, but used indicators of good practice discussed in QCA methodological literature, largely written for policy studies scholars, and often post-dating the publication dates of studies included in this review. It is also worth noting that, given the relatively recent development of QCA methods, methodological debate is still thriving on issues such as the reliability of causal inferences [ 72 ], alongside more general critiques of the usefulness of the method for policy decisions (see, for instance, [ 73 ]). The authors of studies included in this review also commented directly on methodological development: for instance, Thomas et al. suggests that QCA may benefit from methods development for sensitivity analyses around calibration decisions [ 42 ].

However, we selected quality criteria that, we argue, are relevant for public health research> Justifying the selection of cases, discussing and justifying the calibration of set membership, making data sets available, and reporting truth tables, consistency and coverage are all good practice in line with the usual requirements of transparency and credibility in methods. When QCA studies aim to provide explanation of outcomes (rather than exploring configurations), it is also vital that they are reported in ways that enhance the credibility of claims made, including justifying the number of conditions included relative to cases. Few of the studies published to date met all these criteria, at least in the papers included here (although additional material may have been provided in other publications). To improve the future discoverability and uptake up of QCA methods in public health, and to strengthen the credibility of findings from these methods, we therefore suggest the following criteria should be considered by authors and reviewers for reporting QCA studies which aim to provide causal evidence about the configurations of conditions that lead to implementation or outcomes:

The paper title and abstract state the QCA design;

The sampling unit for the ‘case’ is clearly defined (e.g.: patient, specified geographical population, ward, hospital, network, policy, country);

The population from which the cases have been selected is defined (e.g.: all patients in a country with X condition, districts in X country, tertiary hospitals, all hospitals in X country, all health promotion networks in X province, European policies on smoking in outdoor places, OECD countries);

The rationale for selection of cases from the population is justified (e.g.: whole population, random selection, purposive sample);

There are sufficient cases to provide credible coverage across the number of conditions included in the model, and the rationale for the number of conditions included is stated;

Cases are comparable;

There is a clear justification for how choices of relevant conditions (or ‘aspects of context’) have been made;

There is sufficient transparency for replicability: in line with open science expectations, datasets should be available where possible; truth tables should be reported in publications, and reports of coverage and consistency provided.

Implications for future research

In reviewing methods for evaluating natural experiments, Craig et al. focus on statistical techniques for enhancing causal inference, noting only that what they call ‘qualitative’ techniques (the cited references for these are all QCA studies) require “further studies … to establish their validity and usefulness” [ 2 ]. The studies included in this review have demonstrated that QCA is a feasible method when there are sufficient (comparable) cases for identifying configurations of conditions under which interventions are effective (or not), or are implemented (or not). Given ongoing concerns in public health about how best to evaluate interventions across complex contexts and systems, this is promising. This review has also demonstrated the value of adding QCA methods to the tool box of techniques for evaluating interventions such as public policies, health promotion programmes, and organisational changes - whether they are implemented in a randomised way or not. Many of the studies in this review have clearly generated useful evidence: whether this evidence has had more or less impact, in terms of influencing practice and policy, or is more valid, than evidence generated by other methods is not known. Validating the findings of a QCA study is perhaps as challenging as validating the findings from any other design, given the absence of any gold standard comparators. Comparisons of the findings of QCA with those from other methods are also typically constrained by the rather different research questions asked, and the different purposes of the analysis. In our review, QCA were typically used alongside other methods to address different questions, rather than to compare methods. However, as the field develops, follow up studies, which evaluate outcomes of interventions designed in line with conditions identified as causal in prior QCAs, might be useful for contributing to validation.

This review was limited to public health evaluation research: other domains that would be useful to map include health systems/services interventions and studies used to design or target interventions. There is also an opportunity to broaden the scope of the field, particularly for addressing some of the more intractable challenges for public health research. Given the limitations in the evidence base on what works to address inequalities in health, for instance [ 74 ], QCA has potential here, to help identify the conditions under which interventions do or do not exacerbate unequal outcomes, or the conditions that lead to differential uptake or impacts across sub-population groups. It is perhaps surprising that relatively few of the studies in this review included cases at the level of country or region, the traditional level for QCA studies. There may be scope for developing international comparisons for public health policy, and using QCA methods at the case level (nation, sub-national region) of classic policy studies in the field. In the light of debate around COVID-19 pandemic response effectiveness, comparative studies across jurisdictions might shed light on issues such as differential population responses to vaccine uptake or mask use, for example, and these might in turn be considered as conditions in causal configurations leading to differential morbidity or mortality outcomes.

When should be QCA be considered?

Public health evaluations typically assess the efficacy, effectiveness or cost-effectiveness of interventions and the processes and mechanisms through which they effect change. There is no perfect evaluation design for achieving these aims. As in other fields, the choice of design will in part depend on the availability of counterfactuals, the extent to which the investigator can control the intervention, and the range of potential cases and contexts [ 75 ], as well as political considerations, such as the credibility of the approach with key stakeholders [ 76 ]. There are inevitably ‘horses for courses’ [ 77 ]. The evidence from this review suggests that QCA evaluation approaches are feasible when there is a sufficient number of comparable cases with and without the outcome of interest, and when the investigators have, or can generate, sufficiently in-depth understanding of those cases to make sense of connections between conditions, and to make credible decisions about the calibration of set membership. QCA may be particularly relevant for understanding multiple causation (that is, where different configurations might lead to the same outcome), and for understanding the conditions associated with both lack of effect and effect. As a stand-alone approach, QCA might be particularly valuable for national and regional comparative studies of the impact of policies on public health outcomes. Alongside cluster randomised trials of interventions, or alongside systematic reviews, QCA approaches are especially useful for identifying core combinations of causal conditions for success and lack of success in implementation and outcome.


QCA is a relatively new approach for public health research, with promise for contributing to much-needed methodological development for addressing causation in complex systems. This review has demonstrated the large range of evaluation questions that have been addressed to date using QCA, including contributions to process evaluations of trials and for exploring the conditions leading to effectiveness (or not) in systematic reviews of interventions. There is potential for QCA to be more widely used in evaluative research, to identify the conditions under which interventions across contexts are implemented or not, and the configurations of conditions associated with effect or lack of evidence of effect. However, QCA will not be appropriate for all evaluations, and cannot be the only answer to addressing complex causality. For explanatory questions, the approach is most appropriate when there is a series of enough comparable cases with and without the outcome of interest, and where the researchers have detailed understanding of those cases, and conditions. To improve the credibility of findings from QCA for public health evidence users, we recommend that studies are reported with the usual attention to methodological transparency and data availability, with key details that allow readers to judge the credibility of causal configurations reported. If the use of QCA continues to expand, it may be useful to develop more comprehensive consensus guidelines for conduct and reporting.

Availability of data and materials

Full search strategies and extraction forms are available by request from the first author.


Comparative Methods for Systematic Cross-Case Analysis

crisp set QCA

fuzzy set QCA

multi-value QCA

Medical Research Council

  • Qualitative Comparative Analysis

randomised control trial

Physical Activity

Green J, Roberts H, Petticrew M, Steinbach R, Goodman A, Jones A, et al. Integrating quasi-experimental and inductive designs in evaluation: a case study of the impact of free bus travel on public health. Evaluation. 2015;21(4):391–406. https://doi.org/10.1177/1356389015605205 .

Article   Google Scholar  

Craig P, Katikireddi SV, Leyland A, Popham F. Natural experiments: an overview of methods, approaches, and contributions to public health intervention research. Annu Rev Public Health. 2017;38(1):39–56. https://doi.org/10.1146/annurev-publhealth-031816-044327 .

Article   PubMed   PubMed Central   Google Scholar  

Shiell A, Hawe P, Gold L. Complex interventions or complex systems? Implications for health economic evaluation. BMJ. 2008;336(7656):1281–3. https://doi.org/10.1136/bmj.39569.510521.AD .

Craig P, Dieppe P, Macintyre S, Michie S, Nazareth I, Petticrew M. Developing and evaluating complex interventions: the new Medical Research Council guidance. BMJ. 2008;337:a1655.

Moore GF, Audrey S, Barker M, Bond L, Bonell C, Hardeman W, et al. Process evaluation of complex interventions: Medical Research Council guidance. BMJ. 2015;350(mar19 6):h1258. https://doi.org/10.1136/bmj.h1258 .

Pattyn V, Álamos-Concha P, Cambré B, Rihoux B, Schalembier B. Policy effectiveness through Configurational and mechanistic lenses: lessons for concept development. J Comp Policy Anal Res Pract. 2020;0:1–18.

Google Scholar  

Byrne D. Evaluating complex social interventions in a complex world. Evaluation. 2013;19(3):217–28. https://doi.org/10.1177/1356389013495617 .

Gerrits L, Pagliarin S. Social and causal complexity in qualitative comparative analysis (QCA): strategies to account for emergence. Int J Soc Res Methodol 2020;0:1–14, doi: https://doi.org/10.1080/13645579.2020.1799636 .

Grant RL, Hood R. Complex systems, explanation and policy: implications of the crisis of replication for public health research. Crit Public Health. 2017;27(5):525–32. https://doi.org/10.1080/09581596.2017.1282603 .

Rutter H, Savona N, Glonti K, Bibby J, Cummins S, Finegood DT, et al. The need for a complex systems model of evidence for public health. Lancet. 2017;390(10112):2602–4. https://doi.org/10.1016/S0140-6736(17)31267-9 .

Article   PubMed   Google Scholar  

Greenhalgh T, Papoutsi C. Studying complexity in health services research: desperately seeking an overdue paradigm shift. BMC Med. 2018;16(1):95. https://doi.org/10.1186/s12916-018-1089-4 .

Craig P, Di Ruggiero E, Frohlich KL, Mykhalovskiy E and White M, on behalf of the Canadian Institutes of Health Research (CIHR)–National Institute for Health Research (NIHR) Context Guidance Authors Group. Taking account of context in population health intervention research: guidance for producers, users and funders of research. Southampton: NIHR Evaluation, Trials and Studies Coordinating Centre; 2018.

Paparini S, Green J, Papoutsi C, Murdoch J, Petticrew M, Greenhalgh T, et al. Case study research for better evaluations of complex interventions: rationale and challenges. BMC Med. 2020;18(1):301. https://doi.org/10.1186/s12916-020-01777-6 .

Ragin. The Comparative Method: Moving Beyond Qualitative and Quantitative Strategies. Berkeley: University of California Press; 1987.

Ragin C. Redesigning social inquiry: fuzzy sets and beyond - Charles C: Ragin - Google Books. The University of Chicago Press; 2008. https://doi.org/10.7208/chicago/9780226702797.001.0001 .

Book   Google Scholar  

Befani B, Ledermann S, Sager F. Realistic evaluation and QCA: conceptual parallels and an empirical application. Evaluation. 2007;13(2):171–92. https://doi.org/10.1177/1356389007075222 .

Kane H, Lewis MA, Williams PA, Kahwati LC. Using qualitative comparative analysis to understand and quantify translation and implementation. Transl Behav Med. 2014;4(2):201–8. https://doi.org/10.1007/s13142-014-0251-6 .

Cronqvist L, Berg-Schlosser D. Chapter 4: Multi-Value QCA (mvQCA). In: Rihoux B, Ragin C, editors. Configurational Comparative Methods: Qualitative Comparative Analysis (QCA) and Related Techniques. 2455 Teller Road, Thousand Oaks California 91320 United States: SAGE Publications, Inc.; 2009. p. 69–86. doi: https://doi.org/10.4135/9781452226569 .

Ragin CC. Using qualitative comparative analysis to study causal complexity. Health Serv Res. 1999;34(5 Pt 2):1225–39.

CAS   PubMed   PubMed Central   Google Scholar  

Legewie N. An introduction to applied data analysis with qualitative comparative analysis (QCA). Forum Qual Soc Res. 2013;14.  https://doi.org/10.17169/fqs-14.3.1961 .

Varone F, Rihoux B, Marx A. A new method for policy evaluation? In: Rihoux B, Grimm H, editors. Innovative comparative methods for policy analysis: beyond the quantitative-qualitative divide. Boston: Springer US; 2006. p. 213–36. https://doi.org/10.1007/0-387-28829-5_10 .

Chapter   Google Scholar  

Gerrits L, Verweij S. The evaluation of complex infrastructure projects: a guide to qualitative comparative analysis. Cheltenham: Edward Elgar Pub; 2018. https://doi.org/10.4337/9781783478422 .

Greckhamer T, Misangyi VF, Fiss PC. The two QCAs: from a small-N to a large-N set theoretic approach. In: Configurational Theory and Methods in Organizational Research. Emerald Group Publishing Ltd.; 2013. p. 49–75. https://pennstate.pure.elsevier.com/en/publications/the-two-qcas-from-a-small-n-to-a-large-n-set-theoretic-approach . Accessed 16 Apr 2021.

Rihoux B, Ragin CC. Configurational comparative methods: qualitative comparative analysis (QCA) and related techniques. SAGE; 2009, doi: https://doi.org/10.4135/9781452226569 .

Marx A. Crisp-set qualitative comparative analysis (csQCA) and model specification: benchmarks for future csQCA applications. Int J Mult Res Approaches. 2010;4(2):138–58. https://doi.org/10.5172/mra.2010.4.2.138 .

Marx A, Dusa A. Crisp-set qualitative comparative analysis (csQCA), contradictions and consistency benchmarks for model specification. Methodol Innov Online. 2011;6(2):103–48. https://doi.org/10.4256/mio.2010.0037 .

Hanckel B, Petticrew M, Thomas J, Green J. Protocol for a systematic review of the use of qualitative comparative analysis for evaluative questions in public health research. Syst Rev. 2019;8(1):252. https://doi.org/10.1186/s13643-019-1159-5 .

Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015;349(1):g7647. https://doi.org/10.1136/bmj.g7647 .

EPPI-Reviewer 4.0: Software for research synthesis. UK: University College London; 2010.

Harting J, Peters D, Grêaux K, van Assema P, Verweij S, Stronks K, et al. Implementing multiple intervention strategies in Dutch public health-related policy networks. Health Promot Int. 2019;34(2):193–203. https://doi.org/10.1093/heapro/dax067 .

Thomas J, Harden A. Methods for the thematic synthesis of qualitative research in systematic reviews. BMC Med Res Methodol. 2008;8(1):45. https://doi.org/10.1186/1471-2288-8-45 .

Popay J, Roberts H, Sowden A, Petticrew M, Arai L, Rodgers M, et al. Guidance on the conduct of narrative synthesis in systematic reviews: a product from the ESRC methods Programme. 2006.

Wagemann C, Schneider CQ. Qualitative comparative analysis (QCA) and fuzzy-sets: agenda for a research approach and a data analysis technique. Comp Sociol. 2010;9:376–96.

Schneider CQ, Wagemann C. Set-theoretic methods for the social sciences: a guide to qualitative comparative analysis: Cambridge University Press; 2012. https://doi.org/10.1017/CBO9781139004244 .

Blackman T, Dunstan K. Qualitative comparative analysis and health inequalities: investigating reasons for differential Progress with narrowing local gaps in mortality. J Soc Policy. 2010;39(3):359–73. https://doi.org/10.1017/S0047279409990675 .

Blackman T, Wistow J, Byrne D. A Qualitative Comparative Analysis of factors associated with trends in narrowing health inequalities in England. Soc Sci Med 1982. 2011;72:1965–74.

Blackman T, Wistow J, Byrne D. Using qualitative comparative analysis to understand complex policy problems. Evaluation. 2013;19(2):126–40. https://doi.org/10.1177/1356389013484203 .

Glatman-Freedman A, Cohen M-L, Nichols KA, Porges RF, Saludes IR, Steffens K, et al. Factors affecting the introduction of new vaccines to poor nations: a comparative study of the haemophilus influenzae type B and hepatitis B vaccines. PLoS One. 2010;5(11):e13802. https://doi.org/10.1371/journal.pone.0013802 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Ford EW, Duncan WJ, Ginter PM. Health departments’ implementation of public health’s core functions: an assessment of health impacts. Public Health. 2005;119(1):11–21. https://doi.org/10.1016/j.puhe.2004.03.002 .

Article   CAS   PubMed   Google Scholar  

Lucidarme S, Cardon G, Willem A. A comparative study of health promotion networks: configurations of determinants for network effectiveness. Public Manag Rev. 2016;18(8):1163–217. https://doi.org/10.1080/14719037.2015.1088567 .

Melendez-Torres GJ, Sutcliffe K, Burchett HED, Rees R, Richardson M, Thomas J. Weight management programmes: re-analysis of a systematic review to identify pathways to effectiveness. Health Expect Int J Public Particip Health Care Health Policy. 2018;21:574–84.

CAS   Google Scholar  

Thomas J, O’Mara-Eves A, Brunton G. Using qualitative comparative analysis (QCA) in systematic reviews of complex interventions: a worked example. Syst Rev. 2014;3(1):67. https://doi.org/10.1186/2046-4053-3-67 .

Fernald DH, Simpson MJ, Nease DE, Hahn DL, Hoffmann AE, Michaels LC, et al. Implementing community-created self-management support tools in primary care practices: multimethod analysis from the INSTTEPP study. J Patient-Centered Res Rev. 2018;5(4):267–75. https://doi.org/10.17294/2330-0698.1634 .

Harris K, Kneale D, Lasserson TJ, McDonald VM, Grigg J, Thomas J. School-based self-management interventions for asthma in children and adolescents: a mixed methods systematic review. Cochrane Database Syst Rev. 2019. https://doi.org/10.1002/14651858.CD011651.pub2 .

Kahwati LC, Lewis MA, Kane H, Williams PA, Nerz P, Jones KR, et al. Best practices in the veterans health Administration’s MOVE! Weight management program. Am J Prev Med. 2011;41(5):457–64. https://doi.org/10.1016/j.amepre.2011.06.047 .

Warren J, Wistow J, Bambra C. Applying qualitative comparative analysis (QCA) to evaluate a public health policy initiative in the north east of England. Polic Soc. 2013;32(4):289–301. https://doi.org/10.1016/j.polsoc.2013.10.002 .

Warren J, Wistow J, Bambra C. Applying qualitative comparative analysis (QCA) in public health: a case study of a health improvement service for long-term incapacity benefit recipients. J Public Health. 2014;36(1):126–33. https://doi.org/10.1093/pubmed/fdt047 .

Article   CAS   Google Scholar  

Brunton G, O’Mara-Eves A, Thomas J. The “active ingredients” for successful community engagement with disadvantaged expectant and new mothers: a qualitative comparative analysis. J Adv Nurs. 2014;70(12):2847–60. https://doi.org/10.1111/jan.12441 .

McGowan VJ, Wistow J, Lewis SJ, Popay J, Bambra C. Pathways to mental health improvement in a community-led area-based empowerment initiative: evidence from the big local ‘communities in control’ study. England J Public Health. 2019;41(4):850–7. https://doi.org/10.1093/pubmed/fdy192 .

Parrott JS, Henry B, Thompson KL, Ziegler J, Handu D. Managing Complexity in Evidence Analysis: A Worked Example in Pediatric Weight Management. J Acad Nutr Diet. 2018;118:1526–1542.e3.

Kien C, Grillich L, Nussbaumer-Streit B, Schoberberger R. Pathways leading to success and non-success: a process evaluation of a cluster randomized physical activity health promotion program applying fuzzy-set qualitative comparative analysis. BMC Public Health. 2018;18(1):1386. https://doi.org/10.1186/s12889-018-6284-x .

Lubold AM. The effect of family policies and public health initiatives on breastfeeding initiation among 18 high-income countries: a qualitative comparative analysis research design. Int Breastfeed J. 2017;12(1):34. https://doi.org/10.1186/s13006-017-0122-0 .

Bianchi F, Garnett E, Dorsel C, Aveyard P, Jebb SA. Restructuring physical micro-environments to reduce the demand for meat: a systematic review and qualitative comparative analysis. Lancet Planet Health. 2018;2(9):e384–97. https://doi.org/10.1016/S2542-5196(18)30188-8 .

The authors would like to thank and acknowledge the support of Sara Shaw, PI of MR/S014632/1 and the rest of the Triple C project team, the experts who were consulted on the final list of included studies, and the reviewers who provided helpful feedback on the original submission.

This study was funded by MRC: MR/S014632/1 'Case study, context and complex interventions (Triple C): development of guidance and publication standards to support case study research'. The funder played no part in the conduct or reporting of the study. JG is supported by a Wellcome Trust Centre grant 203109/Z/16/Z.

BH - research design, data acquisition, data extraction and coding, data interpretation, paper drafting; JT – research design, data interpretation, contributing to paper; MP – funding acquisition, research design, data interpretation, contributing to paper; JG – funding acquisition, research design, data extraction and coding, data interpretation, paper drafting. All authors approved the final version.

Original research article, a comparative analysis of student performance in an online vs. face-to-face environmental science course from 2009 to 2016.

comparative analysis of research papers

  • Department of Biology, Fort Valley State University, Fort Valley, GA, United States

A growing number of students are now opting for online classes. They find the traditional classroom modality restrictive, inflexible, and impractical. In this age of technological advancement, schools can now provide effective classroom teaching via the Web. This shift in pedagogical medium is forcing academic institutions to rethink how they want to deliver their course content. The overarching purpose of this research was to determine which teaching method proved more effective over the 8-year period. The scores of 548 students, 401 traditional students and 147 online students, in an environmental science class were used to determine which instructional modality generated better student performance. In addition to the overarching objective, we also examined score variabilities between genders and classifications to determine if teaching modality had a greater impact on specific groups. No significant difference in student performance between online and face-to-face (F2F) learners overall, with respect to gender, or with respect to class rank were found. These data demonstrate the ability to similarly translate environmental science concepts for non-STEM majors in both traditional and online platforms irrespective of gender or class rank. A potential exists for increasing the number of non-STEM majors engaged in citizen science using the flexibility of online learning to teach environmental science core concepts.


The advent of online education has made it possible for students with busy lives and limited flexibility to obtain a quality education. As opposed to traditional classroom teaching, Web-based instruction has made it possible to offer classes worldwide through a single Internet connection. Although it boasts several advantages over traditional education, online instruction still has its drawbacks, including limited communal synergies. Still, online education seems to be the path many students are taking to secure a degree.

This study compared the effectiveness of online vs. traditional instruction in an environmental studies class. Using a single indicator, we attempted to see if student performance was effected by instructional medium. This study sought to compare online and F2F teaching on three levels—pure modality, gender, and class rank. Through these comparisons, we investigated whether one teaching modality was significantly more effective than the other. Although there were limitations to the study, this examination was conducted to provide us with additional measures to determine if students performed better in one environment over another ( Mozes-Carmel and Gold, 2009 ).

The methods, procedures, and operationalization tools used in this assessment can be expanded upon in future quantitative, qualitative, and mixed method designs to further analyze this topic. Moreover, the results of this study serve as a backbone for future meta-analytical studies.

Origins of Online Education

Computer-assisted instruction is changing the pedagogical landscape as an increasing number of students are seeking online education. Colleges and universities are now touting the efficiencies of Web-based education and are rapidly implementing online classes to meet student needs worldwide. One study reported “increases in the number of online courses given by universities have been quite dramatic over the last couple of years” ( Lundberg et al., 2008 ). Think tanks are also disseminating statistics on Web-based instruction. “In 2010, the Sloan Consortium found a 17% increase in online students from the years before, beating the 12% increase from the previous year” ( Keramidas, 2012 ).

Contrary to popular belief, online education is not a new phenomenon. The first correspondence and distance learning educational programs were initiated in the mid-1800s by the University of London. This model of educational learning was dependent on the postal service and therefore wasn't seen in American until the later Nineteenth century. It was in 1873 when what is considered the first official correspondence educational program was established in Boston, Massachusetts known as the “Society to Encourage Home Studies.” Since then, non-traditional study has grown into what it is today considered a more viable online instructional modality. Technological advancement indubitably helped improve the speed and accessibility of distance learning courses; now students worldwide could attend classes from the comfort of their own homes.

Qualities of Online and Traditional Face to Face (F2F) Classroom Education

Online and traditional education share many qualities. Students are still required to attend class, learn the material, submit assignments, and complete group projects. While teachers, still have to design curriculums, maximize instructional quality, answer class questions, motivate students to learn, and grade assignments. Despite these basic similarities, there are many differences between the two modalities. Traditionally, classroom instruction is known to be teacher-centered and requires passive learning by the student, while online instruction is often student-centered and requires active learning.

In teacher-centered, or passive learning, the instructor usually controls classroom dynamics. The teacher lectures and comments, while students listen, take notes, and ask questions. In student-centered, or active learning, the students usually determine classroom dynamics as they independently analyze the information, construct questions, and ask the instructor for clarification. In this scenario, the teacher, not the student, is listening, formulating, and responding ( Salcedo, 2010 ).

In education, change comes with questions. Despite all current reports championing online education, researchers are still questioning its efficacy. Research is still being conducted on the effectiveness of computer-assisted teaching. Cost-benefit analysis, student experience, and student performance are now being carefully considered when determining whether online education is a viable substitute for classroom teaching. This decision process will most probably carry into the future as technology improves and as students demand better learning experiences.

Thus far, “literature on the efficacy of online courses is expansive and divided” ( Driscoll et al., 2012 ). Some studies favor traditional classroom instruction, stating “online learners will quit more easily” and “online learning can lack feedback for both students and instructors” ( Atchley et al., 2013 ). Because of these shortcomings, student retention, satisfaction, and performance can be compromised. Like traditional teaching, distance learning also has its apologists who aver online education produces students who perform as well or better than their traditional classroom counterparts ( Westhuis et al., 2006 ).

The advantages and disadvantages of both instructional modalities need to be fully fleshed out and examined to truly determine which medium generates better student performance. Both modalities have been proven to be relatively effective, but, as mentioned earlier, the question to be asked is if one is truly better than the other.

Student Need for Online Education

With technological advancement, learners now want quality programs they can access from anywhere and at any time. Because of these demands, online education has become a viable, alluring option to business professionals, stay-at home-parents, and other similar populations. In addition to flexibility and access, multiple other face value benefits, including program choice and time efficiency, have increased the attractiveness of distance learning ( Wladis et al., 2015 ).

First, prospective students want to be able to receive a quality education without having to sacrifice work time, family time, and travel expense. Instead of having to be at a specific location at a specific time, online educational students have the freedom to communicate with instructors, address classmates, study materials, and complete assignments from any Internet-accessible point ( Richardson and Swan, 2003 ). This type of flexibility grants students much-needed mobility and, in turn, helps make the educational process more enticing. According to Lundberg et al. (2008) “the student may prefer to take an online course or a complete online-based degree program as online courses offer more flexible study hours; for example, a student who has a job could attend the virtual class watching instructional film and streaming videos of lectures after working hours.”

Moreover, more study time can lead to better class performance—more chapters read, better quality papers, and more group project time. Studies on the relationship between study time and performance are limited; however, it is often assumed the online student will use any surplus time to improve grades ( Bigelow, 2009 ). It is crucial to mention the link between flexibility and student performance as grades are the lone performance indicator of this research.

Second, online education also offers more program choices. With traditional classroom study, students are forced to take courses only at universities within feasible driving distance or move. Web-based instruction, on the other hand, grants students electronic access to multiple universities and course offerings ( Salcedo, 2010 ). Therefore, students who were once limited to a few colleges within their immediate area can now access several colleges worldwide from a single convenient location.

Third, with online teaching, students who usually don't participate in class may now voice their opinions and concerns. As they are not in a classroom setting, quieter students may feel more comfortable partaking in class dialogue without being recognized or judged. This, in turn, may increase average class scores ( Driscoll et al., 2012 ).

Benefits of Face-to-Face (F2F) Education via Traditional Classroom Instruction

The other modality, classroom teaching, is a well-established instructional medium in which teaching style and structure have been refined over several centuries. Face-to-face instruction has numerous benefits not found in its online counterpart ( Xu and Jaggars, 2016 ).

First and, perhaps most importantly, classroom instruction is extremely dynamic. Traditional classroom teaching provides real-time face-to-face instruction and sparks innovative questions. It also allows for immediate teacher response and more flexible content delivery. Online instruction dampens the learning process because students must limit their questions to blurbs, then grant the teacher and fellow classmates time to respond ( Salcedo, 2010 ). Over time, however, online teaching will probably improve, enhancing classroom dynamics and bringing students face-to face with their peers/instructors. However, for now, face-to-face instruction provides dynamic learning attributes not found in Web-based teaching ( Kemp and Grieve, 2014 ).

Second, traditional classroom learning is a well-established modality. Some students are opposed to change and view online instruction negatively. These students may be technophobes, more comfortable with sitting in a classroom taking notes than sitting at a computer absorbing data. Other students may value face-to-face interaction, pre and post-class discussions, communal learning, and organic student-teacher bonding ( Roval and Jordan, 2004 ). They may see the Internet as an impediment to learning. If not comfortable with the instructional medium, some students may shun classroom activities; their grades might slip and their educational interest might vanish. Students, however, may eventually adapt to online education. With more universities employing computer-based training, students may be forced to take only Web-based courses. Albeit true, this doesn't eliminate the fact some students prefer classroom intimacy.

Third, face-to-face instruction doesn't rely upon networked systems. In online learning, the student is dependent upon access to an unimpeded Internet connection. If technical problems occur, online students may not be able to communicate, submit assignments, or access study material. This problem, in turn, may frustrate the student, hinder performance, and discourage learning.

Fourth, campus education provides students with both accredited staff and research libraries. Students can rely upon administrators to aid in course selection and provide professorial recommendations. Library technicians can help learners edit their papers, locate valuable study material, and improve study habits. Research libraries may provide materials not accessible by computer. In all, the traditional classroom experience gives students important auxiliary tools to maximize classroom performance.

Fifth, traditional classroom degrees trump online educational degrees in terms of hiring preferences. Many academic and professional organizations do not consider online degrees on par with campus-based degrees ( Columbaro and Monaghan, 2009 ). Often, prospective hiring bodies think Web-based education is a watered-down, simpler means of attaining a degree, often citing poor curriculums, unsupervised exams, and lenient homework assignments as detriments to the learning process.

Finally, research shows online students are more likely to quit class if they do not like the instructor, the format, or the feedback. Because they work independently, relying almost wholly upon self-motivation and self-direction, online learners may be more inclined to withdraw from class if they do not get immediate results.

The classroom setting provides more motivation, encouragement, and direction. Even if a student wanted to quit during the first few weeks of class, he/she may be deterred by the instructor and fellow students. F2F instructors may be able to adjust the structure and teaching style of the class to improve student retention ( Kemp and Grieve, 2014 ). With online teaching, instructors are limited to electronic correspondence and may not pick-up on verbal and non-verbal cues.

Both F2F and online teaching have their pros and cons. More studies comparing the two modalities to achieve specific learning outcomes in participating learner populations are required before well-informed decisions can be made. This study examined the two modalities over eight (8) years on three different levels. Based on the aforementioned information, the following research questions resulted.

RQ1: Are there significant differences in academic performance between online and F2F students enrolled in an environmental science course?

RQ2: Are there gender differences between online and F2F student performance in an environmental science course?

RQ3: Are there significant differences between the performance of online and F2F students in an environmental science course with respect to class rank?

The results of this study are intended to edify teachers, administrators, and policymakers on which medium may work best.



The study sample consisted of 548 FVSU students who completed the Environmental Science class between 2009 and 2016. The final course grades of the participants served as the primary comparative factor in assessing performance differences between online and F2F instruction. Of the 548 total participants, 147 were online students while 401 were traditional students. This disparity was considered a limitation of the study. Of the 548 total students, 246 were male, while 302 were female. The study also used students from all four class ranks. There were 187 freshmen, 184 sophomores, 76 juniors, and 101 seniors. This was a convenience, non-probability sample so the composition of the study set was left to the discretion of the instructor. No special preferences or weights were given to students based upon gender or rank. Each student was considered a single, discrete entity or statistic.

All sections of the course were taught by a full-time biology professor at FVSU. The professor had over 10 years teaching experience in both classroom and F2F modalities. The professor was considered an outstanding tenured instructor with strong communication and management skills.

The F2F class met twice weekly in an on-campus classroom. Each class lasted 1 h and 15 min. The online class covered the same material as the F2F class, but was done wholly on-line using the Desire to Learn (D2L) e-learning system. Online students were expected to spend as much time studying as their F2F counterparts; however, no tracking measure was implemented to gauge e-learning study time. The professor combined textbook learning, lecture and class discussion, collaborative projects, and assessment tasks to engage students in the learning process.

This study did not differentiate between part-time and full-time students. Therefore, many part-time students may have been included in this study. This study also did not differentiate between students registered primarily at FVSU or at another institution. Therefore, many students included in this study may have used FVSU as an auxiliary institution to complete their environmental science class requirement.

Test Instruments

In this study, student performance was operationalized by final course grades. The final course grade was derived from test, homework, class participation, and research project scores. The four aforementioned assessments were valid and relevant; they were useful in gauging student ability and generating objective performance measurements. The final grades were converted from numerical scores to traditional GPA letters.

Data Collection Procedures

The sample 548 student grades were obtained from FVSU's Office of Institutional Research Planning and Effectiveness (OIRPE). The OIRPE released the grades to the instructor with the expectation the instructor would maintain confidentiality and not disclose said information to third parties. After the data was obtained, the instructor analyzed and processed the data though SPSS software to calculate specific values. These converted values were subsequently used to draw conclusions and validate the hypothesis.

Summary of the Results: The chi-square analysis showed no significant difference in student performance between online and face-to-face (F2F) learners [χ 2 (4, N = 548) = 6.531, p > 0.05]. The independent sample t -test showed no significant difference in student performance between online and F2F learners with respect to gender [ t (145) = 1.42, p = 0.122]. The 2-way ANOVA showed no significant difference in student performance between online and F2F learners with respect to class rank ( Girard et al., 2016 ).

Research question #1 was to determine if there was a statistically significant difference between the academic performance of online and F2F students.

Research Question 1

The first research question investigated if there was a difference in student performance between F2F and online learners.

To investigate the first research question, we used a traditional chi-square method to analyze the data. The chi-square analysis is particularly useful for this type of comparison because it allows us to determine if the relationship between teaching modality and performance in our sample set can be extended to the larger population. The chi-square method provides us with a numerical result which can be used to determine if there is a statistically significant difference between the two groups.

Table 1 shows us the mean and SD for modality and for gender. It is a general breakdown of numbers to visually elucidate any differences between scores and deviations. The mean GPA for both modalities is similar with F2F learners scoring a 69.35 and online learners scoring a 68.64. Both groups had fairly similar SDs. A stronger difference can be seen between the GPAs earned by men and women. Men had a 3.23 mean GPA while women had a 2.9 mean GPA. The SDs for both groups were almost identical. Even though the 0.33 numerical difference may look fairly insignificant, it must be noted that a 3.23 is approximately a B+ while a 2.9 is approximately a B. Given a categorical range of only A to F, a plus differential can be considered significant.


Table 1 . Means and standard deviations for 8 semester- “Environmental Science data set.”

The mean grade for men in the environmental online classes ( M = 3.23, N = 246, SD = 1.19) was higher than the mean grade for women in the classes ( M = 2.9, N = 302, SD = 1.20) (see Table 1 ).

First, a chi-square analysis was performed using SPSS to determine if there was a statistically significant difference in grade distribution between online and F2F students. Students enrolled in the F2F class had the highest percentage of A's (63.60%) as compared to online students (36.40%). Table 2 displays grade distribution by course delivery modality. The difference in student performance was statistically significant, χ 2 (4, N = 548) = 6.531, p > 0.05. Table 3 shows the gender difference on student performance between online and F2F students.


Table 2 . Contingency table for student's academic performance ( N = 548).


Table 3 . Gender * performance crosstabulation.

Table 2 shows us the performance measures of online and F2F students by grade category. As can be seen, F2F students generated the highest performance numbers for each grade category. However, this disparity was mostly due to a higher number of F2F students in the study. There were 401 F2F students as opposed to just 147 online students. When viewing grades with respect to modality, there are smaller percentage differences between respective learners ( Tanyel and Griffin, 2014 ). For example, F2F learners earned 28 As (63.60% of total A's earned) while online learners earned 16 As (36.40% of total A's earned). However, when viewing the A grade with respect to total learners in each modality, it can be seen that 28 of the 401 F2F students (6.9%) earned As as compared to 16 of 147 (10.9%) online learners. In this case, online learners scored relatively higher in this grade category. The latter measure (grade total as a percent of modality total) is a better reflection of respective performance levels.

Given a critical value of 7.7 and a d.f. of 4, we were able to generate a chi-squared measure of 6.531. The correlating p -value of 0.163 was greater than our p -value significance level of 0.05. We, therefore, had to accept the null hypothesis and reject the alternative hypothesis. There is no statistically significant difference between the two groups in terms of performance scores.

Research Question 2

The second research question was posed to evaluate if there was a difference between online and F2F varied with gender. Does online and F2F student performance vary with respect to gender? Table 3 shows the gender difference on student performance between online and face to face students. We used chi-square test to determine if there were differences in online and F2F student performance with respect to gender. The chi-square test with alpha equal to 0.05 as criterion for significance. The chi-square result shows that there is no statistically significant difference between men and women in terms of performance.

Research Question 3

The third research question tried to determine if there was a difference between online and F2F varied with respect to class rank. Does online and F2F student performance vary with respect to class rank?

Table 4 shows the mean scores and standard deviations of freshman, sophomore, and junior and senior students for both online and F2F student performance. To test the third hypothesis, we used a two-way ANOVA. The ANOVA is a useful appraisal tool for this particular hypothesis as it tests the differences between multiple means. Instead of testing specific differences, the ANOVA generates a much broader picture of average differences. As can be seen in Table 4 , the ANOVA test for this particular hypothesis states there is no significant difference between online and F2F learners with respect to class rank. Therefore, we must accept the null hypothesis and reject the alternative hypothesis.


Table 4 . Descriptive analysis of student performance by class rankings gender.

The results of the ANOVA show there is no significant difference in performance between online and F2F students with respect to class rank. Results of ANOVA is presented in Table 5 .


Table 5 . Analysis of variance (ANOVA) for online and F2F of class rankings.

As can be seen in Table 4 , the ANOVA test for this particular hypothesis states there is no significant difference between online and F2F learners with respect to class rank. Therefore, we must accept the null hypothesis and reject the alternative hypothesis.

Discussion and Social Implications

The results of the study show there is no significant difference in performance between online and traditional classroom students with respect to modality, gender, or class rank in a science concepts course for non-STEM majors. Although there were sample size issues and study limitations, this assessment shows both online learners and classroom learners perform at the same level. This conclusion indicates teaching modality may not matter as much as other factors. Given the relatively sparse data on pedagogical modality comparison given specific student population characteristics, this study could be considered innovative. In the current literature, we have not found a study of this nature comparing online and F2F non-STEM majors with respect to three separate factors—medium, gender, and class rank—and the ability to learn science concepts and achieve learning outcomes. Previous studies have compared traditional classroom learning vs. F2F learning for other factors (including specific courses, costs, qualitative analysis, etcetera, but rarely regarding outcomes relevant to population characteristics of learning for a specific science concepts course over many years) ( Liu, 2005 ).

In a study evaluating the transformation of a graduate level course for teachers, academic quality of the online course and learning outcomes were evaluated. The study evaluated the ability of course instructors to design the course for online delivery and develop various interactive multimedia models at a cost-savings to the respective university. The online learning platform proved effective in translating information where tested students successfully achieved learning outcomes comparable to students taking the F2F course ( Herman and Banister, 2007 ).

Another study evaluated the similarities and differences in F2F and online learning in a non-STEM course, “Foundations of American Education” and overall course satisfaction by students enrolled in either of the two modalities. F2F and online course satisfaction was qualitatively and quantitative analyzed. However, in analyzing online and F2F course feedback using quantitative feedback, online course satisfaction was less than F2F satisfaction. When qualitative data was used, course satisfaction was similar between modalities ( Werhner, 2010 ). The course satisfaction data and feedback was used to suggest a number of posits for effective online learning in the specific course. The researcher concluded that there was no difference in the learning success of students enrolled in the online vs. F2F course, stating that “in terms of learning, students who apply themselves diligently should be successful in either format” ( Dell et al., 2010 ). The author's conclusion presumes that the “issues surrounding class size are under control and that the instructor has a course load that makes the intensity of the online course workload feasible” where the authors conclude that the workload for online courses is more than for F2F courses ( Stern, 2004 ).

In “A Meta-Analysis of Three Types of Interaction Treatments in Distance Education,” Bernard et al. (2009) conducted a meta-analysis evaluating three types of instructional and/or media conditions designed into distance education (DE) courses known as interaction treatments (ITs)—student–student (SS), student–teacher (ST), or student–content (SC) interactions—to other DE instructional/interaction treatments. The researchers found that a strong association existed between the integration of these ITs into distance education courses and achievement compared with blended or F2F modalities of learning. The authors speculated that this was due to increased cognitive engagement based in these three interaction treatments ( Larson and Sung, 2009 ).

Other studies evaluating students' preferences (but not efficacy) for online vs. F2F learning found that students prefer online learning when it was offered, depending on course topic, and online course technology platform ( Ary and Brune, 2011 ). F2F learning was preferred when courses were offered late morning or early afternoon 2–3 days/week. A significant preference for online learning resulted across all undergraduate course topics (American history and government, humanities, natural sciences, social, and behavioral sciences, diversity, and international dimension) except English composition and oral communication. A preference for analytical and quantitative thought courses was also expressed by students, though not with statistically significant results ( Mann and Henneberry, 2014 ). In this research study, we looked at three hypothesis comparing online and F2F learning. In each case, the null hypothesis was accepted. Therefore, at no level of examination did we find a significant difference between online and F2F learners. This finding is important because it tells us traditional-style teaching with its heavy emphasis on interpersonal classroom dynamics may 1 day be replaced by online instruction. According to Daymont and Blau (2008) online learners, regardless of gender or class rank, learn as much from electronic interaction as they do from personal interaction. Kemp and Grieve (2014) also found that both online and F2F learning for psychology students led to similar academic performance. Given the cost efficiencies and flexibility of online education, Web-based instructional systems may rapidly rise.

A number of studies support the economic benefits of online vs. F2F learning, despite differences in social constructs and educational support provided by governments. In a study by Li and Chen (2012) higher education institutions benefit the most from two of four outputs—research outputs and distance education—with teaching via distance education at both the undergraduate and graduate levels more profitable than F2F teaching at higher education institutions in China. Zhang and Worthington (2017) reported an increasing cost benefit for the use of distance education over F2F instruction as seen at 37 Australian public universities over 9 years from 2003 to 2012. Maloney et al. (2015) and Kemp and Grieve (2014) also found significant savings in higher education when using online learning platforms vs. F2F learning. In the West, the cost efficiency of online learning has been demonstrated by several research studies ( Craig, 2015 ). Studies by Agasisti and Johnes (2015) and Bartley and Golek (2004) both found the cost benefits of online learning significantly greater than that of F2F learning at U.S. institutions.

Knowing there is no significant difference in student performance between the two mediums, institutions of higher education may make the gradual shift away from traditional instruction; they may implement Web-based teaching to capture a larger worldwide audience. If administered correctly, this shift to Web-based teaching could lead to a larger buyer population, more cost efficiencies, and more university revenue.

The social implications of this study should be touted; however, several concerns regarding generalizability need to be taken into account. First, this study focused solely on students from an environmental studies class for non-STEM majors. The ability to effectively prepare students for scientific professions without hands-on experimentation has been contended. As a course that functions to communicate scientific concepts, but does not require a laboratory based component, these results may not translate into similar performance of students in an online STEM course for STEM majors or an online course that has an online laboratory based co-requisite when compared to students taking traditional STEM courses for STEM majors. There are few studies that suggest the landscape may be changing with the ability to effectively train students in STEM core concepts via online learning. Biel and Brame (2016) reported successfully translating the academic success of F2F undergraduate biology courses to online biology courses. However, researchers reported that of the large-scale courses analyzed, two F2F sections outperformed students in online sections, and three found no significant difference. A study by Beale et al. (2014) comparing F2F learning with hybrid learning in an embryology course found no difference in overall student performance. Additionally, the bottom quartile of students showed no differential effect of the delivery method on examination scores. Further, a study from Lorenzo-Alvarez et al. (2019) found that radiology education in an online learning platform resulted in similar academic outcomes as F2F learning. Larger scale research is needed to determine the effectiveness of STEM online learning and outcomes assessments, including workforce development results.

In our research study, it is possible the study participants may have been more knowledgeable about environmental science than about other subjects. Therefore, it should be noted this study focused solely on students taking this one particular class. Given the results, this course presents a unique potential for increasing the number of non-STEM majors engaged in citizen science using the flexibility of online learning to teach environmental science core concepts.

Second, the operationalization measure of “grade” or “score” to determine performance level may be lacking in scope and depth. The grades received in a class may not necessarily show actual ability, especially if the weights were adjusted to heavily favor group tasks and writing projects. Other performance indicators may be better suited to properly access student performance. A single exam containing both multiple choice and essay questions may be a better operationalization indicator of student performance. This type of indicator will provide both a quantitative and qualitative measure of subject matter comprehension.

Third, the nature of the student sample must be further dissected. It is possible the online students in this study may have had more time than their counterparts to learn the material and generate better grades ( Summers et al., 2005 ). The inverse holds true, as well. Because this was a convenience non-probability sampling, the chances of actually getting a fair cross section of the student population were limited. In future studies, greater emphasis must be placed on selecting proper study participants, those who truly reflect proportions, types, and skill levels.

This study was relevant because it addressed an important educational topic; it compared two student groups on multiple levels using a single operationalized performance measure. More studies, however, of this nature need to be conducted before truly positing that online and F2F teaching generate the same results. Future studies need to eliminate spurious causal relationships and increase generalizability. This will maximize the chances of generating a definitive, untainted results. This scientific inquiry and comparison into online and traditional teaching will undoubtedly garner more attention in the coming years.

Our study compared learning via F2F vs. online learning modalities in teaching an environmental science course additionally evaluating factors of gender and class rank. These data demonstrate the ability to similarly translate environmental science concepts for non-STEM majors in both traditional and online platforms irrespective of gender or class rank. The social implications of this finding are important for advancing access to and learning of scientific concepts by the general population, as many institutions of higher education allow an online course to be taken without enrolling in a degree program. Thus, the potential exists for increasing the number of non-STEM majors engaged in citizen science using the flexibility of online learning to teach environmental science core concepts.

Limitations of the Study

The limitations of the study centered around the nature of the sample group, student skills/abilities, and student familiarity with online instruction. First, because this was a convenience, non-probability sample, the independent variables were not adjusted for real-world accuracy. Second, student intelligence and skill level were not taken into consideration when separating out comparison groups. There exists the possibility that the F2F learners in this study may have been more capable than the online students and vice versa. This limitation also applies to gender and class rank differences ( Friday et al., 2006 ). Finally, there may have been ease of familiarity issues between the two sets of learners. Experienced traditional classroom students now taking Web-based courses may be daunted by the technical aspect of the modality. They may not have had the necessary preparation or experience to efficiently e-learn, thus leading to lowered scores ( Helms, 2014 ). In addition to comparing online and F2F instructional efficacy, future research should also analyze blended teaching methods for the effectiveness of courses for non-STEM majors to impart basic STEM concepts and see if the blended style is more effective than any one pure style.

Data Availability Statement

The datasets generated for this study are available on request to the corresponding author.

Ethics Statement

The studies involving human participants were reviewed and approved by Fort Valley State University Human Subjects Institutional Review Board. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author Contributions

JP provided substantial contributions to the conception of the work, acquisition and analysis of data for the work, and is the corresponding author on this paper who agrees to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. FJ provided substantial contributions to the design of the work, interpretation of the data for the work, and revised it critically for intellectual content.

This research was supported in part by funding from the National Science Foundation, Awards #1649717, 1842510, Ñ900572, and 1939739 to FJ.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


Comparative analysis of deep learning image detection algorithms

  • Shrey Srivastava 1 ,
  • Amit Vishvas Divekar 1 ,
  • Chandu Anilkumar 1 ,
  • Ishika Naik 1 ,
A computer views all kinds of visual media as an array of numerical values. As a consequence of this approach, they require image processing algorithms to inspect contents of images. This project compares 3 major image processing algorithms: Single Shot Detection (SSD), Faster Region based Convolutional Neural Networks (Faster R-CNN), and You Only Look Once (YOLO) to find the fastest and most efficient of three. In this comparative analysis, using the Microsoft COCO (Common Object in Context) dataset, the performance of these three algorithms is evaluated and their strengths and limitations are analysed based on parameters such as accuracy, precision and F1 score. From the results of the analysis, it can be concluded that the suitability of any of the algorithms over the other two is dictated to a great extent by the use cases they are applied in. In an identical testing environment, YOLO-v3 outperforms SSD and Faster R-CNN, making it the best of the three algorithms.


In recent times, the industrial revolution makes use of computer vision for their work. Automation industries, robotics, medical field, and surveillance sectors make extensive use of deep learning [ 1 ]. Deep learning has become the most talked-about technology owing to its results which are mainly acquired in applications involving language processing, object detection and image classification. The market forecast predicts outstanding growth around the coming years. The main reasons cited for this are primarily the accessibility of both strong Graphics Processing Units (GPUs) and many datasets [ 1 ]. In recent times, both these requirements are easily available [ 1 ].

Image classification and detection are the most important pillars of object detection. There is a plethora of datasets available. Microsoft COCO is one such widely used image classification domain. It is a benchmark dataset for object detection. It introduces a large-scale dataset that is available for image detection and classification [ 2 ].

This review article aims to make a comparative analysis of SSD, Faster-RCNN, and YOLO. The first algorithm for the comparison in the current work is SSD which adds layers of several features to the end network and facilitates ease of detection [ 3 ]. The Faster R-CNN is a unified, faster, and accurate method of object detection that uses a convolutional neural network. While YOLO was developed by Joseph Redmon that offers end-to-end network [ 3 ].

In this paper, by using the Microsoft COCO dataset as a common factor of the analysis and measuring the same metrics across all the implementations mentioned, the respective performances of the three above mentioned algorithms, which use different architectures, have been made comparable to each other. The results obtained by comparing the effectiveness of these algorithms on the same dataset can help gain an insight on the unique attributes of each algorithm, understand how they differ from one another and determine which method of object recognition is most effective for any given scenario.

Literature survey

Object detection has been an important topic of research in recent times. With powerful learning tools available deeper features can be easily detected and studied. This work is an attempt to compile information on various object detection tools and algorithms used by different researchers so that a comparative analysis can be done and meaningful conclusions can be drawn to apply them in object detection. Literature survey serves the purpose of getting an insight regarding our work.

The work done by Ross Girshick has introduced the Fast R-CNN model as a method of object detection [ 3 ]. It makes use of the CNN method in the target detection field. The novelty of the method proposed by Girshick has proposed a window extraction algorithm instead of a conventional sliding window extraction procedure in the R-CNN model, there is separate training for the deep convolution network for feature isolation and the support vector machines for categorization [ 4 ]. In the fast R- CNN method they have combined feature extraction with classification into a classification framework [ 3 ]. The training time is nine times faster in Fast R-CNN than in R-CNN. Whereas in the faster R-CNN method the proposal isolation region and bit of Fast R-CNN are put into a network template referred to as region proposal network (RPN). The accuracy of Fast R-CNN and Faster R-CNN is the same. The research concludes that the method is a combined, deep learning-based object detection system that works at 5–7 fps (Frames Per Second) [ 4 ]. Basic knowledge about R-CNN, Fast R-CNN and Faster R-CNN was acquired from this paper. The training of the respective model was also inspired from this paper.

Another research work done by Kim et al is discussed here. This research work uses CNN with background subtraction to build a framework that detects and recognizes moving objects using CCTV (Closed Circuit Television) cameras. It is based on the application of the background subtraction algorithm applied to each frame [ 5 ]. An architecture similar to the one in this paper was used in our work.

Another detection network is YOLO. Joseph Redmon et al have proposed You Only Look Once (YOLO)—A one-time convolutional neural network for the prediction of the frame position and classification of multiple candidates is offered by YOLO. End-to-end target detection can be achieved this way. It uses a regression problem to solve object detection. A single end-to-end system completes the process of putting the output obtained from the original image to the category and position [ 6 ]. Bounding box prediction and feature extraction of YOLO architecture in our work was inspired by the technique discussed in this paper.

Tanvir Ahmed et al have proposed a modified method that uses an advanced YOLO v1 network model which optimizes the loss of function in YOLO v1, it has a new inception model structure, has a specialized pooling pyramid layer, and has better performance. The advanced application of YOLO is taken from this research paper. It is also an end-to-end process that carries out an extensive experiment on a PASCAL VOC (Visual Object Classes) dataset. The network is an improved version and also shows high effectiveness [ 7 ]. The training of the YOLO model using PASCAL VOC was done using the technique proposed in this paper.

Wei Liu et al came up with a new method of detecting objects in images using a single deep neural network. They named this procedure the Single Shot MultiBox Detector SSD. According to the team, SSD is a simple method and requires an object proposal as it is based on the complete elimination of the process that generates a proposal. It also eliminates the subsequent pixel and resampling stages. So, it combines everything into a single step. SSD is also very easy to train and is very straightforward when it comes to integrating it into the system. This makes detection easier. The primary feature of SSD is using multiscale convolutional bounding box outputs that are attached to several feature maps [ 8 ]. Training and model analysis of the SSD model of our work was inspired by the work discussed here.

Another paper is based on an advanced type of SSD. In his paper, the authors have proposed their research work to introduce Tiny SSD, a single shot detection deep convolutional neural network. TINY SSD aimed to ease real-time embedded object detection. It comprises of greatly enhanced layers comprising of non-uniform Fire subnetwork and a stack of non-uniform subnetwork of SSD based auxiliary convolutional feature layers. The best feature of Tiny SSD is its size of 2.3 MB which is even smaller than Tiny YOLO. The results of this work have shown that Tiny SSD is well suited for embedded detections [ 9 ]. A similar model of SSD was used for the purpose of comparison.

The paper by Pathak et al describes the role of deep learning technique by using CNN for object detection. The paper also accesses some deep learning techniques for object detection systems. The current paper states that deep CNNs work on the principle of weight sharing. It gives us information about some crucial points in CNN.

These features of CNN depicted in this paper are: [ 1 ]

CNN is integration and involves the multiplication of two overlapping functions.

Features maps are abstracted to reduce their complexity in terms of space

Repetition of the process is done to produce the feature maps using filters.

CNN utilizes different types of pooling layers.

This paper was used as the basis for understanding Convolutional Neural Networks and their role in deed learning.

In a recent research work by Chen et al, they have used anchor boxes for face detection and more exact regression loss function. They have proposed a face detector termed as YOLO face which is based on YOLOv3 that aims at resolving detection problems of varying face scales. The authors concluded that their algorithm out performed previous YOLO versions and its varieties [ 10 ]. The YOLOv3 was used in our work for comparison with other models.

In the research work by Fan et al, they have proposed an improved system for the detection of pedestrians based on SSD model of object detection. In this work the multi-layered system they introduced the Squeeze-and-Excitation model as an additional layer to the SSD model. The improved model employed self-learning that further enhanced the accuracy of the system for small scale pedestrian detection. Experiments on the INRIA dataset showed high accuracy [ 11 ]. This paper was used for the purpose of understanding the SSD model.

In a recent survey published by Mittal et al, they discussed the algorithms namely Faster RCNN, Cascade RCNN, R-FCN, YOLO and its variants, SSD, RetinaNet and CornerNet, Objects as Point under advanced phases in detectors based on deep learning. This paper provides a comprehensive summary of low-altitude datasets and the algorithms used for the respective work [ 12 ]. Our comparison work was done using coco metrics similar to the comparison that has been done in this paper. The paper also discusses several other techniques for comparison which were considered in our work.

Artificial Intelligence (AI): It is a system's ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation [ 13 ].

Machine Learning (ML): It is the study of algorithms that improve automatically through experience [ 14 ]. ML algorithms build a training model based on sample data, and using it, make predictions or decisions without being ‘explicitly programmed to do so’.

Deep Learning (DL): It is the most used and most preferred approach to machine learning. It is inspired by the working of the biological brain—how individual neurons firing on receiving input only see a very small part of the total input/processed data. It has multiple layers. Upper layers build on the outputs from lower layers. Thus, the higher the layer, the more complex is the data it processes [ 15 ].

Identify more complex patterns—animals, faces, objects, skies, etc. A CNN consists of alternating convolutional and pooling layers with at least one fully connected layer at the end.

Evolution of CNNs

Convolutional Neural Network (CNN): It is a type of artificial neural network that is mainly used to analyse images. It was inspired by the neurological experiments conducted by Hubel and Wiesel on the visual cortex [ 17 ]. The visual cortex is the primary region processing visual sensory information in the brain. It extracts features from images and detects patterns and structures to detect objects in the images. Its distinct feature is the presence of convolutional layers that are hidden. These layers apply filters to extract patterns from images. The filter moves over the image to generates the output. Different filters recognize different patterns. Initial layers have filters to recognize simple patterns. They become more complex through the layers over time as follows:

Origin (Late 1980s–1990s): The first popular CNN was LeNet-5 developed in 1998 by LeCun et al. [ 18 ]. It was in development for almost a decade. Its purpose was to detect handwritten digits. It is credited for sparking R&D of efficient CNNs in the field of deep learning. Banks started using it in ATMs.

Stagnation (Early 2000s): The internal working of CNNs was not yet understood during this period. Also, there was no dataset of a variety of images like Google’s Open Images or Microsoft’s COCO. Hence, most CNNs were only focused on optical character recognition (OCR). CNNs also required high computational time; increasing operating cost. Support Vector Machine (SVM), a machine learning model was showing better results than CNN.

Revival (2006–2011): Ranzato et al. in their paper demonstrated that using the max-pooling algorithm for feature extraction instead of the sub-sampling algorithm used earlier results in significant improvement [ 19 ]. Researchers had started using GPUs to accelerate training of CNNs. Around the same time, NVIDIA introduced the CUDA platform that allowed and facilitated parallel processing, thus speeding up CNN training and validation [ 20 ]. This re-sparked research. In 2010, Stanford University established a large image dataset called Pattern Analysis, Statistical modelling and Computational Learning Visual Object Classes (PASCAL VOC), removing yet another hurdle.

Rise (2012–2013): AlexNet was a major breakthrough for accuracy of CNNs. It achieved an error rate of just 15.3% in the 2012 ILSVR challenge. The second-place network had an error rate of 26.2% [ 21 ]. So, AlexNet was better by a large margin of 10.8% than any other network known at the time. AlexNet achieved this accuracy by having a total of 8 layers [ 21 ], thus truly realizing ‘deep’ learning. This required greater computational power, but the advances in GPU technology made it possible. AlexNet, like LeNet is one of the most influential papers to ever be published on CNNs.

Architectural Innovations (2014–2020): The well-known and widely used VGG architecture was developed in 2014 [ 22 ]. RCNN, based on VGG like many others, introduced the idea that objects are located in certain regions of the image; hence the name: region-based CNN [ 23 ]. Improved versions of RCNN—Fast RCNN [ 24 ] and Faster RCNN [ 3 ] came out in the subsequent years. Both of these reduced computation time, while maintaining the accuracy that RCNN is known for. Single Shot Multibox Detector (SSD), also based on VGG was developed around 2016 [ 8 ]. Another algorithm, You Only Look Once (YOLO), based on an architecture called DarkNet was first published in 2016 [ 6 ]. It is in active development; its third version was released in 2018 [ 25 ].

Existing methodologies

Other object detection models such as YOLO or Faster R-CNN perform their operations at a much lesser speed as compared to SSD, making a much more favourable object detection method.

Before the development of SSD, several attempts had been made to design a faster detector by modifying each stage of the detection pipeline. However, any significant increase in speed by such modifications only resulted in a decrease in the detection’s accuracy and hence researchers concluded that rather than altering an existing model, they would have to come up with a fundamentally different object detection model, and hence, the creation of the SSD model [ 8 ].

SSD does not resample pixels or features for bounding box hypotheses and is as accurate as models that do. In addition to this, it is quite straightforward compared to methods that require object proposals because it completely eradicates feature resampling stages or pixel and proposal generation, by encompassing all computation in a single network. Therefore, SSD is very simple to train and can be easily integrated into systems that perform detection as one of their functions [ 8 ].

It’s architecture heavily depends on the generation of bounding boxes and the extraction of feature maps, which are also known as default bounding boxes. Loss is calculated by the network, using comparisons of the offsets of the predicted classes and the default bounding boxes with the training samples’ ground truth values, using different filters for every iteration. Using the back-propagation algorithm and the calculated loss value, all the parameters are updated. This way, SSD is able to learn the most optimal filter structures that can accurately identify the object features and generalize the given training samples in order to minimize the loss value, resulting in high accuracy during the evaluation phase [ 26 ].

Analysis of the functions

SSD is built on a feed-forward complex network that builds a collection of standard-size bounding boxes and for each occurrence of an object in those boxes, a respective score. After score generation, non-maximum suppression is used to generate the final detection results. The preliminary network layers are built on a standard architecture utilized for high quality image classification (and truncated before any classification layers), which is a VGG-16 network. An auxiliary structure is added to the truncated base network such as convo6 to produce detections.

Extracting feature maps: SSD uses the VGG-16 architecture to extract feature maps because it shows very good performance for the classification of images with high quality. The reason for using auxiliary layers is because they allow us to extract the required features at multiple scales as well as reduce the size of our input with each layer that is traversed through [ 8 ]. For each cell in the image, the layer makes a certain number of predications. Each prediction consists of a boundary box and the box generates scores for all the classes it detects in this box including a score for no object at all. It is an algorithm making a ‘guess’ as to what is in the boundary box by choosing the class with the highest score. These scores a called ‘confidence scores’ and making such predictions is called ‘MultiBox’. Figure  1 depicts the SSD model with the extra feature layers.

Convolutional predictors for object detection: Every feature layer produces a fixed number of predictions by utilising convolutional filters. For every feature layer of size x × y having n channels, the rudimentary component for generating prediction variables of a potential detection result is a 3 × 3 × x small kernel that creates a confidence score for every class, or a shape offset calculated with respect to the default grounding box coordinates which are provided by the COCO Dataset at every single one of the ‘x x y’ locations [ 8 ].

Default boxes and aspect ratios: By now, you may be able to infer that every single feature map cell is associated with a corresponding default bounding box for multiple feature maps in the network. The default boxes are responsible for determining the feature map in a complex manner so that the placement of each box concerning its corresponding cell is fixed. At each feature map cell, we speculate the offsets concerning the default box shapes in the cell and the scores for each class which tells us about the class of object present inside the bounding box. Going into further detail, for every box out of b at a particular given location, s class scores are calculated and its 4 offsets relative to the primal default box shape. This computation results in a total of (s + 4) b filters that are applicable to every location in the feature map, resulting in (s + 4) × b × x × y outputs for a x × y feature map. [ 8 ]

figure 1

Deep Learning Layers illustration [ 15 ]

SSD Training Process

Matching Process: All SSD predictions are divided into two types; negative matches or positive matches. Positive matches are only used by SSD to calculate the localization cost which is the misalignment of the boundary box with the default box. The match is positive only if the corresponding default boundary box’s IoU is greater than 0.5 with the ground truth. In any other case, it is negative. IoU stands for the ‘intersection over the union’. It is the ratio between the intersected area over the joined area for two regions. IoU is also referred to as the Jaccard index and using this condition makes the learning process much easier [ 8 ].

Hard negative mining: After the matching step, almost all of the default boxes are negatives, largely when the total count of possible default boxes is high. This causes a large imbalance between the positive and negative training examples. Rather than using up all the negative examples, SSD sorts them by their greatest confidence loss for each default box, the highest ones such that at any point of time, the ratio of the positives and negatives is a maximum of 3:1. This leads to faster optimization and better training [ 8 ].

Data augmentation: This is crucial for increasing accuracy. There are several data augmentation techniques that we may employ such as color distortion, flipping, and cropping. To deal with a variety of different object sizes and shapes, each training image is randomly picked using one of the methods listed below: [ 8 ].

We use the original,

Sample a patch with IoU of 0.1, 0.3, 0.5, 0.7 or 0.9,

Sample a patch randomly.

Final detection: The results are generated by performing NMS on multi-scale refined bounding boxes. Using the above-mentioned methods such as hard negative mining, data augmentation, and a larger number of other methods, SSD’s performance is much greater than that of Faster R-CNN when it comes to accuracy on PASCAL VOC dataset and the COCO dataset, while being three times faster [ 26 ]. The SSD300, where the size of the input image is 300_300, runs at 59 FPS, which is much more efficient and accurate than YOLO. However, SSD is not as efficient at detection for smaller objects, which can be solved by having a more efficient feature extractor backbone (e.g., ResNet101), with the addition of deconvolution layers along with skip connections to create additional large-scale context, and design a better network structure [ 27 ].

Complexity analysis

For most algorithms,time-complexity is dependent on the size of input and can be defined in terms of the big-Oh notation. However,for deep-learning models, time complexity is evaluated in terms of the total time taken by SSD to be trained and the inference time when the model is run on specific hardware (Fig. 2 ).

figure 2

Evolution of CNNs from 1979 through 2018 [ 16 ]

Deep learning models are required to carry out millions of calculations which can prove to be quite expensive computationally, however most of these calculations end up being performed parallelly by the thousands of identical neurons in each layer of the artificial neural network. Due to this parallel nature , it has been observed that training an SSD model in a Nvidia GeForce GTX 1070i GPU reduces the training time by a factor of ten [ 28 ].

When it comes to time-complexity, matrix multiplication in the forward pass of the base CNN takes up the most amount of time. The total number of multiplications is dependent on the number of layers in the CNN along with more specific details such as the number of neurons per layer, the amount of filters along with their respective sizes, the size of the feature extraction map and the image’s resolution. The activation function used at each layer is a ReLu function that has been found to run in quadratic time for each neuron in each layer. Hence, taking all these factors into account, we can determine the time-complexity of the forward pass at the base CNN :

Here, b denotes the index of the CNN layer, B is the total amount of CNN layers,x b is the number of filters in the b th layer,h is the filter width and height, x c is the number of neurons, x b-1 is the total number of input channels of the b th layer, s b is the size of the output feature map.

It should be noted that five to ten percent of the training time is taken up by things like dropout,regression,batch normalisation,classification as well.

As for SSD’s accuracy, it is determined by Mean Average Precision or mAP which is simply the average of APs over all classes from the area under the precision-recall curve. A higher mAP is an indication of a more accurate model [ 28 ].

Faster R-CNN

R-CNN stands for Region-based Convolutional Neural Networks. This method combines region proposals for object segmentation and high capacity CNNs for object detection [ 28 ].

The algorithm of the original R-CNN technique is as follows: [ 29 ]

Using a Selective Search Algorithm, several candidate region proposals are extracted from the input image. In this algorithm, numerous candidate regions are generated in initial sub-segmentation. Then, regions which are similar are combined to form bigger regions using a greedy algorithm. These regions make up the final region proposals.

The CNN component warps the proposals and extracts distinct features as a vector output.

The features which are extracted are fed into an SVM (Support Vector Machine) for recognizing objects of interest in the proposal.

Figure 4 given below explains the features and working of R-CNN.

This technique was plagued by a lot of drawbacks. The requirement to classify ~2000 region proposals make the training of the CNN a very time-consuming process. This makes real-time implementation impossible as each test image would take close to 47 seconds for execution.

Furthermore, machine learning could not take place as the Selective Search Algorithm is a fixed algorithm. This could result in non-ideal candidate region proposals being generated [ 29 ].

Fast R-CNN is an algorithm for object detection that solves some of the drawbacks of R-CNN. It uses an approach similar to that of its predecessor, but as opposed to using region proposals, the CNN utilizes the image itself for creating a convolutional feature map, following which region proposals are determined and warped from it. An RoI (Region of Interest) pooling layer is employed for reshaping the warped squares according to a predefined size for a fully connected layer to accept them. The region class is then predicted from the RoI vector with the help of a SoftMax layer [ 24 ].

Fast R-CNN is faster than its predecessor because feeding ~2000 proposals as input to the CNN per execution is not required. The convolution operation is done to generate a feature map only once per image. [ 24 ] The Fig. 3 given below describes the features and working of Fast RCNN.

figure 3

SSD model [ 8 ]

This algorithm shows a significant reduction in time required for both training and testing when compared to R-CNN. But it was noticed that including region proposals significantly bottlenecks the algorithm, reducing its performance [ 3 ].

Both Fast R-CNN and its predecessor used Selective Search as the algorithm for determining the region proposals. This being a very time-sapping algorithm, Faster R-CNN eliminated the need for its implementation and instead let the proposals be learned by the network. Just as in the case of Fast R-CNN, a convolutional map is obtained from the image. But a separate network replaces the Selective Search algorithm to predict proposals. These proposals are then reshaped and classified using RoI (Region of Interest) pooling. Refer to the Fig. 4 for the working of Faster R-CNN.

figure 4

R-CNN model [ 15 ]

Faster R-CNN offers an improvement over its predecessors so significant that it is now capable of being implemented for real-time object detection.

Architecture of faster R-CNN

The original implementation of Faster Region-based Convolutional Neural Network (Faster R-CNN) algorithm was experimented on two architectures of convolutional networks: The ZF (Zeiler and Fergus) model, with 5 convolutional layers that a Fast R-CNN network shares with it; and the VGG-16(Simonyan and Zisserman) model, with 13 convolutional layers shared [ 3 ] .

The ZF model is based on an earlier model of a Convolutional Network (made by Krizhevsky, Sutskever and Hinton) [ 30 ] . This model consisted of eight layers, of which five were convolutional and the remaining three were fully connected [ 21 ] .

This architecture exhibited quite a few problems. The first layer filters had negligible coverage medium frequency information compared to that of the very extremes, and the large stride 4 used in the first layer caused aliasing artifacts in the second layer. The ZF model fixed these issues by reducing the size of the first and second layer and making the convolution stride 2, allowing it to hold more information in the first and second layers, and improve classification performance [ 30 ] .

Region based Convolutional Neural Network (RCNN) and Fast-RCNN both use Selective Search. Selective Search is a greedy algorithm. Greedy algorithms don’t always return the best result [ 31 ]. Also, it needs to run multiple times. However, RCNN runs selective search about 2000 times on the image. Fast-RCNN extracts all the regions first and runs selective search just once. This way it reduces time complexity by a large factor [ 3 ]. Faster RCNN (FRCNN) removes the final bottleneck—Selective Search. It does so by instead using the Region Proposal Network (RPN). RPN fixes the regions as a grid of n × n. It needs to run fewer number of times as compared to selective search [ 3 ] .

As shown in the diagram above, FRCNN consists of Deep Fully Convolutional Network (DFCN), Region Proposal Network, ROI pooling, Fully Connected (FC) networks, Bounding Box Regressor and Classifier.

We will consider DFCN to be ZF-5 for consistent calculation [ 30 ]. First feature map, M of dimensions 256 × n × n is extracted from input image, P [ 33 ]. Then it is fed to RPN and ROI.

RPN: There are ‘k’ anchors for each point on M. Hence, Total anchors = n × n × k. Anchors are ranked according to score; 2000 anchors are obtained through Non-Maximum Suppression [ 3 ]. The Complexity comes out to be O(N2/2).

ROI: Anchors get divided into H × W grid of sub-windows based on M. Output grid is obtained by max-pooling values in corresponding sub-windows. ROI is special case of spatial pyramid pooling layer used in SPP-net, with just one pyramid layer [ 24 ]. Hence, complexity becomes O(1) .

In modern times YOLO (You Only Look Once) is one of the most precise and accurate object detection algorithms available. It has been made on the basis of a newly altered and customized architecture named Darknet [ 25 ]. The first version was inspired by Google Net, which used tensor to sample down the image and predicted it with the maximum accuracy. The tensor is generated on the basis of a similar procedure and structure which is also seen in the Region of Interest that is pooled and compiled to decrease the number of individual computations and make the analysis swifter) that is used in the Faster R-CNN network. The following generation utilized an architecture with just 30 convolutional layers, that in turn consisted of 19 layers from DarkNet-19 and an extra 11 for detection of natural objects or objects in natural context as the COCO dataset and metrics have been used. It provided more precise detection and with good speed, although it struggled with pictures of small objects and small pixels. But version 3 has been the greatest and most accurate version of YOLO which has been used widely because of its high precision. Also, the architecture with multiple layers has made the detection more precise [ 26 ].

YOLOv3 makes use of the latest darknet features like 53 layers and it has undergone training with one of the most reliable datasets called ImageNet. The layers used are from an architecture Darnnet-53 which is convolutional in nature. For detection, the aforementioned 53 layers were supplemented instead of the pre-existing 19 and this enhanced architecture was trained and instructed with PASCAL VOC. After so many additional layers the architecture maintains one of the best response times with the accuracy offered. It also is very helpful in analysing live video feed because of its swift data unsampling and object detection techniques. One can notice that this version is the best enhancements in ML (Machine Learning) using neural networks. The previous version did not work well with the images of small pixels but the recent updates in v3 have made it very useful in analysing satellite imaging even for defence departments of some countries. The architecture performs in 3 different layers which makes it more efficient but the process is a little slower yet state-of-the-art. For understanding, the framework refers to the Fig. 5 given below.

figure 5

Fast R-CNN [ 16 ]

Feature extraction and analysis [ 34 ]

1. Forecasting: This model utilizes packages of different lengths and breadths to produce the weights and frames that establish a strong foundation. This technique is an individual where the network determines the objectivity and allocation independently. The logical regression is used by YOLOv3 where it foresees the objectivity score. It is projected over the selection frame initially on the object that has been established to be the fundamental truth in the picture by pre-training models [ 35 ]. This gives a singular bounding box and any kind of fallacy in this part would cause mistakes in both allocation of these boxes and their accuracy and also in the detection arrear. The bounding box forecasting is depicted in the equation given below and Fig.  6 .

figure 6

Faster R-CNN [ 3 ]

Equations for bounding box forecasting [ 34 ]

2. Class Prediction: YOLOv3 executes a soft-max function to alter the scores to an understandable format for the code. The format is 1. YOLOv3 uses multiple classifications by tag. These tags are custom and non-exclusive. For eg. ‘man’ and ‘woman’ are not exclusive. The architecture modifies the function with individualistic logistic classifiers. YOLOv3 uses binary loss function initially. It uses the soft-max function after that. This leads to a reduction in complexity by avoiding it for the first implementation [ 36 ].

3. Predictions: Three distinct orders and dimensions are used for pre-determining the bounding boxes. These are in combination with the function extractor, DarkNet-53. The last levels include detection and categorization into object classes. 3 takes are what is taken on each scale of the COCO dataset. That leads to more than 70 class predictions as an o/p tensor. These features are a classic coder-decoder design introduced in Single-Shot-Detector. The grouping of k-means is also used for finding the best bounding boxes. Finally, in the COCO dataset dimensions like 10 × 13, 62 × 45 and others are used. In total there are 9 distinct dimensions including the aforementioned.

4. DarkNet-53 - The feature Extractor: YOLOv2 had the implementation of DarkNet-19 but in the recently modified model of YOLO Darknet-53 is being used where the 53 is 53 convolutional levels. Speed and accuracy both are an enhanced in Darknet 53 making it 1.5 times quicker. When this architecture is put to compete with ResNet-152, it almost the same performance in terms of accuracy and precision but it is twice as fast [ 37 ]. The following Fig. 7 shows the YOLO model.

figure 7

CNN of the Krizhevsky model [ 21 ]

The YOLO network is based on a systematic division of the given image into grid. The grids are of 3 types which will be mentioned later. These grids serve as a separate image for the algorithm and they undergo further divisions. YOLO utilizes boundaries that are called bounding boxes. These are the anchors for the analysis of an image. These boxes are essentially acknowledged as resulted even though thousands and thousands are ignored because of the low probability scores and are treated as false positives. These boxes are the manifestation of the rigorous breaking down of an image into grids of cells [ 38 , 39 , 40 ].

For determining suitable anchor box sizes, YOLO uses K-means clustering to clutch the boxes among the training data. These prior boxes are the guidelines for the algorithm. After receiving the aforementioned data, the algorithm looks for objects with symmetrical shape and size. YOLO uses 3 boxes as anchor so each grid cell puts out 3 boxes. The further predictions and analysis are based on these 3 anchor boxes. Some cases and studies involve the use of 2 anchor box leading to 2 boxes per grid cell [ 39 ].

In the above Fig. 8 , we can see the anchor box as the dashed box and the forecast of the ground truth or the bounding box is the box with the highlighted borders. There are multiple examples of sizes of image floating around. Each have a distinctive grid cell size and shape. For our model we have taken the standard 448 × 448 image size. The other sizes used for analysis are 416 × 416, 608 × 608 etc. and the grid sizes for them are 19 × 19, 38 × 38 & 76 ×76 and 13 × 13, 52 × 52 & 26 × 26 respectively [ 40 , 41 ].

figure 8

Bounding box forecasting [ 34 ]

For the first step, the image is modified and altered to a size of 448 x 448 and then the image is put through a slice and dice system where they are divided into 7 x 7 size. This implies that the size of each grid is of size 64 x 64. Every single one of these grid cells produce a certain number of bounding boxes. It may vary from version to version (multiple versions in YOLOv3). For our model we are using 2 boxes per grid. This gives us 4 coordinates per bounding box. They are x center , y center , width, height. Also, there’s a corresponding confidence value [ 32 ].

Use of K-means clustering algorithm gives exponential time complexity O(n kd ) where k is the number of images and d is the dimension of the images. After a thorough and stable optimisation technique, the creators have made YOLOv3 the fastest image detection algorithm among the ones mentioned in the paper.


In recent times for the search of a perfect combination of algorithm and data set, contenders have used the top and highly rated deep learning architectures and data sets. They are used for arriving at the best possible precision and accuracy. The most commonly used data sets are PASCAL VOC and Microsoft COCO. For the review analysis, COCO is used as a dataset and an evaluation metric. They applied different ways of analysis, tweaking and calibrating the base networks and adjusting the software; that leads to better precision but also for improving accuracy, speed, and local split performance [ 26 ].

For Object detection, the use of computationally costly architectures and algorithms such as RCNN, SPP-NET (Spatial Pyramid Pooling Network) the use of smart data sets having varied objects and images which also have various objects and are of different dimensions have become a necessity. Not to forget the extreme scope in live video feed monitoring the cost of detection becomes too high. Recently the advancement in deep learning architectures has lead algorithms like YOLO and SSD networks to detect objects by the access to a singular NN (neural network). The introduction of latest architectures has increased the competition between various techniques [ 26 ]. But recently COCO has emerged as the most used data set for training and classification. Also, more developments have made it alterable for adding classes [ 2 ].

Furthermore, COCO is better than other popular widely used data sets as per some research papers [ 2 ]. They are namely Pattern Analysis, Statistical Modelling and Computational Learning Visual Object Classes, ImageNet & SUN (Scene Understanding). The above-mentioned data sets vary hugely based on size, categories, and types. ImageNet was made to target a wider category where the number of different categories but they were fine-grained. SUN focused on more of a modular approach where the regions of interest were based on the frequency of them occurring in the data set. Finally, PASCAL VOC’s was similar yet different in approach to COCO. It used a wide range of images taken from the environment and nature. Microsoft Common Objects in Context is made for the detection and classification of the objects in their classic natural context [ 2 ].

Annotation pipeline [ 2 ]

As seen in the following Fig. 9 an annotation pipeline explains the identification and categorization of a particular image.

figure 9

The ZF model [ 30 ]

This type of annotation pipeline gives a better perspective to object detection algorithms. Training algorithms using these diverse images and advanced concepts like crowd scheduling and visual segmentation. Following Fig. 10 gives the detailed categories that are available in MS COCO. The 11 super-categories are Person and Accessories, Animal, Vehicle, Outdoor Objects, Sports, Kitchenware, Food, Furniture, Appliance, Electronics, and Indoor Objects [ 42 ].

figure 10

FRCNN Architecture [ 32 ]

Pascal VOC (Visual Object Classes)

The challenge.

The Pascal VOC (Visual Object Classes) Challenges were a series of challenges that took place from 2005 to 2012 which consisted of two components: A public dataset which contained images from the Flickr website, their annotations and software for evaluation; and a yearly event consisting of a competition and a workshop. The main objectives of the challenge were classification, detection, and segmentation of the images. There were also two additional challenges of action classification and person layout [ 43 ].

The Datasets

The datasets used in the Pascal VOC Challenges consist of two subsets: a trainval dataset, which was further classified into separate sets for training and validation; and a test dataset. All the contained images are fully annotated with the help of bounding boxes for all instances of the following objects for the classification and detection challenges: [ 43 ]

Along with these annotations, attributes such as viewpoint, truncation, difficult, consistent, accurate and exhaustive were specified, some of which were added in later editions of the challenge [ 44 ].

Experimental set up

The hardware comprised of 8 GB DDR5 Random Access Memory, 1 TB Hard Disk Drive, 256 GB Solid State Drive and Intel Core processor i5 8th Generation which clocks at a speed 1.8Ghz (Figs. 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , and 20 ).

figure 11

YOLO architecture [ 26 ]

figure 12

YOLO model ConvNet [ 37 ]

figure 13

Categories of images [ 42 ]

figure 15

The classes of objects considered in the challenge [ 43 ]

figure 16

Statistics of the VOC2012 datasets [ 43 ]

figure 17

Graph for SSD [ 26 ]

figure 18

Graph for faster RCNN [ 26 ]

figure 19

Graph for YOLO [ 26 ]

figure 20

Compared with YOLOv3, the new version of AP (accuracy) and FPS (frame rate per second) are improved by 10% and 12%, respectively [ 46 ]

The software configuration put to use is the Google Colab using inbuilt engine called Python 3 Google Compute Engine Backend. It provides a RAM of 12.72 GB of which 3.54 was used at an average. Also, it provides a disk space of 107.77 GB of which 74.41 GB was used which included the training and validation datasets. The hardware accelerator used was the synthetic GPU offered by Google Colab (Tables 1 and 2 ).

Results and discussions

Two performance metrics are applied to object detecting models for testing. These are ‘Average Precision’ and an F1 score. The predicted bounding boxes are compared with the ground truth bounding boxes by the detector according to IOU (Intersection Over Union). The ‘True Positive’, ‘False Negative’, and ‘False Positive’ are defined and then used for the calculation of precision and recall which in turn are used for calculating the F1 score. The Formulae for these are as follows. [ 42 ]

Precision = TP/ (TP +FP’)

Recall = TP/ (TP + FN’)

And using these, F1 score = 2*Precision*Recall/(Precision + Recall)

Apart from these two, the performance of the models is also measured using the following metrics given by the COCO metrics API. [ 42 ]

Using all these, the outcomes for all three algorithms were compared in order to compare their performance. The outcomes were as follows:

Results comparison

Following were some limitations that were observed in the three models

When it comes to smaller objects, SSD’s performance is much worse as compared to Faster R-CNN. The main reason for this drawback, is that in SSD, higher resolution layers are responsible for detecting small objects. However, these layers are less useful for classification as they contain lower-level features such as colour patches or edges, thereby reducing the overall performance of SSD [ 8 ].

Another limitation of this method which can be inferred from the complexity of SSD’s data augmentation, is that SSD requires a large amount of data for training purposes. This can be quite expensive and time-consuming depending on the application [ 8 ]

Accuracy of this algorithm comes at the cost of time complexity. It is significantly slower than the likes of YOLO.

Despite improvements over RCNN and Fast RCNN, it still requires multiple passes over a single image unlike YOLO [ 3 ] 3

FRCNN has many components—the convolutional network, Regions of Interest (ROI) pooling layer and Region Proposal Network (RPN). Any of these can serve as a bottleneck for the others [ 3 ].

YOLOv3 was one of the best modifications that had been done to an object detection system since the introduction of Darknet 53. This modified update was received very well among the critics and other industrial professionals. But it had its own shortcomings. Though YOLOv3 is still considered to be a veteran, the complexity analysis showed flaws and lacked optimal solutions to the loss function. It was later rectified in an optimized model of the same and was later used and tested for functionality enhancements [ 45 ].

A better version of a given software is the best to analyse the faults in the former. After analysing the paper on YOLOv4 we can see that version 3 used to fail when the image had multiple features to be analysed but they weren’t the highlight of the pic. The lack of accuracy was always an issue when it came to smaller images. It was basically useless to use version 3 to analyse small images because the accuracy was around 16% (proven by our data). Another matter to be looked at is that the use of Darknet 53. YOLOv4 has brought in CSPDarknet-53 which is better than Darknet-53 as it uses only 66% of the number of parameters that version 3 used to use but gives a better result which enhanced speed and accuracy [ 46 ].

The precision-recall curves plotted using the COCO metric, API, allowed us to form proper deductions about the efficiency with which these three models perform object detection. Graphs were plotted for each model based on different object sizes.

The area shaded in orange indicates the precision-recall curve without any errors, the area shaded in violet indicates the objects that were falsely detected, the area shaded in blue indicates the localisation errors (Loc). Lastly, the areas under the precision-recall curve that are white indicates an IoU value greater than 0.75 and area shaded in grey indicates an IoU value greater than 0.5.

From the graphs of the three models, it is evident that both region-based detectors like F R-CNN and SSD both have low accuracy due to their relatively larger violet areas. However, amongst themselves, F R-CNN is more accurate than SSD while SSD is more efficient for real-time processing applications due to its higher mAP values. YOLO is clearly the most efficient of the all evident from its almost non-existent violet regions.

This review article compared the latest and most advanced CNN-based object detection algorithms. Without object detection, it would be impossible to analyse the hundreds of thousands of images that are uploaded to the internet every day [ 42 ]. Technologies like self-driving vehicles that depend on real-time analysis are also impossible to realize without object detection. All the networks were trained with the open-source COCO dataset by Microsoft, to ensure a homogeneous baseline. It was found that Yolo-v3 is the fastest with SSD following closely and Faster RCNN coming in the last place. However, it can be said that the use case influences which algorithm is picked; if you are dealing with a relatively small dataset and don’t need real-time results, it is best to go with Faster RCNN. Yolo-v3 is the one to pick if you need to analyse a live video feed. Meanwhile, SSD provides a good balance between speed and accuracy. Additionally, Yolo-v3 is the most recently released of the three and is actively being contributed to by the vast open-source community. Hence, in conclusion, out of the three Object Detection Convolutional Neural Networks analysed, Yolo-v3 shows the best overall performance. This result is similar to what some of the previous reports have obtained.

A great deal of work can still be done in the future in this field. Every year, either new algorithms or updates to existing ones are published. Also, each field—aviation, autonomous vehicles (aerial and terrestrial), industrial machinery, etc. are suited to different algorithms.

These subjects can be explored in detail in the future.

Availability of data and materials

Coco dataset used in the paper is available from the website https://cocodataset.org/#explore .


Faster Region based Convolutional Neural Network

Single Shot Detector

You Look Only Once version 3

Common Objects in Context

Visual Geometry Group 16

Pathak AR, Pandey M, Rautaray S. Application of deep learning for object detection. Procedia Comput Sci. 2018;132:1706–17.

Article   Google Scholar  

Palop JJ, Mucke L, Roberson ED. Quantifying biomarkers of cognitive dysfunction and neuronal network hyperexcitability in mouse models of Alzheimer’s disease: depletion of calcium-dependent proteins and inhibitory hippocampal remodeling. In: Alzheimer's Disease and Frontotemporal Dementia. Humana Press, Totowa, NJ; 2010, p. 245–262.

Ren S, He K, Girshick R, Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2016;39(6):1137–49.

Ding S, Zhao K. Research on daily objects detection based on deep neural network. IOP Conf Ser Mater Sci Eng. 2018;322(6):062024.

Kim C, Lee J, Han T, Kim YM. A hybrid framework combining background subtraction and deep neural networks for rapid person detection. J Big Data. 2018;5(1):22.

Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016, pp. 779–788.

Ahmad T, Ma Y, Yahya M, Ahmad B, Nazir S. Object detection through modified YOLO neural network. Scientific Programming, 2020.

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC. Ssd: single shot multibox detector. In: European conference on computer vision. Cham: Springer; 2016, p. 21–37.

Womg A, Shafiee MJ, Li F, Chwyl B. Tiny SSD: a tiny singleshot detection deep convolutional neural network for real-time embedded object detection. In: 2018 15th conference on computer and robot vision (CRV). IEEE; 2018, p. 95101

Chen W, Huang H, Peng S, Zhou C, Zhang C. YOLO-face: a real-time face detector. The Visual Computer 2020:1–9.

Fan D, Liu D, Chi W, Liu X, Li Y. Improved SSD-based multi-scale pedestrian detection algorithm. In: Advances in 3D image and graphics representation, analysis, computing and information technology. Springer, Singapore; 2020, p. 109–118.

Mittal P, Sharma A, Singh R. Deep learning-based object detection in low-altitude UAV datasets: a survey. Image and Vision Computing 2020:104046.

Kaplan A, Haenlein M. Siri, Siri, in my hand: Who’s the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence. Bus Horiz. 2019;62(1):15–25.

Mitchell T. Machine learning. New York: McGraw Hill; 1997.

MATH   Google Scholar  

Schulz H, Behnke S. Deep learning. KI-Künstliche Intelligenz. 2012;26(4):357–63.

Khan A, Sohail A, Zahoora U, Qureshi AS. A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev. 2020;53(8):5455–516.

Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol. 1962;160(1):106–54.

LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.

Ranzato MA, Huang FJ, Boureau YL, LeCun Y. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: 2007 IEEE conference on computer vision and pattern recognition. IEEE; 2007, p. 1–8.

Nickolls J, Buck I, Garland M, Skadron K. Scalable parallel programming with cuda: Is cuda the parallel programming model that application developers have been waiting for? Queue. 2008;6(2):40–53.

Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 2012;25:1097–105.

Google Scholar  

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556; 2014.

Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2014, p. 580–7.

Girshick R. Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision; 2015, p. 1440–8.

Redmon J, Farhadi A. Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767; 2018.

Alganci U, Soydas M, Sertel E. Comparative research on deep learning approaches for airplane detection from very high-resolution satellite images. Remote Sensing. 2020;12(3):458.

Zhao ZQ, Zheng P, Xu ST, Wu X. Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst. 2019;30(11):32123232.

Reza Z. N. (2019). Real-time automated weld quality analysis from ultrasonic B-scan using deep learning (Doctoral dissertation, University of Windsor (Canada)).

Shen X, Wu Y. A unified approach to salient object detection via low rank matrix recovery. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE; 2012, p. 853–60.

Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. In: European conference on computer vision. Cham: Springer, 2014, p. 818–33.

Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW. Selective search for object recognition. Int J Comput Vision. 2013;104(2):154–71. https://doi.org/10.1007/s11263-013-0620-5 .

Wu J. Complexity and accuracy analysis of common artificial neural networks on pedestrian detection. In MATEC Web of Conferences, Vol. 232. EDP Science; 2018, p. 01003.

He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. In: European conference on computer vision. Cham: Springer; 2016, p. 630–45.

Xu D, Wu Y. Improved YOLO-V3 with DenseNet for multi-scale remote sensing target detection. Sensors. 2020;20(15):4276.

Butt UA, Mehmood M, Shah SBH, Amin R, Shaukat MW, Raza SM, Piran M. A review of machine learning algorithms for cloud computing security. Electronics. 2020;9(9):1379.

Ketkar N, Santana E. Deep learning with Python, vol. 1. Berkeley: Apress; 2017.

Book   Google Scholar  

Jiang R, Lin Q, Qu S. Let blind people see: real-time visual recognition with results converted to 3D audio. Report No. 218, Stanford University, Stanford, USA; 2016.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015, p. 1–9.

Zhao L, Li S. Object detection algorithm based on improved YOLOv3. Electronics. 2020;9(3):537.

Syed NR. A PyTorch implementation of YOLOv3 for real time object detection (Part 1). [Internet] [Updated Jun 30 2020]. https://nrsyed.com/2020/04/28/a-pytorch-implementation-of-yolov3-for-real-time-object-detection-part-1/ . Accessed 02 Feb 2021.

Ethan Yanjia Li. Dive really deep into YOLOv3: a beginner’s guide. [Internet][Posted on December 30 2019] Available at https://yanjia.li/dive-really-deep-into-yolo-v3-a-beginners-guide/ . Accessed 31 Jan 2021.

COCO. [Internet]. https://cocodataset.org/#explore . Accessed 28 Oct 2020.

Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes challenge: a retrospective. Int J Comput Vision. 2015;111(1):98–136.

Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes (voc) challenge. Int J Comput Vision. 2010;88(2):303–38.

Huang YQ, Zheng JC, Sun SD, Yang CF, Liu J. Optimized YOLOv3 algorithm and its application in traffic flow detections. Appl Sci. 2020;10(9):3079.

Bochkovskiy A, Wang CY, Liao HYM. Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.

Not applicable.

SS: Research and Implementation of YOLO Algorithm. Comparative Analysis. AVD: Research and Implementation of Faster RCNN Algorithm. Comparative Analysis. CA: Research and Implementation on Faster RCNN Algorithm. Comparative Analysis. IN: Research and Implementation of SSD Algorithm. Comparative Analysis. VK: Research and Implementation on SSD Algorithm. Comparative Analysis. VP: Verification of results obtained through implementations. Approval of final manuscript.

comparative analysis of research papers

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

A Comparative Analysis of Two Published Research Papers

Profile image of Bex Ferriday

Related Papers

Ann T Conway

This paper highlights the necessary application of mixed methods in educational research within the field of adult learners and lifelong learning in higher education to provide adult learners with a “voice”. The paper will introduce the reader to educational research through focusing on mixed methods and the theoretical frameworks within. Critical hermeneutic epistemology or transformative research (as espoused by Freire

comparative analysis of research papers

Mignonne Breier

Abstract This paper presents an analytic framework whereby the relationship between formal and informal knowledge can be viewed in a pedagogic context. The framework is the result of a dialectical conversation between the theories of Bernstein and Dowling, primarily, and the empirical data from case studies of two university courses in Labour Law.

Julius John L Palacpac , Czarino Torrano , Janice G. Manzanares

John Ballam

Carolyn Fortuna

taw tshering

Claire McDonnell

Christopher Wibberley

At Manchester Metropolitan University, the concept of the independent autonomous learner is at the heart of institution changes in the learning, teaching and assessment processes and the implementation of an institutional Managed Learning Environment (MLE). In the Physiotherapy programme team we have conducted a mixed methods evaluation of the provision of online resources that aim to facilitate autonomy from the beginning of the programme and are delivered via the WebCT VISTA component of our MLE. Primarily, we investigated “facilitators” and “barriers” to uptake and use of these resources by students. Overall, students reported a very positive experience of online activities, with a broad range of factors influencing uptake and engagement. Extrinsic factors related mainly to technical (e.g. home PC setup) and administrative (student enrolment, network access and support) difficulties. These had less impact on our study’s metrics than intrinsic factors such as autonomy, motivation and IT skills. Our evaluations have also highlighted a mismatch between the programme’s aspirations and student perspectives of autonomy. We have made links between the levels of autonomy, motivation and IT skills of our students and considered ways of addressing these issues within the Physiotherapy curriculum. As a result we are in the process of devising a new induction programme which aims to provide “scaffolding” that will motivate our students and assist their development as independent autonomous learners.

Ridwan Osman


Applied E-Learning and E-Teaching in …

Pirjo H Vuoskoski

International Journal of Applied Linguistics & English Literature [IJALEL]

Educational Researcher

Mirka Koro-Ljungberg

Eugene Matusov

midhun.k Kulappuram

Jayshree Thakrar

Sanjaya Mishra

Garrie Steels

Pamela Ryan

Matthew Rich-Tolsma

Research methods in the social sciences

Gayle Dillon

Violet Kafwa , Katam Joseph. K.

Jürgen Rudolph

Research methods in the social …

Morten Nissen

National Academy For Integration of Research Teaching and Learning

Bettie Higgs , Catherine N Lowry-O'Neill

Chapter 21: Communities of Practice.

E. Cohernour

David Preece

Raymond Utulu

Jocelyn Chapman

Julianne Nyhan

… Conference Papers and Abstracts September 2009

Learning and Teaching in …

Nick Swarbrick

Heather Piper

Alireza Zaker , Mania Nosratinia

siyabonga zondo

US-China Education Review A & B

Executive Editor

Iain MacLaren

Crispin Dale

Libreriauniversitaria.it Edizioni

Lesley Bartlett , Matthew A.M. Thomas

Fernando Magalhães

Teaching and Learning in Higher Education: Perspectives from UCL

Martin Oliver

Ahmad Faruqi

Zane L Berge

Moira McLoughlin

Lambri Trisokka

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Help | Advanced Search

Computer Science > Computation and Language

Title: bridging the gap in online hate speech detection: a comparative analysis of bert and traditional models for homophobic content identification on x/twitter.

Abstract: Our study addresses a significant gap in online hate speech detection research by focusing on homophobia, an area often neglected in sentiment analysis research. Utilising advanced sentiment analysis models, particularly BERT, and traditional machine learning methods, we developed a nuanced approach to identify homophobic content on X/Twitter. This research is pivotal due to the persistent underrepresentation of homophobia in detection models. Our findings reveal that while BERT outperforms traditional methods, the choice of validation technique can impact model performance. This underscores the importance of contextual understanding in detecting nuanced hate speech. By releasing the largest open-source labelled English dataset for homophobia detection known to us, an analysis of various models' performance and our strongest BERT-based model, we aim to enhance online safety and inclusivity. Future work will extend to broader LGBTQIA+ hate speech detection, addressing the challenges of sourcing diverse datasets. Through this endeavour, we contribute to the larger effort against online hate, advocating for a more inclusive digital landscape. Our study not only offers insights into the effective detection of homophobic content by improving on previous research results, but it also lays groundwork for future advancements in hate speech analysis.

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Crypto-asset regulatory landscape: a comparative analysis of the crypto-asset regulation in the UK and Germany

  • Original Article
  • Open access
  • Published: 21 May 2024

Cite this article

You have full access to this open access article

comparative analysis of research papers

  • Christoph Wronka   ORCID: orcid.org/0000-0001-5074-433X 1  

79 Accesses

Explore all metrics

The purpose of this research paper is to compare and analyse how crypto-assets are regulated in the UK and Germany. The aim is to understand and highlight the approaches taken by these two countries in terms of regulating crypto-assets and to explore the potential impact that their regulatory frameworks could have on the market for these crypto-assets. The research employs a doctrinal research design to examine the crypto-asset regulatory regimes in the UK and Germany. A comprehensive review of existing literature, official regulatory documents and relevant legal frameworks is conducted to understand the core components of each country's crypto-asset regulations. The findings of this study reveal divergences in the regulatory approaches of the UK and Germany towards crypto-assets. While the UK has embraced a principles-based regulatory framework, fostering innovation and industry growth, Germany has adopted a more prescriptive and cautious approach, focusing on investor protection and market stability. The research identifies that the UK's flexible approach has attracted a flourishing crypto-asset ecosystem, while Germany's conservative stance has offered greater investor confidence. However, certain regulatory gaps and challenges persist in both jurisdictions, such as ambiguities in classification and tax treatment, requiring further attention.

Avoid common mistakes on your manuscript.


In today's financial environment, “crypto-asset” denotes a digital or virtual asset that leverages encryption technology to protect and authenticate transactions and manage ownership changes. Blockchain or distributed ledger technologies serve as the underlying framework for the decentralization and immutability of crypto-assets. Crypto-assets stand out due to their cryptographic characteristics, which offer exceptional security and confidence without the requirement for conventional intermediaries like banks or financial organizations (Alaassar et al. 2023 ). These crypto methods facilitate safe Peer-to-Peer (P2P) exchanges and protect the authenticity of the underpinning blockchain or decentralized ledger.

From simple stores of value to sophisticated smart contract platforms, crypto-assets offer a multifaceted array of features. While Bitcoin (BTC) and Ethereum (ETH) are prominent cryptocurrencies, others serve diverse roles, including acting as utility tokens or security tokens. Decentralized applications enable utility tokens to unlock access to services or goods for users. In comparison, security tokens operate similarly to conventional financial securities by representing ownership stakes in tangible assets. Therefore, these tokens offer investors a novel means of investing (Allen et al. 2022 ; Ariesmansyah 2022 ). Crypto-assets have ushered in a new era for finance, with both exciting prospects and daunting challenges. By leveraging technology, they have simplified global commerce, broadened financial inclusiveness and created new investment avenues. Certain crypto-assets' programmability paved the way for innovative automated agreements. For example, smart contracts can execute themselves with minimal human oversight.

The dynamic nature of the crypto space has led to worries regarding compliance and protection. Globally, governments and regulators are facing the task of overseeing these resources carefully. Investor protection and halting unlawful activities like fraud, money laundering (ML) and terrorist financing (TF) is a pressing challenge (Baiod et al. 2021 ). Moreover, the relatively nascent and highly speculative nature of some crypto-assets has resulted in significant price volatility, sparking debates over their long-term viability and suitability as stable stores of value or mediums of exchange.

The paper focuses on the regulatory frameworks of Germany and the UK. The two countries have recently adopted and/or implemented regulatory approaches for regulating cryptocurrencies. However, while the regulatory frameworks for the two countries share many similarities, they are heterogeneous. This is consistent with a recent study by the Cambridge Centre for Alternative Finance (Cambridge CAF) which showed that involved reviewing various regulations of cryptocurrency in different countries found heterogeneity as well as lack of clarity on regulatory frameworks related crypto-assets in different jurisdictions (Cambridge CAF 2019 ). Germany has adopted an absolutely proactive approach in regulating cryptocurrencies and passed law in 2020 that mandates all cryptocurrency exchanges taking place in Germany acquire a license from the Federal Financial Supervisory Authority (BaFin) (Armata 2023 ). This means that Germany’s regulatory framework is technology neutral. On the other hand, the UK government has no plans to design a special set of regulations for crypto-assets. Rather, it stated that it would regulate some crypto-assets by classifying them as “specified investments”, which are already regulated assets according to current regulations (Ross and Cavill 2023 ). This means that the UK regulatory framework is principle-based. In its consultation, the UK stated that it would be guided by “same risk, same regulatory outcome” principle in the course of establishing the regulatory framework for crypto-assets (HM Treasury 2023 ). Whereas the principle-based approach is applauded for fostering innovation and technology, a technology-neutral approach may pose barriers for entrepreneurs.

After passing the bill to regulate and recognize crypto as regulated financial activity in June 2023, the market of crypto-assets has grown significantly. Based on raw cryptocurrency transaction volume, the UK has become the world's third-largest economy after the USA and India (Vardai 2023 ). Before the regulatory changes in 2022, the revenue obtained from cryptocurrencies in the UK averaged $0.89 billion. After regulatory changes, the revenue from crypto-assets market in the UK grew to $1.94 billion. It is projected that the revenue in the UK’s crypto currencies market would reach $2.53 in by the close of 2024. This represents a 30.1% change in revenue (Statista 2023 ). A report by Chainalysis shows that the UK ranked 14 in 2023 in terms of overall adoption index globally (Chainalysis 2023 ). In 2021, before regulatory changes, the country ranked 21 in terms of adoption index globally (de Best 2023 ). Germany also experienced a surge in the adoption of crypto-assets following the adoption of its regulatory framework in 2020. However, the growth in crypto-assets adoption is lower than that of the UK. According to a bi-annual poll conducted by Ding, a top fintech business, Germany's cryptocurrency adoption was 8% by 2021 . This was far lower than the global average, which was at 14% by late 2021 (Ngari 2023 ).

Research methodology

The researcher adopted a qualitative approach involving doctrinal research (as defined by Mann 2017 ) to compare and explore the frameworks for crypto-assets in both the UK and Germany. This approach allowed a thorough examination of the regulations in each country including their strengths, weaknesses, similarities and differences. To gather data the researcher reviewed legal documents such as statutes, regulations, guidelines, official reports and policy papers related to crypto-asset regulation in the UK and Germany. Collected data were thematically analysed to identify recurring themes, patterns and key regulatory elements. To ensure the accuracy and dependability of the findings a triangulation method was employed, incorporating data sources such as documents and case studies. This analysis aided in understanding the factors that influence crypto-asset regulations in each country. However, as the crypto industry is constantly changing, the regulatory environments in both countries may evolve during the research process. Some recent developments might not be fully captured.

Importance of crypto-asset regulation

Regulations surrounding crypto-assets hold importance in the world especially considering the recent rapid growth and development of these assets. These regulations serve purposes aiming to tackle risks and obstacles associated with crypto-assets. At the time, they also strive to create a transparent market environment, for all participants involved. Firstly, crypto-asset regulation is instrumental in mitigating systemic risks and safeguarding financial stability. This is consistent with van der Linden and Shiraz ( 2023 ) who identify four goals that Regulation on Markets in Crypto-Assets (MiCA Regulation) alongside other legislative frameworks aims to accomplish. According to the authors, the first goal is to offer legal certainty. In this respect, to grow crypto-asset markets, a solid legal framework is required that vividly outlines the rules that apply to all crypto-assets that are not covered by present financial legislations. The second goal entails establishing a legal framework that is not only safe but also proportionate, one that encourages innovation and fair competition (van der Linden and Shiraz 2023 ). The third objective of having a legal framework is to put in place sufficient levels of consumer and investor protection with the aim of getting rid of the risks that crypto-assets may pose to the internal market. The fourth objective is to achieve market stability (van der Linden and Shiraz 2023 ). In regard to this, the European Commission stated that stablecoins have the potential to be widely accepted and lead to systemic risks (European Commission 2020 ).

The decentralized nature of crypto-assets and their borderless nature can amplify the potential impact of market fluctuations, leading to heightened volatility and potential contagion effects in the broader financial system (Baker et al. 2023 ). Appropriate regulation can help monitor and manage these risks, establishing mechanisms to prevent market manipulations, fraud and the abuse of crypto-assets for illicit activities such as ML and TF.

Furthermore, it is crucial to establish frameworks that aim to safeguard investors from harm. Given the limited investor protections and oversight in the market, compared to traditional financial markets, individuals face significant risks such as cyber theft, scams and fraudulent schemes. To address these concerns, it becomes imperative for authorities to implement regulations that promote transparency during offerings strengthen disclosure requirements and enforce robust cybersecurity measures to protect user’s assets and personal information. In its recently finalized global regulatory framework for crypto-asset activities, the Financial Stability Board (FSB) recommended high level regulation, oversight and supervision of crypto-asset activities and markets. The framework is founded on the principle of “same activity, same risk, same regulation” and provides a solid foundation for ensuring that stablecoin and crypto-asset activities are subject to continuous and thorough regulation that is proportionate to the risks they pose while also encouraging responsible innovations that may be brought about by technological advancement (FSB 2023 ). Existing literature notes that the nature of cryptocurrencies makes difficult to distinguish financial and technological risks (Dumas et al. 2021 ). Because the exchange rate between a cryptocurrency and a fiat currency is governed by supply and demand, it is very volatile and unpredictable (Woebbeking 2021 ). This makes “investing” in cryptocurrencies a risky business, fuelling calls for regulations to safeguard people from deceptive ads and scams.

Besides, regulatory oversight fosters investor confidence and market trust. As crypto-assets continue to gain mainstream attention, attracting institutional investors and retail participants, establishing clear rules and guidelines can reduce uncertainty and foster confidence in the market. A well-regulated crypto-asset ecosystem can attract greater institutional interest, leading to increased liquidity and more mature and stable markets (Bellavitis et al. 2021 ).

Another critical aspect of crypto-asset regulation is the prevention of illicit financial activities. Cryptocurrencies' pseudonymous nature can facilitate illicit transactions, raising concerns for law enforcement and financial intelligence units. Regulators can effectively fight against the misuse of crypto-assets by implementing measures that prevent ML and TF, which is crucial for maintaining the integrity of the system.

In addition, regulatory frameworks play a role in promoting innovation and responsible growth within the crypto-asset industry. Clear guidelines can provide entrepreneurs, start-ups and established companies with a conducive environment for creating innovative applications and services. A balanced approach to regulation can foster a competitive marketplace while safeguarding against excessive risk-taking and speculative behaviour.

Nevertheless, it is essential to strike a delicate balance in crafting crypto-asset regulations. Overly burdensome or restrictive measures could stifle innovation and deter legitimate businesses from participating in the sector (Bellucci et al. 2022 ). Striking the right balance between regulation and innovation is crucial for nurturing the potential benefits that blockchain technology and crypto-assets can bring to various industries.

In today's ever-evolving global financial landscape, crypto-assets have gained remarkable popularity and significance. Understanding how various jurisdictions handle their regulation has become crucial. This analysis delves into the regulatory approaches of the UK and Germany, aiming to unveil similarities, differences, strengths and areas for potential improvement in their frameworks. Ultimately, the study provides valuable insights into the effectiveness and adaptability of crypto-asset regulation in these two prominent European markets.

Regulatory frameworks

Overview of the uk's approach to crypto-asset regulation.

Amidst the dynamic world of crypto-assets, the UK tried to distinguish itself as an innovator in tackling their complexities by creating a robust regulatory structure. The UK's regulatory framework for crypto-assets necessitates cooperation among government entities, financial watchdogs and industry participants (Bellucci et al. 2022 ). Each crypto-asset's distinct traits and intended function determine how they are regulated in the UK. The Financial Conduct Authority (FCA) has taken a prominent position in defining the categorization of crypto-assets and the level of regulatory supervision needed. The UK strongly focuses on tackling illegal activities within the crypto-asset sphere. Parties involved in crypto-asset-related pursuits are required to abide by AML and CFT rules. Crypto-asset firms must be registered with the FCA, which includes following rigorous customer due diligence (CDD) processes (Blandin et al. 2019 ). These measures aim to minimize the risk of unlawful activities such as ML and TF. The UK's proactive stance in this domain aligns with global efforts to strengthen the integrity of the financial system and prevent the illicit use of crypto-assets.

Protecting investors is a pivotal aspect of the UK's approach to crypto-asset regulation. The FCA consistently works towards enhancing transparency and promoting fair practices within the sector. Crypto companies are obligated to share information, making sure that investors have all the details about the risks involved in their investments (FCA 2023a , 2023b ). Moreover, the FCA has the power to step in swiftly if there’s suspected harm to investors and take actions to enforce compliance.

Acknowledging the importance of fostering innovation in the crypto-asset space, the UK has established a regulatory sandbox (FCA 2022 ). This initiative allows crypto-asset firms to test new products and services in a controlled environment, exempt from certain regulatory requirements. The sandbox provides a safe space for businesses to experiment and refine their offerings while closely engaging with regulators to address potential risks and compliance challenges (FCA 2022 ).

The taxation of crypto-assets in the UK is well defined and thorough. Both individuals and businesses involved in crypto-related activities are required to meet tax obligations. For individuals, this includes Capital Gains Tax (CGT), while companies are subject to Corporation Tax (Bollaert et al. 2021 ). To provide clarity and promote compliance in tax reporting, the HM Revenue & Customs (HMRC) has released precise guidelines. These efforts contribute significantly to establishing crypto-assets as assets and integrating them into financial practices.

Classification of crypto-assets under UK law

The UK's regulatory landscape for crypto-assets has evolved significantly to accommodate the growing popularity of crypto currencies and tokens. To foster a clear understanding and provide sufficient investor protection, the UK has classified crypto-assets into three main categories: exchange tokens, security tokens and utility tokens (Draganidis 2023 ). Each classification holds distinct characteristics and regulatory considerations, enabling stakeholders to navigate the crypto-asset space with greater clarity and confidence.

Cryptocurrencies are classified as exchange tokens. These tokens are primarily designed to be used as a means of conducting transactions and holding value. Known examples of exchange tokens include BTC and ETH.

Security tokens represent the second category of crypto-assets in the UK. Unlike exchange tokens, these tokens are classified as securities, as they derive their value from underlying assets or investment contracts (Ferreira and Sandner 2021 ). Security tokens can symbolize values, like ownership rights, equity shares, debt obligations or even profit sharing in a company or project. Because of this, they are governed by the regulations pertaining securities. The issuance and trading of security tokens must adhere to compliance requirements, such as providing prospectus disclosures and obtaining authorization from the FCA.

The third category of crypto-assets in the UK comprises utility tokens. These tokens serve as access keys or units of account within a specific digital ecosystem or platform. Unlike security tokens, utility tokens do not possess inherent investment characteristics, and their primary purpose is to provide access to services or functionalities within a decentralized network. For instance, some utility tokens enable users to access features, obtain discounts or pay for services on blockchain-based platforms. Since they do not fall under the definition of securities, utility tokens generally have less stringent regulatory requirements in comparison with security tokens (Garanina et al. 2022 ). The UK's classification of crypto-assets aims to strike a balance between promoting innovation protecting investors and maintaining stability. The FCA plays a central role in overseeing and enforcing regulations in the crypto-asset market. However, regulating this fast-paced and ever-evolving space poses challenges for regulators as they need to adapt their frameworks to address emerging risks and technologies.

Registration and licensing requirements

For crypto-asset businesses to operate legally, they are required to register with the FCA. Registration is mandatory for any activities involving the exchange or conversion of cryptocurrencies, encompassing the facilitation of buying, selling and trading these crypto-assets (Gundur et al. 2021 ).

The services offered by a crypto-asset company may dictate additional licensing requirements beyond the initial registration process. Companies involved in safeguarding or managing cryptocurrencies should secure additional approvals from regulatory bodies. This enhanced monitoring is intended to foster investor confidence in the crypto-asset environment, which is vital to its long-term viability. A thorough assessment of the registration and licensing process for crypto-asset businesses involves evaluating their compliance with AML and CTF standards. The FCA meticulously evaluates the efficacy of an organization's rules and processes. This involves identifying and avoiding economic crimes, as well as protocols for client diligence (Huang 2021 ). In addition to AML and CFT compliance, crypto companies need to adhere to other applicable laws, such as data security legislation and buyer safeguard rules. These regulations ensure that personal and financial information of users is handled with utmost security as this provides them with adequate protection against potential risks associated with crypto-asset transactions. Operators who disregard registration or licensing regulations risk facing legal ramifications. The possible consequences range from modest fines to severe sanctions, such as business halting. It is therefore crucial for crypto-asset businesses to stay informed about the ever-changing regulatory landscape and to maintain a robust compliance framework. To operate responsibly and lawfully in the UK market, considering the rapidly evolving nature of the crypto industry, businesses in this sector must continuously monitor updates from the FCA and other regulatory authorities. By doing so, they can ensure ongoing compliance with the evolving regulations. Seeking legal and professional advice can also be instrumental in navigating the complexities of the registration and licensing process, thus allowing businesses to meet the requisite regulatory standards effectively. By adhering to these requirements, crypto-asset businesses contribute to the overall stability and legitimacy of the crypto market, fostering a conducive environment for both businesses and investors in the UK.

AML and CFT regulations

The foundations of AML and CFT laws can be traced back to the 1980s, with groups like FATF being set up to battle ML and related financial offenses (Tiwari et al. 2020 ). In the UK, the Proceeds of Crime Act 2002 (PoCA) and the Money Laundering Regulations (MLR) functions as the primary legislation controlling AML responsibilities for financial services providers in the UK (Preller 2008 ). The original designs were meant for conventional FIAT transactions, but the transition to the digital era necessitated their modification for application in the crypto-asset market. In response to the rapid expansion of crypto-assets and the concomitant transformation of the global financial landscape in recent times, the UK adjusted its AML and CFT framework to encompass crypto-related operations. These regulations mandate that crypto-asset companies must conduct CDD procedures. This involves verifying the identities of their users and assessing the risk of potential illicit activities (Kim 2023 ).

Despite the UK's concerted efforts to incorporate AML and CFT regulations into the crypto-asset sector, several challenges persist. Notably, the pseudonymous nature of many cryptocurrencies presents difficulties in ascertaining the true identity of users engaged in transactions (Kostoula 2023 ). Furthermore, the global and decentralized nature of crypto-asset exchanges can result in discrepancies in regulatory enforcement across jurisdictions. Overcoming such challenges may require enhanced international cooperation to achieve more comprehensive results in combating financial crime.

The effectiveness of AML and CFT regulations in the crypto-asset industry is constantly under scrutiny. The related regulatory measures have contributed to improving transparency and reporting standards. Since crypto-asset technologies are always evolving, regulatory frameworks need to adapt. It is important to strike a balance between encouraging innovation and stopping illegal activities to ensure the long-term existence of the crypto-asset market in the UK (Kutera 2022 ). Additionally, exploring technologies, like advanced analytics, has the potential to make AML compliance more efficient and to streamline investigations. While the UK's approach on crypto-asset regulation emphasizes investor protection and market stability, there are varying perspectives on its effectiveness and implications.

Overview of Germany's approach to crypto-asset regulation

Germany has adopted a tech-agnostic approach. This means recognizing that the regulatory treatment of crypto-assets should be tailored to specific situations instead of employing a uniform approach (Ferreira and Sandner 2021 ). Distinct categories of crypto-assets are recognized, with corresponding regulatory frameworks adapted to suit each one. Payment tokens are commonly recognized as a form of “currency”. Others, like security tokens, are under tighter legal restrictions, e.g. if security tokens are sold to private investors, companies need a securities prospectus, which must be approved by BaFin.

However, Germany has demonstrated a cautious yet supportive approach towards decentralized finance (DeFi). While acknowledging the transformative potential of DeFi, regulators have emphasized the need to ensure compliance with existing laws and to address concerns related to investor protection and financial stability.

In the future, Germany's approach to regulating crypto-assets is expected to remain responsive to the evolving technology landscape and global advancements. The country aims to find an equilibrium between embracing the benefits of blockchain and crypto-assets while also ensuring financial stability and safeguarding the interests of investors. As the market for crypto-assets continues to develop Germany is likely to engage in discussions with industry experts and partners to establish a strong and efficient regulatory framework for this emerging sector (Ferreira and Sandner 2021 ).

Classification of crypto-assets under German law

To address the regulatory challenges presented by crypto-assets, German authorities have taken steps to establish a comprehensive legal framework. In Germany, most crypto-assets are regulated as financial instruments. The dynamic and evolving nature of these crypto-assets has led authorities and legal scholars to grapple with the appropriate categorization and regulatory framework. In essence, crypto-assets in Germany can be grouped into three main categories:

Payment tokens (cryptocurrencies), like BTC and ETH, are seen as a type of currency. They are commonly used for transactions and investments being acknowledged as a form of payment. Those tokens do not constitute securities according to the meaning of the Securities Prospectus Act (Wertpapierprospektgesetz—WpPG) or investments according to the meaning of the Investment Act (Vermögensanlagegesetz—VermAnlG) in principle, but they are financial instruments under the Banking Act (Kreditwesengesetz—KWG) (BaFin 2023 ). Additionally, they are not considered legal tender and no value-added tax (VAT) is imposed when used for payments (IHK Munich 2023 ).

Security tokens might be categorized as financial instruments according to the German Securities Trading Act (Gesetz über den Wertpapierhandel—WpHG) (BaFin 2023 ). These tokens typically represent ownership rights or other financial interests in an underlying asset, such as company shares or debt securities. Consequently, they are subject to specific regulations, including prospectus requirements, custody regulations and measures to protect investors (Quamara and Singh 2022 ). In addition, there are some special forms such as Kryptowertpapiere (crypto securities) according to the German Electronic Securities Act (Gesetz zur Einführung elektronischer Wertpapiere—eWpG) and Kryptofundanteile (crypto fund units) according to the German Regulation on Crypto Fund Units (Verordnung über Kryptofondsanteile—KryptoFAV) existing (BaFin 2022 ).

Other crypto-assets are categorized as electronic money (E-Money) under the German Payment Services Supervision Act (Zahlungsdiensteaufsichtsgesetz—ZAG). This classification applies when a token represents a claim against the issuer that can be used for payment transactions and is issued against the receipt of funds. To safeguard users' interests and ensure financial stability, E-Money is subject to certain prudential requirements, including capital and liquidity rules.

Classifying crypto-assets under German law is not always straightforward, as the legal status of individual tokens depends on their specific features and functionalities. Sometimes, hybrid forms of crypto-assets blur the lines between these categories, leading to further complexities in their treatment.

Additionally, the German government has been actively participating in discussions, at the European Union level to establish regulations for crypto-assets across all member states. The goal is to foster investor protection, ensure market integrity and maintain financial stability while promoting innovation and technological advancements in the crypto-asset space.

In Germany, companies providing crypto-asset services must acquire licenses and follow operational regulations, which are designed to protect investors, maintain financial stability and comply with measures against ML and TF. These regulations demonstrate the country’s dedication to creating a transparent environment for the growing crypto-asset industry while minimizing risks related to crypto-assets.

BaFin oversees the licensing process for crypto-asset service providers in Germany, acting as the competent authority for financial regulation. Under the updated regulatory framework, providers offering services related to crypto-assets, such as cryptocurrency exchange platforms, wallet services and custody solutions, must seek authorization from BaFin before commencing operations. This authorization process ensures that only legitimate and trustworthy providers are permitted to engage in crypto-asset-related activities, safeguarding investors from fraudulent practices (Renduchintala et al. 2022 ).

To acquire the required license, providers must have their company's headquarters located within the EU. Furthermore, sufficient capital must also be available, and liquidity must be proven. Besides, a detailed business plan must be submitted, as well as the organizational structure and a detailed description of the planned internal control procedures (BaFin 2020).

In addition, companies need to show their capability to meet AML and CFT obligations. This involves setting up procedures to verify investor identities through know your customer (KYC) protocols, as well as implementing advanced systems for monitoring transactions to detect and report any suspicious activities (BaFin 2020 ). By following these regulations, providers of crypto-asset services play a significant role in preventing illegal financial activities and ML thus strengthening the integrity of the overall financial system.

Furthermore, crypto-asset service providers are not only required to obtain licenses but are also expected to uphold stringent IT-/cybersecurity measures. These measures aim to safeguard both their platforms and the assets of their investors against cyber threats. Maintaining the confidentiality, integrity and availability of information is of high importance in an industry where crypto-assets can be vulnerable to cyberattacks (BaFin 2020 ).

In addition to security measures, crypto-asset service providers must adopt adequate risk management practices. This involves embracing transparent and sound business models that minimize operational risks and regularly conducting risk assessments to identify potential vulnerabilities (BaFin 2020 ). Proactive risk management helps service providers to mitigate the impact of unforeseen events and promotes market stability.

Germany has accentuated safeguarding the interests of investors making it mandatory for crypto-asset service providers to offer precise information regarding the risks involved in crypto investments. This requirement allows investors to make informed choices and protects them from financial risks caused by deceitful practices. To foster accountability and transparency, crypto-asset service providers are required to maintain comprehensive records of transactions and financial activities. This meticulous record-keeping is essential for regulatory oversight and audit purposes, enabling authorities to monitor compliance with established regulations.

The German Criminal Code (Strafgesetzbuch—StGB), the Code of Criminal Procedures (Strafprozessordnung—StPO), the Money Laundering Act (Geldwäschegesetz—GWG) and the supplementary legislation are the basis of the AML legislation. Being an EU member state, German law is influenced by EU standards and regulations for AML and CFT (Meyer et al. 2022 ). In 2020, the implementation of the 5th AMLD extended the reach of AML and CFT regulations to include providers of crypto-asset services (Racetin et al. 2022 ). This introduced KYC procedures and mandatory reporting obligations to the industry (Renduchintala et al. 2022 ).

There is an emphasis on enforcing regulations to ensure that the crypto-asset industry adheres to strict AML and CFT practices. These regulations aim to prevent activities such as ML, TF and other illegal actions that may be associated with crypto-assets.

CDD protocols are an indispensable element of AML guidelines for German crypto-asset firms. To adhere to the new requirements, virtual asset service providers (VASP) must obtain and examine identification documents from their clients, and the sources of their funds. Furthermore, these firms need to monitor and file any suspicious transactions with the German Financial Investigation Unit (Zentralstelle für Finanztransaktionsuntersuchungen—FIU). This obligation holds great significance in detecting and preventing unlawful transactions involving crypto currencies. These enterprises must adopt strict documentation methods to fulfil legal obligations. All relevant data points are duly documented, enabling effective reviews and sample testing as required. However, regulatory requirements for crypto-based ventures in Germany are subject to continuous change. Therefore, these entities must stay up-to-date with the latest AML and CFT requirements and any changes in the regulatory framework that may impact their operations. In particular, EU regulations such as the Markets in Crypto-Assets Regulation (MiCAR) and the amendment of the Transfer of Funds Regulation (ToFR) will have a significant impact on the regulatory frameworks within the EU member states thus also on Germany.

The domain of crypto-assets witnesses remarkable distinctions in regulatory scope and definition between the UK and Germany. In the UK, crypto-assets fall under the purview of the FCA and are categorized as specified investments. This classification encompasses various crypto-asset types, including security tokens, utility tokens and cryptocurrencies (payment tokens) like BTC and ETH. Such a systematic approach provides much-needed clarity and enables effective oversight (Özelli 2021 ).

Conversely, Germany adopts a technology-neutral standpoint, defining crypto-assets mostly as financial instruments governed by the German Banking Act. While this broad coverage facilitates comprehensive regulation of diverse crypto-assets, it also introduces challenges in differentiating between crypto-assets and traditional financial instruments, potentially giving rise to ambiguity.

Licensing and registration requirements

Divergent paths are evident in the licensing and registration requirements of crypto-asset businesses in the UK and Germany. In the UK, entities like exchanges and custodian wallet providers must only register with the FCA to adhere to AML regulations. Additionally, the registration process is relatively straightforward, paving the way for swifter market entry (Tello-Gamarra et al. 2022 ).

In contrast, Germany necessitates crypto-asset businesses to obtain a license from the BaFin to operate legally. The licensing process entails rigorous adherence to various regulatory requirements, prioritizing the fight against ML and investor protection. This more demanding approach may discourage smaller businesses from entering the market due to the associated costs and complexities.

Effectively combating ML and TF represents a paramount concern within the crypto-asset industry. In the UK, AML and CFT measures are woven into the registration process under the MLR. The FCA meticulously monitors crypto-asset businesses to ensure their adherence to these vital regulations, fostering a safe environment for all stakeholders.

Germany addresses AML and CFT concerns through its strict licensing process for crypto-asset businesses. BaFin meticulously assesses and enforces robust AML and CFT protocols among licensed entities. However, the comprehensive requirements may pose challenges for small and innovative crypto-asset start-ups to comply with such regulations (També Bearpark 2022 ).

Diverse approaches to the taxation of crypto-assets are evident in both countries. In the UK, HM Revenue and Customs (HMRC) provide clear guidelines on crypto-asset taxation, considering them as taxable assets subject to capital gains tax. This clarity simplifies the reporting process for taxpayers and therefore promotes compliance. In contrast, Germany treats the sale of cryptocurrencies private sales transactions for tax purposes if the earnings exceed Euro 600 per year or if they have been sold prior 1 year of holding (see Sect. 23 of the German Income Tax Act (Einkommensteuergetz—EstG), for example. It must be noted that a lot is happening in the field of crypto-assets taxation in Germany at the moment. Nevertheless, discrepancies and ambiguities in the German tax code remain, posing challenges in accurate reporting and potentially fostering tax evasion (Zainutdinova 2023 ).

Investor protection

The safeguarding of investors is of paramount importance in any regulatory framework concerning crypto-assets. In the UK, the FCA diligently enforces rules to protect investors from fraudulent activities, market manipulation and misleading information. Additionally, the FCA's regulatory sandbox fosters responsible innovation by allowing businesses to experiment with the latest crypto-asset solutions within a controlled environment.

Germany, on the other hand, emphasizes investor protection through BaFin's stringent regulations. Striving to minimize risks for private investors, Germany has implemented strict rules on advertising and disclosure requirements to prevent scams and fraudulent schemes. Nevertheless, the highly administrative nature of the German regulatory system may impede swift responses to emerging challenges.

In conclusion, the UK and Germany have laid the groundwork for comprehensive regulatory frameworks concerning crypto-assets, each boasting its own set of merits and challenges. The UK's categorization system and user-friendly registration process foster innovation and market participation. Conversely, Germany's technology-neutral approach and rigorous licensing requirements prioritize investor protection but may pose barriers for smaller enterprises. As the crypto-asset industry continues its evolution, both countries must remain vigilant in adapting their regulations to address emerging challenges while nurturing innovation. Collaborative efforts and harmonization of regulations on an international level could play a pivotal role in establishing a global framework that balances innovation, investor protection and financial stability. Ultimately, an optimal regulatory landscape should encourage responsible growth and instil confidence in the crypto-asset industry.

Alaassar, A., A.L. Mention, and T.H. Aas. 2023. Facilitating innovation in FinTech: a review and research agenda. Review of Managerial Science 17 (1): 33–66.

Article   Google Scholar  

Allen, J.G., Wells, H. and Mauer, M., 2022. Crypto-assets in Private Law: Emerging Trends and Open Questions from the First 10 Years.  SMU Centre for AI & Data Governance Research Paper , (06).

Ariesmansyah, A., 2022. Creativity to innovation: What lesson learned from digital transformation in financial accountability in government practices?  Budapest International Research and Critics Institute-Journal (BIRCI-Journal) ,  4 (4): 14061–14072.

Armata, R., 2023. Cryptocurrency in Germany: Is it regulated and safe? Idnow. Available at: https://www.idnow.io/blog/cryptocurrency-germany-regulations/#:~:text=On%20a%20national%20level%2C%20although,related%20to%20securities%20and%20investments (Accessed: 11 January 2024).

Baiod, W., J. Light, and A. Mahanti. 2021. Blockchain technology and its applications across multiple domains: A survey. Journal of International Technology and Information Management 29 (4): 78–119.

Baker, H.K., Benedetti, H., Nikbakht, E. and Smith, S.S. eds., 2023.  The Emerald Handbook on Crypto-assets: Investment Opportunities and Challenges . Emerald Publishing Limited.

Bellavitis, C., C. Fisch, and J. Wiklund. 2021. A comprehensive review of the global development of initial coin offerings (ICOs) and their regulation. Journal of Business Venturing Insights 15: e00213.

Bellucci, M., D. Cesa Bianchi, and G. Manetti. 2022. Blockchain in accounting practice and research: systematic literature review. Meditari Accountancy Research 30 (7): 121–146.

Blandin, A., Cloots, A.S., Hussain, H., Rauchs, M., Saleuddin, R., Allen, J.G., Zhang, B.Z. and Cloud, K., 2019. Global crypto-asset regulatory landscape study.  University of Cambridge Faculty of Law Research Paper , (23).

Bollaert, H., F. Lopez-de-Silanes, and A. Schwienbacher. 2021. Fintech and access to finance. Journal of Corporate Finance 68: 101941.

Bundesanstalt für Finanzdienstleistungsaufsicht (BaFin) 2023. Zweites Hinweisschreiben zu Prospekt- und Erlaubnispflichten im Zusammenhang mit der Ausgabe sogenannter Krypto-Token , available at: https://www.bafin.de/SharedDocs/Downloads/DE/Merkblatt/WA/dl_wa_merkblatt_ICOs.pdf?__blob=publicationFile&v=1

Bundesanstalt für Finanzdiestleistungsaufsicht (BaFin), 2022. Kryptotoken. Available at: https://www.bafin.de/DE/Aufsicht/FinTech/Geschaeftsmodelle/DLT_Blockchain_Krypto/Kryptotoken/Kryptotoken_node.html

Bundesanstalt für Finzanzdienstleistungsaufsicht (BaFin), 2020. Hinweise zum Erlaubnisantrag für das Kryptoverwahrgeschäft. Available at: https://www.bafin.de/SharedDocs/Veroeffentlichungen/DE/Merkblatt/BA/mb_Hinweise_zum_Erlaubnisantrag_fuer_das_Kryptoverwahrgeschaeft.html;jsessionid=A4C6EF3A91661D7A10A671AA3A8ECE6E.2_cid502?nn=13733456

Cambridge Center for Alternative Finance, 2019. Global crypto-asset regulatory landscape study. University of Cambridge.

Chainalysis, 2023. The 2023 global crypto adoption index: Central & southern Asia are leading the way in grassroots crypto adoption. Chainalysis. Available at: https://www.chainalysis.com/blog/2023-global-crypto-adoption-index/ (Accessed: 10 January 2024).

de Best, R., 2023. Crypto Adoption Index ranking of the United Kingdom (UK) from 2020 to 2023, by metric. Statista. Available at: https://www.statista.com/statistics/1362086/cryptocurrency-adoption-index-uk/ (Accessed: 11 January 2024).

Ding, 2021. Global survey reveals that cryptocurrency adoption penetration is at 14%. Available at: https://www.banklesstimes.com/cryptocurrency/crypto-adoption-stats-in-germany-data-shows-surging-interest-especially-among-institutional-investors/ (Accessed: 11 January 2024).

Draganidis, S. 2023. Jurisdictional arbitrage: combatting an inevitable by-product of crypto-asset regulation. Journal of Financial Regulation and Compliance 31 (2): 170–185.

Dumas, J.G., Jimenez-Garcès, S. and Șoiman, F., 2021, March. Blockchain technology and crypto-assets market analysis: vulnerabilities and risk assessment. In 12th International Conference on Complexity, Informatics and Cybernetics (Vol. 1, pp. 30-37).

European Commission., 2020. Proposal for a Regulation of the European Parliament and the Council on Markets in Cryptoassets, and amending Directive (EU) 2019/1937. Available at: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A52020PC0593 (Accessed: 11 January 2024).

Ferreira, A., and P. Sandner. 2021. Eu search for regulatory answers to crypto-assets and their place in the financial markets’ infrastructure. Computer Law & Security Review 43: 105632.

Financial Conduct Authority (FCA), 2022. Regulatory Sandbox . Available at: https://www.fca.org.uk/firms/innovation/regulatory-sandbox

Financial Conduct Authority (FCA), 2023a. FCA introduces tough new rules for marketing cryptoassets. Available at: https://www.fca.org.uk/news/press-releases/fca-introduces-tough-new-rules-marketing-cryptoassets

Financial Conduct Authority (FCA), 2023b. Regulation of Digital Assets in the UK. Available at: https://www.fca.org.uk/news/speeches/regulation-digital-assets-uk

Financial Stability Board (FSB), 2023. FSB global regulatory framework for crypto-asset activities: Umbrella public note to accompany final framework. Financial Stability Board. Available at: https://www.fsb.org/wp-content/uploads/P170723-1.pdf (Accessed 14 January 2024).

Garanina, T., M. Ranta, and J. Dumay. 2022. Blockchain in accounting research: current trends and emerging topics. Accounting, Auditing & Accountability Journal 35 (7): 1507–1533.

Gundur, R.V., Levi, M., Topalli, V., Ouellet, M., Stolyarova, M., Chang, L.Y.C. and Mejía, D.D., 2021. Evaluating criminal transactional methods in cyberspace as understood in an international context.

Huang, S.S. 2021. Crypto-assets regulation in the UK: an assessment of the regulatory effectiveness and consistency. Journal of Financial Regulation and Compliance 29 (3): 336–351.

Industrie und Handelskammer Muenchen und Oberbayern (IHK), 2023. Blockchain - Besteuerung von Kryptowährungen [online]. Available at: https://www.ihk-muenchen.de/de/Service/Recht-und-Steuern/Blockchain-Kryptow%C3%A4hrung/#:~:text=UStG%20umsatzsteuerfrei%20ist.-,Entgelt%20und%20Umsatzsteuer,somit%20f%C3%BCr%20Umsatzsteuerzwecke%20nicht%20steuerbar .

Kim, S.J. ed., 2023.  Fintech, Pandemic, and the Financial System: Challenges and Opportunities . Emerald Publishing Limited.

Kostoula, T., 2023. Valuation of crypto-assets in EU insolvency: Challenges and prospects.  International Insolvency Review .

Kutera, M. 2022. Cryptocurrencies as a subject of financial fraud. Journal of Entrepreneurship, Management and Innovation 18 (4): 45–77.

Mann, T. 2017. Australian Law Dictionary 3 Oxford University Press Oxford

Meyer, E., Welpe, I.M. and Sandner, P.G. 2022. Decentralized finance—A systematic literature review and research directions. ECIS.

Ngari, S. (2023). Cryptocurrency adoption statistics in the UK. Available at https://www.banklesstimes.com/uk/buy-cryptocurrency/crypto-adoption . Accessed on 10 Jan 2024.

Özelli, T. 2021. The financial and conceptual foundations of intangible asset manager capitalism. Journal of Ekonomi 3 (1): 29–100.

Google Scholar  

Preller, S.F. 2008. Comparing AML legislation of the UK, Switzerland and Germany. Journal of Money Laundering Control 11 (3): 234–250.

Quamara, S., and A.K. Singh. 2022. A systematic survey on security concerns in cryptocurrencies: State-of-the-art and perspectives. Computers & Security 113: 102548.

Racetin, I., J. KilićPamuković, M. Zrinjski, and M. Peko. 2022. Blockchain-based land management for sustainable development. Sustainability 14 (17): 10649.

Renduchintala, T., H. Alfauri, Z. Yang, R.D. Pietro, and R. Jain. 2022. A survey of blockchain applications in the fintech sector. Journal of Open Innovation: Technology, Market, and Complexity 8 (4): 185.

Ross H. and Cavill, J., 2023. UK to legislate for crypto-asset regulatory regime. Available at: https://www.pinsentmasons.com/out-law/news/uk-legislate-cryptoasset-regulatory-regime (Accessed: 9 January 2024).

Statista, 2023. Cryptocurrencies—United Kingdom. Available at: https://www.statista.com/outlook/dmo/fintech/digital-assets/cryptocurrencies/united-kingdom (Accessed: 12 January 2024).

També Bearpark, N., 2022. Introduction to Anti-Money Laundering.  Deconstructing Money Laundering Risk: De-risking, the Risk-based Approach and Risk Communication , pp.1–43.

Tello-Gamarra, J., D. Campos-Teixeira, A.A. Longaray, J. Reis, and M. Hernani-Merino. 2022. Fintechs and Institutions: A Systematic Literature Review and Future Research Agenda. Journal of Theoretical and Applied Electronic Commerce Research 17 (2): 722–750.

Tiwari, M., A. Gepp, and K. Kumar. 2020. A review of money laundering literature: the state of research in key areas. Pacific Accounting Review 32 (2): 271–303.

Treasury, H. M. 2023. Future Financial Services Regulatory Regime for Cryptoassets: Consultation and Call for Evidence. Available at: https://assets.publishing.service.gov.uk/media/653bd1a180884d0013f71cca/Future_financial_services_regulatory_regime_for_cryptoassets_RESPONSE.pdf (Accessed: 12 January 2024).

van der Linden, T., and T. Shirazi. 2023. Markets in crypto-assets regulation: Does it provide legal certainty and increase adoption of crypto-assets?. Financial Innovation 9 (1): 22.

Vardai, Z., 2023. UK emerges as world’s third-largest economy in terms of crypto transaction volume: Chainalysis. Forkast. Available at: https://forkast.news/uk-emerges-worlds-third-largest-economy-crypto-transaction-volume-chainalysis/#:~:text=The%20U.K.%20is%20also%20listed,Chainalysis'%20Global%20Crypto%20Adoption%20Index.&text=The%20United%20Kingdom%20is%20the,on%2Dchain%20intelligence%20firm%20Chainalysis . (Accessed: 12 January 2024).

Woebbeking, F. 2021. Cryptocurrency volatility markets. Digital Finance 3 (3–4): 273–298.

Zainutdinova, E. 2023. Models of legal regulation of digital rights and digital currency turnover. Legal Issues in the Digital Age 4 (1): 93–122.

Download references

Author information

Authors and affiliations.

Deloitte Wirtschaftspruefungsgesellschaft GmbH, Dammtorstrasse 12, 20354, Hamburg, Germany

Christoph Wronka

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Christoph Wronka .

Ethics declarations

Conflict of interest.

No conflict of interest is existing.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Wronka, C. Crypto-asset regulatory landscape: a comparative analysis of the crypto-asset regulation in the UK and Germany. J Asset Manag (2024). https://doi.org/10.1057/s41260-024-00358-z

Download citation

Revised : 17 April 2024

Accepted : 23 April 2024

Published : 21 May 2024

DOI : https://doi.org/10.1057/s41260-024-00358-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Distributed ledger technology
  • Crypto-assets
  • Find a journal
  • Publish with us
  • Track your research

(Stanford users can avoid this Captcha by logging in.)

The Attractiveness of European HE Systems: A Comparative Analysis of Faculty Remuneration and Career Paths. Research & Occasional Paper Series: CSHE.1.2023

Original Paper

  • Mikaël Chelli 1 , MSc, MD   ; 
  • Jules Descamps 2 , MSc, MD   ; 
  • Vincent Lavoué 1 , MD   ; 
  • Christophe Trojani 1 , MD, PhD   ; 
  • Michel Azar 1 , MD   ; 
  • Marcel Deckert 3 , PhD   ; 
  • Jean-Luc Raynier 1 , MSc, MD   ; 
  • Gilles Clowez 1 , MD   ; 
  • Pascal Boileau 1 , MD, PhD   ; 
  Caroline Ruetsch-Chelli, MSc, MD  

Background: Large language models (LLMs) have raised both interest and concern in the academic community. They offer the potential for automating literature search and synthesis for systematic reviews but raise concerns regarding their reliability, as the tendency to generate unsupported (hallucinated) content persist.

Objective: The aim of the study is to assess the performance of LLMs such as ChatGPT and Bard (subsequently rebranded Gemini) to produce references in the context of scientific writing.

Methods: The performance of ChatGPT and Bard in replicating the results of human-conducted systematic reviews was assessed. Using systematic reviews pertaining to shoulder rotator cuff pathology, these LLMs were tested by providing the same inclusion criteria and comparing the results with original systematic review references, serving as gold standards. The study used 3 key performance metrics: recall, precision, and F 1 -score, alongside the hallucination rate. Papers were considered “hallucinated” if any 2 of the following information were wrong: title, first author, or year of publication.

Results: In total, 11 systematic reviews across 4 fields yielded 33 prompts to LLMs (3 LLMs×11 reviews), with 471 references analyzed. Precision rates for GPT-3.5, GPT-4, and Bard were 9.4% (13/139), 13.4% (16/119), and 0% (0/104) respectively ( P <.001). Recall rates were 11.9% (13/109) for GPT-3.5 and 13.7% (15/109) for GPT-4, with Bard failing to retrieve any relevant papers ( P <.001). Hallucination rates stood at 39.6% (55/139) for GPT-3.5, 28.6% (34/119) for GPT-4, and 91.4% (95/104) for Bard ( P <.001). Further analysis of nonhallucinated papers retrieved by GPT models revealed significant differences in identifying various criteria, such as randomized studies, participant criteria, and intervention criteria. The study also noted the geographical and open-access biases in the papers retrieved by the LLMs.

Conclusions: Given their current performance, it is not recommended for LLMs to be deployed as the primary or exclusive tool for conducting systematic reviews. Any references generated by such models warrant thorough validation by researchers. The high occurrence of hallucinations in LLMs highlights the necessity for refining their training and functionality before confidently using them for rigorous academic purposes.


The advent of artificial intelligence (AI) has led to significant advancements in various fields, including medical research. Large language models (LLMs), such as ChatGPT (OpenAI), could assist academic researchers in a variety of tasks, including writing scientific papers. These models have the potential to streamline the way researchers conduct literature searches, synthesize findings, and draft systematic reviews [ 1 ]. However, there is ongoing debate surrounding their reliability, ethical considerations, and appropriate use in academic publishing.

Recently, editorials and opinion papers have been published addressing the use of LLMs in the scientific community. One such example is an editorial in The Lancet Digital Health , which discusses the potential benefits and challenges of implementing AI in medical research [ 2 ]. As the application of LLMs such as ChatGPT in research settings grows, concerns have arisen regarding their accuracy, the potential for generating misleading or false information, and the ethical implications of using AI-generated content without proper disclosure.

While it is known that ChatGPT can help researchers write papers [ 3 - 5 ], controversy exists about whether it should be used at all, whether its use should be disclosed, and whether it should be listed as an author or not [ 6 ]. These debates raise important questions about the role of AI in scientific research and the potential consequences of using LLMs in generating systematic reviews and other research outputs [ 7 ].

In this study, we aim to address these concerns by systematically evaluating the reliability of ChatGPT and Bard (subsequently rebranded Gemini; Google AI) [ 8 ] in the context of searching for and synthesizing peer-reviewed literature for systematic reviews. We will compare their performance to that of traditional methods used by researchers, investigate the extent of the “hallucination” phenomenon, and discuss potential ethical and practical considerations for using ChatGPT and Bard in academic publishing. By providing evidence-based insights into the capabilities and limitations of LLMs in medical research, we hope to contribute to the ongoing debate about the role of AI in the research ecosystem and guide researchers in making informed decisions about using LLMs in their work.

Ethical Considerations

Ethics approval is not required, as human participants were not involved in this research. Consent for publication has been provided from all identifiable persons in the figures.

Study Design

This study follows a sequential design, chosen for its ability to progressively build on each preceding phase, thus ensuring a comprehensive evaluation of the LLMs in the context of a systematic review. The process initiated with a systematic review search on PubMed, followed by the retrieval of selected papers. Subsequently, the methodology of these papers served as inputs to the LLM, which is tasked to search for papers using the same inclusion criteria as the systematic reviews. The final phase involves a comparison of the LLM results with the systematic review references, which act as the ground truth, thus providing a robust evaluation of the LLMs’ ability to replicate the results of human-conducted systematic reviews. The ethical considerations of using AI, specifically LLMs, in research were carefully evaluated.

Systematic Review Search on PubMed

On July 27, 2023, a literature search was performed on PubMed to find literature published in the English language during 2020. The selected year aligns with ChatGPT’s training cut-off point in September 2021, ensuring that the AI model has access to the comprehensive scope of literature for the given year. The focus was directed toward systematic reviews of randomized clinical trials pertaining to shoulder rotator cuff pathology. This prevalent condition spans multiple disciplines inclusive of surgery, anesthesiology, sports medicine, and physical therapy, thereby positioning it as an optimal candidate for this multidisciplinary appraisal. In addition, the collective clinical and scientific experience of the research team on the topic furnished a critical review of the references obtained from the PubMed search and the LLMs [ 9 - 12 ].

An electronic search of PubMed was conducted using a combination of keywords, including “shoulder,” “rotator cuff,” and “randomized” ( Multimedia Appendix 1 ). The search was restricted to papers published in 2020 and filtered to retrieve only systematic reviews and meta-analyses. Titles and abstracts were scrutinized, and papers indicating a systematic review of randomized studies on rotator cuff pathology were selected for further analysis.

Exclusion criteria were applied to eliminate papers that did not meet our study focus. Papers were excluded if they were not systematic reviews, if their primary concern did not pertain to rotator cuff pathology, if written in a language other than English, or if they included nonrandomized clinical studies.

Two independent reviewers (MC and PB) screened titles, abstracts, and full texts retrieved by this query. Differences between reviewers were reconciled with a third reviewer (JD). To ensure the selection of relevant systematic reviews, the reviewers applied exclusion criteria that consisted of systematic reviews including nonrandomized studies and papers that were not systematic reviews. The eligibility of the selected systematic reviews was further validated by assessing their adherence to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [ 13 ]. Additionally, the registration status of these reviews was verified in the PROSPERO database [ 14 ].

For each paper referenced in the systematic reviews, information on the paper title, author list, country (based on the first author’s affiliation with PubMed), journal name, journal date and issue, DOI, and open access status was collected. We assessed the hypothesis that LLMs may favor publicly available papers in their results by using a broad definition of “open access.” This definition included open access through the journal or any full-text PDF available on another server and accessible through a Google search (eg, ResearchGate or university website).

Systematic Review on LLMs: Paper Retrieval

For each new request, a fresh chatbot session was initiated to prevent any carryover effect from previous queries, ensuring the validity of the results. We prompted ChatGPT and Bard with a precise query to identify papers that could be included in the systematic review. The structure of the prompt consisted of a statement about the physician’s and researcher’s current work, followed by the inclusion criteria for the studies in the review ( Figure 1 ). The criteria specified randomized controlled trials with specific participant criteria and interventions comparing 2 different treatments. LLMs were asked to provide references to randomized studies on the topic, excluding papers published after 2020 and systematic reviews or meta-analyses. To assess the impact of the prompt’s specificity on the search results of LLMs, we tested 2 versions of the prompt for each request. One specifying the minimum number of papers to be found and the other without specifying this minimum number, thus providing us with an opportunity to assess if the presence or absence of a target number influences the LLMs’ search results. The query that led to the largest number of results was retained for this study.

comparative analysis of research papers

For each paper provided by LLMs, information on the existence or hallucination status of the paper, authors’ list, country (based on the first author’s affiliation on PubMed), open-access status, inclusion in the original systematic review, randomization status, participant criteria adherence, intervention criteria adherence, exclusion of systematic reviews (as requested in the prompt), and accuracy of the provided information (authors’ list, journal, year and issue, title, and DOI) was collected. We also verified if the paper was published before 2021, as requested in the prompt.

Papers were considered hallucinated if any 2 of the following information were wrong: title, first author, or year of publication. The hallucination rate was calculated to quantify the proportion of LLM-generated references that were irrelevant, incorrect, or unsupported by the available literature, offering insights into the extent of spurious or inaccurate information production by the LLMs.

For noncomparative studies, the intervention criteria were considered adequate if at least 1 of the 2 interventions was studied in the proposed reference. For comparative studies, the intervention criteria were considered adequate if both interventions were studied in the proposed reference.

Comparison of LLMs Results

The sample size was determined based on an anticipated 10% rate of systematic review references overlooked by LLMs, with an assumed power of 90% and an α of .05. This calculation yielded a requisite of 80 references for the comparison. The PubMed search yielded 11 systematic reviews ( Figure 2 ), each with an average of 9.9 (SD 6.6; range 3-23) references. The evaluation of the LLMs was predicated on three widely used metrics: (1) recall, representing the proportion of genuinely pertinent papers from the original systematic reviews accurately identified and retrieved by the LLMs; (2) precision, quantifying the proportion of papers retrieved by the LLMs that are verifiably present in the original systematic reviews; and (3) F 1 -score, which serves as an aggregate metric encapsulating both the recall and precision values ( Table 1 ).

comparative analysis of research papers

a LLM: large language model.

where TP represents true positive, FN represents false negative, and FP represents false positive.

The LLMs incorporated in this study included GPT-3.5 (text-davinci-002-render-sha, July 19 version; OpenAI), GPT-4 (gpt-4-32k-0314, July 19 version; OpenAI) [ 15 ], and Bard (PaLM version 2.0, released on July 13, 2023; Google AI). We conducted chi-square tests to compare each piece of information extracted from LLMs’ responses, including authors’ nationalities and the open-access status of the retrieved papers. The significance threshold used was P <.05. Statistical analysis was performed with EasyMedStat (version 3.24).

In total, 11 systematic reviews were identified in 4 fields ( Table 2 ): physiotherapy (3 papers), sports medicine (3 papers), orthopedic surgery (3 papers), and anesthesiology (2 papers), leading to 33 prompts to LLMs (3 tested LLMs×11 systematic reviews). LLM prompts returned references in 32 of 33 cases: Bard did not return any result for the systematic review about “subacromial analgesia via continuous infusion catheter.” In most cases, the number of references returned by LLMs was greater or equal to that of the original papers ( Table 2 ). Overall, 471 references were included in this study and analyzed.

Papers identified by LLMs were present in the original systematic reviews (precision) in 9.4% (13/139), 13.4% (16/119), and 0% (0/104) of cases for GPT-3.5, GPT-4, and Bard ( P <.001), respectively. Conversely, 11.9% (13/109) of papers from the systematic reviews (recall) were retrieved by GPT-3.5, and 13.7% (15/109) by GPT-4. No paper from the systematic reviews was retrieved by Bard ( P <.001; Table 3 ) .

The hallucination rates were, respectively, 39.6% (55/139), 28.6% (34/119), and 91.4% (95/104) for GPT-3.5, GPT-4, and Bard ( P <.001). When analyzing the papers retrieved by GPT that were not hallucinated (n=84 for GPT-3.5 and n=85 for GPT-4), the following criteria were successfully identified ( Figure 3 ): randomized studies (33/84, 39% vs 42/85, 49%; P =.24), participant criteria (49/84, 57% vs 57/85, 67%; P =.24), intervention criteria (58/84, 69% vs 72/85, 85%; P =.03), not a systematic review (69/84, 81% vs 66/85, 78%; P =.73), and published before 2021 (84/84, 100% vs 85/85, 100%; P> .99). In total, 9 papers retrieved by Bard were not hallucinated. This limited sample was not appropriate for further inferential statistics.

Regarding the same nonhallucinated papers retrieved by GPT, the following bibliographic information were considered accurate ( Figure 4 ): authors list (73/84, 87% vs 74/85, 87%; P> .99), journal title (81/84, 96% vs 85/85, 100%; P =.12), date and issue (71/84, 84% vs 81/85, 95%; P =.02), paper title (83/84, 99% vs 84/85, 99%; P> .99), and DOI (13/82, 16% vs 17/84, 20%; P =.59).

Open-access papers were selected in 27.5% (30/109) of original systematic reviews, 38% (32/84) of GPT-3.5 papers, and 36% (31/85) of GPT-4 papers ( P =.24). Papers from American authors were selected in 16.5% (18/109) of original systematic reviews, 44% (37/84) of GPT-3.5 papers, and 33% (28/85) of GPT-4 papers ( P <.001).

a PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

comparative analysis of research papers

Principal Findings

The most important finding of this study is that using LLMs such as ChatGPT and Bard to conduct systematic reviews for a common condition such as rotator cuff disease can generate misleading or “hallucinated” references, exceeding a 25% rate.

This concern has been broached in previous literature [ 26 - 29 ], but our study provides an experimental design to probe the matter more deeply. OpenAI, the developer of ChatGPT, acknowledges this issue, stating that their model “occasionally generates plausible but incorrect or nonsensical responses” [ 30 ]. As LLMs increasingly assist academic researchers in producing scientific literature, this phenomenon warrants careful scrutiny.

When comparing the 3 models tested, GPT-4 was the most efficient to retrieve nonhallucinated references, while GPT-3.5 produced 39.6% (55/139) of nonexisting references. Bard, however, appears ill-suited for conducting systematic reviews in the selected areas, with 91.3% (95/104) of the references failing to correlate with legitimate papers. Bard seemed to have a try-and-repeat approach, providing multiple versions of hallucinated papers with close titles and journal names ( Figure 5 ).

Despite this, LLMs typically encouraged users to conduct their own systematic reviews, recognizing the necessity of human involvement. However, in none of our queries did the LLMs ask to verify the authenticity of the produced citations. Nonetheless, the convincing verisimilitude of the references generated by LLMs presents a risk for incautious researchers, potentially undermining the quality of scientific bibliographies if improperly used ( Figures 5 and 6 ). Moreover, the efficiency of LLMs in retrieving original papers from systematic reviews ranged from negligible to modest (0/109, 0% to 15/109, 13.8%), emphasizing that researchers should not overly rely on these tools for systematic reviews. Nevertheless, in numerous instances, both ChatGPT and Bard “encouraged [users] to conduct their own research” ( Figure 5 ), a suggestion that appears crucial considering the findings of this study.

It could be expected that LLMs were not able to retrieve the same references as authors of systematic reviews. However, this study also reveals that LLMs, despite being provided with the same eligibility criteria as those in the original systematic reviews, were not able to consistently apply them. For instance, the criterion of “randomized study” was adhered to in only 39% (33/84) to 49% (42/85) of nonhallucinated papers generated by ChatGPT, even when the term “randomized” appeared in the title or abstract of the papers from the original systematic reviews. The same finding was observed for the “not a systematic review” criterion, which was not respected in 20.1% (36/179) of cases, while the publicly available information of the produced papers clearly states the nature of these studies.

These discrepancies could potentially stem from the underlying statistical nature of these LLMs, which predict subsequent text (tokens) based on a model reinforced by human feedback [ 31 ]. However, as human supervision does not extend to validating the accuracy of LLM outputs, especially in specialized fields like medicine, inaccuracies can prevail.

In the case of nonhallucinated papers, however, ChatGPT demonstrated significant efficiency in retrieving accurate bibliographic information like the exact paper title, the authors’ list, and the journal title.

Potential biases in LLMs due to training on biased data sets and the risk of perpetuating stereotypes have been highlighted [ 2 ]. Our findings suggest that American authors were more frequently represented in ChatGPT references. However, further investigation across diverse medical fields is warranted to ascertain whether these LLMs may introduce such biases definitively.

comparative analysis of research papers

Strengths and Limitations

This investigation, by virtue of its specific and circumscribed parameters, comes with several inherent limitations. The scope of the study was exclusively focused on systematic reviews related to shoulder rotator cuff pathology. Consequently, it must be recognized that the findings might not be universally applicable across diverse medical specialties or disciplines. The examination was also restricted to 3 LLMs, specifically GPT-3.5, GPT-4, and Bard. The landscape of available language models is vast and continually evolving, and it is conceivable that different models might yield divergent results. In addition, the field lacks established guidelines for leveraging LLMs to optimize accuracy. Notwithstanding rigorous attempts to devise specific, comprehensive prompts, it remains plausible that alternative queries could generate more precise outcomes. This fact underscores the multifaceted nature of the challenge and the need for further research in this domain.

The choice of prompt plays a crucial role in determining the output generated by LLMs. During the exploratory phase of our study, various prompt versions were tested. While our study did not focus on identifying the optimal prompts, several techniques used in our prompts appeared to enhance output quality: specifying a minimum number of papers (a minimum of 9 papers); using bullet points to delineate criteria such as “type of studies,” “participants,” and “interventions”; and explicitly instructing to “exclude systematic reviews and meta-analyses.” Introducing prompts by specifying the researcher’s profession provides additional context, aligning with recommendations from LLM providers. Finally, enforcing a specific reference style format facilitated the retrieval of vital information, including authors’ names, journal titles, publication dates, and DOIs when available.

Our decision not to provide the initial PubMed results list to LLMs for assessing paper eligibility was deliberate, aimed at preserving study integrity and interpretability. While providing the list might enhance LLM accuracy, it introduces bias by guiding models toward replicating the provided set rather than autonomously identifying relevant studies. Our study design, though sacrificing some precision, ensures that LLM results reflect genuine capabilities in navigating scientific literature independently.

Future Directions

LLMs present a highly efficient instrument that may aid academics in the drafting of research papers. However, upon analyzing the findings of this study, it becomes imperative to emphasize that the bibliographic references proposed by the AI are not intrinsically trustworthy. These citations necessitate human validation, focusing on the authors, the title, and the subject matter.

We thereby deduce that, in the context of GPT iterations, user verification is indispensable for preserving the scientific integrity and relevance of the output. A statement or a scholarly usage guideline should be prominently featured before the tool is used or should be integrated into the software itself to outline its lack of liability for any inaccuracies in the citation of papers. This is paramount as such errors could potentially mislead a considerable number of users. We also propose that the application of GPT-based chatbots for tasks such as spelling correction, proofreading, or text restructuring ought to be explicitly mentioned within the materials and methods section of academic writings.


ChatGPT and Bard exhibit the capacity to generate convincingly authentic references for systematic reviews but also yield hallucinated papers in 28.6% (34/119) to 91.3% (95/104) of cases. Among the models tested, GPT-4 displayed superior performance in generating legitimate and relevant references but, like the other models, largely failed to respect the established eligibility criteria. Given their current state, LLMs such as ChatGPT and Bard should not be used as the sole or primary means for conducting systematic reviews of literature, and it is crucial that references generated by these tools undergo rigorous validation by the authors of scientific papers.


The entirety of this paper was composed by the authors of this research. As nonnative English speakers, the authors used ChatGPT to refine the English language used in the paper [ 32 ]. Importantly, all modifications suggested by ChatGPT underwent meticulous evaluation and approval by the authors to ensure accuracy and clarity. ChatGPT was not used for bibliographic reference retrieval.

Data Availability

  • Lund BD, Wang T. Chatting about ChatGPT: how may AI and GPT impact academia and libraries? Lib Hi Tech News. 2023;40(3):26-29. [ CrossRef ]
  • The Lancet Digital Health. ChatGPT: friend or foe? Lancet Digit Health. 2023;5(3):e102. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Else H. Abstracts written by ChatGPT fool scientists. Nature. 2023;613(7944):423. [ CrossRef ] [ Medline ]
  • Biswas S. ChatGPT and the future of medical writing. Radiology. 2023;307(2):e223312. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Salvagno M, Taccone FS, Gerli AG. Can artificial intelligence help for scientific writing? Crit Care. 2023;27(1):75. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Stokel-Walker C. ChatGPT listed as author on research papers: many scientists disapprove. Nature. 2023;613(7945):620-621. [ CrossRef ] [ Medline ]
  • Zheng H, Zhan H. ChatGPT in scientific writing: a cautionary tale. Am J Med. 2023;136(8):725-726.e6. [ CrossRef ] [ Medline ]
  • Belk JW, Kraeutler MJ, Houck DA, Chrisman AN, Scillia AJ, McCarty EC. Biceps tenodesis versus tenotomy: a systematic review and meta-analysis of level I randomized controlled trials. J Shoulder Elbow Surg. 2021;30(5):951-960. [ CrossRef ] [ Medline ]
  • Azar M, Van der Meijden O, Pireau N, Chelli M, Gonzalez JF, Boileau P. Arthroscopic revision cuff repair: do tendons have a second chance to heal? J Shoulder Elbow Surg. 2022;31(12):2521-2531. [ CrossRef ] [ Medline ]
  • Boileau P, Andreani O, Schramm M, Baba M, Barret H, Chelli M. The effect of tendon delamination on rotator cuff healing. Am J Sports Med. 2019;47(5):1074-1081. [ CrossRef ] [ Medline ]
  • Muccioli C, Chelli M, Caudal A, Andreani O, Elhor H, Gauci MO, et al. Rotator cuff integrity and shoulder function after intra-medullary humerus nailing. Orthop Traumatol Surg Res. 2020;106(1):17-23. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Boileau P, Baqué F, Valerio L, Ahrens P, Chuinard C, Trojani C. Isolated arthroscopic biceps tenotomy or tenodesis improves symptoms in patients with massive irreparable rotator cuff tears. J Bone Joint Surg Am. 2007;89(4):747-757. [ CrossRef ] [ Medline ]
  • Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. 2009;339:b2700. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Booth A, Clarke M, Dooley G, Ghersi D, Moher D, Petticrew M, et al. The nuts and bolts of PROSPERO: an international prospective register of systematic reviews. Syst Rev. 2012;1:2. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • OpenAI. GPT-4 technical report. ArXiv. . Preprint posted online on March 15, 2023. [ FREE Full text ] [ CrossRef ]
  • Lähdeoja T, Karjalainen T, Jokihaara J, Salamh P, Kavaja L, Agarwal A, et al. Subacromial decompression surgery for adults with shoulder pain: a systematic review with meta-analysis. Br J Sports Med. 2020;54(11):665-673. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Catapano M, Zhang K, Mittal N, Sangha H, Onishi K, de Sa D. Effectiveness of dextrose prolotherapy for rotator cuff tendinopathy: a systematic review. PM R. 2020;12(3):288-300. [ CrossRef ] [ Medline ]
  • Gutiérrez-Espinoza H, Araya-Quintanilla F, Cereceda-Muriel C, Álvarez-Bueno C, Martínez-Vizcaíno V, Cavero-Redondo I. Effect of supervised physiotherapy versus home exercise program in patients with subacromial impingement syndrome: a systematic review and meta-analysis. Phys Ther Sport. 2020;41:34-42. [ CrossRef ] [ Medline ]
  • Chen X, Jones IA, Togashi R, Park C, Vangsness CT. Use of platelet-rich plasma for the improvement of pain and function in rotator cuff tears: a systematic review and meta-analysis with bias assessment. Am J Sports Med. 2020;48(8):2028-2041. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • An VVG, Farey JE, Karunaratne S, Smithers CJ, Petchell JF. Subacromial analgesia via continuous infusion catheter vs. placebo following arthroscopic shoulder surgery: a systematic review and meta-analysis of randomized trials. J Shoulder Elbow Surg. 2020;29(3):471-482. [ CrossRef ] [ Medline ]
  • Craig RS, Goodier H, Singh JA, Hopewell S, Rees JL. Shoulder replacement surgery for osteoarthritis and rotator cuff tear arthropathy. Cochrane Database Syst Rev. 2020;4(4):CD012879. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Naunton J, Street G, Littlewood C, Haines T, Malliaras P. Effectiveness of progressive and resisted and non-progressive or non-resisted exercise in rotator cuff related shoulder pain: a systematic review and meta-analysis of randomized controlled trials. Clin Rehabil. 2020;34(9):1198-1216. [ CrossRef ] [ Medline ]
  • Malliaras P, Johnston R, Street G, Littlewood C, Bennell K, Haines T, et al. The efficacy of higher versus lower dose exercise in rotator cuff tendinopathy: a systematic review of randomized controlled trials. Arch Phys Med Rehabil. 2020;101(10):1822-1834. [ CrossRef ] [ Medline ]
  • Simpson M, Pizzari T, Cook T, Wildman S, Lewis J. Effectiveness of non-surgical interventions for rotator cuff calcific tendinopathy: a systematic review. J Rehabil Med. 2020;52(10):jrm00119. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Belk JW, McCarty EC, Houck DA, Dragoo JL, Savoie FH, Thon SG. Tranexamic acid use in knee and shoulder arthroscopy leads to improved outcomes and fewer hemarthrosis-related complications: a systematic review of level I and II studies. Arthroscopy. 2021;37(4):1323-1333. [ CrossRef ] [ Medline ]
  • Hillier M. Why does ChatGPT generate fake references? TECHE. 2023. URL: https://teche.mq.edu.au/2023/02/why-does-chatgpt-generate-fake-references/ [accessed 2023-05-17]
  • Gravel J, D’Amours-Gravel M, Osmanlliu E. Learning to fake it: limited responses and fabricated references provided by ChatGPT for medical questions. Mayo Clin Proc Digit Health. 2023;1(3):226-234. [ FREE Full text ] [ CrossRef ]
  • Day T. A preliminary investigation of fake peer-reviewed citations and references generated by ChatGPT. Prof Geogr. 2023;75(6):1024-1027. [ FREE Full text ] [ CrossRef ]
  • Alkaissi H, McFarlane SI. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus. 2023;15(2):e35179. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Introducing ChatGPT. OpenAI. 2022. URL: https://openai.com/blog/chatgpt [accessed 2023-05-18]
  • Stiennon N, Ouyang L, Wu J, Ziegler DM, Lowe R, Voss C, et al. Learning to summarize from human feedback. ArXiv. . Preprint posted online on September 2, 2020. [ FREE Full text ] [ CrossRef ]
  • Li H, Moon JT, Purkayastha S, Celi LA, Trivedi H, Gichoya JW. Ethics of large language models in medicine and medical research. Lancet Digit Health. 2023;5(6):e333-e335. [ FREE Full text ] [ CrossRef ] [ Medline ]


  1. FREE 9+ Comparative Research Templates in PDF

    comparative analysis of research papers

  2. ⇉Comparative Research Analysis Essay Example

    comparative analysis of research papers

  3. ⛔ Comparative method in research. How to Write a Comparative Analysis

    comparative analysis of research papers

  4. FREE 9+ Comparative Research Templates in PDF

    comparative analysis of research papers

  5. Comparison table of above research papers

    comparative analysis of research papers

  6. Comparative Research

    comparative analysis of research papers


  1. Difference between Research paper and a review. Which one is more important?

  2. comparative analysis between germany and india health administrations

  3. #comparative analysis of various investment #comparative #analysis #investment #incometax

  4. comparative Analysis of organizational culture

  5. Individual Project: A Comparative Analysis of English School and Realism in International Relations

  6. Comparative Analysis of Sentiment Analysis Using ML & DL Techniques for Political Issues in Malaysia


  1. Comparative Analysis

    Framing . Framing multi-source writing assignments (comparative analysis, research essays, multi-modal projects) is likely to overlap a great deal with "Why It's Useful" (see above), because the range of reasons why we might use these kinds of writing in academic or non-academic settings is itself the reason why they so often appear later in courses.

  2. (PDF) A Short Introduction to Comparative Research

    Comparative research or analysis is a broad term that includes both quantitative and . ... his more widely quoted essay, "In Comparison a Magic Dwells" (S mith, 1982). In .

  3. PDF How to Write a Comparative Analysis

    Determine the focus of your piece. Determine if you will focus on the similarities, the differences, or both. Be sure you treat each individual the same; each person deserves the same amount of focus-meaning, do not place most of the emphasis on you or the other person. Find a balance.

  4. PDF How to Write a Comparative Analysis

    There are two basic ways to organize the body of your paper. In text-by-text, you discuss all of A, then all of B. In point-by-point, you alternate points about A with comparable points about B. If you think that B extends A, you'll probably use a text-by-text scheme; if you see A and B engaged in debate, a point-by-point scheme will draw ...

  5. What is Comparative Analysis? Guide with Examples

    A comparative analysis is a side-by-side comparison that systematically compares two or more things to pinpoint their similarities and differences. The focus of the investigation might be conceptual—a particular problem, idea, or theory—or perhaps something more tangible, like two different data sets. For instance, you could use comparative ...

  6. Comparative Research Methods

    A recent synthesis by Esser and Hanitzsch ( 2012a) concluded that comparative communication research involves comparisons between a minimum of two macro-level cases (systems, cultures, markets, or their sub-elements) in which at least one object of investigation is relevant to the field of communication.

  7. PDF Title of Module: Comparative Analysis

    Comparative analysis is a common way of engaging with those sources that builds upon the basic components of an academic argument by asking you to identify and join in a conversation with multiple other writers. Comparative analysis goes much further than a traditional compare/contrast essay, where you are mainly identifying similarities and ...

  8. Academic Guides: Writing a Paper: Comparing & Contrasting

    Use Clear Transitions. Transitions are important in compare and contrast essays, where you will be moving frequently between different topics or perspectives. Examples of transitions and phrases for comparisons: as well, similar to, consistent with, likewise, too. Examples of transitions and phrases for contrasts: on the other hand, however ...

  9. Comparative Analysis

    Comparative analysis is a multidisciplinary method, which spans a wide cross-section of disciplines (Azarian, 2011).It is the process of comparing multiple units of study for the purpose of scientific discovery and for informing policy decisions (Rogers, 2014).Even though there has been a renewed interest in comparative analysis as a research method over the last decade in fields such as ...

  10. Comparative Analysis

    Definition. The goal of comparative analysis is to search for similarity and variance among units of analysis. Comparative research commonly involves the description and explanation of similarities and differences of conditions or outcomes among large-scale social units, usually regions, nations, societies, and cultures.

  11. How do I write a comparative analysis?

    A comparative analysis is an essay in which two things are compared and contrasted. You may have done a "compare and contrast" paper in your English class, and a comparative analysis is the same general idea, but as a graduate student you are expected to produce a higher level of analysis in your writing.

  12. Comparing and Contrasting in an Essay

    Making effective comparisons. As the name suggests, comparing and contrasting is about identifying both similarities and differences. You might focus on contrasting quite different subjects or comparing subjects with a lot in common—but there must be some grounds for comparison in the first place. For example, you might contrast French ...

  13. Comparative Research Methods

    Research goals. Comparative communication research is a combination of substance (specific objects of investigation studied in diferent macro-level contexts) and method (identification of diferences and similarities following established rules and using equivalent concepts).

  14. (PDF) Four Varieties of Comparative Analysis

    Comparative analysis methods consist of four different types methods which are individualizing, universalizing, variating finding and encompassing. According to Pickvance, C. (2001 ...

  15. A Step-by-Step Guide to Writing a Comparative Analysis

    Organize information. It is important to structure your comments for your readers to want to read your comparative analysis. The idea is to make it easy for your readers to navigate your paper and get them to find the information that interests them quickly. 5. End with a conclusion.

  16. How to Do Comparative Analysis in Research ( Examples )

    Comparative analysis is a method that is widely used in social science. It is a method of comparing two or more items with an idea of uncovering and discovering new ideas about them. It often compares and contrasts social structures and processes around the world to grasp general patterns. Comparative analysis tries to understand the study and ...


    What makes a study comparative is not the particular techniques employed but the theoretical orientation and the sources of data. All the tools of the social scientist, including historical analysis, fieldwork, surveys, and aggregate data analysis, can be used to achieve the goals of comparative research. So, there is plenty of room for the ...

  18. What is Comparative Analysis and How to Conduct It?

    Comparative analysis is a systematic approach used to evaluate and compare two or more entities, variables, or options to identify similarities, differences, and patterns. It involves assessing the strengths, weaknesses, opportunities, and threats associated with each entity or option to make informed decisions.

  19. (PDF) Methods of comparative analysis

    The one-parameter comparison allows for obtaining stable general qualitative and quantitative comparison estimates. The multi-parameter comparison allows for obtaining general qualitative ...

  20. The use of Qualitative Comparative Analysis (QCA) to address causality

    Qualitative Comparative Analysis (QCA) is a method for identifying the configurations of conditions that lead to specific outcomes. Given its potential for providing evidence of causality in complex systems, QCA is increasingly used in evaluative research to examine the uptake or impacts of public health interventions. We map this emerging field, assessing the strengths and weaknesses of QCA ...

  21. Frontiers

    The overarching purpose of this research was to determine which teaching method proved more effective over the eight-year period. ... A Comparative Analysis of Student Performance in an Online vs. Face-to-Face Environmental Science Course From 2009 to 2016 ... A single exam containing both multiple choice and essay questions may be a better ...

  22. Comparative analysis of deep learning image detection algorithms

    A computer views all kinds of visual media as an array of numerical values. As a consequence of this approach, they require image processing algorithms to inspect contents of images. This project compares 3 major image processing algorithms: Single Shot Detection (SSD), Faster Region based Convolutional Neural Networks (Faster R-CNN), and You Only Look Once (YOLO) to find the fastest and most ...

  23. A Comparative Analysis of Two Published Research Papers

    The following assignment will present a comparative analysis of two published research papers. It will examine the approaches used; theoretical and philosophical assumptions and the wider socio-political context of each piece, and provide a balanced and informed judgement regarding the strengths and weaknesses of available research approaches by way of ethical analysis.

  24. Comparative analysis of modern technologies of additive production

    In the work, a comparative analysis of these technologies was carried out according to various criteria, such as principle of operation, materials, resolution, surface finish, accuracy, speed, strength, application, cost, complexity of parts, and post-processing. For each technology, the advantages and disadvantages of its use are determined ...

  25. Bridging the gap in online hate speech detection: a comparative

    Our study addresses a significant gap in online hate speech detection research by focusing on homophobia, an area often neglected in sentiment analysis research. Utilising advanced sentiment analysis models, particularly BERT, and traditional machine learning methods, we developed a nuanced approach to identify homophobic content on X/Twitter. This research is pivotal due to the persistent ...

  26. Crypto-asset regulatory landscape: a comparative analysis of ...

    The purpose of this research paper is to compare and analyse how crypto-assets are regulated in the UK and Germany. The aim is to understand and highlight the approaches taken by these two countries in terms of regulating crypto-assets and to explore the potential impact that their regulatory frameworks could have on the market for these crypto-assets. The research employs a doctrinal research ...

  27. The Attractiveness of European HE Systems: A Comparative Analysis of

    This paper investigates the salaries as well as the recruitment and retention procedures in public higher education institutions from a cross country perspective. The UK, Germany, France, and Italy are adopted as case studies to determine the attractiveness of European higher education systems.

  28. Journal of Medical Internet Research

    Further analysis of nonhallucinated papers retrieved by GPT models revealed significant differences in identifying various criteria, such as randomized studies, participant criteria, and intervention criteria. The study also noted the geographical and open-access biases in the papers retrieved by the LLMs.