random assignment to groups is a critical part of the methodology

Bipolar Disorder
Therapy Center
When To See a Therapist
Types of Therapy
Best Online Therapy
Best Couples Therapy
Best Family Therapy
Managing Stress
Sleep and Dreaming
Understanding Emotions
Self-Improvement
Healthy Relationships
Student Resources
Personality Types
Guided Meditations
Verywell Mind Insights
2024 Verywell Mind 25
Mental Health in the Classroom
Editorial Process
Meet Our Review Board
Crisis Support

The Definition of Random Assignment According to Psychology

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

random assignment to groups is a critical part of the methodology

Emily is a board-certified science editor who has worked with top digital publishing brands like Voices for Biodiversity, Study.com, GoodTherapy, Vox, and Verywell.

Materio / Getty Images

Random assignment refers to the use of chance procedures in psychology experiments to ensure that each participant has the same opportunity to be assigned to any given group in a study to eliminate any potential bias in the experiment at the outset. Participants are randomly assigned to different groups, such as the treatment group versus the control group. In clinical research, randomized clinical trials are known as the gold standard for meaningful results.

Simple random assignment techniques might involve tactics such as flipping a coin, drawing names out of a hat, rolling dice, or assigning random numbers to a list of participants. It is important to note that random assignment differs from random selection .

While random selection refers to how participants are randomly chosen from a target population as representatives of that population, random assignment refers to how those chosen participants are then assigned to experimental groups.

Random Assignment In Research

To determine if changes in one variable will cause changes in another variable, psychologists must perform an experiment. Random assignment is a critical part of the experimental design that helps ensure the reliability of the study outcomes.

Researchers often begin by forming a testable hypothesis predicting that one variable of interest will have some predictable impact on another variable.

The variable that the experimenters will manipulate in the experiment is known as the independent variable , while the variable that they will then measure for different outcomes is known as the dependent variable. While there are different ways to look at relationships between variables, an experiment is the best way to get a clear idea if there is a cause-and-effect relationship between two or more variables.

Once researchers have formulated a hypothesis, conducted background research, and chosen an experimental design, it is time to find participants for their experiment. How exactly do researchers decide who will be part of an experiment? As mentioned previously, this is often accomplished through something known as random selection.

Random Selection

In order to generalize the results of an experiment to a larger group, it is important to choose a sample that is representative of the qualities found in that population. For example, if the total population is 60% female and 40% male, then the sample should reflect those same percentages.

Choosing a representative sample is often accomplished by randomly picking people from the population to be participants in a study. Random selection means that everyone in the group stands an equal chance of being chosen to minimize any bias. Once a pool of participants has been selected, it is time to assign them to groups.

By randomly assigning the participants into groups, the experimenters can be fairly sure that each group will have the same characteristics before the independent variable is applied.

Participants might be randomly assigned to the control group , which does not receive the treatment in question. The control group may receive a placebo or receive the standard treatment. Participants may also be randomly assigned to the experimental group , which receives the treatment of interest. In larger studies, there can be multiple treatment groups for comparison.

There are simple methods of random assignment, like rolling the die. However, there are more complex techniques that involve random number generators to remove any human error.

There can also be random assignment to groups with pre-established rules or parameters. For example, if you want to have an equal number of men and women in each of your study groups, you might separate your sample into two groups (by sex) before randomly assigning each of those groups into the treatment group and control group.

Random assignment is essential because it increases the likelihood that the groups are the same at the outset. With all characteristics being equal between groups, other than the application of the independent variable, any differences found between group outcomes can be more confidently attributed to the effect of the intervention.

Example of Random Assignment

Imagine that a researcher is interested in learning whether or not drinking caffeinated beverages prior to an exam will improve test performance. After randomly selecting a pool of participants, each person is randomly assigned to either the control group or the experimental group.

The participants in the control group consume a placebo drink prior to the exam that does not contain any caffeine. Those in the experimental group, on the other hand, consume a caffeinated beverage before taking the test.

Participants in both groups then take the test, and the researcher compares the results to determine if the caffeinated beverage had any impact on test performance.

A Word From Verywell

Random assignment plays an important role in the psychology research process. Not only does this process help eliminate possible sources of bias, but it also makes it easier to generalize the results of a tested sample of participants to a larger population.

Random assignment helps ensure that members of each group in the experiment are the same, which means that the groups are also likely more representative of what is present in the larger population of interest. Through the use of this technique, psychology researchers are able to study complex phenomena and contribute to our understanding of the human mind and behavior.

Lin Y, Zhu M, Su Z. The pursuit of balance: An overview of covariate-adaptive randomization techniques in clinical trials . Contemp Clin Trials. 2015;45(Pt A):21-25. doi:10.1016/j.cct.2015.07.011

Sullivan L. Random assignment versus random selection . In: The SAGE Glossary of the Social and Behavioral Sciences. SAGE Publications, Inc.; 2009. doi:10.4135/9781412972024.n2108

Alferes VR. Methods of Randomization in Experimental Design . SAGE Publications, Inc.; 2012. doi:10.4135/9781452270012

Nestor PG, Schutt RK. Research Methods in Psychology: Investigating Human Behavior. (2nd Ed.). SAGE Publications, Inc.; 2015.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Random Assignment in Psychology (Definition + 40 Examples)

Have you ever wondered how researchers discover new ways to help people learn, make decisions, or overcome challenges? A hidden hero in this adventure of discovery is a method called random assignment, a cornerstone in psychological research that helps scientists uncover the truths about the human mind and behavior.

Random Assignment is a process used in research where each participant has an equal chance of being placed in any group within the study. This technique is essential in experiments as it helps to eliminate biases, ensuring that the different groups being compared are similar in all important aspects.

By doing so, researchers can be confident that any differences observed are likely due to the variable being tested, rather than other factors.

In this article, we’ll explore the intriguing world of random assignment, diving into its history, principles, real-world examples, and the impact it has had on the field of psychology.

History of Random Assignment

Stepping back in time, we delve into the origins of random assignment, which finds its roots in the early 20th century.

The pioneering mind behind this innovative technique was Sir Ronald A. Fisher , a British statistician and biologist. Fisher introduced the concept of random assignment in the 1920s, aiming to improve the quality and reliability of experimental research .

His contributions laid the groundwork for the method's evolution and its widespread adoption in various fields, particularly in psychology.

Fisher’s groundbreaking work on random assignment was motivated by his desire to control for confounding variables – those pesky factors that could muddy the waters of research findings.

By assigning participants to different groups purely by chance, he realized that the influence of these confounding variables could be minimized, paving the way for more accurate and trustworthy results.

Early Studies Utilizing Random Assignment

Following Fisher's initial development, random assignment started to gain traction in the research community. Early studies adopting this methodology focused on a variety of topics, from agriculture (which was Fisher’s primary field of interest) to medicine and psychology.

The approach allowed researchers to draw stronger conclusions from their experiments, bolstering the development of new theories and practices.

One notable early study utilizing random assignment was conducted in the field of educational psychology. Researchers were keen to understand the impact of different teaching methods on student outcomes.

By randomly assigning students to various instructional approaches, they were able to isolate the effects of the teaching methods, leading to valuable insights and recommendations for educators.

Evolution of the Methodology

As the decades rolled on, random assignment continued to evolve and adapt to the changing landscape of research.

Advances in technology introduced new tools and techniques for implementing randomization, such as computerized random number generators, which offered greater precision and ease of use.

The application of random assignment expanded beyond the confines of the laboratory, finding its way into field studies and large-scale surveys.

Researchers across diverse disciplines embraced the methodology, recognizing its potential to enhance the validity of their findings and contribute to the advancement of knowledge.

From its humble beginnings in the early 20th century to its widespread use today, random assignment has proven to be a cornerstone of scientific inquiry.

Its development and evolution have played a pivotal role in shaping the landscape of psychological research, driving discoveries that have improved lives and deepened our understanding of the human experience.

Principles of Random Assignment

Delving into the heart of random assignment, we uncover the theories and principles that form its foundation.

The method is steeped in the basics of probability theory and statistical inference, ensuring that each participant has an equal chance of being placed in any group, thus fostering fair and unbiased results.

Basic Principles of Random Assignment

Understanding the core principles of random assignment is key to grasping its significance in research. There are three principles: equal probability of selection, reduction of bias, and ensuring representativeness.

The first principle, equal probability of selection , ensures that every participant has an identical chance of being assigned to any group in the study. This randomness is crucial as it mitigates the risk of bias and establishes a level playing field.

The second principle focuses on the reduction of bias . Random assignment acts as a safeguard, ensuring that the groups being compared are alike in all essential aspects before the experiment begins.

This similarity between groups allows researchers to attribute any differences observed in the outcomes directly to the independent variable being studied.

Lastly, ensuring representativeness is a vital principle. When participants are assigned randomly, the resulting groups are more likely to be representative of the larger population.

This characteristic is crucial for the generalizability of the study’s findings, allowing researchers to apply their insights broadly.

Theoretical Foundation

The theoretical foundation of random assignment lies in probability theory and statistical inference .

Probability theory deals with the likelihood of different outcomes, providing a mathematical framework for analyzing random phenomena. In the context of random assignment, it helps in ensuring that each participant has an equal chance of being placed in any group.

Statistical inference, on the other hand, allows researchers to draw conclusions about a population based on a sample of data drawn from that population. It is the mechanism through which the results of a study can be generalized to a broader context.

Random assignment enhances the reliability of statistical inferences by reducing biases and ensuring that the sample is representative.

Differentiating Random Assignment from Random Selection

It’s essential to distinguish between random assignment and random selection, as the two terms, while related, have distinct meanings in the realm of research.

Random assignment refers to how participants are placed into different groups in an experiment, aiming to control for confounding variables and help determine causes.

In contrast, random selection pertains to how individuals are chosen to participate in a study. This method is used to ensure that the sample of participants is representative of the larger population, which is vital for the external validity of the research.

While both methods are rooted in randomness and probability, they serve different purposes in the research process.

Understanding the theories, principles, and distinctions of random assignment illuminates its pivotal role in psychological research.

This method, anchored in probability theory and statistical inference, serves as a beacon of reliability, guiding researchers in their quest for knowledge and ensuring that their findings stand the test of validity and applicability.

Methodology of Random Assignment

Implementing random assignment in a study is a meticulous process that involves several crucial steps.

The initial step is participant selection, where individuals are chosen to partake in the study. This stage is critical to ensure that the pool of participants is diverse and representative of the population the study aims to generalize to.

Once the pool of participants has been established, the actual assignment process begins. In this step, each participant is allocated randomly to one of the groups in the study.

Researchers use various tools, such as random number generators or computerized methods, to ensure that this assignment is genuinely random and free from biases.

Monitoring and adjusting form the final step in the implementation of random assignment. Researchers need to continuously observe the groups to ensure that they remain comparable in all essential aspects throughout the study.

If any significant discrepancies arise, adjustments might be necessary to maintain the study’s integrity and validity.

Tools and Techniques Used

The evolution of technology has introduced a variety of tools and techniques to facilitate random assignment.

Random number generators, both manual and computerized, are commonly used to assign participants to different groups. These generators ensure that each individual has an equal chance of being placed in any group, upholding the principle of equal probability of selection.

In addition to random number generators, researchers often use specialized computer software designed for statistical analysis and experimental design.

These software programs offer advanced features that allow for precise and efficient random assignment, minimizing the risk of human error and enhancing the study’s reliability.

Ethical Considerations

The implementation of random assignment is not devoid of ethical considerations. Informed consent is a fundamental ethical principle that researchers must uphold.

Informed consent means that every participant should be fully informed about the nature of the study, the procedures involved, and any potential risks or benefits, ensuring that they voluntarily agree to participate.

Beyond informed consent, researchers must conduct a thorough risk and benefit analysis. The potential benefits of the study should outweigh any risks or harms to the participants.

Safeguarding the well-being of participants is paramount, and any study employing random assignment must adhere to established ethical guidelines and standards.

Conclusion of Methodology

The methodology of random assignment, while seemingly straightforward, is a multifaceted process that demands precision, fairness, and ethical integrity. From participant selection to assignment and monitoring, each step is crucial to ensure the validity of the study’s findings.

The tools and techniques employed, coupled with a steadfast commitment to ethical principles, underscore the significance of random assignment as a cornerstone of robust psychological research.

Benefits of Random Assignment in Psychological Research

The impact and importance of random assignment in psychological research cannot be overstated. It is fundamental for ensuring the study is accurate, allowing the researchers to determine if their study actually caused the results they saw, and making sure the findings can be applied to the real world.

Facilitating Causal Inferences

When participants are randomly assigned to different groups, researchers can be more confident that the observed effects are due to the independent variable being changed, and not other factors.

This ability to determine the cause is called causal inference .

This confidence allows for the drawing of causal relationships, which are foundational for theory development and application in psychology.

Ensuring Internal Validity

One of the foremost impacts of random assignment is its ability to enhance the internal validity of an experiment.

Internal validity refers to the extent to which a researcher can assert that changes in the dependent variable are solely due to manipulations of the independent variable , and not due to confounding variables.

By ensuring that each participant has an equal chance of being in any condition of the experiment, random assignment helps control for participant characteristics that could otherwise complicate the results.

Enhancing Generalizability

Beyond internal validity, random assignment also plays a crucial role in enhancing the generalizability of research findings.

When done correctly, it ensures that the sample groups are representative of the larger population, so can allow researchers to apply their findings more broadly.

This representative nature is essential for the practical application of research, impacting policy, interventions, and psychological therapies.

Limitations of Random Assignment

Potential for implementation issues.

While the principles of random assignment are robust, the method can face implementation issues.

One of the most common problems is logistical constraints. Some studies, due to their nature or the specific population being studied, find it challenging to implement random assignment effectively.

For instance, in educational settings, logistical issues such as class schedules and school policies might stop the random allocation of students to different teaching methods .

Ethical Dilemmas

Random assignment, while methodologically sound, can also present ethical dilemmas.

In some cases, withholding a potentially beneficial treatment from one of the groups of participants can raise serious ethical questions, especially in medical or clinical research where participants' well-being might be directly affected.

Researchers must navigate these ethical waters carefully, balancing the pursuit of knowledge with the well-being of participants.

Generalizability Concerns

Even when implemented correctly, random assignment does not always guarantee generalizable results.

The types of people in the participant pool, the specific context of the study, and the nature of the variables being studied can all influence the extent to which the findings can be applied to the broader population.

Researchers must be cautious in making broad generalizations from studies, even those employing strict random assignment.

Practical and Real-World Limitations

In the real world, many variables cannot be manipulated for ethical or practical reasons, limiting the applicability of random assignment.

For instance, researchers cannot randomly assign individuals to different levels of intelligence, socioeconomic status, or cultural backgrounds.

This limitation necessitates the use of other research designs, such as correlational or observational studies , when exploring relationships involving such variables.

Response to Critiques

In response to these critiques, people in favor of random assignment argue that the method, despite its limitations, remains one of the most reliable ways to establish cause and effect in experimental research.

They acknowledge the challenges and ethical considerations but emphasize the rigorous frameworks in place to address them.

The ongoing discussion around the limitations and critiques of random assignment contributes to the evolution of the method, making sure it is continuously relevant and applicable in psychological research.

While random assignment is a powerful tool in experimental research, it is not without its critiques and limitations. Implementation issues, ethical dilemmas, generalizability concerns, and real-world limitations can pose significant challenges.

However, the continued discourse and refinement around these issues underline the method's enduring significance in the pursuit of knowledge in psychology.

By being careful with how we do things and doing what's right, random assignment stays a really important part of studying how people act and think.

Real-World Applications and Examples

Random assignment has been employed in many studies across various fields of psychology, leading to significant discoveries and advancements.

Here are some real-world applications and examples illustrating the diversity and impact of this method:

Medicine and Health Psychology: Randomized Controlled Trials (RCTs) are the gold standard in medical research. In these studies, participants are randomly assigned to either the treatment or control group to test the efficacy of new medications or interventions.
Educational Psychology: Studies in this field have used random assignment to explore the effects of different teaching methods, classroom environments, and educational technologies on student learning and outcomes.
Cognitive Psychology: Researchers have employed random assignment to investigate various aspects of human cognition, including memory, attention, and problem-solving, leading to a deeper understanding of how the mind works.
Social Psychology: Random assignment has been instrumental in studying social phenomena, such as conformity, aggression, and prosocial behavior, shedding light on the intricate dynamics of human interaction.

Let's get into some specific examples. You'll need to know one term though, and that is "control group." A control group is a set of participants in a study who do not receive the treatment or intervention being tested , serving as a baseline to compare with the group that does, in order to assess the effectiveness of the treatment.

Smoking Cessation Study: Researchers used random assignment to put participants into two groups. One group received a new anti-smoking program, while the other did not. This helped determine if the program was effective in helping people quit smoking.
Math Tutoring Program: A study on students used random assignment to place them into two groups. One group received additional math tutoring, while the other continued with regular classes, to see if the extra help improved their grades.
Exercise and Mental Health: Adults were randomly assigned to either an exercise group or a control group to study the impact of physical activity on mental health and mood.
Diet and Weight Loss: A study randomly assigned participants to different diet plans to compare their effectiveness in promoting weight loss and improving health markers.
Sleep and Learning: Researchers randomly assigned students to either a sleep extension group or a regular sleep group to study the impact of sleep on learning and memory.
Classroom Seating Arrangement: Teachers used random assignment to place students in different seating arrangements to examine the effect on focus and academic performance.
Music and Productivity: Employees were randomly assigned to listen to music or work in silence to investigate the effect of music on workplace productivity.
Medication for ADHD: Children with ADHD were randomly assigned to receive either medication, behavioral therapy, or a placebo to compare treatment effectiveness.
Mindfulness Meditation for Stress: Adults were randomly assigned to a mindfulness meditation group or a waitlist control group to study the impact on stress levels.
Video Games and Aggression: A study randomly assigned participants to play either violent or non-violent video games and then measured their aggression levels.
Online Learning Platforms: Students were randomly assigned to use different online learning platforms to evaluate their effectiveness in enhancing learning outcomes.
Hand Sanitizers in Schools: Schools were randomly assigned to use hand sanitizers or not to study the impact on student illness and absenteeism.
Caffeine and Alertness: Participants were randomly assigned to consume caffeinated or decaffeinated beverages to measure the effects on alertness and cognitive performance.
Green Spaces and Well-being: Neighborhoods were randomly assigned to receive green space interventions to study the impact on residents’ well-being and community connections.
Pet Therapy for Hospital Patients: Patients were randomly assigned to receive pet therapy or standard care to assess the impact on recovery and mood.
Yoga for Chronic Pain: Individuals with chronic pain were randomly assigned to a yoga intervention group or a control group to study the effect on pain levels and quality of life.
Flu Vaccines Effectiveness: Different groups of people were randomly assigned to receive either the flu vaccine or a placebo to determine the vaccine’s effectiveness.
Reading Strategies for Dyslexia: Children with dyslexia were randomly assigned to different reading intervention strategies to compare their effectiveness.
Physical Environment and Creativity: Participants were randomly assigned to different room setups to study the impact of physical environment on creative thinking.
Laughter Therapy for Depression: Individuals with depression were randomly assigned to laughter therapy sessions or control groups to assess the impact on mood.
Financial Incentives for Exercise: Participants were randomly assigned to receive financial incentives for exercising to study the impact on physical activity levels.
Art Therapy for Anxiety: Individuals with anxiety were randomly assigned to art therapy sessions or a waitlist control group to measure the effect on anxiety levels.
Natural Light in Offices: Employees were randomly assigned to workspaces with natural or artificial light to study the impact on productivity and job satisfaction.
School Start Times and Academic Performance: Schools were randomly assigned different start times to study the effect on student academic performance and well-being.
Horticulture Therapy for Seniors: Older adults were randomly assigned to participate in horticulture therapy or traditional activities to study the impact on cognitive function and life satisfaction.
Hydration and Cognitive Function: Participants were randomly assigned to different hydration levels to measure the impact on cognitive function and alertness.
Intergenerational Programs: Seniors and young people were randomly assigned to intergenerational programs to study the effects on well-being and cross-generational understanding.
Therapeutic Horseback Riding for Autism: Children with autism were randomly assigned to therapeutic horseback riding or traditional therapy to study the impact on social communication skills.
Active Commuting and Health: Employees were randomly assigned to active commuting (cycling, walking) or passive commuting to study the effect on physical health.
Mindful Eating for Weight Management: Individuals were randomly assigned to mindful eating workshops or control groups to study the impact on weight management and eating habits.
Noise Levels and Learning: Students were randomly assigned to classrooms with different noise levels to study the effect on learning and concentration.
Bilingual Education Methods: Schools were randomly assigned different bilingual education methods to compare their effectiveness in language acquisition.
Outdoor Play and Child Development: Children were randomly assigned to different amounts of outdoor playtime to study the impact on physical and cognitive development.
Social Media Detox: Participants were randomly assigned to a social media detox or regular usage to study the impact on mental health and well-being.
Therapeutic Writing for Trauma Survivors: Individuals who experienced trauma were randomly assigned to therapeutic writing sessions or control groups to study the impact on psychological well-being.
Mentoring Programs for At-risk Youth: At-risk youth were randomly assigned to mentoring programs or control groups to assess the impact on academic achievement and behavior.
Dance Therapy for Parkinson’s Disease: Individuals with Parkinson’s disease were randomly assigned to dance therapy or traditional exercise to study the effect on motor function and quality of life.
Aquaponics in Schools: Schools were randomly assigned to implement aquaponics programs to study the impact on student engagement and environmental awareness.
Virtual Reality for Phobia Treatment: Individuals with phobias were randomly assigned to virtual reality exposure therapy or traditional therapy to compare effectiveness.
Gardening and Mental Health: Participants were randomly assigned to engage in gardening or other leisure activities to study the impact on mental health and stress reduction.

Each of these studies exemplifies how random assignment is utilized in various fields and settings, shedding light on the multitude of ways it can be applied to glean valuable insights and knowledge.

Real-world Impact of Random Assignment

Random assignment is like a key tool in the world of learning about people's minds and behaviors. It’s super important and helps in many different areas of our everyday lives. It helps make better rules, creates new ways to help people, and is used in lots of different fields.

Health and Medicine

In health and medicine, random assignment has helped doctors and scientists make lots of discoveries. It’s a big part of tests that help create new medicines and treatments.

By putting people into different groups by chance, scientists can really see if a medicine works.

This has led to new ways to help people with all sorts of health problems, like diabetes, heart disease, and mental health issues like depression and anxiety.

Schools and education have also learned a lot from random assignment. Researchers have used it to look at different ways of teaching, what kind of classrooms are best, and how technology can help learning.

This knowledge has helped make better school rules, develop what we learn in school, and find the best ways to teach students of all ages and backgrounds.

Workplace and Organizational Behavior

Random assignment helps us understand how people act at work and what makes a workplace good or bad.

Studies have looked at different kinds of workplaces, how bosses should act, and how teams should be put together. This has helped companies make better rules and create places to work that are helpful and make people happy.

Environmental and Social Changes

Random assignment is also used to see how changes in the community and environment affect people. Studies have looked at community projects, changes to the environment, and social programs to see how they help or hurt people’s well-being.

This has led to better community projects, efforts to protect the environment, and programs to help people in society.

Technology and Human Interaction

In our world where technology is always changing, studies with random assignment help us see how tech like social media, virtual reality, and online stuff affect how we act and feel.

This has helped make better and safer technology and rules about using it so that everyone can benefit.

The effects of random assignment go far and wide, way beyond just a science lab. It helps us understand lots of different things, leads to new and improved ways to do things, and really makes a difference in the world around us.

From making healthcare and schools better to creating positive changes in communities and the environment, the real-world impact of random assignment shows just how important it is in helping us learn and make the world a better place.

So, what have we learned? Random assignment is like a super tool in learning about how people think and act. It's like a detective helping us find clues and solve mysteries in many parts of our lives.

From creating new medicines to helping kids learn better in school, and from making workplaces happier to protecting the environment, it’s got a big job!

This method isn’t just something scientists use in labs; it reaches out and touches our everyday lives. It helps make positive changes and teaches us valuable lessons.

Whether we are talking about technology, health, education, or the environment, random assignment is there, working behind the scenes, making things better and safer for all of us.

In the end, the simple act of putting people into groups by chance helps us make big discoveries and improvements. It’s like throwing a small stone into a pond and watching the ripples spread out far and wide.

Thanks to random assignment, we are always learning, growing, and finding new ways to make our world a happier and healthier place for everyone!

19+ Experimental Design Examples (Methods + Types)
Cluster Sampling vs Stratified Sampling
41+ White Collar Job Examples (Salary + Path)
47+ Blue Collar Job Examples (Salary + Path)
McDonaldization of Society (Definition + Examples)

Reference this article:

About The Author

Free Personality Test

Free Memory Test

Free IQ Test

PracticalPie.com is a participant in the Amazon Associates Program. As an Amazon Associate we earn from qualifying purchases.

Youtube Facebook Instagram X/Twitter

Psychology Resources

Developmental

Personality

Relationships

Psychologists

Serial Killers

Psychology Tests

Personality Quiz

Memory Test

Depression test

Type A/B Personality Test

Random Assignment in Psychology: Definition & Examples

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

In psychology, random assignment refers to the practice of allocating participants to different experimental groups in a study in a completely unbiased way, ensuring each participant has an equal chance of being assigned to any group.

In experimental research, random assignment, or random placement, organizes participants from your sample into different groups using randomization.

Random assignment uses chance procedures to ensure that each participant has an equal opportunity of being assigned to either a control or experimental group.

The control group does not receive the treatment in question, whereas the experimental group does receive the treatment.

When using random assignment, neither the researcher nor the participant can choose the group to which the participant is assigned. This ensures that any differences between and within the groups are not systematic at the onset of the study.

In a study to test the success of a weight-loss program, investigators randomly assigned a pool of participants to one of two groups.

Group A participants participated in the weight-loss program for 10 weeks and took a class where they learned about the benefits of healthy eating and exercise.

Group B participants read a 200-page book that explains the benefits of weight loss. The investigator randomly assigned participants to one of the two groups.

The researchers found that those who participated in the program and took the class were more likely to lose weight than those in the other group that received only the book.

Importance

Random assignment ensures that each group in the experiment is identical before applying the independent variable.

In experiments , researchers will manipulate an independent variable to assess its effect on a dependent variable, while controlling for other variables. Random assignment increases the likelihood that the treatment groups are the same at the onset of a study.

Thus, any changes that result from the independent variable can be assumed to be a result of the treatment of interest. This is particularly important for eliminating sources of bias and strengthening the internal validity of an experiment.

Random assignment is the best method for inferring a causal relationship between a treatment and an outcome.

Random Selection vs. Random Assignment

Random selection (also called probability sampling or random sampling) is a way of randomly selecting members of a population to be included in your study.

On the other hand, random assignment is a way of sorting the sample participants into control and treatment groups.

Random selection ensures that everyone in the population has an equal chance of being selected for the study. Once the pool of participants has been chosen, experimenters use random assignment to assign participants into groups.

Random assignment is only used in between-subjects experimental designs, while random selection can be used in a variety of study designs.

Random Assignment vs Random Sampling

Random sampling refers to selecting participants from a population so that each individual has an equal chance of being chosen. This method enhances the representativeness of the sample.

Random assignment, on the other hand, is used in experimental designs once participants are selected. It involves allocating these participants to different experimental groups or conditions randomly.

This helps ensure that any differences in results across groups are due to manipulating the independent variable, not preexisting differences among participants.

When to Use Random Assignment

Random assignment is used in experiments with a between-groups or independent measures design.

In these research designs, researchers will manipulate an independent variable to assess its effect on a dependent variable, while controlling for other variables.

There is usually a control group and one or more experimental groups. Random assignment helps ensure that the groups are comparable at the onset of the study.

How to Use Random Assignment

There are a variety of ways to assign participants into study groups randomly. Here are a handful of popular methods:

Random Number Generator : Give each member of the sample a unique number; use a computer program to randomly generate a number from the list for each group.
Lottery : Give each member of the sample a unique number. Place all numbers in a hat or bucket and draw numbers at random for each group.
Flipping a Coin : Flip a coin for each participant to decide if they will be in the control group or experimental group (this method can only be used when you have just two groups)
Roll a Die : For each number on the list, roll a dice to decide which of the groups they will be in. For example, assume that rolling 1, 2, or 3 places them in a control group and rolling 3, 4, 5 lands them in an experimental group.

When is Random Assignment not used?

When it is not ethically permissible: Randomization is only ethical if the researcher has no evidence that one treatment is superior to the other or that one treatment might have harmful side effects.
When answering non-causal questions : If the researcher is just interested in predicting the probability of an event, the causal relationship between the variables is not important and observational designs would be more suitable than random assignment.
When studying the effect of variables that cannot be manipulated: Some risk factors cannot be manipulated and so it would not make any sense to study them in a randomized trial. For example, we cannot randomly assign participants into categories based on age, gender, or genetic factors.

Drawbacks of Random Assignment

While randomization assures an unbiased assignment of participants to groups, it does not guarantee the equality of these groups. There could still be extraneous variables that differ between groups or group differences that arise from chance. Additionally, there is still an element of luck with random assignments.

Thus, researchers can not produce perfectly equal groups for each specific study. Differences between the treatment group and control group might still exist, and the results of a randomized trial may sometimes be wrong, but this is absolutely okay.

Scientific evidence is a long and continuous process, and the groups will tend to be equal in the long run when data is aggregated in a meta-analysis.

Additionally, external validity (i.e., the extent to which the researcher can use the results of the study to generalize to the larger population) is compromised with random assignment.

Random assignment is challenging to implement outside of controlled laboratory conditions and might not represent what would happen in the real world at the population level.

Random assignment can also be more costly than simple observational studies, where an investigator is just observing events without intervening with the population.

Randomization also can be time-consuming and challenging, especially when participants refuse to receive the assigned treatment or do not adhere to recommendations.

What is the difference between random sampling and random assignment?

Random sampling refers to randomly selecting a sample of participants from a population. Random assignment refers to randomly assigning participants to treatment groups from the selected sample.

Does random assignment increase internal validity?

Yes, random assignment ensures that there are no systematic differences between the participants in each group, enhancing the study’s internal validity .

Does random assignment reduce sampling error?

Yes, with random assignment, participants have an equal chance of being assigned to either a control group or an experimental group, resulting in a sample that is, in theory, representative of the population.

Random assignment does not completely eliminate sampling error because a sample only approximates the population from which it is drawn. However, random sampling is a way to minimize sampling errors.

When is random assignment not possible?

Random assignment is not possible when the experimenters cannot control the treatment or independent variable.

For example, if you want to compare how men and women perform on a test, you cannot randomly assign subjects to these groups.

Participants are not randomly assigned to different groups in this study, but instead assigned based on their characteristics.

Does random assignment eliminate confounding variables?

Yes, random assignment eliminates the influence of any confounding variables on the treatment because it distributes them at random among the study groups. Randomization invalidates any relationship between a confounding variable and the treatment.

Why is random assignment of participants to treatment conditions in an experiment used?

Random assignment is used to ensure that all groups are comparable at the start of a study. This allows researchers to conclude that the outcomes of the study can be attributed to the intervention at hand and to rule out alternative explanations for study results.

Random Assignment in Experiments

By Jim Frost 4 Comments

Random assignment uses chance to assign subjects to the control and treatment groups in an experiment. This process helps ensure that the groups are equivalent at the beginning of the study, which makes it safer to assume the treatments caused any differences between groups that the experimenters observe at the end of the study.

photogram of tumbling dice to illustrate a process for random assignment.

Huh? That might be a big surprise! At this point, you might be wondering about all of those studies that use statistics to assess the effects of different treatments. There’s a critical separation between significance and causality:

Statistical procedures determine whether an effect is significant.
Experimental designs determine how confidently you can assume that a treatment causes the effect.

In this post, learn how using random assignment in experiments can help you identify causal relationships.

Correlation, Causation, and Confounding Variables

Random assignment helps you separate causation from correlation and rule out confounding variables. As a critical component of the scientific method , experiments typically set up contrasts between a control group and one or more treatment groups. The idea is to determine whether the effect, which is the difference between a treatment group and the control group, is statistically significant. If the effect is significant, group assignment correlates with different outcomes.

However, as you have no doubt heard, correlation does not necessarily imply causation. In other words, the experimental groups can have different mean outcomes, but the treatment might not be causing those differences even though the differences are statistically significant.

The difficulty in definitively stating that a treatment caused the difference is due to potential confounding variables or confounders. Confounders are alternative explanations for differences between the experimental groups. Confounding variables correlate with both the experimental groups and the outcome variable. In this situation, confounding variables can be the actual cause for the outcome differences rather than the treatments themselves. As you’ll see, if an experiment does not account for confounding variables, they can bias the results and make them untrustworthy.

Related posts : Understanding Correlation in Statistics , Causation versus Correlation , and Hill’s Criteria for Causation .

Example of Confounding in an Experiment

A photograph of vitamin capsules to represent our experiment.

Control group: Does not consume vitamin supplements
Treatment group: Regularly consumes vitamin supplements.

Imagine we measure a specific health outcome. After the experiment is complete, we perform a 2-sample t-test to determine whether the mean outcomes for these two groups are different. Assume the test results indicate that the mean health outcome in the treatment group is significantly better than the control group.

Why can’t we assume that the vitamins improved the health outcomes? After all, only the treatment group took the vitamins.

Related post : Confounding Variables in Regression Analysis

Alternative Explanations for Differences in Outcomes

The answer to that question depends on how we assigned the subjects to the experimental groups. If we let the subjects decide which group to join based on their existing vitamin habits, it opens the door to confounding variables. It’s reasonable to assume that people who take vitamins regularly also tend to have other healthy habits. These habits are confounders because they correlate with both vitamin consumption (experimental group) and the health outcome measure.

Random assignment prevents this self sorting of participants and reduces the likelihood that the groups start with systematic differences.

In fact, studies have found that supplement users are more physically active, have healthier diets, have lower blood pressure, and so on compared to those who don’t take supplements. If subjects who already take vitamins regularly join the treatment group voluntarily, they bring these healthy habits disproportionately to the treatment group. Consequently, these habits will be much more prevalent in the treatment group than the control group.

The healthy habits are the confounding variables—the potential alternative explanations for the difference in our study’s health outcome. It’s entirely possible that these systematic differences between groups at the start of the study might cause the difference in the health outcome at the end of the study—and not the vitamin consumption itself!

If our experiment doesn’t account for these confounding variables, we can’t trust the results. While we obtained statistically significant results with the 2-sample t-test for health outcomes, we don’t know for sure whether the vitamins, the systematic difference in habits, or some combination of the two caused the improvements.

Learn why many randomized clinical experiments use a placebo to control for the Placebo Effect .

Experiments Must Account for Confounding Variables

Your experimental design must account for confounding variables to avoid their problems. Scientific studies commonly use the following methods to handle confounders:

Use control variables to keep them constant throughout an experiment.
Statistically control for them in an observational study.
Use random assignment to reduce the likelihood that systematic differences exist between experimental groups when the study begins.

Let’s take a look at how random assignment works in an experimental design.

Random Assignment Can Reduce the Impact of Confounding Variables

Note that random assignment is different than random sampling. Random sampling is a process for obtaining a sample that accurately represents a population .

Photo of a coin toss to represent how we can incorporate random assignment in our experiment.

Random assignment uses a chance process to assign subjects to experimental groups. Using random assignment requires that the experimenters can control the group assignment for all study subjects. For our study, we must be able to assign our participants to either the control group or the supplement group. Clearly, if we don’t have the ability to assign subjects to the groups, we can’t use random assignment!

Additionally, the process must have an equal probability of assigning a subject to any of the groups. For example, in our vitamin supplement study, we can use a coin toss to assign each subject to either the control group or supplement group. For more complex experimental designs, we can use a random number generator or even draw names out of a hat.

Random Assignment Distributes Confounders Equally

The random assignment process distributes confounding properties amongst your experimental groups equally. In other words, randomness helps eliminate systematic differences between groups. For our study, flipping the coin tends to equalize the distribution of subjects with healthier habits between the control and treatment group. Consequently, these two groups should start roughly equal for all confounding variables, including healthy habits!

Random assignment is a simple, elegant solution to a complex problem. For any given study area, there can be a long list of confounding variables that you could worry about. However, using random assignment, you don’t need to know what they are, how to detect them, or even measure them. Instead, use random assignment to equalize them across your experimental groups so they’re not a problem.

Because random assignment helps ensure that the groups are comparable when the experiment begins, you can be more confident that the treatments caused the post-study differences. Random assignment helps increase the internal validity of your study.

Comparing the Vitamin Study With and Without Random Assignment

Let’s compare two scenarios involving our hypothetical vitamin study. We’ll assume that the study obtains statistically significant results in both cases.

Scenario 1: We don’t use random assignment and, unbeknownst to us, subjects with healthier habits disproportionately end up in the supplement treatment group. The experimental groups differ by both healthy habits and vitamin consumption. Consequently, we can’t determine whether it was the habits or vitamins that improved the outcomes.

Scenario 2: We use random assignment and, consequently, the treatment and control groups start with roughly equal levels of healthy habits. The intentional introduction of vitamin supplements in the treatment group is the primary difference between the groups. Consequently, we can more confidently assert that the supplements caused an improvement in health outcomes.

For both scenarios, the statistical results could be identical. However, the methodology behind the second scenario makes a stronger case for a causal relationship between vitamin supplement consumption and health outcomes.

How important is it to use the correct methodology? Well, if the relationship between vitamins and health outcomes is not causal, then consuming vitamins won’t cause your health outcomes to improve regardless of what the study indicates. Instead, it’s probably all the other healthy habits!

Learn more about Randomized Controlled Trials (RCTs) that are the gold standard for identifying causal relationships because they use random assignment.

Drawbacks of Random Assignment

Random assignment helps reduce the chances of systematic differences between the groups at the start of an experiment and, thereby, mitigates the threats of confounding variables and alternative explanations. However, the process does not always equalize all of the confounding variables. Its random nature tends to eliminate systematic differences, but it doesn’t always succeed.

Sometimes random assignment is impossible because the experimenters cannot control the treatment or independent variable. For example, if you want to determine how individuals with and without depression perform on a test, you cannot randomly assign subjects to these groups. The same difficulty occurs when you’re studying differences between genders.

In other cases, there might be ethical issues. For example, in a randomized experiment, the researchers would want to withhold treatment for the control group. However, if the treatments are vaccinations, it might be unethical to withhold the vaccinations.

Other times, random assignment might be possible, but it is very challenging. For example, with vitamin consumption, it’s generally thought that if vitamin supplements cause health improvements, it’s only after very long-term use. It’s hard to enforce random assignment with a strict regimen for usage in one group and non-usage in the other group over the long-run. Or imagine a study about smoking. The researchers would find it difficult to assign subjects to the smoking and non-smoking groups randomly!

Fortunately, if you can’t use random assignment to help reduce the problem of confounding variables, there are different methods available. The other primary approach is to perform an observational study and incorporate the confounders into the statistical model itself. For more information, read my post Observational Studies Explained .

Read About Real Experiments that Used Random Assignment

I’ve written several blog posts about studies that have used random assignment to make causal inferences. Read studies about the following:

Flu Vaccinations
COVID-19 Vaccinations

Sullivan L. Random assignment versus random selection . SAGE Glossary of the Social and Behavioral Sciences, SAGE Publications, Inc.; 2009.

Reader Interactions

November 13, 2019 at 4:59 am

Hi Jim, I have a question of randomly assigning participants to one of two conditions when it is an ongoing study and you are not sure of how many participants there will be. I am using this random assignment tool for factorial experiments. http://methodologymedia.psu.edu/most/rannumgenerator It asks you for the total number of participants but at this point, I am not sure how many there will be. Thanks for any advice you can give me, Floyd

May 28, 2019 at 11:34 am

Jim, can you comment on the validity of using the following approach when we can’t use random assignments. I’m in education, we have an ACT prep course that we offer. We can’t force students to take it and we can’t keep them from taking it either. But we want to know if it’s working. Let’s say that by senior year all students who are going to take the ACT have taken it. Let’s also say that I’m only including students who have taking it twice (so I can show growth between first and second time taking it). What I’ve done to address confounders is to go back to say 8th or 9th grade (prior to anyone taking the ACT or the ACT prep course) and run an analysis showing the two groups are not significantly different to start with. Is this valid? If the ACT prep students were higher achievers in 8th or 9th grade, I could not assume my prep course is effecting greater growth, but if they were not significantly different in 8th or 9th grade, I can assume the significant difference in ACT growth (from first to second testing) is due to the prep course. Yes or no?

May 26, 2019 at 5:37 pm

Nice post! I think the key to understanding scientific research is to understand randomization. And most people don’t get it.

May 27, 2019 at 9:48 pm

Thank you, Anoop!

I think randomness in an experiment is a funny thing. The issue of confounding factors is a serious problem. You might not even know what they are! But, use random assignment and, voila, the problem usually goes away! If you can’t use random assignment, suddenly you have a whole host of issues to worry about, which I’ll be writing about in more detail in my upcoming post about observational experiments!

Comments and Questions Cancel reply

Random Assignment

Reference work entry
First Online: 01 January 2020
pp 4260–4262
Cite this reference work entry

Sven Hilbert 3 , 4 , 5

117 Accesses

Random assignment defines the assignment of participants of a study to their respective group strictly by chance.

Introduction

Statistical inference is based on the theory of probability, and effects investigated in psychological studies are defined by measures that are treated as random variables. The inference about the probability of a given result with regard to an assumed population and the popular term “significance” are only meaningful and without bias if the measure of interest is really a random variable. To achieve the creation of a random variable in form of a measure derived from a sample of participants, these participants have to be randomly drawn. In an experimental study involving different groups of participants, these participants have to additionally be randomly assigned to one of the groups.

Why Is Random Assignment Crucial for Statistical Inference?

Many psychological investigations, such as clinical treatment studies or neuropsychological training...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime
Available as PDF
Read on any device
Instant download
Own it forever
Available as EPUB and PDF
Durable hardcover edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Kruger, L. (1989). The empire of chance: How probability changed science and everyday-life . Cambridge: New York.

Book Google Scholar

Download references

Author information

Authors and affiliations.

Department of Psychology, Psychological Methods and Assessment, Münich, Germany

Sven Hilbert

Faculty of Psychology, Educational Science, and Sport Science, University of Regensburg, Regensburg, Germany

Psychological Methods and Assessment, LMU Munich, Munich, Germany

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sven Hilbert .

Editor information

Editors and affiliations.

Oakland University, Rochester, MI, USA

Virgil Zeigler-Hill

Todd K. Shackelford

Section Editor information

Humboldt University, Germany, Berlin, Germany

Matthias Ziegler

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry.

Hilbert, S. (2020). Random Assignment. In: Zeigler-Hill, V., Shackelford, T.K. (eds) Encyclopedia of Personality and Individual Differences. Springer, Cham. https://doi.org/10.1007/978-3-319-24612-3_1343

Download citation

DOI : https://doi.org/10.1007/978-3-319-24612-3_1343

Published : 22 April 2020

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-24610-9

Online ISBN : 978-3-319-24612-3

eBook Packages : Behavioral Science and Psychology Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

8.1 Experimental design: What is it and when should it be used?

Learning objectives.

Define experiment
Identify the core features of true experimental designs
Describe the difference between an experimental group and a control group
Identify and describe the various types of true experimental designs

Experiments are an excellent data collection strategy for social workers wishing to observe the effects of a clinical intervention or social welfare program. Understanding what experiments are and how they are conducted is useful for all social scientists, whether they actually plan to use this methodology or simply aim to understand findings from experimental studies. An experiment is a method of data collection designed to test hypotheses under controlled conditions. In social scientific research, the term experiment has a precise meaning and should not be used to describe all research methodologies.

Experiments have a long and important history in social science. Behaviorists such as John Watson, B. F. Skinner, Ivan Pavlov, and Albert Bandura used experimental design to demonstrate the various types of conditioning. Using strictly controlled environments, behaviorists were able to isolate a single stimulus as the cause of measurable differences in behavior or physiological responses. The foundations of social learning theory and behavior modification are found in experimental research projects. Moreover, behaviorist experiments brought psychology and social science away from the abstract world of Freudian analysis and towards empirical inquiry, grounded in real-world observations and objectively-defined variables. Experiments are used at all levels of social work inquiry, including agency-based experiments that test therapeutic interventions and policy experiments that test new programs.

Several kinds of experimental designs exist. In general, designs considered to be true experiments contain three basic key features:

random assignment of participants into experimental and control groups
a “treatment” (or intervention) provided to the experimental group
measurement of the effects of the treatment in a post-test administered to both groups

Some true experiments are more complex. Their designs can also include a pre-test and can have more than two groups, but these are the minimum requirements for a design to be a true experiment.

Experimental and control groups

In a true experiment, the effect of an intervention is tested by comparing two groups: one that is exposed to the intervention (the experimental group , also known as the treatment group) and another that does not receive the intervention (the control group ). Importantly, participants in a true experiment need to be randomly assigned to either the control or experimental groups. Random assignment uses a random number generator or some other random process to assign people into experimental and control groups. Random assignment is important in experimental research because it helps to ensure that the experimental group and control group are comparable and that any differences between the experimental and control groups are due to random chance. We will address more of the logic behind random assignment in the next section.

Treatment or intervention

In an experiment, the independent variable is receiving the intervention being tested—for example, a therapeutic technique, prevention program, or access to some service or support. It is less common in of social work research, but social science research may also have a stimulus, rather than an intervention as the independent variable. For example, an electric shock or a reading about death might be used as a stimulus to provoke a response.

In some cases, it may be immoral to withhold treatment completely from a control group within an experiment. If you recruited two groups of people with severe addiction and only provided treatment to one group, the other group would likely suffer. For these cases, researchers use a control group that receives “treatment as usual.” Experimenters must clearly define what treatment as usual means. For example, a standard treatment in substance abuse recovery is attending Alcoholics Anonymous or Narcotics Anonymous meetings. A substance abuse researcher conducting an experiment may use twelve-step programs in their control group and use their experimental intervention in the experimental group. The results would show whether the experimental intervention worked better than normal treatment, which is useful information.

The dependent variable is usually the intended effect the researcher wants the intervention to have. If the researcher is testing a new therapy for individuals with binge eating disorder, their dependent variable may be the number of binge eating episodes a participant reports. The researcher likely expects her intervention to decrease the number of binge eating episodes reported by participants. Thus, she must, at a minimum, measure the number of episodes that occur after the intervention, which is the post-test . In a classic experimental design, participants are also given a pretest to measure the dependent variable before the experimental treatment begins.

Types of experimental design

Let’s put these concepts in chronological order so we can better understand how an experiment runs from start to finish. Once you’ve collected your sample, you’ll need to randomly assign your participants to the experimental group and control group. In a common type of experimental design, you will then give both groups your pretest, which measures your dependent variable, to see what your participants are like before you start your intervention. Next, you will provide your intervention, or independent variable, to your experimental group, but not to your control group. Many interventions last a few weeks or months to complete, particularly therapeutic treatments. Finally, you will administer your post-test to both groups to observe any changes in your dependent variable. What we’ve just described is known as the classical experimental design and is the simplest type of true experimental design. All of the designs we review in this section are variations on this approach. Figure 8.1 visually represents these steps.

Steps in classic experimental design: Sampling to Assignment to Pretest to intervention to Posttest

An interesting example of experimental research can be found in Shannon K. McCoy and Brenda Major’s (2003) study of people’s perceptions of prejudice. In one portion of this multifaceted study, all participants were given a pretest to assess their levels of depression. No significant differences in depression were found between the experimental and control groups during the pretest. Participants in the experimental group were then asked to read an article suggesting that prejudice against their own racial group is severe and pervasive, while participants in the control group were asked to read an article suggesting that prejudice against a racial group other than their own is severe and pervasive. Clearly, these were not meant to be interventions or treatments to help depression, but were stimuli designed to elicit changes in people’s depression levels. Upon measuring depression scores during the post-test period, the researchers discovered that those who had received the experimental stimulus (the article citing prejudice against their same racial group) reported greater depression than those in the control group. This is just one of many examples of social scientific experimental research.

In addition to classic experimental design, there are two other ways of designing experiments that are considered to fall within the purview of “true” experiments (Babbie, 2010; Campbell & Stanley, 1963). The posttest-only control group design is almost the same as classic experimental design, except it does not use a pretest. Researchers who use posttest-only designs want to eliminate testing effects , in which participants’ scores on a measure change because they have already been exposed to it. If you took multiple SAT or ACT practice exams before you took the real one you sent to colleges, you’ve taken advantage of testing effects to get a better score. Considering the previous example on racism and depression, participants who are given a pretest about depression before being exposed to the stimulus would likely assume that the intervention is designed to address depression. That knowledge could cause them to answer differently on the post-test than they otherwise would. In theory, as long as the control and experimental groups have been determined randomly and are therefore comparable, no pretest is needed. However, most researchers prefer to use pretests in case randomization did not result in equivalent groups and to help assess change over time within both the experimental and control groups.

Researchers wishing to account for testing effects but also gather pretest data can use a Solomon four-group design. In the Solomon four-group design , the researcher uses four groups. Two groups are treated as they would be in a classic experiment—pretest, experimental group intervention, and post-test. The other two groups do not receive the pretest, though one receives the intervention. All groups are given the post-test. Table 8.1 illustrates the features of each of the four groups in the Solomon four-group design. By having one set of experimental and control groups that complete the pretest (Groups 1 and 2) and another set that does not complete the pretest (Groups 3 and 4), researchers using the Solomon four-group design can account for testing effects in their analysis.

Table 8.1 Solomon four-group design

Group 1	X	X	X
Group 2	X		X
Group 3		X	X
Group 4			X

Solomon four-group designs are challenging to implement in the real world because they are time- and resource-intensive. Researchers must recruit enough participants to create four groups and implement interventions in two of them.

Overall, true experimental designs are sometimes difficult to implement in a real-world practice environment. It may be impossible to withhold treatment from a control group or randomly assign participants in a study. In these cases, pre-experimental and quasi-experimental designs–which we will discuss in the next section–can be used. However, the differences in rigor from true experimental designs leave their conclusions more open to critique.

Experimental design in macro-level research

You can imagine that social work researchers may be limited in their ability to use random assignment when examining the effects of governmental policy on individuals. For example, it is unlikely that a researcher could randomly assign some states to implement decriminalization of recreational marijuana and some states not to in order to assess the effects of the policy change. There are, however, important examples of policy experiments that use random assignment, including the Oregon Medicaid experiment. In the Oregon Medicaid experiment, the wait list for Oregon was so long, state officials conducted a lottery to see who from the wait list would receive Medicaid (Baicker et al., 2013). Researchers used the lottery as a natural experiment that included random assignment. People selected to be a part of Medicaid were the experimental group and those on the wait list were in the control group. There are some practical complications macro-level experiments, just as with other experiments. For example, the ethical concern with using people on a wait list as a control group exists in macro-level research just as it does in micro-level research.

Key Takeaways

True experimental designs require random assignment.
Control groups do not receive an intervention, and experimental groups receive an intervention.
The basic components of a true experiment include a pretest, posttest, control group, and experimental group.
Testing effects may cause researchers to use variations on the classic experimental design.
Classic experimental design- uses random assignment, an experimental and control group, as well as pre- and posttesting
Control group- the group in an experiment that does not receive the intervention
Experiment- a method of data collection designed to test hypotheses under controlled conditions
Experimental group- the group in an experiment that receives the intervention
Posttest- a measurement taken after the intervention
Posttest-only control group design- a type of experimental design that uses random assignment, and an experimental and control group, but does not use a pretest
Pretest- a measurement taken prior to the intervention
Random assignment-using a random process to assign people into experimental and control groups
Solomon four-group design- uses random assignment, two experimental and two control groups, pretests for half of the groups, and posttests for all
Testing effects- when a participant’s scores on a measure change because they have already been exposed to it
True experiments- a group of experimental designs that contain independent and dependent variables, pretesting and post testing, and experimental and control groups

Image attributions

exam scientific experiment by mohamed_hassan CC-0

Foundations of Social Work Research Copyright © 2020 by Rebecca L. Mauldin is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Elements of Research

Random assignment is a procedure used in experiments to create multiple study groups that include participants with similar characteristics so that the groups are equivalent at the beginning of the study. The procedure involves assigning individuals to an experimental treatment or program at random, or by chance (like the flip of a coin). This means that each individual has an equal chance of being assigned to either group. Usually in studies that involve random assignment, participants will receive a new treatment or program, will receive nothing at all or will receive an existing treatment. When using random assignment, neither the researcher nor the participant can choose the group to which the participant is assigned.

The benefit of using random assignment is that it “evens the playing field.” This means that the groups will differ only in the program or treatment to which they are assigned. If both groups are equivalent except for the program or treatment that they receive, then any change that is observed after comparing information collected about individuals at the beginning of the study and again at the end of the study can be attributed to the program or treatment. This way, the researcher has more confidence that any changes that might have occurred are due to the treatment under study and not to the characteristics of the group.

A potential problem with random assignment is the temptation to ignore the random assignment procedures. For example, it may be tempting to assign an overweight participant to the treatment group that includes participation in a weight-loss program. Ignoring random assignment procedures in this study limits the ability to determine whether or not the weight loss program is effective because the groups will not be randomized. Research staff must follow random assignment protocol, if that is part of the study design, to maintain the integrity of the research. Failure to follow procedures used for random assignment prevents the study outcomes from being meaningful and applicable to the groups represented.

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
HHS Author Manuscripts

Establishing Equivalence: Methodological Progress in Group-Matching Design and Analysis

Sara t. kover.

University of Wisconsin-Madison, Waisman Center, 1500 Highland Avenue, Madison, WI 53705

Amy K. Atwood

University of Wisconsin-Madison

This methodological review draws attention to the challenges faced by intellectual and developmental disabilities researchers in the appropriate design and analysis of group comparison studies. We provide a brief overview of matching methodologies in the field, emphasizing group-matching designs utilized in behavioral research on cognition and language in neurodevelopmental disorders, including autism spectrum disorder, fragile X syndrome, Down syndrome, and Williams syndrome. The limitations of relying on p -values to establish group equivalence are discussed in the context of other existing methods: equivalence tests, propensity scores, and regression-based analyses. Our primary recommendation for advancing research on intellectual and developmental disabilities is the use of descriptive indices of adequate group matching: effect sizes (i.e., standardized mean differences) and variance ratios.

With the ultimate goal of understanding their causal effects on development, much of behavioral research on intellectual and developmental disabilities (IDDs) is designed to (1) characterize phenotypic strengths and weaknesses in behavior and cognition and/or (2) identify syndrome-specific aspects of these profiles. Such aims are often addressed with group-matching designs, in which statistical comparisons between nonrandomized groups (e.g., autism spectrum disorder [ASD] versus typical development) provide the basis for conclusions. Despite the considerable attention matching has received ( Abbeduto, 2010 ; Burack, 2004 ), methodological issues in group matching remain at the forefront of concerns regarding the progress of behavioral research on neurodevelopmental disorders ( Beeghly, 2006 ; Eigsti, de Marchena, Schuh, & Kelley, 2011 ).

The purpose of this article is to introduce methodological improvements to group-matching designs frequently used in IDD research. To that end, we discuss the pitfalls of common group-matching strategies and suggest metrics for establishing adequate group equivalence that are not novel, but are new to the field: effect sizes and variance ratios. Because our primary goal is to provide a foundation from which informed decisions on research design, analysis, and interpretation can be made, we highlight several other study designs worthy of consideration. We conclude by emphasizing the need for thoughtful research questions and responsible use of equivalence thresholds.

Challenges of Group Matching in IDD Research

Frameworks for causality.

The ability to draw conclusions about causality has traditionally hinged upon random assignment of participants to the target group (e.g., treatment, intervention, diagnosis—in our case) and comparison group. Properly implemented, random assignment allows estimation of causal effects because it ensures, in the long run, that any differences between the target and comparison groups (i.e., bias or selection bias) aside from group assignment prior to the study are due to chance. One approach to causality, the Rubin Causal Model, defines the causal effect—that is, the effect of a manipulable treatment—in terms of potential outcomes: what the outcome would have been for participants in the comparison group had they received the treatment and what the outcome would have been for those in the treatment group had they not received it ( Holland, 1986 ; Rubin, 1974 ). In quasi-experimental designs (e.g., regression discontinuity, interrupted time series), it is possible to test hypotheses about the effects of causes without random assignment. A nonequivalent control group design is one that seeks to remove the bias associated with nonrandom assignment by matching the target and comparison groups to establish equivalence, or balance ( Shadish, Cook, & Campbell, 2002 ).

Methods in IDD Research

Although IDDs are attributable to neurodevelopmental disorders, those disorders can scarcely be considered manipulable causes. Research on IDDs is further constrained by ethical parameters (e.g., inability to randomly assign the circumstances that lead to a diagnosis of fetal alcohol spectrum disorder) and relatively small samples due to low prevalence. As such, the use of more desirable techniques, such as random assignment or sophisticated matching that relies on large datasets, is precluded. Instead, in the simplest and perhaps most common group-matching design in the field, two groups composed of participants with preexisting diagnoses are matched on a single variable, such as nonverbal cognitive ability, and then compared on some dependent variable of interest, such as vocabulary ability. These groups are selected in such a manner as to presume they are equivalent on a dimension of ability thought to be relevant to the dependent variable of interest. Differences between groups on the dependent variable are taken to indicate strengths or weaknesses on the construct of interest relative to the matching construct. How to select constructs and variables on which to match is discussed elsewhere and is beyond the current scope (see Burack, Iarocci, Bowler, & Mottron, 2002 ; Mervis & Klein-Tasman, 2004 ; Mervis & Robinson, 1999 ; Strauss, 2001 ). We focus here on a specific aspect of matching: establishing when groups are equivalent.

A customary group-matching procedure is to iteratively exclude participants from one or both groups until an independent samples t -test of the group mean difference on the matching variable yields a sufficiently high p -value, showing that the groups do not significantly differ. This process is achieved by first testing the difference between the groups on the matching variable. For example, a hypothetical target group of 30 participants with a mean score of 60.10 on the matching variable would not be considered matched to a comparison group of 30 participants with a mean score of 71.70 because the p -value for the t -test on the matching variable is less than .05 (hypothetical data are given in the Appendix ). Two matched groups might be attained by next removing all participants outside of the overlapping range of the groups or according to some other criterion, and then testing the group difference again, usually yielding iteratively higher p -values ( Mervis & John, 2008 ). This procedure might be repeated an unreported number of times by a researcher and usually yields higher p -values as participants are removed.

P -value Thresholds

The most persuasive standard in the field for group matching has been generated by Mervis and colleagues ( Mervis & Klein-Tasman, 2004 , Mervis & Robinson, 1999 ), who drew important attention to the matching procedures used to study individuals with IDDs. Mervis and colleagues highlighted that accepting groups as matched when a t -test on the matching variable yields a p -value greater than .05 is not sufficient ( Mervis & Klein-Tasman, 2004 ; Mervis & Robinson, 1999 ). As such, they substantially improved upon the common practice of accepting the null hypothesis that population means are equal given any nonsignficant p -value. Mervis and Klein-Tasman (2004) suggested that when considering a p -value threshold for matching groups, “…it is important to show that the group distributions on the matching variable overlap strongly, as evidenced, we suggest, by a p -level of at least .50 on the test of mean differences for the control variable(s),” (p. 9). While p -values below .05 are taken as clear indication of a group difference, Mervis and colleagues (2004 , 1999 ) proposed that p -values between .20 and .50 are ambiguous; p -values of .50 and above are sufficient evidence of equivalence. The .50 p -value threshold was based on Frick’s (1995) “good-effort criterion” for accepting the null hypothesis, which included p -value thresholds in combination with a small effect.

According to Mervis’ guideline, groups might be considered matched on a measure of cognitive ability only when the p -value for the test of the group difference on the matching variable is greater than or equal to .50. In our hypothetical example, a subset of participants ( n = 20 from each group) could be selected to improve the overlap of the groups on the matching variable by removing the lowest scoring participants of the target group and the highest scoring participants of the comparison group, yielding a mean of 68.10 for the target group and a mean of 67.10 for the comparison group. The t -test on the matching variable for these subgroups is not significant, but instead gives a p -value of .55, which is a value that might be taken as evidence that the groups are matched.

The Trouble with P -value Thresholds

Mervis and colleagues (2004 , 1999 ) were correct in emphasizing that groups ought not be considered matched solely on the basis of a p -value greater than .05. Most importantly, their recommendations increased awareness in the field that some p -values should lead to a conclusion of failing to reject the null hypothesis that the population means are equal. Nonetheless the p -value threshold proposed by Mervis et al. (2004 , 1999 ) and Frick (1995) is not without limitations ( Edgell, 1995 ; Imai, King, & Stuart, 2008 ). Difficulties hinge around the interpretation of p -values and the role of power in hypothesis testing.

Interpretation of a P -value

A p -value is defined as the probability of observing the sampled data or data more extreme, given that the null hypothesis is true. The hypotheses in question in the traditional matching procedure are H 0 : Δ = 0 and H 1 : Δ ≠ 0, where Δ is the population mean difference on the matching variable. The p -value represents the probability of observing the sample mean difference (or one more extreme) when the population mean difference is zero. When there is no difference in the population on the matching variable, one would expect a p -value to be less than or equal to .05 exactly 5% of the time. As such, using p > .05 as a threshold for declaring groups to be matched will result in groups being considered matched 95% of the time when there is no true population mean difference. This can be seen in the first panel of Figure 1 . Likewise, one would expect a p -value to be less than or equal to .20 exactly 20% of the time; a p -value of less than or equal to .50 would be expected 50% of the time when the populations are equivalent. Thus, using a matching threshold of p ≥ .50, one would conclude that the group samples are matched just 50% of the time on average when the groups are truly matched in the population.

An external file that holds a picture, illustration, etc.
Object name is nihms906034f1.jpg

Proportion of samples considered to be adequately matched according to p -value thresholds by sample size and population effect size (Cohen’s d ).

Failure to reject H 0 :Δ = 0 does not allow one to conclude that the groups come from populations with the same mean because a p -value denotes nothing about the probability of the truth of H 0 given the observed data ( Schervish, 1996 ). Null hypothesis significance testing allows for rejecting or failing to reject the null hypothesis; the option of accepting the null hypothesis simply does not exist. Thus, observing a p -value of .50 leads only to a conclusion that the groups are not significantly different (i.e., a failure to reject the null hypothesis).

Power and P -values

It is possible to observe a large p -value due to a lack of effect, due to a small effect, and/or due to the lack of power to detect that effect because of a small sample size. Eliminating participants until a test of the mean difference results in a large enough p -value may decrease the difference between the observed sample means for the matching variable, but also decreases the power (due to the decreasing sample size) to detect any effect at all, including on the dependent variable of interest. In other words, increasing p -values on the matching variable may have less to do with achieving equivalence between groups and more to do with a reduction in power, particularly when sample sizes are initially small. Mervis and John (2008) provide a substantive example of this for a sample of participants with Williams syndrome.

Impact of sample size on p -value threshold matching

To further illustrate these difficulties, we simulated the process of sampling groups of various sizes from populations with known mean differences in a Monte Carlo simulation using R version 2.10.1 ( R Development Core Team, 2010 ). Over 10,000 iterations, we tracked the frequency with which various p -value matching thresholds resulted in concluding that the groups were matched. We defined groups to be matched when a t -test comparing group means on the matching variable resulted in a p -value greater than p -value thresholds of .05, .20, and .50. We based the matching variable on a standardized assessment of cognitive ability, such as IQ ( M = 100, SD = 15), because it is often used in this context and is readily interpretable. The true difference in populations was calculated in terms of the standardized mean difference effect size, ranging from 0 (i.e., no difference) to 0.5 (i.e., a medium effect size using Cohen's [1988] guidelines).

The impact of sample size on the p -value threshold method for determining when groups are matched becomes apparent when the population mean difference is truly greater than zero. As seen in Figure 1 , when the population mean difference is a medium effect of d = 0.50, using a threshold of p ≥ .50, one concludes that the groups are matched between 5% and 35% of the time across sample sizes of 10 to 50. This notable discrepancy in rate of meeting the p -value threshold and concluding that the groups are matched is due to variability in sample size alone.

Inferential statistics should not be used in isolation for establishing equivalence because the results of a t -test hinge on both the observed mean difference between groups and the statistical power of the test, which has a direct relation to sample sizes—dropping participants reduces power and increases a p -value without respect to the mean difference ( Imai et al., 2008 ).

Improved Equivalence Thresholds: Recommendations

Descriptive statistics for group matching.

Contemporary reviews of matching methodologies highlight descriptive statistics pre- and post-matching as an alternative to inferential statistics for determining the adequacy of group equivalence ( Steiner & Cook, 2012 ; Stuart, 2010 ). As the basis for equivalence thresholds—a term from Stegner et al. (1996) —for IDD research, we suggest two descriptive metrics: effect sizes (i.e., standardized mean differences) and variance ratios. These metrics are used widely in quasi-experimental designs with propensity score analysis, which we describe below. Importantly, effect sizes and variance ratios yield interpretable estimates of group matching adequacy and reduce the influence of sample size ( Breaugh & Arnold, 2007 ; Imai et al., 2008 ).

Sometimes referred to as standardized bias, standardized mean differences are a simple and effective index of matching ( Rosenbaum & Rubin, 1985 ; Rubin, 2001 ). Where x ̄ t and x ̄ c are the means of the target and comparison groups on the matching variable, respectively, and s 2 is the corresponding variance, Cohen’s d should be calculated as ( x t ¯ − x c ¯ ) / s t 2 + s c 2 / 2 , when population variances are assumed equal a priori and sample sizes are equal. Note that Cohen’s d is calculated as ( x t ¯ − x c ¯ ) / [ ( n t − 1 ) s t 2 + ( n c − 1 ) s c 2 ] / [ n t + n c − 2 ] for equal or unequal sample sizes, but the formula simplifies as above when sample sizes are equal. When variances are not assumed equal and/or interpreting the mean difference with respect to the variance of the comparison group is preferred, Cohen’s d can be calculated as ( x t ¯ − x c ¯ ) / s c 2 . For our hypothetical example of the n = 20 groups (see Appendix ), Cohen’s d is: ( 68.10 − 67.10 ) / ( 36 + 20 ) / 2 = .19 . Effect sizes should be reported as best practice for tests of the dependent variable ( American Psychological Association, 2010 ; Bakeman, 2006 ), but also reporting standardized mean differences alongside p -values on the matching variable provides context to the comparison of groups. The strategy of reporting effect sizes has been utilized in investigations of language and cognitive abilities in boys with ASD to aid the reader in interpreting the equivalence between the target and comparison samples ( Brown, Aczel, Jimenez, Kaufman, & Grant, 2010 ; Kover, McDuffie, Hagerman, & Abbeduto, under revision ).

The weaknesses of matching groups on just one aspect of their distributions (i.e., means) has been noted ( Facon, Magis, & Belmont, 2011 ); however, using p -value threshold tests on variance, skewness, and kurtosis may exacerbate the issues associated with using p -value thresholds for means alone. We do not recommend p -value threshold matching for variances in addition to means. Rather, we favor Rubin’s (2001) guideline, which avoids use of an inferential statistic: reporting the ratio of the variance of the target group to the variance of the comparison group. The variance ratio should be calculated as: s t 2 / s c 2 . For our hypothetical example of the n = 20 groups, the variance ratio is: 36.00/20.20 = 1.78.

Thresholds for Effects Sizes and Variance Ratios

Of course, the issue of where to set the equivalence thresholds for effect sizes and variance ratios remains ( Shadish & Steiner, 2010 ). Researchers will need to decide on meaningful thresholds based on seminal substantive studies because general guidelines are not universally applicable and should be used only when other references are not available ( Cohen, 1988 ):

"The terms 'small,' 'medium,' and 'large' are relative, not only to each other, but to the area of behavioral science or even more particularly to the specific content and research method being employed in any given investigation…In the face of this relativity, there is a certain risk inherent in offering conventional operational definitions for these terms for use in power analysis in as diverse a field of inquiry as behavioral science. This risk is nevertheless accepted in the belief that more is to be gained than lost by supplying a common conventional frame of reference which is recommended for use only when no better bases for estimating the ES [effect size] index is available" (p. 25).

An adequately small effect size for matched groups might be defined as the smallest value at which a difference in groups would be clinically meaningful ( Piaggio et al., 2006 ). Rubin (2001) proposed that the standardized mean difference be close to zero (less than half a standard deviation apart; d ≤ .5) and that the ratio of variances be near 1 (0.5 and 2 serve as endpoints that indicate very poor variance matches). Others have been more specific in defining equivalence as a standardized mean difference near zero such that it is within .1 standard deviation and a variance ratio greater than .8 and less than 1.25 ( Steiner, Cook, Shadish, & Clark, 2010 ). In research on ASD, a Cohen’s d of less than .20 has been described as trivial, but this threshold has yet to be evaluated in terms of group-matching adequacy ( Cicchetti et al., 2011 ). Steiner and Cook (2012) point out that a given effect size on the matching variable must be interpreted together with the expected effect size of the variable of interest (e.g., a Cohen’s d of 0.15 on the matching variable would not be sufficiently small if the effect of interest was expected to be 0.20).

We suggest that groups be considered adequately matched when they fall within the field’s standards for both the absolute value of Cohen’s d and the variance ratio. Table 1 lists a variety of effect sizes and variance ratios with illustrative corresponding means and variances on a matching variable for two groups. A Cohen’s d of 0.00 reflects well-matched group means; a Cohen’s d of 1.00 reflects poorly matched groups. A variance ratio of 1 indicates no difference in variances; a ratio of 2 reflects an unacceptable magnitude of difference in the spread of the distributions. For our hypothetical example, the effect size of .19 might be sufficiently small in some contexts for some researchers to conclude that the groups are matched, but taken together with the variance ratio of 1.78, it is unlikely that these two groups should be considered matched in most fields of study.

Example Standardized Mean Differences (Cohen’s d ) and Variance Ratios as Thresholds

Means				Variances

	Target Group	Comparison Group	Adequacy	Ratio	Target Group	Comparison Group	Adequacy

0.00	100	100	Matched	1.00	225	225	Matched
0.05	99.25	100	Matched	1.10	247.5	225	Matched
0.13	98	100	Matched	1.20	270	225	Matched
0.20	97	100	Not Matched	1.33	299.25	225	Not Matched
0.33	95	100	Not Matched	1.5	337.5	225	Not Matched
0.50	92.5	100	Not Matched	1.75	394	225	Not Matched
1.00	85	100	Not Matched	2.00	450	225	Not Matched

Note . Matched-group adequacy should be evaluated with respect to both means (i.e., absolute value of the effect size) and variances. We emphasize that decisions regarding the adequacy of group matches must be reached through consensus within individual fields; we merely provide starting points based on Rubin (2001) and Steiner and Cook (2012) . Sample statistics reflect a matching variable with M = 100 and SD = 15.

Although negotiating appropriate equivalence thresholds will be far from a trivial feat, these descriptive indices of group matching have several strengths. First, effect sizes are less directly affected by sample size than are p -values. Second, effect sizes and variance ratios can be used in combination with other metrics of equivalence, including visual inspection of plots and p -values from the t -test on the matching variable. Furthermore, because means and standard deviations are usually reported for matching variables in published studies, an interested reader can calculate effect sizes and variance ratios to aid in interpreting extant findings. In Table 2 , we summarize the strengths and weaknesses of the indices of equivalence discussed, as well as the methods described in the next section.

Brief Summary of Strengths and Weaknesses of Methodologies for IDD Research

Method	Strength	Weakness
Single-variable matching
-value threshold	Widely used in IDD research	Violates underlying logic of hypothesis testing; matching conclusions depend on sample size
Effect size threshold	Avoids direct influence of sample size on matching decision	Not sufficient without other evidence (e.g., variance ratio, plot)
Variance ratio threshold	Descriptive criterion for width of distributions	Not sufficient without other evidence (e.g., effect size, plot)
Equivalence testing	Allows appropriate logical use of -values to indicate statistical equivalence	Large sample sizes usually required
Propensity score matching	Sophisticated analysis for causal inference	Many participants and many measures usually required; establishing balance is still subjective
ANCOVA	Simple implementation	Assumptions may be violated for IDD participant samples; complex interpretations of findings
Developmental trajectories ( ; )	Accessible, theoretically-motivated analysis	Large comparison group required

Existing Methodologies Applied to IDD Research

Simple group-matching designs are ubiquitous in research on IDDs; however, other methodological options are available. We briefly describe three classes of methodologies with strengths and weaknesses that may be unfamiliar to the reader: equivalence tests, propensity score matching, and regression-based techniques.

Equivalence Tests

Often used in medical studies to demonstrate that the difference between two treatments does not exceed some clinically meaningful equivalence threshold, equivalence tests can also be applied to behavioral research ( Rogers, Howard, & Vessey, 1993 ; Serlin & Lapsley, 1985 ; Stegner et al., 1996 ). Schuirmann (1987) suggested a “two one-sided tests” procedure wherein one may conclude that Δ lies within the equivalence bounds (−Δ B , Δ B ) by simultaneously rejecting both H 0 : Δ ≥ −Δ B and H 0 : Δ ≤ Δ B . For Westlake’s (1979) confidence interval method, equivalence is established if the confidence interval (constructed in the usual manner, but with coverage of 0.90) for Δ ̂ , the observed mean difference, falls entirely within the equivalence bounds (−Δ B , Δ B ). Finally, the range equivalence confidence intervals proposed by Serlin and Lapsley (1985 ; 1993 ) stem from a good-enough principle and provide an additional alternative to "strawman" point null hypothesis testing. It is important to note that limited sample sizes may prevent equivalence methods from having the necessary power to detect a truly ‘trivial’ effect, or else triviality may need to be set at a higher magnitude than would be desired. For example, Brown et al. (2010) concluded based on equivalence testing that implicit learning is unimpaired in individuals with ASD relative to typical development, though their choice of threshold value may have been unusually large.

Propensity Scores

The state-of-the-art for matching nonequivalent control groups in quasi-experimental design is propensity score analysis. With the goal of removing selection bias by modeling the probability of being in the target group, propensity scores are aggregated variables that predict group membership using logistic regression ( Fraser & Guo, 2009 ; Shadish et al., 2002 ). Propensity score analysis involves creating a single score from many variables that could be related to group membership and then matching the groups on those propensity scores ( Fraser & Guo, 2009 ). The nonequivalent control groups are often matched utilizing algorithms that, for example, select comparison participants who have scores within a defined absolute difference from a given target participant (i.e., caliper matching) or minimize the total difference between pairs of target and comparison participants (i.e., optimal matching; Rosenbaum, 1989 ).

Propensity scores are best suited to the analysis of large datasets in which it is reasonable to assume that all variables relevant to group membership have been measured and those with complete overlap between the groups on the range of propensity scores ( Shadish et al., 2002 ). In addition, propensity score analysis may be no better than regression techniques unless the primary concern is the large number of matching variables included in the analysis ( Shadish & Steiner, 2010 ). Despite the fact that these conditions are rarely met in IDD research, there are cases in which propensity score matching has been applied. For example, Blackford (2009) used propensity score matching for data from State of Tennessee administrative databases to test whether infants with Down syndrome have lower birth weight than those without. Unfortunately, such large databases are yet unavailable to answer many research questions relevant to neurodevelopmental disorders.

Importantly, matching groups on propensity scores escapes neither the problem of having a satisfactory way to determine when groups are adequately matched nor other problems associated with matching groups on a single variable. Even when using large samples and sophisticated matching algorithms, matching can be problematic when the populations of interest do not completely overlap in range. As such, group-matching procedures can lead researchers to analyze data from samples of participants that are not representative of the populations from which they are drawn or to which the researcher wishes to generalize ( Shadish et al., 2002 ). Furthermore, when participants are chosen from the ends of their distributions due to matching criteria and when matching variables are measured with error, regression to the mean is of concern because participants selected for their extreme, apparently nonrepresentative scores are likely to have less extreme scores on the dependent variable and/or over time ( Breaugh & Arnold, 2007 ; Marsh, 1998 ; Shadish et al., 2002 ). Thus, propensity scores are not a panacea for researchers interested in a single matching construct or those with limited resources to collect large samples with all measurable variables relevant to group membership.

Regression-based Methods

Analysis of covariance (ANCOVA) is sometimes used as an alternative to group matching. ANCOVA is ideal for reducing sampling bias due to variability between groups in experimental designs when unbalance occurs due to chance. Assumptions of ANCOVA include: group membership independent of the covariate, linearly related predictor and outcome, and identical slopes for the groups between the covariate and the dependent variable. When used with preexisting groups, a researcher can expect difficult interpretation, at best, and spurious findings, at worst, because ANCOVA attempts to adjust or control for part of what the group effect is thought to be ( Brock, Jarrold, Farran, Laws, & Riby, 2007 ; Miller & Chapman, 2001 ). For neurodevelopmental disorders, the “selection bias” being removed is often integrally related to the causal effect of interest (e.g., background genes, maternal interaction styles, family stress, world experiences; see Newcombe, 2003 for an example related to children's socioemotional adjustment). In these cases, statistical adjustments between groups diminish true population differences that are attributes of the disorder, yielding uninterpretable results ( Dennis et al., 2009 ; Miller & Chapman, 2001 ; Tupper & Rosenblood, 1984 ). A strong argument has been made in particular against the use of IQ as a covariate in studies of neurodevelopmental disorders because it is inseparable from the disorder itself ( Dennis et al., 2009 ).

More generally, the process of choosing a matching variable or covariate should be deliberate. Preliminary tests of significance—including tests on the matching variable to decide whether it should be used as a covariate—are not recommended ( Atwood, Swoboda, & Serlin, 2010 ; Zimmerman, 2004 ). Above all, the choice of covariate or matching variable is likely to have a greater impact on the conclusions drawn than the choice of analytic method and, thus, should be carefully theoretically justified ( Breaugh & Arnold, 2007 ; Steiner et al., 2010 ).

Developmental trajectories and residuals

Distinct from ANCOVA, Thomas and colleagues (2009) have put forth a regression-based approach, termed cross-sectional developmental trajectories analysis, that allows testing within-group slope differences with respect to theoretically motivated predictors. From this perspective, trajectories are estimated for the dependent variable of interest relative to age and other predictors, such as nonverbal cognitive ability, and these trajectories are compared between a target group and a large comparison group. Conclusions can be drawn about group differences in intercepts (i.e., level of ability) and slopes (i.e., the relationship between a given predictor and the variable of interest). This approach has been applied to aspects of cognitive development in individuals with Williams syndrome ( Karmiloff-Smith et al., 2004 ) and vocabulary ability in individuals with ASD ( Kover et al., under revision ). We refer the interested reader to the detailed substantive examples and thorough characterization of the approach provided by Thomas et al. (2009) , which includes an online worksheet that walks through trajectory analyses step-by-step.

A special case of this type of analysis involves standardizing the performance of the target group based on the residual score (the difference between observed and predicted) from the trajectory of the comparison group ( Jarrold & Brock, 2004 ). The z -scores (or alternatively, scores divided by the standard error of the regression estimate) of these residuals can be used to assess relative deficits on multiple tasks of interest that have been standardized using the same predictor ( Jarrold, Baddeley, & Phillips, 2007 ; Jarrold & Brock, 2004 ). For example, Jarrold and colleagues (2007) examined the performance of individuals with Down syndrome and Williams syndrome on memory tasks with respect to multiple control variables (e.g., age, vocabulary ability), standardized against the performance of 110 typically developing children. By standardizing performance relative to these constructs, Jarrold et al. (2007) identified distinct relationships among abilities relative to the comparison group and differentiated the nature of the deficits in long-term memory in individuals with Down syndrome from those with Williams syndrome.

The developmental trajectories method carries fewer assumptions than ANCOVA because the regression with the matching variable is done for the comparison group alone, avoiding the potential to violate the assumption of independence between the covariate and group ( Brock et al., 2007 ). While allowing simultaneous analysis and “comparison” of disparate participant groups, this procedure is not without limitations. First, a very large comparison group is required. Second, transformations of matching and dependent variables limit the extent to which results can be transparently interpreted. Finally, like other methods, this technique still requires linearity and complete overlap between the groups on the matching variable. As data sharing and access to national datasets (e.g., the National Database for Autism Research; NDAR) become more common, analytic techniques like the developmental trajectories approach will only become more valuable because of the availability of larger samples.

Summary of Recommendations for Researchers

Having brought attention to some of the methodological challenges in research on IDDs, we close with comments on the relationship between research questions and design, and on the responsible use of effect sizes and variance ratios as descriptive equivalence thresholds.

Choose Productive Research Questions

Thoughtful research questions that yield interpretable results should drive study design. We have focused on the simplest type of group-matching design (i.e., two groups and one matching variable); however, many research questions call for other applications of nonequivalent comparison designs. For example, pair-wise matching on one or more control variables might ensure more closely matched groups, but it might also call into question the generalizability of the findings ( Mervis & Robinson, 1999 ). In some cases, studies might be strengthened by including multiple comparison groups ( Burack et al., 2002 ; Eigsti et al., 2011 ) or by matching that is conducted on control tasks that very closely align with the skill of interest ( Jarrold & Brock, 2004 ). Another alternative is creating individual profiles of ability (e.g., case-study analysis), rather than group-level profiles that might fail to represent any individuals from the population from which the sample was drawn ( Mervis & Robinson, 1999 ; Towgood, Meuwese, Gilbert, Turner, & Burgess, 2009 ). Regardless of the research question, reporting results based on multiple matching and analysis techniques will leave the reader informed and free to draw conclusions based on maximal information ( Breaugh & Arnold, 2007 ; Brock et al., 2007 ; Kover et al., under revision ; Mervis & John, 2008 ).

Shifting focus towards understanding individual variability avoids some difficulties associated with group matching, while also leading researchers closer to understanding the sources of difficulty that result in phenotypic strengths and weaknesses. Comparing unrepresentative samples provides little advantage over studying the entire range of variability within a given phenotype and identifying foundational cognitive skills that account for individual variation ( Beeghly, 2006 ; Eigsti et al., 2011 ). Adopting an individual differences approach can highlight phenotypic variability and emphasizes the prerequisite skills necessary for development, ultimately supporting research that emphasizes learning mechanisms rather than outcomes. Of course, some research questions will nonetheless necessitate group comparisons.

Use Effect Sizes and Variance Ratios for Equivalence Thresholds

Group-matching studies that appropriately compare groups presumed to be equivalent on a single matching variable have the potential to provide the groundwork for stronger, well-controlled studies of greater scope. Researchers will benefit from including as many sources of information as possible for establishing group equivalence: plots of the distributions, effect sizes, variance ratios, etc. Given the complexities faced by IDD researchers, our recommendation is that groups be considered adequately matched when both the effect size (e.g., Cohen’s d ) and variance ratio fall within acceptable ranges for a particular area of research. We have provided a table of effect sizes and variance ratios that demonstrates how this technique can be applied to decision making regarding group matching adequacy; however, this table is meant to be thought-provoking, not prescriptive. In published reports, best practice would be to report effect sizes and variance ratios in all cases—for the matching variable and the dependent variable of interest—to allow the reader to interpret where meaningful differences exist.

Conclusions and Future Directions

We have discussed the limitations of p -value thresholds and ways in which using descriptive diagnostics (effect sizes and variance ratios) as equivalence thresholds will benefit research on neurodevelopmental disorders. Drawing the interest of methodological specialists to the study of IDDs will also be key to advancing the field. Open dialogue concerning current practices, paired with the development of improved methods for defining and testing meaningful differences, will significantly improve the design and implementation of research on IDDs.

Acknowledgments

This work was supported in part by NIH P30 HD03352 to the Waisman Center and NIH T32 DC05359. We thank Peter Steiner for his comments on an earlier draft. Following Strauss (2001), we chose to maintain a methodological focus and avoided citing substantive studies as examples, with the exception of those that have utilized methodologies likely to be lesser known to the reader.

Hypothetical Scores on a Matching Variable from Two Groups

Participant Count	Target Group Scores	Comparison Group Scores
1	36	92
2	36	88
3	36	87
4	42	85
5	46	82
6	47	76
7	48	75
8	49	75
9	50	75
10	51	74
11	52	72
12	60	72
13	61	72
14	62	72
15	64	71
16	67	71
17	67	70
18	68	70
19	68	70
20	69	69
21	69	67
22	70	67
23	70	66
24	71	65
25	72	64
26	72	63
27	73	62
28	74	61
29	75	60
30	78	58

= 30 Mean (SD)	60.10 (12.94)	71.70 (8.42)

= 20 Mean (SD)	68.10 (6.00)	67.10 (4.49)

Note . The shaded cells show the subset of 20 participants in each group who remained in the analysis during the process of obtaining a higher p -value. They were chosen simply to demonstrate the calculation of effect size and variance ratio, not to demonstrate adequate equivalence.

A preliminary paper was presented at the 2011 annual meeting of the American Educational Research Association in New Orleans.

Contributor Information

Sara T. Kover, University of Wisconsin-Madison, Waisman Center, 1500 Highland Avenue, Madison, WI 53705.

Amy K. Atwood, University of Wisconsin-Madison.

Abbeduto L. Editorial. American Journal on Intellectual and Developmental Disabilities. 2010; 115 (1):1–2. [ PubMed ] [ Google Scholar ]
American Psychological Association. Publication manual of the American Psychological Association. Sixth. Washington, D.C: Author; 2010. [ Google Scholar ]
Atwood AK, Swoboda CM, Serlin RC. The impact of selection procedures for nonnormal covariates on the Type I error rate and power of ANCOVA. Paper presented at the the annual meeting of the American Educational Research Association; Denver, CO. 2010. Paper retrieved from. [ Google Scholar ]
Bakeman R. VII. THE PRACTICAL IMPORTANCE OF FINDINGS. Monographs of the Society for Research in Child Development. 2006; 71 (3):127–145. [ Google Scholar ]
Beeghly M. Translational research on early language development: Current challenges and future directions. Development and Psychopathology. 2006; 18 (03):737–757. [ PMC free article ] [ PubMed ] [ Google Scholar ]
Blackford JU. Propensity Scores: Method for Matching on Multiple Variables in Down Syndrome Research. Intellectual and Developmental Disabilities. 2009; 47 (5):348–357. [ PMC free article ] [ PubMed ] [ Google Scholar ]
Breaugh JA, Arnold J. Controlling nuisance variables by using a matched-groups design. Organizational Research Methods. 2007; 10 (3):523–541. [ Google Scholar ]
Brock J, Jarrold C, Farran EK, Laws G, Riby DM. Do children with Williams syndrome really have good vocabulary knowledge? Methods for comparing cognitive and linguistic abilities in developmental disorders. Clinical Linguistics & Phonetics. 2007; 21 (9):673–688. [ PubMed ] [ Google Scholar ]
Brown J, Aczel B, Jimenez L, Kaufman SB, Grant KP. Intact implicit learning in autism spectrum conditions. Quarterly Journal of Experimental Psychology (Hove) 2010; 63 (9):1789–1812. [ PubMed ] [ Google Scholar ]
Burack J. Editorial Preface. Journal of Autism and Develompental Disorders. 2004; 34 (1):3–5. [ Google Scholar ]
Burack JA, Iarocci G, Bowler D, Mottron L. Benefits and pitfalls in the merging of disciplines: The example of developmental psychopathology and the study of persons with autism. Development and Psychopathology. 2002; 14 (2):225–237. [ PubMed ] [ Google Scholar ]
Cicchetti DV, Koenig K, Klin A, Volkmar FR, Paul R, Sparrow S. From Bayes through marginal utility to effect sizes: a guide to understanding the clinical and statistical significance of the results of autism research findings. Journal of Autism and Develompental Disorders. 2011; 41 (2):168–174. [ PubMed ] [ Google Scholar ]
Cohen J. Statistical power analysis for the behavioral sciences. 2. Hillsdale, N.J: L. Erlbaum Associates; 1988. [ Google Scholar ]
Dennis M, Francis DJ, Cirino PT, Schachar R, Barnes MA, Fletcher JM. Why IQ is not a covariate in cognitive studies of neurodevelopmental disorders. Journal of the International Neuropsychological Society. 2009; 15 (03):331–343. [ PMC free article ] [ PubMed ] [ Google Scholar ]
Edgell SE. Commentary on "Accepting the null hypothesis". Memory & Cognition. 1995; 23 (4):525–526. [ PubMed ] [ Google Scholar ]
Eigsti IM, de Marchena AB, Schuh JM, Kelley E. Language acquisition in autism spectrum disorders: A developmental review. Research in Autism Spectrum Disorders. 2011; 5 (2):681–691. [ Google Scholar ]
Facon B, Magis D, Belmont JM. Beyond matching on the mean in developmental disabilities research. Research in Developmental Disabilities. 2011; 32 (6):2134–2147. [ PubMed ] [ Google Scholar ]
Fraser MW, Guo S. Propensity Score Analysis: Statistical Methods and Applications. SAGE Publications; 2009. [ Google Scholar ]
Frick RW. Accepting the Null Hypothesis. Memory & Cognition. 1995; 23 (1):132–138. [ PubMed ] [ Google Scholar ]
Holland PW. Statistics and Causal Inference. Journal of the American Statistical Association. 1986; 81 (396):945–960. [ Google Scholar ]
Imai K, King G, Stuart EA. Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society Series a-Statistics in Society. 2008; 171 :481–502. [ Google Scholar ]
Jarrold C, Baddeley AD, Phillips C. Long-term memory for verbal and visual information in Down syndrome and Williams syndrome: performance on the Doors and People test. Cortex. 2007; 43 (2):233–247. [ PubMed ] [ Google Scholar ]
Jarrold C, Brock J. To match or not to match? Methodological issues in autism-related research. Journal of Autism and Developmental Disorders. 2004; 34 (1):81–86. [ PubMed ] [ Google Scholar ]
Karmiloff-Smith A, Thomas M, Annaz D, Humphreys K, Ewing S, Brace N, Campbell R. Exploring the Williams syndrome face-processing debate: the importance of building developmental trajectories. Journal of Child Psychology and Psychiatry. 2004; 45 (7):1258–1274. [ PubMed ] [ Google Scholar ]
Kover S, McDuffie A, Hagerman R, Abbeduto L. Receptive vocabulary in boys with autism spectrum disorder: Cross-sectional developmental trajectories (under revision) [ PMC free article ] [ PubMed ] [ Google Scholar ]
Marsh HW. Simulation study of nonequivalent group-matching and regression-discontinuity designs: Evaluations of gifted and talented programs. Journal of Experimental Education. 1998; 66 (2):163–192. [ Google Scholar ]
Mervis CB, John AE. Vocabulary abilities of children with Williams syndrome: strengths, weaknesses, and relation to visuospatial construction ability. Journal of Speech, Language, and Hearing Research. 2008; 51 (4):967–982. [ PMC free article ] [ PubMed ] [ Google Scholar ]
Mervis CB, Klein-Tasman BP. Methodological Issues in Group-Matching Designs: α Levels for Control Variable Comparisons and Measurement Characteristics of Control and Target Variables. Journal of Autism and Developmental Disorders. 2004; 34 (1):7–17. [ PubMed ] [ Google Scholar ]
Mervis CB, Robinson BF. Methodological issues in cross-syndrome comparisons: Matching procedures, sensitivity (Se) and specificity (Sp) Monographs of the Society for Research in Child Development. 1999; 64 (1):115–130. [ PubMed ] [ Google Scholar ]
Miller GA, Chapman JP. Misunderstanding analysis of covariance. Journal of Abnormal Psychology. 2001; 110 (1):40–48. [ PubMed ] [ Google Scholar ]
Newcombe NS. Some controls control too much. Child Development. 2003; 74 (4):1050–1052. [ PubMed ] [ Google Scholar ]
Piaggio G, Elbourne DR, Altman DG, Pocock SJ, Evans SW, f CG. Reporting of noninferiority and equivalence randomized trials: An extension of the consort statement. Journal of the American Medical Association. 2006; 295 (10):1152–1160. [ PubMed ] [ Google Scholar ]
R Development Core Team. R: A Language and Environment for Statistical Computing (Version 2.10.1) Vienna, Austria: R Foundation for Statistical Computing; 2010. Retrieved from http://www.R-project.org . [ Google Scholar ]
Rogers JL, Howard KI, Vessey JT. Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin. 1993; 113 (3):553–565. [ PubMed ] [ Google Scholar ]
Rosenbaum PR. Optimal Matching for Observational Studies. Journal of the American Statistical Association. 1989; 84 (408):1024–1032. [ Google Scholar ]
Rosenbaum PR, Rubin DB. Constructing a Control-Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score. American Statistician. 1985; 39 (1):33–38. [ Google Scholar ]
Rubin DB. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology. 1974; 66 (5):688–701. [ Google Scholar ]
Rubin DB. Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation. Health Services and Outcomes Research Methodology. 2001; 2 (3):169–188. [ Google Scholar ]
Schervish MJ. P values: What they are and what they are not. American Statistician. 1996; 50 (3):203–206. [ Google Scholar ]
Schuirmann DJ. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics. 1987; 15 (6):657–680. [ PubMed ] [ Google Scholar ]
Serlin RC, Lapsley DK. Rationality in psychological research: The good-enough principle. American Psychologist. 1985; 40 (1):73–83. [ Google Scholar ]
Serlin RC, Lapsley DK. Rational appraisal of psychological research and the good-enough principle. In: Keren G, Lewis C, editors. A Handbook for Data Analysis in the Behavioral Sciences: Methodological Issues. Hillsdale, NJ: Erlbaum; 1993. pp. 199–228. [ Google Scholar ]
Shadish WR, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin: 2002. [ Google Scholar ]
Shadish WR, Steiner PM. A Primer on Propensity Score Analysis. Newborn and Infant Nursing Reviews. 2010; 10 (1):19–26. [ Google Scholar ]
Stegner BL, Bostrom AG, Greenfield TK. Equivalence testing for use in psychosocial and services research: An introduction with examples. Evaluation and Program Planning. 1996; 19 (3):193–198. [ Google Scholar ]
Steiner PM, Cook DL. Matching and propensity scores. In: Little TD, editor. Oxford Handbook of Quantitative Methods. New York: Oxford University Press; 2012. [ Google Scholar ]
Steiner PM, Cook TD, Shadish WR, Clark MH. The Importance of Covariate Selection in Controlling for Selection Bias in Observational Studies. Psychological Methods. 2010; 15 (3):250–267. [ PubMed ] [ Google Scholar ]
Strauss ME. Demonstrating specific cognitive deficits: A psychometric perspective. Journal of Abnormal Psychology. 2001; 110 (1):6–14. [ PubMed ] [ Google Scholar ]
Stuart EA. Matching Methods for Causal Inference: A Review and a Look Forward. Statistical Science. 2010; 25 (1):1–21. [ PMC free article ] [ PubMed ] [ Google Scholar ]
Thomas MS, Annaz D, Ansari D, Scerif G, Jarrold C, Karmiloff-Smith A. Using developmental trajectories to understand developmental disorders. Journal of Speech, Language, and Hearing Research. 2009; 52 (2):336–358. [ PubMed ] [ Google Scholar ]
Towgood KJ, Meuwese JDI, Gilbert SJ, Turner MS, Burgess PW. Advantages of the multiple case series approach to the study of cognitive deficits in autism spectrum disorder. Neuropsychologia. 2009; 47 (13):2981–2988. [ PMC free article ] [ PubMed ] [ Google Scholar ]
Tupper DE, Rosenblood LK. Methodological considerations in the use of attribute variables in neuropsychological research. Journal of Clinical Neuropsychology. 1984; 6 (4):441–453. [ PubMed ] [ Google Scholar ]
Westlake WJ. Statistical Aspects of Comparative Bioavailability Trials. Biometrics. 1979; 35 (1):273–280. [ PubMed ] [ Google Scholar ]
Zimmerman DW. A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology. 2004; 57 (1):173–181. [ PubMed ] [ Google Scholar ]

Frequently asked questions

What is random assignment.

In experimental research, random assignment is a way of placing participants from your sample into different groups using randomization. With this method, every member of the sample has a known or equal chance of being placed in a control group or an experimental group.

Frequently asked questions: Methodology

Attrition refers to participants leaving a study. It always happens to some extent—for example, in randomized controlled trials for medical research.

Differential attrition occurs when attrition or dropout rates differ systematically between the intervention and the control group . As a result, the characteristics of the participants who drop out differ from the characteristics of those who stay in the study. Because of this, study results may be biased .

Action research is conducted in order to solve a particular issue immediately, while case studies are often conducted over a longer period of time and focus more on observing and analyzing a particular ongoing phenomenon.

Action research is focused on solving a problem or informing individual and community-based knowledge in a way that impacts teaching, learning, and other related processes. It is less focused on contributing theoretical input, instead producing actionable input.

Action research is particularly popular with educators as a form of systematic inquiry because it prioritizes reflection and bridges the gap between theory and practice. Educators are able to simultaneously investigate an issue as they solve it, and the method is very iterative and flexible.

A cycle of inquiry is another name for action research . It is usually visualized in a spiral shape following a series of steps, such as “planning → acting → observing → reflecting.”

To make quantitative observations , you need to use instruments that are capable of measuring the quantity you want to observe. For example, you might use a ruler to measure the length of an object or a thermometer to measure its temperature.

Criterion validity and construct validity are both types of measurement validity . In other words, they both show you how accurately a method measures something.

While construct validity is the degree to which a test or other measurement method measures what it claims to measure, criterion validity is the degree to which a test can predictively (in the future) or concurrently (in the present) measure something.

Construct validity is often considered the overarching type of measurement validity . You need to have face validity , content validity , and criterion validity in order to achieve construct validity.

Convergent validity and discriminant validity are both subtypes of construct validity . Together, they help you evaluate whether a test measures the concept it was designed to measure.

Convergent validity indicates whether a test that is designed to measure a particular construct correlates with other tests that assess the same or similar construct.
Discriminant validity indicates whether two tests that should not be highly related to each other are indeed not related. This type of validity is also called divergent validity .

You need to assess both in order to demonstrate construct validity. Neither one alone is sufficient for establishing construct validity.

Discriminant validity indicates whether two tests that should not be highly related to each other are indeed not related

Content validity shows you how accurately a test or other measurement method taps into the various aspects of the specific construct you are researching.

In other words, it helps you answer the question: “does the test measure all aspects of the construct I want to measure?” If it does, then the test has high content validity.

The higher the content validity, the more accurate the measurement of the construct.

If the test fails to include parts of the construct, or irrelevant parts are included, the validity of the instrument is threatened, which brings your results into question.

Face validity and content validity are similar in that they both evaluate how suitable the content of a test is. The difference is that face validity is subjective, and assesses content at surface level.

When a test has strong face validity, anyone would agree that the test’s questions appear to measure what they are intended to measure.

For example, looking at a 4th grade math test consisting of problems in which students have to add and multiply, most people would agree that it has strong face validity (i.e., it looks like a math test).

On the other hand, content validity evaluates how well a test represents all the aspects of a topic. Assessing content validity is more systematic and relies on expert evaluation. of each question, analyzing whether each one covers the aspects that the test was designed to cover.

A 4th grade math test would have high content validity if it covered all the skills taught in that grade. Experts(in this case, math teachers), would have to evaluate the content validity by comparing the test to the learning objectives.

Snowball sampling is a non-probability sampling method . Unlike probability sampling (which involves some form of random selection ), the initial individuals selected to be studied are the ones who recruit new participants.

Because not every member of the target population has an equal chance of being recruited into the sample, selection in snowball sampling is non-random.

Snowball sampling is a non-probability sampling method , where there is not an equal chance for every member of the population to be included in the sample .

This means that you cannot use inferential statistics and make generalizations —often the goal of quantitative research . As such, a snowball sample is not representative of the target population and is usually a better fit for qualitative research .

Snowball sampling relies on the use of referrals. Here, the researcher recruits one or more initial participants, who then recruit the next ones.

Participants share similar characteristics and/or know each other. Because of this, not every member of the population has an equal chance of being included in the sample, giving rise to sampling bias .

Snowball sampling is best used in the following cases:

If there is no sampling frame available (e.g., people with a rare disease)
If the population of interest is hard to access or locate (e.g., people experiencing homelessness)
If the research focuses on a sensitive topic (e.g., extramarital affairs)

The reproducibility and replicability of a study can be ensured by writing a transparent, detailed method section and using clear, unambiguous language.

Reproducibility and replicability are related terms.

Reproducing research entails reanalyzing the existing data in the same manner.
Replicating (or repeating ) the research entails reconducting the entire analysis, including the collection of new data .
A successful reproduction shows that the data analyses were conducted in a fair and honest manner.
A successful replication shows that the reliability of the results is high.

Stratified sampling and quota sampling both involve dividing the population into subgroups and selecting units from each subgroup. The purpose in both cases is to select a representative sample and/or to allow comparisons between subgroups.

The main difference is that in stratified sampling, you draw a random sample from each subgroup ( probability sampling ). In quota sampling you select a predetermined number or proportion of units, in a non-random manner ( non-probability sampling ).

Purposive and convenience sampling are both sampling methods that are typically used in qualitative data collection.

A convenience sample is drawn from a source that is conveniently accessible to the researcher. Convenience sampling does not distinguish characteristics among the participants. On the other hand, purposive sampling focuses on selecting participants possessing characteristics associated with the research study.

The findings of studies based on either convenience or purposive sampling can only be generalized to the (sub)population from which the sample is drawn, and not to the entire population.

Random sampling or probability sampling is based on random selection. This means that each unit has an equal chance (i.e., equal probability) of being included in the sample.

On the other hand, convenience sampling involves stopping people at random, which means that not everyone has an equal chance of being selected depending on the place, time, or day you are collecting your data.

Convenience sampling and quota sampling are both non-probability sampling methods. They both use non-random criteria like availability, geographical proximity, or expert knowledge to recruit study participants.

However, in convenience sampling, you continue to sample units or cases until you reach the required sample size.

In quota sampling, you first need to divide your population of interest into subgroups (strata) and estimate their proportions (quota) in the population. Then you can start your data collection, using convenience sampling to recruit participants, until the proportions in each subgroup coincide with the estimated proportions in the population.

A sampling frame is a list of every member in the entire population . It is important that the sampling frame is as complete as possible, so that your sample accurately reflects your population.

Stratified and cluster sampling may look similar, but bear in mind that groups created in cluster sampling are heterogeneous , so the individual characteristics in the cluster vary. In contrast, groups created in stratified sampling are homogeneous , as units share characteristics.

Relatedly, in cluster sampling you randomly select entire groups and include all units of each group in your sample. However, in stratified sampling, you select some units of all groups and include them in your sample. In this way, both methods can ensure that your sample is representative of the target population .

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

The key difference between observational studies and experimental designs is that a well-done observational study does not influence the responses of participants, while experiments do have some sort of treatment condition applied to at least some participants by random assignment .

An observational study is a great choice for you if your research question is based purely on observations. If there are ethical, logistical, or practical concerns that prevent you from conducting a traditional experiment , an observational study may be a good choice. In an observational study, there is no interference or manipulation of the research subjects, as well as no control or treatment groups .

It’s often best to ask a variety of people to review your measurements. You can ask experts, such as other researchers, or laypeople, such as potential participants, to judge the face validity of tests.

While experts have a deep understanding of research methods , the people you’re studying can provide you with valuable insights you may have missed otherwise.

Face validity is important because it’s a simple first step to measuring the overall validity of a test or technique. It’s a relatively intuitive, quick, and easy way to start checking whether a new measure seems useful at first glance.

Good face validity means that anyone who reviews your measure says that it seems to be measuring what it’s supposed to. With poor face validity, someone reviewing your measure may be left confused about what you’re measuring and why you’re using this method.

Face validity is about whether a test appears to measure what it’s supposed to measure. This type of validity is concerned with whether a measure seems relevant and appropriate for what it’s assessing only on the surface.

Statistical analyses are often applied to test validity with data from your measures. You test convergent validity and discriminant validity with correlations to see if results from your test are positively or negatively related to those of other established tests.

You can also use regression analyses to assess whether your measure is actually predictive of outcomes that you expect it to predict theoretically. A regression analysis that supports your expectations strengthens your claim of construct validity .

When designing or evaluating a measure, construct validity helps you ensure you’re actually measuring the construct you’re interested in. If you don’t have construct validity, you may inadvertently measure unrelated or distinct constructs and lose precision in your research.

Construct validity is often considered the overarching type of measurement validity , because it covers all of the other types. You need to have face validity , content validity , and criterion validity to achieve construct validity.

Construct validity is about how well a test measures the concept it was designed to evaluate. It’s one of four types of measurement validity , which includes construct validity, face validity , and criterion validity.

There are two subtypes of construct validity.

Convergent validity : The extent to which your measure corresponds to measures of related constructs
Discriminant validity : The extent to which your measure is unrelated or negatively related to measures of distinct constructs

Naturalistic observation is a valuable tool because of its flexibility, external validity , and suitability for topics that can’t be studied in a lab setting.

The downsides of naturalistic observation include its lack of scientific control , ethical considerations , and potential for bias from observers and subjects.

Naturalistic observation is a qualitative research method where you record the behaviors of your research subjects in real world settings. You avoid interfering or influencing anything in a naturalistic observation.

You can think of naturalistic observation as “people watching” with a purpose.

A dependent variable is what changes as a result of the independent variable manipulation in experiments . It’s what you’re interested in measuring, and it “depends” on your independent variable.

In statistics, dependent variables are also called:

Response variables (they respond to a change in another variable)
Outcome variables (they represent the outcome you want to measure)
Left-hand-side variables (they appear on the left-hand side of a regression equation)

An independent variable is the variable you manipulate, control, or vary in an experimental study to explore its effects. It’s called “independent” because it’s not influenced by any other variables in the study.

Independent variables are also called:

Explanatory variables (they explain an event or outcome)
Predictor variables (they can be used to predict the value of a dependent variable)
Right-hand-side variables (they appear on the right-hand side of a regression equation).

As a rule of thumb, questions related to thoughts, beliefs, and feelings work well in focus groups. Take your time formulating strong questions, paying special attention to phrasing. Be careful to avoid leading questions , which can bias your responses.

Overall, your focus group questions should be:

Open-ended and flexible
Impossible to answer with “yes” or “no” (questions that start with “why” or “how” are often best)
Unambiguous, getting straight to the point while still stimulating discussion
Unbiased and neutral

A structured interview is a data collection method that relies on asking questions in a set order to collect data on a topic. They are often quantitative in nature. Structured interviews are best used when:

You already have a very clear understanding of your topic. Perhaps significant research has already been conducted, or you have done some prior research yourself, but you already possess a baseline for designing strong structured questions.
You are constrained in terms of time or resources and need to analyze your data quickly and efficiently.
Your research question depends on strong parity between participants, with environmental conditions held constant.

More flexible interview options include semi-structured interviews , unstructured interviews , and focus groups .

Social desirability bias is the tendency for interview participants to give responses that will be viewed favorably by the interviewer or other participants. It occurs in all types of interviews and surveys , but is most common in semi-structured interviews , unstructured interviews , and focus groups .

Social desirability bias can be mitigated by ensuring participants feel at ease and comfortable sharing their views. Make sure to pay attention to your own body language and any physical or verbal cues, such as nodding or widening your eyes.

This type of bias can also occur in observations if the participants know they’re being observed. They might alter their behavior accordingly.

The interviewer effect is a type of bias that emerges when a characteristic of an interviewer (race, age, gender identity, etc.) influences the responses given by the interviewee.

There is a risk of an interviewer effect in all types of interviews , but it can be mitigated by writing really high-quality interview questions.

A semi-structured interview is a blend of structured and unstructured types of interviews. Semi-structured interviews are best used when:

You have prior interview experience. Spontaneous questions are deceptively challenging, and it’s easy to accidentally ask a leading question or make a participant uncomfortable.
Your research question is exploratory in nature. Participant answers can guide future research questions and help you develop a more robust knowledge base for future research.

An unstructured interview is the most flexible type of interview, but it is not always the best fit for your research topic.

Unstructured interviews are best used when:

You are an experienced interviewer and have a very strong background in your research topic, since it is challenging to ask spontaneous, colloquial questions.
Your research question is exploratory in nature. While you may have developed hypotheses, you are open to discovering new or shifting viewpoints through the interview process.
You are seeking descriptive data, and are ready to ask questions that will deepen and contextualize your initial thoughts and hypotheses.
Your research depends on forming connections with your participants and making them feel comfortable revealing deeper emotions, lived experiences, or thoughts.

The four most common types of interviews are:

Structured interviews : The questions are predetermined in both topic and order.
Semi-structured interviews : A few questions are predetermined, but other questions aren’t planned.
Unstructured interviews : None of the questions are predetermined.
Focus group interviews : The questions are presented to a group instead of one individual.

Deductive reasoning is commonly used in scientific research, and it’s especially associated with quantitative research .

In research, you might have come across something called the hypothetico-deductive method . It’s the scientific method of testing hypotheses to check whether your predictions are substantiated by real-world data.

Deductive reasoning is a logical approach where you progress from general ideas to specific conclusions. It’s often contrasted with inductive reasoning , where you start with specific observations and form general conclusions.

Deductive reasoning is also called deductive logic.

There are many different types of inductive reasoning that people use formally or informally.

Here are a few common types:

Inductive generalization : You use observations about a sample to come to a conclusion about the population it came from.
Statistical generalization: You use specific numbers about samples to make statements about populations.
Causal reasoning: You make cause-and-effect links between different things.
Sign reasoning: You make a conclusion about a correlational relationship between different things.
Analogical reasoning: You make a conclusion about something based on its similarities to something else.

Inductive reasoning is a bottom-up approach, while deductive reasoning is top-down.

Inductive reasoning takes you from the specific to the general, while in deductive reasoning, you make inferences by going from general premises to specific conclusions.

In inductive research , you start by making observations or gathering data. Then, you take a broad scan of your data and search for patterns. Finally, you make general conclusions that you might incorporate into theories.

Inductive reasoning is a method of drawing conclusions by going from the specific to the general. It’s usually contrasted with deductive reasoning, where you proceed from general information to specific conclusions.

Inductive reasoning is also called inductive logic or bottom-up reasoning.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Triangulation can help:

Reduce research bias that comes from using a single method, theory, or investigator
Enhance validity by approaching the same topic with different tools
Establish credibility by giving you a complete picture of the research problem

But triangulation can also pose problems:

It’s time-consuming and labor-intensive, often involving an interdisciplinary team.
Your results may be inconsistent or even contradictory.

There are four main types of triangulation :

Data triangulation : Using data from different times, spaces, and people
Investigator triangulation : Involving multiple researchers in collecting or analyzing data
Theory triangulation : Using varying theoretical perspectives in your research
Methodological triangulation : Using different methodologies to approach the same topic

Many academic fields use peer review , largely to determine whether a manuscript is suitable for publication. Peer review enhances the credibility of the published manuscript.

However, peer review is also common in non-academic settings. The United Nations, the European Union, and many individual nations use peer review to evaluate grant applications. It is also widely used in medical and health-related fields as a teaching or quality-of-care measure.

Peer assessment is often used in the classroom as a pedagogical tool. Both receiving feedback and providing it are thought to enhance the learning process, helping students think critically and collaboratively.

Peer review can stop obviously problematic, falsified, or otherwise untrustworthy research from being published. It also represents an excellent opportunity to get feedback from renowned experts in your field. It acts as a first defense, helping you ensure your argument is clear and that there are no gaps, vague terms, or unanswered questions for readers who weren’t involved in the research process.

Peer-reviewed articles are considered a highly credible source due to this stringent process they go through before publication.

In general, the peer review process follows the following steps:

First, the author submits the manuscript to the editor.
Reject the manuscript and send it back to author, or
Send it onward to the selected peer reviewer(s)
Next, the peer review process occurs. The reviewer provides feedback, addressing any major or minor issues with the manuscript, and gives their advice regarding what edits should be made.
Lastly, the edited manuscript is sent back to the author. They input the edits, and resubmit it to the editor for publication.

Exploratory research is often used when the issue you’re studying is new or when the data collection process is challenging for some reason.

You can use exploratory research if you have a general idea or a specific question that you want to study but there is no preexisting knowledge or paradigm with which to study it.

Exploratory research is a methodology approach that explores research questions that have not previously been studied in depth. It is often used when the issue you’re studying is new, or the data collection process is challenging in some way.

Explanatory research is used to investigate how or why a phenomenon occurs. Therefore, this type of research is often one of the first stages in the research process , serving as a jumping-off point for future research.

Exploratory research aims to explore the main aspects of an under-researched problem, while explanatory research aims to explain the causes and consequences of a well-defined problem.

Explanatory research is a research method used to investigate how or why something occurs when only a small amount of information is available pertaining to that topic. It can help you increase your understanding of a given topic.

Clean data are valid, accurate, complete, consistent, unique, and uniform. Dirty data include inconsistencies and errors.

Dirty data can come from any part of the research process, including poor research design , inappropriate measurement materials, or flawed data entry.

Data cleaning takes place between data collection and data analyses. But you can use some methods even before collecting data.

For clean data, you should start by designing measures that collect valid data. Data validation at the time of data entry or collection helps you minimize the amount of data cleaning you’ll need to do.

After data collection, you can use data standardization and data transformation to clean your data. You’ll also deal with any missing values, outliers, and duplicate values.

Every dataset requires different techniques to clean dirty data , but you need to address these issues in a systematic way. You focus on finding and resolving data points that don’t agree or fit with the rest of your dataset.

These data might be missing values, outliers, duplicate values, incorrectly formatted, or irrelevant. You’ll start with screening and diagnosing your data. Then, you’ll often standardize and accept or remove data to make your dataset consistent and valid.

Data cleaning is necessary for valid and appropriate analyses. Dirty data contain inconsistencies or errors , but cleaning your data helps you minimize or resolve these.

Without data cleaning, you could end up with a Type I or II error in your conclusion. These types of erroneous conclusions can be practically significant with important consequences, because they lead to misplaced investments or missed opportunities.

Data cleaning involves spotting and resolving potential data inconsistencies or errors to improve your data quality. An error is any value (e.g., recorded weight) that doesn’t reflect the true value (e.g., actual weight) of something that’s being measured.

In this process, you review, analyze, detect, modify, or remove “dirty” data to make your dataset “clean.” Data cleaning is also called data cleansing or data scrubbing.

Research misconduct means making up or falsifying data, manipulating data analyses, or misrepresenting results in research reports. It’s a form of academic fraud.

These actions are committed intentionally and can have serious consequences; research misconduct is not a simple mistake or a point of disagreement but a serious ethical failure.

Anonymity means you don’t know who the participants are, while confidentiality means you know who they are but remove identifying information from your research report. Both are important ethical considerations .

You can only guarantee anonymity by not collecting any personally identifying information—for example, names, phone numbers, email addresses, IP addresses, physical characteristics, photos, or videos.

You can keep data confidential by using aggregate information in your research report, so that you only refer to groups of participants rather than individuals.

Research ethics matter for scientific integrity, human rights and dignity, and collaboration between science and society. These principles make sure that participation in studies is voluntary, informed, and safe.

Ethical considerations in research are a set of principles that guide your research designs and practices. These principles include voluntary participation, informed consent, anonymity, confidentiality, potential for harm, and results communication.

Scientists and researchers must always adhere to a certain code of conduct when collecting data from others .

These considerations protect the rights of research participants, enhance research validity , and maintain scientific integrity.

In multistage sampling , you can use probability or non-probability sampling methods .

For a probability sample, you have to conduct probability sampling at every stage.

You can mix it up by using simple random sampling , systematic sampling , or stratified sampling to select units at different stages, depending on what is applicable and relevant to your study.

Multistage sampling can simplify data collection when you have large, geographically spread samples, and you can obtain a probability sample without a complete sampling frame.

But multistage sampling may not lead to a representative sample, and larger samples are needed for multistage samples to achieve the statistical properties of simple random samples .

These are four of the most common mixed methods designs :

Convergent parallel: Quantitative and qualitative data are collected at the same time and analyzed separately. After both analyses are complete, compare your results to draw overall conclusions.
Embedded: Quantitative and qualitative data are collected at the same time, but within a larger quantitative or qualitative design. One type of data is secondary to the other.
Explanatory sequential: Quantitative data is collected and analyzed first, followed by qualitative data. You can use this design if you think your qualitative data will explain and contextualize your quantitative findings.
Exploratory sequential: Qualitative data is collected and analyzed first, followed by quantitative data. You can use this design if you think the quantitative data will confirm or validate your qualitative findings.

Triangulation in research means using multiple datasets, methods, theories and/or investigators to address a research question. It’s a research strategy that can help you enhance the validity and credibility of your findings.

Triangulation is mainly used in qualitative research , but it’s also commonly applied in quantitative research . Mixed methods research always uses triangulation.

In multistage sampling , or multistage cluster sampling, you draw a sample from a population using smaller and smaller groups at each stage.

This method is often used to collect data from a large, geographically spread group of people in national surveys, for example. You take advantage of hierarchical groupings (e.g., from state to city to neighborhood) to create a sample that’s less expensive and time-consuming to collect data from.

No, the steepness or slope of the line isn’t related to the correlation coefficient value. The correlation coefficient only tells you how closely your data fit on a line, so two datasets with the same correlation coefficient can have very different slopes.

To find the slope of the line, you’ll need to perform a regression analysis .

Correlation coefficients always range between -1 and 1.

The sign of the coefficient tells you the direction of the relationship: a positive value means the variables change together in the same direction, while a negative value means they change together in opposite directions.

The absolute value of a number is equal to the number without its sign. The absolute value of a correlation coefficient tells you the magnitude of the correlation: the greater the absolute value, the stronger the correlation.

These are the assumptions your data must meet if you want to use Pearson’s r :

Both variables are on an interval or ratio level of measurement
Data from both variables follow normal distributions
Your data have no outliers
Your data is from a random or representative sample
You expect a linear relationship between the two variables

Quantitative research designs can be divided into two main categories:

Correlational and descriptive designs are used to investigate characteristics, averages, trends, and associations between variables.
Experimental and quasi-experimental designs are used to test causal relationships .

Qualitative research designs tend to be more flexible. Common types of qualitative design include case study , ethnography , and grounded theory designs.

A well-planned research design helps ensure that your methods match your research aims, that you collect high-quality data, and that you use the right kind of analysis to answer your questions, utilizing credible sources . This allows you to draw valid , trustworthy conclusions.

The priorities of a research design can vary depending on the field, but you usually have to specify:

Your research questions and/or hypotheses
Your overall approach (e.g., qualitative or quantitative )
The type of design you’re using (e.g., a survey , experiment , or case study )
Your sampling methods or criteria for selecting subjects
Your data collection methods (e.g., questionnaires , observations)
Your data collection procedures (e.g., operationalization , timing and data management)
Your data analysis methods (e.g., statistical tests or thematic analysis )

A research design is a strategy for answering your research question . It defines your overall approach and determines how you will collect and analyze data.

Questionnaires can be self-administered or researcher-administered.

Self-administered questionnaires can be delivered online or in paper-and-pen formats, in person or through mail. All questions are standardized so that all respondents receive the same questions with identical wording.

Researcher-administered questionnaires are interviews that take place by phone, in-person, or online between researchers and respondents. You can gain deeper insights by clarifying questions for respondents or asking follow-up questions.

You can organize the questions logically, with a clear progression from simple to complex, or randomly between respondents. A logical flow helps respondents process the questionnaire easier and quicker, but it may lead to bias. Randomization can minimize the bias from order effects.

Closed-ended, or restricted-choice, questions offer respondents a fixed set of choices to select from. These questions are easier to answer quickly.

Open-ended or long-form questions allow respondents to answer in their own words. Because there are no restrictions on their choices, respondents can answer in ways that researchers may not have otherwise considered.

A questionnaire is a data collection tool or instrument, while a survey is an overarching research method that involves collecting and analyzing data from people using questionnaires.

The third variable and directionality problems are two main reasons why correlation isn’t causation .

The third variable problem means that a confounding variable affects both variables to make them seem causally related when they are not.

The directionality problem is when two variables correlate and might actually have a causal relationship, but it’s impossible to conclude which variable causes changes in the other.

Correlation describes an association between variables : when one variable changes, so does the other. A correlation is a statistical indicator of the relationship between variables.

Causation means that changes in one variable brings about changes in the other (i.e., there is a cause-and-effect relationship between variables). The two variables are correlated with each other, and there’s also a causal link between them.

While causation and correlation can exist simultaneously, correlation does not imply causation. In other words, correlation is simply a relationship where A relates to B—but A doesn’t necessarily cause B to happen (or vice versa). Mistaking correlation for causation is a common error and can lead to false cause fallacy .

Controlled experiments establish causality, whereas correlational studies only show associations between variables.

In an experimental design , you manipulate an independent variable and measure its effect on a dependent variable. Other variables are controlled so they can’t impact the results.
In a correlational design , you measure variables without manipulating any of them. You can test whether your variables change together, but you can’t be sure that one variable caused a change in another.

In general, correlational research is high in external validity while experimental research is high in internal validity .

A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.

A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.

Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.

A correlational research design investigates relationships between two variables (or more) without the researcher controlling or manipulating any of them. It’s a non-experimental type of quantitative research .

A correlation reflects the strength and/or direction of the association between two or more variables.

A positive correlation means that both variables change in the same direction.
A negative correlation means that the variables change in opposite directions.
A zero correlation means there’s no relationship between the variables.

Random error is almost always present in scientific studies, even in highly controlled settings. While you can’t eradicate it completely, you can reduce random error by taking repeated measurements, using a large sample, and controlling extraneous variables .

You can avoid systematic error through careful design of your sampling , data collection , and analysis procedures. For example, use triangulation to measure your variables using multiple methods; regularly calibrate instruments or procedures; use random sampling and random assignment ; and apply masking (blinding) where possible.

Systematic error is generally a bigger problem in research.

With random error, multiple measurements will tend to cluster around the true value. When you’re collecting data from a large sample , the errors in different directions will cancel each other out.

Systematic errors are much more problematic because they can skew your data away from the true value. This can lead you to false conclusions ( Type I and II errors ) about the relationship between the variables you’re studying.

Random and systematic error are two types of measurement error.

Random error is a chance difference between the observed and true values of something (e.g., a researcher misreading a weighing scale records an incorrect measurement).

Systematic error is a consistent or proportional difference between the observed and true values of something (e.g., a miscalibrated scale consistently records weights as higher than they actually are).

On graphs, the explanatory variable is conventionally placed on the x-axis, while the response variable is placed on the y-axis.

If you have quantitative variables , use a scatterplot or a line graph.
If your response variable is categorical, use a scatterplot or a line graph.
If your explanatory variable is categorical, use a bar graph.

The term “ explanatory variable ” is sometimes preferred over “ independent variable ” because, in real world contexts, independent variables are often influenced by other variables. This means they aren’t totally independent.

Multiple independent variables may also be correlated with each other, so “explanatory variables” is a more appropriate term.

The difference between explanatory and response variables is simple:

An explanatory variable is the expected cause, and it explains the results.
A response variable is the expected effect, and it responds to other variables.

In a controlled experiment , all extraneous variables are held constant so that they can’t influence the results. Controlled experiments require:

A control group that receives a standard treatment, a fake treatment, or no treatment.
Random assignment of participants to ensure the groups are equivalent.

Depending on your study topic, there are various other methods of controlling variables .

There are 4 main types of extraneous variables :

Demand characteristics : environmental cues that encourage participants to conform to researchers’ expectations.
Experimenter effects : unintentional actions by researchers that influence study outcomes.
Situational variables : environmental variables that alter participants’ behaviors.
Participant variables : any characteristic or aspect of a participant’s background that could affect study results.

An extraneous variable is any variable that you’re not investigating that can potentially affect the dependent variable of your research study.

A confounding variable is a type of extraneous variable that not only affects the dependent variable, but is also related to the independent variable.

In a factorial design, multiple independent variables are tested.

If you test two variables, each level of one independent variable is combined with each level of the other independent variable to create different conditions.

Within-subjects designs have many potential threats to internal validity , but they are also very statistically powerful .

Advantages:

Only requires small samples
Statistically powerful
Removes the effects of individual differences on the outcomes

Disadvantages:

Internal validity threats reduce the likelihood of establishing a direct relationship between variables
Time-related effects, such as growth, can influence the outcomes
Carryover effects mean that the specific order of different treatments affect the outcomes

While a between-subjects design has fewer threats to internal validity , it also requires more participants for high statistical power than a within-subjects design .

Prevents carryover effects of learning and fatigue.
Shorter study duration.
Needs larger samples for high power.
Uses more resources to recruit participants, administer sessions, cover costs, etc.
Individual differences may be an alternative explanation for results.

Yes. Between-subjects and within-subjects designs can be combined in a single study when you have two or more independent variables (a factorial design). In a mixed factorial design, one variable is altered between subjects and another is altered within subjects.

In a between-subjects design , every participant experiences only one condition, and researchers assess group differences between participants in various conditions.

In a within-subjects design , each participant experiences all conditions, and researchers test the same participants repeatedly for differences between conditions.

The word “between” means that you’re comparing different conditions between groups, while the word “within” means you’re comparing different conditions within the same group.

Random assignment is used in experiments with a between-groups or independent measures design. In this research design, there’s usually a control group and one or more experimental groups. Random assignment helps ensure that the groups are comparable.

In general, you should always use random assignment in this type of experimental design when it is ethically possible and makes sense for your study topic.

To implement random assignment , assign a unique number to every member of your study’s sample .

Then, you can use a random number generator or a lottery method to randomly assign each number to a control or experimental group. You can also do so manually, by flipping a coin or rolling a dice to randomly assign participants to groups.

Random selection, or random sampling , is a way of selecting members of a population for your study’s sample.

In contrast, random assignment is a way of sorting the sample into control and experimental groups.

Random sampling enhances the external validity or generalizability of your results, while random assignment improves the internal validity of your study.

“Controlling for a variable” means measuring extraneous variables and accounting for them statistically to remove their effects on other variables.

Researchers often model control variable data along with independent and dependent variable data in regression analyses and ANCOVAs . That way, you can isolate the control variable’s effects from the relationship between the variables of interest.

Control variables help you establish a correlational or causal relationship between variables by enhancing internal validity .

If you don’t control relevant extraneous variables , they may influence the outcomes of your study, and you may not be able to demonstrate that your results are really an effect of your independent variable .

A control variable is any variable that’s held constant in a research study. It’s not a variable of interest in the study, but it’s controlled because it could influence the outcomes.

Including mediators and moderators in your research helps you go beyond studying a simple relationship between two variables for a fuller picture of the real world. They are important to consider when studying complex correlational or causal relationships.

Mediators are part of the causal pathway of an effect, and they tell you how or why an effect takes place. Moderators usually help you judge the external validity of your study by identifying the limitations of when the relationship between variables holds.

If something is a mediating variable :

It’s caused by the independent variable .
It influences the dependent variable
When it’s taken into account, the statistical correlation between the independent and dependent variables is higher than when it isn’t considered.

A confounder is a third variable that affects variables of interest and makes them seem related when they are not. In contrast, a mediator is the mechanism of a relationship between two variables: it explains the process by which they are related.

A mediator variable explains the process through which two variables are related, while a moderator variable affects the strength and direction of that relationship.

There are three key steps in systematic sampling :

Define and list your population , ensuring that it is not ordered in a cyclical or periodic order.
Decide on your sample size and calculate your interval, k , by dividing your population by your target sample size.
Choose every k th member of the population as your sample.

Systematic sampling is a probability sampling method where researchers select members of the population at a regular interval – for example, by selecting every 15th person on a list of the population. If the population is in a random order, this can imitate the benefits of simple random sampling .

Yes, you can create a stratified sample using multiple characteristics, but you must ensure that every participant in your study belongs to one and only one subgroup. In this case, you multiply the numbers of subgroups for each characteristic to get the total number of groups.

For example, if you were stratifying by location with three subgroups (urban, rural, or suburban) and marital status with five subgroups (single, divorced, widowed, married, or partnered), you would have 3 x 5 = 15 subgroups.

You should use stratified sampling when your sample can be divided into mutually exclusive and exhaustive subgroups that you believe will take on different mean values for the variable that you’re studying.

Using stratified sampling will allow you to obtain more precise (with lower variance ) statistical estimates of whatever you are trying to measure.

For example, say you want to investigate how income differs based on educational attainment, but you know that this relationship can vary based on race. Using stratified sampling, you can ensure you obtain a large enough sample from each racial group, allowing you to draw more precise conclusions.

In stratified sampling , researchers divide subjects into subgroups called strata based on characteristics that they share (e.g., race, gender, educational attainment).

Once divided, each subgroup is randomly sampled using another probability sampling method.

Cluster sampling is more time- and cost-efficient than other probability sampling methods , particularly when it comes to large samples spread across a wide geographical area.

However, it provides less statistical certainty than other methods, such as simple random sampling , because it is difficult to ensure that your clusters properly represent the population as a whole.

There are three types of cluster sampling : single-stage, double-stage and multi-stage clustering. In all three types, you first divide the population into clusters, then randomly select clusters for use in your sample.

In single-stage sampling , you collect data from every unit within the selected clusters.
In double-stage sampling , you select a random sample of units from within the clusters.
In multi-stage sampling , you repeat the procedure of randomly sampling elements from within the clusters until you have reached a manageable sample.

Cluster sampling is a probability sampling method in which you divide a population into clusters, such as districts or schools, and then randomly select some of these clusters as your sample.

The clusters should ideally each be mini-representations of the population as a whole.

If properly implemented, simple random sampling is usually the best sampling method for ensuring both internal and external validity . However, it can sometimes be impractical and expensive to implement, depending on the size of the population to be studied,

If you have a list of every member of the population and the ability to reach whichever members are selected, you can use simple random sampling.

The American Community Survey is an example of simple random sampling . In order to collect detailed data on the population of the US, the Census Bureau officials randomly select 3.5 million households per year and use a variety of methods to convince them to fill out the survey.

Simple random sampling is a type of probability sampling in which the researcher randomly selects a subset of participants from a population . Each member of the population has an equal chance of being selected. Data is then collected from as large a percentage as possible of this random subset.

Quasi-experimental design is most useful in situations where it would be unethical or impractical to run a true experiment .

Quasi-experiments have lower internal validity than true experiments, but they often have higher external validity as they can use real-world interventions instead of artificial laboratory settings.

A quasi-experiment is a type of research design that attempts to establish a cause-and-effect relationship. The main difference with a true experiment is that the groups are not randomly assigned.

Blinding is important to reduce research bias (e.g., observer bias , demand characteristics ) and ensure a study’s internal validity .

If participants know whether they are in a control or treatment group , they may adjust their behavior in ways that affect the outcome that researchers are trying to measure. If the people administering the treatment are aware of group assignment, they may treat participants differently and thus directly or indirectly influence the final results.

In a single-blind study , only the participants are blinded.
In a double-blind study , both participants and experimenters are blinded.
In a triple-blind study , the assignment is hidden not only from participants and experimenters, but also from the researchers analyzing the data.

Blinding means hiding who is assigned to the treatment group and who is assigned to the control group in an experiment .

A true experiment (a.k.a. a controlled experiment) always includes at least one control group that doesn’t receive the experimental treatment.

However, some experiments use a within-subjects design to test treatments without a control group. In these designs, you usually compare one group’s outcomes before and after a treatment (instead of comparing outcomes between different groups).

For strong internal validity , it’s usually best to include a control group if possible. Without a control group, it’s harder to be certain that the outcome was caused by the experimental treatment and not by other variables.

An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.

Individual Likert-type questions are generally considered ordinal data , because the items have clear rank order, but don’t have an even distribution.

Overall Likert scale scores are sometimes treated as interval data. These scores are considered to have directionality and even spacing between them.

The type of data determines what statistical tests you should use to analyze your data.

A Likert scale is a rating scale that quantitatively assesses opinions, attitudes, or behaviors. It is made up of 4 or more questions that measure a single attitude or trait when response scores are combined.

To use a Likert scale in a survey , you present participants with Likert-type questions or statements, and a continuum of items, usually with 5 or 7 possible responses, to capture their degree of agreement.

In scientific research, concepts are the abstract ideas or phenomena that are being studied (e.g., educational achievement). Variables are properties or characteristics of the concept (e.g., performance at school), while indicators are ways of measuring or quantifying variables (e.g., yearly grade reports).

The process of turning abstract concepts into measurable variables and indicators is called operationalization .

There are various approaches to qualitative data analysis , but they all share five steps in common:

Prepare and organize your data.
Review and explore your data.
Develop a data coding system.
Assign codes to the data.
Identify recurring themes.

The specifics of each step depend on the focus of the analysis. Some common approaches include textual analysis , thematic analysis , and discourse analysis .

There are five common approaches to qualitative research :

Grounded theory involves collecting data in order to develop new theories.
Ethnography involves immersing yourself in a group or organization to understand its culture.
Narrative research involves interpreting stories to understand how people make sense of their experiences and perceptions.
Phenomenological research involves investigating phenomena through people’s lived experiences.
Action research links theory and practice in several cycles to drive innovative changes.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Operationalization means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.

When conducting research, collecting original data has significant advantages:

You can tailor data collection to your specific research aims (e.g. understanding the needs of your consumers or user testing your website)
You can control and standardize the process for high reliability and validity (e.g. choosing appropriate measurements and sampling methods )

However, there are also some drawbacks: data collection can be time-consuming, labor-intensive and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

There are several methods you can use to decrease the impact of confounding variables on your research: restriction, matching, statistical control and randomization.

In restriction , you restrict your sample by only including certain subjects that have the same values of potential confounding variables.

In matching , you match each of the subjects in your treatment group with a counterpart in the comparison group. The matched subjects have the same values on any potential confounding variables, and only differ in the independent variable .

In statistical control , you include potential confounders as variables in your regression .

In randomization , you randomly assign the treatment (or independent variable) in your study to a sufficiently large number of subjects, which allows you to control for all potential confounding variables.

A confounding variable is closely related to both the independent and dependent variables in a study. An independent variable represents the supposed cause , while the dependent variable is the supposed effect . A confounding variable is a third variable that influences both the independent and dependent variables.

Failing to account for confounding variables can cause you to wrongly estimate the relationship between your independent and dependent variables.

To ensure the internal validity of your research, you must consider the impact of confounding variables. If you fail to account for them, you might over- or underestimate the causal relationship between your independent and dependent variables , or even find a causal relationship where none exists.

Yes, but including more than one of either type requires multiple research questions .

For example, if you are interested in the effect of a diet on health, you can use multiple measures of health: blood sugar, blood pressure, weight, pulse, and many more. Each of these is its own dependent variable with its own research question.

You could also choose to look at the effect of exercise levels as well as diet, or even the additional effect of the two combined. Each of these is a separate independent variable .

To ensure the internal validity of an experiment , you should only change one independent variable at a time.

No. The value of a dependent variable depends on an independent variable, so a variable cannot be both independent and dependent at the same time. It must be either the cause or the effect, not both!

You want to find out how blood sugar levels are affected by drinking diet soda and regular soda, so you conduct an experiment .

The type of soda – diet or regular – is the independent variable .
The level of blood sugar that you measure is the dependent variable – it changes depending on the type of soda.

Determining cause and effect is one of the most important parts of scientific research. It’s essential to know which is the cause – the independent variable – and which is the effect – the dependent variable.

In non-probability sampling , the sample is selected based on non-random criteria, and not every member of the population has a chance of being included.

Common non-probability sampling methods include convenience sampling , voluntary response sampling, purposive sampling , snowball sampling, and quota sampling .

Probability sampling means that every member of the target population has a known chance of being included in the sample.

Probability sampling methods include simple random sampling , systematic sampling , stratified sampling , and cluster sampling .

Using careful research design and sampling procedures can help you avoid sampling bias . Oversampling can be used to correct undercoverage bias .

Some common types of sampling bias include self-selection bias , nonresponse bias , undercoverage bias , survivorship bias , pre-screening or advertising bias, and healthy user bias.

Sampling bias is a threat to external validity – it limits the generalizability of your findings to a broader group of people.

A sampling error is the difference between a population parameter and a sample statistic .

A statistic refers to measures about the sample , while a parameter refers to measures about the population .

Populations are used when a research question requires data from every member of the population. This is usually only feasible when the population is small and easily accessible.

Samples are used to make inferences about populations . Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.

There are seven threats to external validity : selection bias , history, experimenter effect, Hawthorne effect , testing effect, aptitude-treatment and situation effect.

The two types of external validity are population validity (whether you can generalize to other groups of people) and ecological validity (whether you can generalize to other situations and settings).

The external validity of a study is the extent to which you can generalize your findings to different groups of people, situations, and measures.

Cross-sectional studies cannot establish a cause-and-effect relationship or analyze behavior over a period of time. To investigate cause and effect, you need to do a longitudinal study or an experimental study .

Cross-sectional studies are less expensive and time-consuming than many other types of study. They can provide useful insights into a population’s characteristics and identify correlations for further research.

Sometimes only cross-sectional data is available for analysis; other times your research question may only require a cross-sectional study to answer it.

Longitudinal studies can last anywhere from weeks to decades, although they tend to be at least a year long.

The 1970 British Cohort Study , which has collected data on the lives of 17,000 Brits since their births in 1970, is one well-known example of a longitudinal study .

Longitudinal studies are better to establish the correct sequence of events, identify changes over time, and provide insight into cause-and-effect relationships, but they also tend to be more expensive and time-consuming than other types of studies.

Longitudinal studies and cross-sectional studies are two different types of research design . In a cross-sectional study you collect data from a population at a specific point in time; in a longitudinal study you repeatedly collect data from the same sample over an extended period of time.

Longitudinal study	Cross-sectional study
observations	Observations at a in time
Observes the multiple times	Observes (a “cross-section”) in the population
Follows in participants over time	Provides of society at a given point

There are eight threats to internal validity : history, maturation, instrumentation, testing, selection bias , regression to the mean, social interaction and attrition .

Internal validity is the extent to which you can be confident that a cause-and-effect relationship established in a study cannot be explained by other factors.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

The research methods you use depend on the type of data you need to answer your research question .

If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.

A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.

In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.

Discrete and continuous variables are two types of quantitative variables :

Discrete variables represent counts (e.g. the number of objects in a collection).
Continuous variables represent measurable amounts (e.g. water volume or weight).

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

You can think of independent and dependent variables in terms of cause and effect: an independent variable is the variable you think is the cause , while a dependent variable is the effect .

In an experiment, you manipulate the independent variable and measure the outcome in the dependent variable. For example, in an experiment about the effect of nutrients on crop growth:

The independent variable is the amount of nutrients added to the crop field.
The dependent variable is the biomass of the crops at harvest time.

Defining your variables, and deciding how you will manipulate and measure them, is an important part of experimental design .

Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:

A testable hypothesis
At least one independent variable that can be precisely manipulated
At least one dependent variable that can be precisely measured

When designing the experiment, you decide:

How you will manipulate the variable(s)
How you will control for any potential confounding variables
How many subjects or samples will be included in the study
How subjects will be assigned to treatment levels

Experimental design is essential to the internal and external validity of your experiment.

I nternal validity is the degree of confidence that the causal relationship you are testing is not influenced by other factors or variables .

External validity is the extent to which your results can be generalized to other contexts.

The validity of your experiment depends on your experimental design .

Reliability and validity are both about how well a method measures something:

Reliability refers to the consistency of a measure (whether the results can be reproduced under the same conditions).
Validity refers to the accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

A sample is a subset of individuals from a larger population . Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.

In statistics, sampling allows you to test a hypothesis about the characteristics of a population.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

Methodology refers to the overarching strategy and rationale of your research project . It involves studying the methods used in your field and the theories or principles behind them, in order to develop an approach that matches your objectives.

Methods are the specific tools and procedures you use to collect and analyze data (for example, experiments, surveys , and statistical tests ).

In shorter scientific papers, where the aim is to report the findings of a specific study, you might simply describe what you did in a methods section .

In a longer or more complex research project, such as a thesis or dissertation , you will probably include a methodology section , where you explain your approach to answering the research questions and cite relevant sources to support your choice of methods.

Ask our team

Want to contact us directly? No problem. We are always here for you.

Email [email protected]
Start live chat
Call +1 (510) 822-8066
WhatsApp +31 20 261 6040

Our team helps students graduate by offering:

A world-class citation generator
Plagiarism Checker software powered by Turnitin
Innovative Citation Checker software
Professional proofreading services
Over 300 helpful articles about academic writing, citing sources, plagiarism, and more

Scribbr specializes in editing study-related documents . We proofread:

PhD dissertations
Research proposals
Personal statements
Admission essays
Motivation letters
Reflection papers
Journal articles
Capstone projects

Scribbr’s Plagiarism Checker is powered by elements of Turnitin’s Similarity Checker , namely the plagiarism detection software and the Internet Archive and Premium Scholarly Publications content databases .

The add-on AI detector is powered by Scribbr’s proprietary software.

The Scribbr Citation Generator is developed using the open-source Citation Style Language (CSL) project and Frank Bennett’s citeproc-js . It’s the same technology used by dozens of other popular citation tools, including Mendeley and Zotero.

You can find all the citation styles and locales used in the Scribbr Citation Generator in our publicly accessible repository on Github .

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

Knowledge Base
Methodology
Random Assignment in Experiments | Introduction & Examples

Random Assignment in Experiments | Introduction & Examples

Published on 6 May 2022 by Pritha Bhandari . Revised on 13 February 2023.

In experimental research, random assignment is a way of placing participants from your sample into different treatment groups using randomisation.

With simple random assignment, every member of the sample has a known or equal chance of being placed in a control group or an experimental group. Studies that use simple random assignment are also called completely randomised designs .

Random assignment is a key part of experimental design . It helps you ensure that all groups are comparable at the start of a study: any differences between them are due to random factors.

Why does random assignment matter, random sampling vs random assignment, how do you use random assignment, when is random assignment not used, frequently asked questions about random assignment.

Random assignment is an important part of control in experimental research, because it helps strengthen the internal validity of an experiment.

In experiments, researchers manipulate an independent variable to assess its effect on a dependent variable, while controlling for other variables. To do so, they often use different levels of an independent variable for different groups of participants.

This is called a between-groups or independent measures design.

You use three groups of participants that are each given a different level of the independent variable:

A control group that’s given a placebo (no dosage)
An experimental group that’s given a low dosage
A second experimental group that’s given a high dosage

Random assignment to helps you make sure that the treatment groups don’t differ in systematic or biased ways at the start of the experiment.

If you don’t use random assignment, you may not be able to rule out alternative explanations for your results.

Participants recruited from pubs are placed in the control group
Participants recruited from local community centres are placed in the low-dosage experimental group
Participants recruited from gyms are placed in the high-dosage group

With this type of assignment, it’s hard to tell whether the participant characteristics are the same across all groups at the start of the study. Gym users may tend to engage in more healthy behaviours than people who frequent pubs or community centres, and this would introduce a healthy user bias in your study.

Although random assignment helps even out baseline differences between groups, it doesn’t always make them completely equivalent. There may still be extraneous variables that differ between groups, and there will always be some group differences that arise from chance.

Most of the time, the random variation between groups is low, and, therefore, it’s acceptable for further analysis. This is especially true when you have a large sample. In general, you should always use random assignment in experiments when it is ethically possible and makes sense for your study topic.

Prevent plagiarism, run a free check.

Random sampling and random assignment are both important concepts in research, but it’s important to understand the difference between them.

Random sampling (also called probability sampling or random selection) is a way of selecting members of a population to be included in your study. In contrast, random assignment is a way of sorting the sample participants into control and experimental groups.

While random sampling is used in many types of studies, random assignment is only used in between-subjects experimental designs.

Some studies use both random sampling and random assignment, while others use only one or the other.

Random sampling enhances the external validity or generalisability of your results, because it helps to ensure that your sample is unbiased and representative of the whole population. This allows you to make stronger statistical inferences .

You use a simple random sample to collect data. Because you have access to the whole population (all employees), you can assign all 8,000 employees a number and use a random number generator to select 300 employees. These 300 employees are your full sample.

Random assignment enhances the internal validity of the study, because it ensures that there are no systematic differences between the participants in each group. This helps you conclude that the outcomes can be attributed to the independent variable .

A control group that receives no intervention
An experimental group that has a remote team-building intervention every week for a month

You use random assignment to place participants into the control or experimental group. To do so, you take your list of participants and assign each participant a number. Again, you use a random number generator to place each participant in one of the two groups.

To use simple random assignment, you start by giving every member of the sample a unique number. Then, you can use computer programs or manual methods to randomly assign each participant to a group.

Random number generator: Use a computer program to generate random numbers from the list for each group.
Lottery method: Place all numbers individually into a hat or a bucket, and draw numbers at random for each group.
Flip a coin: When you only have two groups, for each number on the list, flip a coin to decide if they’ll be in the control or the experimental group.
Use a dice: When you have three groups, for each number on the list, roll a die to decide which of the groups they will be in. For example, assume that rolling 1 or 2 lands them in a control group; 3 or 4 in an experimental group; and 5 or 6 in a second control or experimental group.

This type of random assignment is the most powerful method of placing participants in conditions, because each individual has an equal chance of being placed in any one of your treatment groups.

Random assignment in block designs

In more complicated experimental designs, random assignment is only used after participants are grouped into blocks based on some characteristic (e.g., test score or demographic variable). These groupings mean that you need a larger sample to achieve high statistical power .

For example, a randomised block design involves placing participants into blocks based on a shared characteristic (e.g., college students vs graduates), and then using random assignment within each block to assign participants to every treatment condition. This helps you assess whether the characteristic affects the outcomes of your treatment.

In an experimental matched design , you use blocking and then match up individual participants from each block based on specific characteristics. Within each matched pair or group, you randomly assign each participant to one of the conditions in the experiment and compare their outcomes.

Sometimes, it’s not relevant or ethical to use simple random assignment, so groups are assigned in a different way.

When comparing different groups

Sometimes, differences between participants are the main focus of a study, for example, when comparing children and adults or people with and without health conditions. Participants are not randomly assigned to different groups, but instead assigned based on their characteristics.

In this type of study, the characteristic of interest (e.g., gender) is an independent variable, and the groups differ based on the different levels (e.g., men, women). All participants are tested the same way, and then their group-level outcomes are compared.

When it’s not ethically permissible

When studying unhealthy or dangerous behaviours, it’s not possible to use random assignment. For example, if you’re studying heavy drinkers and social drinkers, it’s unethical to randomly assign participants to one of the two groups and ask them to drink large amounts of alcohol for your experiment.

When you can’t assign participants to groups, you can also conduct a quasi-experimental study . In a quasi-experiment, you study the outcomes of pre-existing groups who receive treatments that you may not have any control over (e.g., heavy drinkers and social drinkers).

These groups aren’t randomly assigned, but may be considered comparable when some other variables (e.g., age or socioeconomic status) are controlled for.

In experimental research, random assignment is a way of placing participants from your sample into different groups using randomisation. With this method, every member of the sample has a known or equal chance of being placed in a control group or an experimental group.

Random selection, or random sampling , is a way of selecting members of a population for your study’s sample.

In contrast, random assignment is a way of sorting the sample into control and experimental groups.

Random sampling enhances the external validity or generalisability of your results, while random assignment improves the internal validity of your study.

In general, you should always use random assignment in this type of experimental design when it is ethically possible and makes sense for your study topic.

To implement random assignment , assign a unique number to every member of your study’s sample .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2023, February 13). Random Assignment in Experiments | Introduction & Examples. Scribbr. Retrieved 24 June 2024, from https://www.scribbr.co.uk/research-methods/random-assignment-experiments/

Is this article helpful?

Pritha Bhandari

Other students also liked, a quick guide to experimental design | 5 steps & examples, controlled experiments | methods & examples of control, control groups and treatment groups | uses & examples.

Random Sampling vs. Random Assignment

Random sampling and random assignment are fundamental concepts in the realm of research methods and statistics. However, many students struggle to differentiate between these two concepts, and very often use these terms interchangeably. Here we will explain the distinction between random sampling and random assignment.

Random sampling refers to the method you use to select individuals from the population to participate in your study. In other words, random sampling means that you are randomly selecting individuals from the population to participate in your study. This type of sampling is typically done to help ensure the representativeness of the sample (i.e., external validity). It is worth noting that a sample is only truly random if all individuals in the population have an equal probability of being selected to participate in the study. In practice, very few research studies use “true” random sampling because it is usually not feasible to ensure that all individuals in the population have an equal chance of being selected. For this reason, it is especially important to avoid using the term “random sample” if your study uses a nonprobability sampling method (such as convenience sampling).

Discover How We Assist to Edit Your Dissertation Chapters

Aligning theoretical framework, gathering articles, synthesizing gaps, articulating a clear methodology and data plan, and writing about the theoretical and practical implications of your research are part of our comprehensive dissertation editing services.

Bring dissertation editing expertise to chapters 1-5 in timely manner.
Track all changes, then work with you to bring about scholarly writing.
Ongoing support to address committee feedback, reducing revisions.

Random assignment refers to the method you use to place participants into groups in an experimental study. For example, say you are conducting a study comparing the blood pressure of patients after taking aspirin or a placebo. You have two groups of patients to compare: patients who will take aspirin (the experimental group) and patients who will take the placebo (the control group). Ideally, you would want to randomly assign the participants to be in the experimental group or the control group, meaning that each participant has an equal probability of being placed in the experimental or control group. This helps ensure that there are no systematic differences between the groups before the treatment (e.g., the aspirin or placebo) is given to the participants. Random assignment is a fundamental part of a “true” experiment because it helps ensure that any differences found between the groups are attributable to the treatment, rather than a confounding variable.

So, to summarize, random sampling refers to how you select individuals from the population to participate in your study. Random assignment refers to how you place those participants into groups (such as experimental vs. control). Knowing this distinction will help you clearly and accurately describe the methods you use to collect your data and conduct your study.

IMAGES

15 Random Assignment Examples (2024)
PPT
Random Assignment in Experiments
Research Methods
Random Assignment ~ A Simple Introduction with Examples
PPT

VIDEO

random sampling & assignment
💥WEEK 3💥🔥100%🔥RESEARCH METHODOLOGY ASSIGNMENT ANSWERS💥
INV ABC Assignment Groups, Oracle Applications Training
NMIMS -June 2024 Assignment-Research Methodology SEM4
Research Methodology Week 4 Quiz Assignment Solution
NPTEL Research Methodology Week 1 Assignment Solution 2024

COMMENTS

Random Assignment in Experiments
Random sampling (also called probability sampling or random selection) is a way of selecting members of a population to be included in your study. In contrast, random assignment is a way of sorting the sample participants into control and experimental groups. While random sampling is used in many types of studies, random assignment is only used ...
Abnormal psych quizzes Flashcards
Random assignment to groups is a critical part of the methodology in controlled design experiments because each participant has Select one: a. an equal opportunity of being assigned to either group. b. benefit of the treatment intervention. c. an equal opportunity of being part of the analogue sample. d. an ethical right to be a member of ...
The Definition of Random Assignment In Psychology
Random assignment refers to the use of chance procedures in psychology experiments to ensure that each participant has the same opportunity to be assigned to any given group in a study to eliminate any potential bias in the experiment at the outset. Participants are randomly assigned to different groups, such as the treatment group versus the ...
Random Assignment in Psychology (Definition + 40 Examples)
Random Assignment is a process used in research where each participant has an equal chance of being placed in any group within the study. This technique is essential in experiments as it helps to eliminate biases, ensuring that the different groups being compared are similar in all important aspects. ... The methodology of random assignment ...
Random Assignment in Psychology: Definition & Examples
Random selection (also called probability sampling or random sampling) is a way of randomly selecting members of a population to be included in your study. On the other hand, random assignment is a way of sorting the sample participants into control and treatment groups. Random selection ensures that everyone in the population has an equal ...
Issues in Outcomes Research: An Overview of Randomization Techniques
One critical component of clinical trials that strengthens results is random assignment of participants to control and treatment groups. Although randomization appears to be a simple concept, issues of balancing sample sizes and controlling the influence of covariates a priori are important.
Random assignment
Random assignment or random placement is an experimental technique for assigning human participants or animal subjects to different groups in an experiment (e.g., a treatment group versus a control group) using randomization, such as by a chance procedure (e.g., flipping a coin) or a random number generator. This ensures that each participant or subject has an equal chance of being placed in ...
Random Assignment in Psychology
Random assignment is a critical part of any experimental design in science, especially random assignment in psychology. The simplest random assignment definition is that every participant in the ...
Random Assignment in Experiments
Correlation, Causation, and Confounding Variables. Random assignment helps you separate causation from correlation and rule out confounding variables. As a critical component of the scientific method, experiments typically set up contrasts between a control group and one or more treatment groups. The idea is to determine whether the effect, which is the difference between a treatment group and ...
Guide to Experimental Design
Table of contents. Step 1: Define your variables. Step 2: Write your hypothesis. Step 3: Design your experimental treatments. Step 4: Assign your subjects to treatment groups. Step 5: Measure your dependent variable. Other interesting articles. Frequently asked questions about experiments.
Random Assignment
Random distribution can be achieved by an automatized process, like a random number generator, pre-existing lists for assigning the subjects to the groups in a certain order, or even a coin flip. Importantly, random assignment cannot ensure that the distributions of gender, age, and other potential confounds are the same across all groups.
Random assignment: A systematic review.
Evaluating the effectiveness of interventions in producing desired outcomes is critical for developing a sound knowledge base to guide practice in various applied disciplines such as psychology, medicine, and nursing. The experimental or randomized controlled trial (RCT) design is viewed as the "gold standard" for effectiveness research. Random assignment is the cornerstone of the experimental ...
8.1 Experimental design: What is it and when should it be used?
Random assignment uses a random number generator or some other random process to assign people into experimental and control groups. Random assignment is important in experimental research because it helps to ensure that the experimental group and control group are comparable and that any differences between the experimental and control groups ...
Elements of Research : Random Assignment
Random assignment is a procedure used in experiments to create multiple study groups that include participants with similar characteristics so that the groups are equivalent at the beginning of the study. The procedure involves assigning individuals to an experimental treatment or program at random, or by chance (like the flip of a coin).
PDF Random assignment: It's all in the cards
2. Explain HOW you (the researcher) will conduct random assignment. 3. Argue WHY you (the researcher) will conduct random assignment. In your answer, be sure to discuss at least one confounding variable that is equally distributed between the control and experimental groups. Underline the confounding variable.
Ch. 14: Random Sampling, Random Assignment, and Causality
Random Assignment. a procedure that is applied to subjects to assign them to conditions of an experiment. For each individual, a random method is used to determine the level of the independent variable imposed on that individual. Confound. any systematic difference between 2 groups other than the independent variable.
Establishing Equivalence: Methodological Progress in Group-Matching
Abstract. This methodological review draws attention to the challenges faced by intellectual and developmental disabilities researchers in the appropriate design and analysis of group comparison studies. We provide a brief overview of matching methodologies in the field, emphasizing group-matching designs utilized in behavioral research on ...
What is random assignment?
In experimental research, random assignment is a way of placing participants from your sample into different groups using randomisation. With this method, every member of the sample has a known or equal chance of being placed in a control group or an experimental group.
What is random assignment?
In this research design, there's usually a control group and one or more experimental groups. Random assignment helps ensure that the groups are comparable. In general, you should always use random assignment in this type of experimental design when it is ethically possible and makes sense for your study topic.
Solved Random assignment to groups is a critical part of the
Random assignment to groups is a critical part of the methodology in controlled design experiments because each participant has Your solution's ready to go! Enhanced with AI, our expert help has broken down your problem into an easy-to-learn solution you can count on.
Random Assignment in Experiments
Random sampling (also called probability sampling or random selection) is a way of selecting members of a population to be included in your study. In contrast, random assignment is a way of sorting the sample participants into control and experimental groups. While random sampling is used in many types of studies, random assignment is only used ...
Random Sampling vs. Random Assignment
Random assignment is a fundamental part of a "true" experiment because it helps ensure that any differences found between the groups are attributable to the treatment, rather than a confounding variable. So, to summarize, random sampling refers to how you select individuals from the population to participate in your study.
Random Assignment
The way in which random assignment is implemented depends on the way in which applicants are recruited for an intervention. Sometimes a large pool of applicants are recruited. In these cases, information on the entire pool can be sent to the research team at the same time, simplifying the random assignment. The process is more complicated when ...

The Definition of Random Assignment According to Psychology

Random Assignment In Research

Random Selection

Example of Random Assignment

A Word From Verywell

Random Assignment in Psychology (Definition + 40 Examples)

History of Random Assignment

Early Studies Utilizing Random Assignment

Evolution of the Methodology

Principles of Random Assignment

Basic Principles of Random Assignment

Theoretical Foundation

Differentiating Random Assignment from Random Selection

Methodology of Random Assignment

Tools and Techniques Used

Ethical Considerations

Conclusion of Methodology

Benefits of Random Assignment in Psychological Research

Facilitating Causal Inferences

Ensuring Internal Validity

Enhancing Generalizability

Limitations of Random Assignment

Ethical Dilemmas

Generalizability Concerns

Practical and Real-World Limitations

Response to Critiques

Real-World Applications and Examples

Real-world Impact of Random Assignment

Health and Medicine

Workplace and Organizational Behavior

Environmental and Social Changes

Technology and Human Interaction

Related posts:

About The Author

Free Personality Test

Free Memory Test

Free IQ Test

Psychology Resources

Psychology Tests

Random Assignment in Psychology: Definition & Examples

Importance

Random Selection vs. Random Assignment

Random Assignment vs Random Sampling

When to Use Random Assignment

How to Use Random Assignment

When is Random Assignment not used?

Drawbacks of Random Assignment

What is the difference between random sampling and random assignment?

Does random assignment increase internal validity?

Does random assignment reduce sampling error?

When is random assignment not possible?

Does random assignment eliminate confounding variables?

Why is random assignment of participants to treatment conditions in an experiment used?

Further Reading

Random Assignment in Experiments

Correlation, Causation, and Confounding Variables

Example of Confounding in an Experiment

Alternative Explanations for Differences in Outcomes

Experiments Must Account for Confounding Variables

Random Assignment Can Reduce the Impact of Confounding Variables

Random Assignment Distributes Confounders Equally

Comparing the Vitamin Study With and Without Random Assignment

Drawbacks of Random Assignment

Read About Real Experiments that Used Random Assignment

Share this:

Reader Interactions

Comments and Questions Cancel reply

Random Assignment

Introduction

Why Is Random Assignment Crucial for Statistical Inference?

Access this chapter

Author information

Corresponding author

Editor information

Section Editor information

Rights and permissions

Copyright information

About this entry

Download citation

Share this entry