statistics research papers free

Statistical Papers

Statistical Papers is a forum for presentation and critical assessment of statistical methods encouraging the discussion of methodological foundations and potential applications.

The Journal stresses statistical methods that have broad applications, giving special attention to those relevant to the economic and social sciences.
Covers all topics of modern data science, such as frequentist and Bayesian design and inference as well as statistical learning.
Contains original research papers (regular articles), survey articles, short communications, reports on statistical software, and book reviews.
High author satisfaction with 90% likely to publish in the journal again.
Werner G. Müller,
Carsten Jentsch,
Shuangzhe Liu,
Ulrike Schneider

Latest issue

Volume 65, Issue 6

Latest articles

Inference on weibull inverted exponential distribution under progressive first-failure censoring with constant-stress partially accelerated life test.

Abdullah Fathi
Al-Wageh A. Farghal
Ahmed A. Soliman

Multiple testing of interval composite null hypotheses using randomized p -values

Daniel Ochieng

On the statistical analysis of high-dimensional factor models

Jianhua Guo

A new integrated discrimination improvement index via odds

Kenichi Hayashi
Shinto Eguchi

Exact distribution of change-point MLE for a Multivariate normal sequence

Mohammad Esmail Dehghan Monfared

Journal updates

Write & submit: overleaf latex template.

Overleaf LaTeX Template

Journal information

Australian Business Deans Council (ABDC) Journal Quality List
Current Index to Statistics
Google Scholar
Japanese Science and Technology Agency (JST)
Mathematical Reviews
Norwegian Register for Scientific Journals and Series
OCLC WorldCat Discovery Service
Research Papers in Economics (RePEc)
Science Citation Index Expanded (SCIE)
TD Net Discovery Service
UGC-CARE List (India)

Rights and permissions

Editorial policies

Find a journal
Publish with us
Track your research

Youth Program
Wharton Online

Research Papers / Publications

Help | Advanced Search

arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. Materials on this site are not peer-reviewed by arXiv.

arXiv is a free distribution service and an open-access archive for scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. Materials on this site are not peer-reviewed by arXiv.

Stay up to date with what is happening at arXiv on our blog.

Mathematics

Mathematics ( math new , recent , search ) includes: (see detailed description ): Algebraic Geometry ; Algebraic Topology ; Analysis of PDEs ; Category Theory ; Classical Analysis and ODEs ; Combinatorics ; Commutative Algebra ; Complex Variables ; Differential Geometry ; Dynamical Systems ; Functional Analysis ; General Mathematics ; General Topology ; Geometric Topology ; Group Theory ; History and Overview ; Information Theory ; K-Theory and Homology ; Logic ; Mathematical Physics ; Metric Geometry ; Number Theory ; Numerical Analysis ; Operator Algebras ; Optimization and Control ; Probability ; Quantum Algebra ; Representation Theory ; Rings and Algebras ; Spectral Theory ; Statistics Theory ; Symplectic Geometry

Computer Science

Computing Research Repository ( CoRR new , recent , search ) includes: (see detailed description ): Artificial Intelligence ; Computation and Language ; Computational Complexity ; Computational Engineering, Finance, and Science ; Computational Geometry ; Computer Science and Game Theory ; Computer Vision and Pattern Recognition ; Computers and Society ; Cryptography and Security ; Data Structures and Algorithms ; Databases ; Digital Libraries ; Discrete Mathematics ; Distributed, Parallel, and Cluster Computing ; Emerging Technologies ; Formal Languages and Automata Theory ; General Literature ; Graphics ; Hardware Architecture ; Human-Computer Interaction ; Information Retrieval ; Information Theory ; Logic in Computer Science ; Machine Learning ; Mathematical Software ; Multiagent Systems ; Multimedia ; Networking and Internet Architecture ; Neural and Evolutionary Computing ; Numerical Analysis ; Operating Systems ; Other Computer Science ; Performance ; Programming Languages ; Robotics ; Social and Information Networks ; Software Engineering ; Sound ; Symbolic Computation ; Systems and Control

Quantitative Biology

Quantitative Biology ( q-bio new , recent , search ) includes: (see detailed description ): Biomolecules ; Cell Behavior ; Genomics ; Molecular Networks ; Neurons and Cognition ; Other Quantitative Biology ; Populations and Evolution ; Quantitative Methods ; Subcellular Processes ; Tissues and Organs

Quantitative Finance

Quantitative Finance ( q-fin new , recent , search ) includes: (see detailed description ): Computational Finance ; Economics ; General Finance ; Mathematical Finance ; Portfolio Management ; Pricing of Securities ; Risk Management ; Statistical Finance ; Trading and Market Microstructure
Statistics ( stat new , recent , search ) includes: (see detailed description ): Applications ; Computation ; Machine Learning ; Methodology ; Other Statistics ; Statistics Theory

Electrical Engineering and Systems Science

Electrical Engineering and Systems Science ( eess new , recent , search ) includes: (see detailed description ): Audio and Speech Processing ; Image and Video Processing ; Signal Processing ; Systems and Control
Economics ( econ new , recent , search ) includes: (see detailed description ): Econometrics ; General Economics ; Theoretical Economics

About arXiv

General information
How to Submit to arXiv
Membership & Giving

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals

Statistics articles from across Nature Portfolio

Statistics is the application of mathematical concepts to understanding and analysing large collections of data. A central tenet of statistics is to describe the variations in a data set or population using probability distributions. This analysis aids understanding of what underlies these variations and enables predictions of future changes.

Latest Research and Reviews

Employees’ pro-environmental behavior in an organization: a case study in the UAE

Nadin Alherimi
Ayman Alzaaterh

The predictive capability of several anthropometric indices for identifying the risk of metabolic syndrome and its components among industrial workers

Ekaterina D. Konstantinova
Tatiana A. Maslakova
Svetlana Yu. Ogorodnikova

A scalable synergy-first backbone decomposition of higher-order structures in complex systems

Thomas F. Varley

A bayesian spatio-temporal dynamic analysis of food security in Africa

Adusei Bofa
Temesgen Zewotir

Research on the influencing factors of promoting flipped classroom teaching based on the integrated UTAUT model and learning engagement theory

Peak response regularization for localization

Jinzhen Yao

News and Comment

Efficient learning of many-body systems

The Hamiltonian describing a quantum many-body system can be learned using measurements in thermal equilibrium. Now, a learning algorithm applicable to many natural systems has been found that requires exponentially fewer measurements than existing methods.

Fudging the volcano-plot without dredging the data

Selecting omic biomarkers using both their effect size and their differential status significance ( i.e. , selecting the “volcano-plot outer spray”) has long been equally biologically relevant and statistically troublesome. However, recent proposals are paving the way to resolving this dilemma.

Thomas Burger

Disentangling truth from bias in naturally occurring data

A technique that leverages duplicate records in crowdsourcing data could help to mitigate the effects of biases in research and services that are dependent on government records.

Daniel T. O’Brien

Sciama’s argument on life in a random universe and distinguishing apples from oranges

Dennis Sciama has argued that the existence of life depends on many quantities—the fundamental constants—so in a random universe life should be highly unlikely. However, without full knowledge of these constants, his argument implies a universe that could appear to be ‘intelligently designed’.

Zhi-Wei Wang
Samuel L. Braunstein

A method for generating constrained surrogate power laws

A paper in Physical Review X presents a method for numerically generating data sequences that are as likely to be observed under a power law as a given observed dataset.

Zoe Budrikis

Connected climate tipping elements

Tipping elements are regions that are vulnerable to climate change and capable of sudden drastic changes. Now research establishes long-distance linkages between tipping elements, with the network analysis offering insights into their interactions on a global scale.

Valerie N. Livina

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organizations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organize and summarize the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalize your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarize your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, other interesting articles.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
Null hypothesis: Parental income and GPA have no relationship with each other in college students.
Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
Experimental
Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalize your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Variable	Type of data
Age	Quantitative (ratio)
Gender	Categorical (nominal)
Race or ethnicity	Categorical (nominal)
Baseline test scores	Quantitative (interval)
Final test scores	Quantitative (interval)


Parental income	Quantitative (ratio)
GPA	Quantitative (interval)

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

Probability sampling: every member of the population has a chance of being selected for the study through random selection.
Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalizable findings, you should use a probability sampling method. Random selection reduces several types of research bias , like sampling bias , and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to at risk for biases like self-selection bias , they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

your sample is representative of the population you’re generalizing your findings to.
your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalize your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialized, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalized in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

Will you have resources to advertise your study widely, including outside of your university setting?
Will you have the means to recruit a diverse sample that represents a broad population?
Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
Expected effect size : a standardized indication of how large the expected result of your study will be, usually based on other similar studies.
Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarize them.

Inspect your data

There are various ways to inspect your data, including the following:

Organizing data from each variable in frequency distribution tables .
Displaying data from a key variable in a bar chart to view the distribution of responses.
Visualizing the relationship between two variables using a scatter plot .

By visualizing your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

Mode : the most popular response or value in the data set.
Median : the value in the exact middle of the data set when ordered from low to high.
Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

Range : the highest value minus the lowest value of the data set.
Interquartile range : the range of the middle half of the data set.
Standard deviation : the average distance between each value in your data set and the mean.
Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

	Pretest scores	Posttest scores
Mean	68.44	75.25
Standard deviation	9.43	9.88
Variance	88.96	97.96
Range	36.25	45.12
	30

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

	Parental income (USD)	GPA
Mean	62,100	3.12
Standard deviation	15,000	0.45
Variance	225,000,000	0.16
Range	8,000–378,000	2.64–4.00
	653

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

Estimation: calculating population parameters based on sample statistics.
Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

A point estimate : a value that represents your best guess of the exact parameter.
An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

A test statistic tells you how much your data differs from the null hypothesis of the test.
A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

Comparison tests assess group differences in outcomes.
Regression tests assess cause-and-effect relationships between variables.
Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

A simple linear regression includes one predictor variable and one outcome variable.
A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
A z test is for exactly 1 or 2 groups when the sample is large.
An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

If you have only one sample that you want to compare to a population mean, use a one-sample test .
If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
If you expect a difference between groups in a specific direction, use a one-tailed test .
If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

a t value (test statistic) of 3.00
a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

a t value of 3.08
a p value of 0.001

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

Academic style
Vague sentences
Style consistency

See an example

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimize the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasizes null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

Student’s t -distribution
Normal distribution
Null and Alternative Hypotheses
Chi square tests
Confidence interval

Methodology

Cluster sampling
Stratified sampling
Data cleansing
Reproducibility vs Replicability
Peer review
Likert scale

Research bias

Implicit bias
Framing effect
Cognitive bias
Placebo effect
Hawthorne effect
Hostile attribution bias
Affect heuristic

Is this article helpful?

Other students also liked.

Descriptive Statistics | Definitions, Types, Examples
Inferential Statistics | An Easy Introduction & Examples
Choosing the Right Statistical Test | Types & Examples

What is your plagiarism score?

Free Databases (all subjects): Statistical Sources

Anthropology
Theater Arts
Criminal Justice
Dissertations
Ethnic Studies
Free Online Journals
Gerontology
Kinesiology
Library Science
Political Science
Encyclopedias
Dictionaries
Style and Citation Guides
Engineering
Environment
Physics/Astronomy
Science Education

Statistical Sources

Women's Studies
DES (Data Access Tools) A number of different databases from the U.S. Census Bureau.
Ersys Includes detailed statistics on nearly every metropolitan area in the US. Based on 2000 Census data.
Explore Census Data Databases provided by the Bureau of the Census including: Annual Survey of Manufactures, Census Tract Street Locator, USA Counties 1996, and ZIP Code Business Patterns.
Data and Statistics About the U.S. Find data about the U.S., such as demographic and economic data, population, and maps. Get information about the 2020 U.S. Census.
Google Data Set Search Data set Search, search engine to find data repositories on the web, and to local and national governments around the world. This search engine leads to data sets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page.
Pew Research Center Provides free access to data and reports which document the impact of the Internet and technology on American life. The site also includes government information and policy resources related to the spread of the Internet.
Statistical Sources From Cornell University
<< Previous: Sociology
Next: Women's Studies >>
Last Updated: Jun 10, 2024 12:44 PM
URL: https://csulb.libguides.com/freedatabases

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Indian J Anaesth
v.60(9); 2016 Sep

Basic statistical tools in research and data analysis

Zulfiqar ali.

Department of Anaesthesiology, Division of Neuroanaesthesiology, Sheri Kashmir Institute of Medical Sciences, Soura, Srinagar, Jammu and Kashmir, India

S Bala Bhaskar

1 Department of Anaesthesiology and Critical Care, Vijayanagar Institute of Medical Sciences, Bellary, Karnataka, India

Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

INTRODUCTION

Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population.[ 1 ] This requires a proper design of the study, an appropriate selection of the study sample and choice of a suitable statistical test. An adequate knowledge of statistics is necessary for proper designing of an epidemiological study or a clinical trial. Improper statistical methods may result in erroneous conclusions which may lead to unethical practice.[ 2 ]

Variable is a characteristic that varies from one individual member of population to another individual.[ 3 ] Variables such as height and weight are measured by some type of scale, convey quantitative information and are called as quantitative variables. Sex and eye colour give qualitative information and are called as qualitative variables[ 3 ] [ Figure 1 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g001.jpg

Classification of variables

Quantitative variables

Quantitative or numerical data are subdivided into discrete and continuous measurements. Discrete numerical data are recorded as a whole number such as 0, 1, 2, 3,… (integer), whereas continuous data can assume any value. Observations that can be counted constitute the discrete data and observations that can be measured constitute the continuous data. Examples of discrete data are number of episodes of respiratory arrests or the number of re-intubations in an intensive care unit. Similarly, examples of continuous data are the serial serum glucose levels, partial pressure of oxygen in arterial blood and the oesophageal temperature.

A hierarchical scale of increasing precision can be used for observing and recording the data which is based on categorical, ordinal, interval and ratio scales [ Figure 1 ].

Categorical or nominal variables are unordered. The data are merely classified into categories and cannot be arranged in any particular order. If only two categories exist (as in gender male and female), it is called as a dichotomous (or binary) data. The various causes of re-intubation in an intensive care unit due to upper airway obstruction, impaired clearance of secretions, hypoxemia, hypercapnia, pulmonary oedema and neurological impairment are examples of categorical variables.

Ordinal variables have a clear ordering between the variables. However, the ordered data may not have equal intervals. Examples are the American Society of Anesthesiologists status or Richmond agitation-sedation scale.

Interval variables are similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. A good example of an interval scale is the Fahrenheit degree scale used to measure temperature. With the Fahrenheit scale, the difference between 70° and 75° is equal to the difference between 80° and 85°: The units of measurement are equal throughout the full range of the scale.

Ratio scales are similar to interval scales, in that equal differences between scale values have equal quantitative meaning. However, ratio scales also have a true zero point, which gives them an additional property. For example, the system of centimetres is an example of a ratio scale. There is a true zero point and the value of 0 cm means a complete absence of length. The thyromental distance of 6 cm in an adult may be twice that of a child in whom it may be 3 cm.

STATISTICS: DESCRIPTIVE AND INFERENTIAL STATISTICS

Descriptive statistics[ 4 ] try to describe the relationship between variables in a sample or population. Descriptive statistics provide a summary of data in the form of mean, median and mode. Inferential statistics[ 4 ] use a random sample of data taken from a population to describe and make inferences about the whole population. It is valuable when it is not possible to examine each member of an entire population. The examples if descriptive and inferential statistics are illustrated in Table 1 .

Example of descriptive and inferential statistics

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g002.jpg

Descriptive statistics

The extent to which the observations cluster around a central location is described by the central tendency and the spread towards the extremes is described by the degree of dispersion.

Measures of central tendency

The measures of central tendency are mean, median and mode.[ 6 ] Mean (or the arithmetic average) is the sum of all the scores divided by the number of scores. Mean may be influenced profoundly by the extreme variables. For example, the average stay of organophosphorus poisoning patients in ICU may be influenced by a single patient who stays in ICU for around 5 months because of septicaemia. The extreme values are called outliers. The formula for the mean is

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g003.jpg

where x = each observation and n = number of observations. Median[ 6 ] is defined as the middle of a distribution in a ranked data (with half of the variables in the sample above and half below the median value) while mode is the most frequently occurring variable in a distribution. Range defines the spread, or variability, of a sample.[ 7 ] It is described by the minimum and maximum values of the variables. If we rank the data and after ranking, group the observations into percentiles, we can get better information of the pattern of spread of the variables. In percentiles, we rank the observations into 100 equal parts. We can then describe 25%, 50%, 75% or any other percentile amount. The median is the 50 th percentile. The interquartile range will be the observations in the middle 50% of the observations about the median (25 th -75 th percentile). Variance[ 7 ] is a measure of how spread out is the distribution. It gives an indication of how close an individual observation clusters about the mean value. The variance of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g004.jpg

where σ 2 is the population variance, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The variance of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g005.jpg

where s 2 is the sample variance, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. The formula for the variance of a population has the value ‘ n ’ as the denominator. The expression ‘ n −1’ is known as the degrees of freedom and is one less than the number of parameters. Each observation is free to vary, except the last one which must be a defined value. The variance is measured in squared units. To make the interpretation of the data simple and to retain the basic unit of observation, the square root of variance is used. The square root of the variance is the standard deviation (SD).[ 8 ] The SD of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g006.jpg

where σ is the population SD, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The SD of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g007.jpg

where s is the sample SD, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. An example for calculation of variation and SD is illustrated in Table 2 .

Example of mean, variance, standard deviation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g008.jpg

Normal distribution or Gaussian distribution

Most of the biological variables usually cluster around a central value, with symmetrical positive and negative deviations about this point.[ 1 ] The standard normal distribution curve is a symmetrical bell-shaped. In a normal distribution curve, about 68% of the scores are within 1 SD of the mean. Around 95% of the scores are within 2 SDs of the mean and 99% within 3 SDs of the mean [ Figure 2 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g009.jpg

Normal distribution curve

Skewed distribution

It is a distribution with an asymmetry of the variables about its mean. In a negatively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the right of Figure 1 . In a positively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the left of the figure leading to a longer right tail.

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g010.jpg

Curves showing negatively skewed and positively skewed distribution

Inferential statistics

In inferential statistics, data are analysed from a sample to make inferences in the larger collection of the population. The purpose is to answer or test the hypotheses. A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. Hypothesis tests are thus procedures for making rational decisions about the reality of observed effects.

Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty).

In inferential statistics, the term ‘null hypothesis’ ( H 0 ‘ H-naught ,’ ‘ H-null ’) denotes that there is no relationship (difference) between the population variables in question.[ 9 ]

Alternative hypothesis ( H 1 and H a ) denotes that a statement between the variables is expected to be true.[ 9 ]

The P value (or the calculated probability) is the probability of the event occurring by chance if the null hypothesis is true. The P value is a numerical between 0 and 1 and is interpreted by researchers in deciding whether to reject or retain the null hypothesis [ Table 3 ].

P values with interpretation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g011.jpg

If P value is less than the arbitrarily chosen value (known as α or the significance level), the null hypothesis (H0) is rejected [ Table 4 ]. However, if null hypotheses (H0) is incorrectly rejected, this is known as a Type I error.[ 11 ] Further details regarding alpha error, beta error and sample size calculation and factors influencing them are dealt with in another section of this issue by Das S et al .[ 12 ]

Illustration for null hypothesis

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g012.jpg

PARAMETRIC AND NON-PARAMETRIC TESTS

Numerical data (quantitative variables) that are normally distributed are analysed with parametric tests.[ 13 ]

Two most basic prerequisites for parametric statistical analysis are:

The assumption of normality which specifies that the means of the sample group are normally distributed
The assumption of equal variance which specifies that the variances of the samples and of their corresponding population are equal.

However, if the distribution of the sample is skewed towards one side or the distribution is unknown due to the small sample size, non-parametric[ 14 ] statistical techniques are used. Non-parametric tests are used to analyse ordinal and categorical data.

Parametric tests

The parametric tests assume that the data are on a quantitative (numerical) scale, with a normal distribution of the underlying population. The samples have the same variance (homogeneity of variances). The samples are randomly drawn from the population, and the observations within a group are independent of each other. The commonly used parametric tests are the Student's t -test, analysis of variance (ANOVA) and repeated measures ANOVA.

Student's t -test

Student's t -test is used to test the null hypothesis that there is no difference between the means of the two groups. It is used in three circumstances:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g013.jpg

where X = sample mean, u = population mean and SE = standard error of mean

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g014.jpg

where X 1 − X 2 is the difference between the means of the two groups and SE denotes the standard error of the difference.

To test if the population means estimated by two dependent samples differ significantly (the paired t -test). A usual setting for paired t -test is when measurements are made on the same subjects before and after a treatment.

The formula for paired t -test is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g015.jpg

where d is the mean difference and SE denotes the standard error of this difference.

The group variances can be compared using the F -test. The F -test is the ratio of variances (var l/var 2). If F differs significantly from 1.0, then it is concluded that the group variances differ significantly.

Analysis of variance

The Student's t -test cannot be used for comparison of three or more groups. The purpose of ANOVA is to test if there is any significant difference between the means of two or more groups.

In ANOVA, we study two variances – (a) between-group variability and (b) within-group variability. The within-group variability (error variance) is the variation that cannot be accounted for in the study design. It is based on random differences present in our samples.

However, the between-group (or effect variance) is the result of our treatment. These two estimates of variances are compared using the F-test.

A simplified formula for the F statistic is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g016.jpg

where MS b is the mean squares between the groups and MS w is the mean squares within groups.

Repeated measures analysis of variance

As with ANOVA, repeated measures ANOVA analyses the equality of means of three or more groups. However, a repeated measure ANOVA is used when all variables of a sample are measured under different conditions or at different points in time.

As the variables are measured from a sample at different points of time, the measurement of the dependent variable is repeated. Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures: The data violate the ANOVA assumption of independence. Hence, in the measurement of repeated dependent variables, repeated measures ANOVA should be used.

Non-parametric tests

When the assumptions of normality are not met, and the sample means are not normally, distributed parametric tests can lead to erroneous results. Non-parametric tests (distribution-free test) are used in such situation as they do not require the normality assumption.[ 15 ] Non-parametric tests may fail to detect a significant difference when compared with a parametric test. That is, they usually have less power.

As is done for the parametric tests, the test statistic is compared with known values for the sampling distribution of that statistic and the null hypothesis is accepted or rejected. The types of non-parametric analysis techniques and the corresponding parametric analysis techniques are delineated in Table 5 .

Analogue of parametric and non-parametric tests

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g017.jpg

Median test for one sample: The sign test and Wilcoxon's signed rank test

The sign test and Wilcoxon's signed rank test are used for median tests of one sample. These tests examine whether one instance of sample data is greater or smaller than the median reference value.

This test examines the hypothesis about the median θ0 of a population. It tests the null hypothesis H0 = θ0. When the observed value (Xi) is greater than the reference value (θ0), it is marked as+. If the observed value is smaller than the reference value, it is marked as − sign. If the observed value is equal to the reference value (θ0), it is eliminated from the sample.

If the null hypothesis is true, there will be an equal number of + signs and − signs.

The sign test ignores the actual values of the data and only uses + or − signs. Therefore, it is useful when it is difficult to measure the values.

Wilcoxon's signed rank test

There is a major limitation of sign test as we lose the quantitative information of the given data and merely use the + or – signs. Wilcoxon's signed rank test not only examines the observed values in comparison with θ0 but also takes into consideration the relative sizes, adding more statistical power to the test. As in the sign test, if there is an observed value that is equal to the reference value θ0, this observed value is eliminated from the sample.

Wilcoxon's rank sum test ranks all data points in order, calculates the rank sum of each sample and compares the difference in the rank sums.

Mann-Whitney test

It is used to test the null hypothesis that two samples have the same median or, alternatively, whether observations in one sample tend to be larger than observations in the other.

Mann–Whitney test compares all data (xi) belonging to the X group and all data (yi) belonging to the Y group and calculates the probability of xi being greater than yi: P (xi > yi). The null hypothesis states that P (xi > yi) = P (xi < yi) =1/2 while the alternative hypothesis states that P (xi > yi) ≠1/2.

Kolmogorov-Smirnov test

The two-sample Kolmogorov-Smirnov (KS) test was designed as a generic method to test whether two random samples are drawn from the same distribution. The null hypothesis of the KS test is that both distributions are identical. The statistic of the KS test is a distance between the two empirical distributions, computed as the maximum absolute difference between their cumulative curves.

Kruskal-Wallis test

The Kruskal–Wallis test is a non-parametric test to analyse the variance.[ 14 ] It analyses if there is any difference in the median values of three or more independent samples. The data values are ranked in an increasing order, and the rank sums calculated followed by calculation of the test statistic.

Jonckheere test

In contrast to Kruskal–Wallis test, in Jonckheere test, there is an a priori ordering that gives it a more statistical power than the Kruskal–Wallis test.[ 14 ]

Friedman test

The Friedman test is a non-parametric test for testing the difference between several related samples. The Friedman test is an alternative for repeated measures ANOVAs which is used when the same parameter has been measured under different conditions on the same subjects.[ 13 ]

Tests to analyse the categorical data

Chi-square test, Fischer's exact test and McNemar's test are used to analyse the categorical or nominal variables. The Chi-square test compares the frequencies and tests whether the observed data differ significantly from that of the expected data if there were no differences between groups (i.e., the null hypothesis). It is calculated by the sum of the squared difference between observed ( O ) and the expected ( E ) data (or the deviation, d ) divided by the expected data by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g018.jpg

A Yates correction factor is used when the sample size is small. Fischer's exact test is used to determine if there are non-random associations between two categorical variables. It does not assume random sampling, and instead of referring a calculated statistic to a sampling distribution, it calculates an exact probability. McNemar's test is used for paired nominal data. It is applied to 2 × 2 table with paired-dependent samples. It is used to determine whether the row and column frequencies are equal (that is, whether there is ‘marginal homogeneity’). The null hypothesis is that the paired proportions are equal. The Mantel-Haenszel Chi-square test is a multivariate test as it analyses multiple grouping variables. It stratifies according to the nominated confounding variables and identifies any that affects the primary outcome variable. If the outcome variable is dichotomous, then logistic regression is used.

SOFTWARES AVAILABLE FOR STATISTICS, SAMPLE SIZE CALCULATION AND POWER ANALYSIS

Numerous statistical software systems are available currently. The commonly used software systems are Statistical Package for the Social Sciences (SPSS – manufactured by IBM corporation), Statistical Analysis System ((SAS – developed by SAS Institute North Carolina, United States of America), R (designed by Ross Ihaka and Robert Gentleman from R core team), Minitab (developed by Minitab Inc), Stata (developed by StataCorp) and the MS Excel (developed by Microsoft).

There are a number of web resources which are related to statistical power analyses. A few are:

StatPages.net – provides links to a number of online power calculators
G-Power – provides a downloadable power analysis program that runs under DOS
Power analysis for ANOVA designs an interactive site that calculates power or sample size needed to attain a given power for one effect in a factorial ANOVA design
SPSS makes a program called SamplePower. It gives an output of a complete report on the computer screen which can be cut and paste into another document.

It is important that a researcher knows the concepts of the basic statistical methods used for conduct of a research study. This will help to conduct an appropriately well-designed study leading to valid and reliable results. Inappropriate use of statistical techniques may lead to faulty conclusions, inducing errors and undermining the significance of the article. Bad statistics may lead to bad research, and bad research may lead to unethical practice. Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important. An appropriate knowledge about the basic statistical methods will go a long way in improving the research designs and producing quality medical research which can be utilised for formulating the evidence-based guidelines.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

Amstat News
ASA Community
Practical Significance
ASA Leader HUB

Real World Data Science

Staff Directory
ASA Leader Hub
Code of Conduct
Board of Directors
Constitution
Strategic Plan
Council of Sections Governing Board
Council of Chapters Governing Board
Council of Sections
Council of Chapters
Individual Member Benefits
Membership Options
Membership for Organizations
Student Chapters
Sections & Interest Groups
Outreach Groups
Membership Campaigns
Membership Directory
Members Only
Classroom Resources
Publications
Guidelines and Reports
Professional Development
Student Competitions
Communities and Resources
Graduate Educators
Caucus of Academic Reps
Student Resources
Career Resources
Communities
Statistics and Biostatistics Programs
Internships and Fellowships
K-12 Student Outreach
K-12 Statistical Ambassador
Educational Ambassador
Statistics and Biostatistics Degree Data
COVID-19 Pandemic Resources
Education Publications
JSM Proceedings
Significance
ASA Member News
Joint Statistical Meetings
Conference on Statistical Practice
ASA Biopharmaceutical Section Regulatory-Industry Statistics Workshop
International Conference on Establishment Statistics
International Conference on Health Policy Statistics
Symposium on Data Science & Statistics
Women in Statistics and Data Science
Other Meetings
ASA Board Statements
Letters Signed/Sent
Resources for Policymakers
Federal Budget Information
Statistical Significance Series
Count on Stats
ASA Fellowships and Grants
Salary Information
External Funding Sources
Ethical Guidelines for Statistical Practice
Accreditation
Authorized Use of PSTAT® Mark
ASA Fellows
Student Paper Competitions
Awards and Scholarships

ASA Journals Online

Journal of the american statistical association, the american statistician, journal of agricultural, biological, and environmental statistics, journal of business & economic statistics, journal of computational and graphical statistics, journal of nonparametric statistics, statistical analysis and data mining: the asa data science journal, statistics in biopharmaceutical research, technometrics, asa open-access journals.

Data Science in Science

Journal of statistics and data science education .

Statistics and Public Policy

Statistics surveys, asa co-published journals, journal of educational and behavioral statistics, journal of quantitative analysis in sports.

SIAM/ASA Journal on Uncertainty Quantification

Journal of Survey Statistics and Methodology

Journal of Statistical Distributions and Applications Cover Image

Search by keyword
Search by citation

Page 1 of 3

A generalization to the log-inverse Weibull distribution and its applications in cancer research

In this paper we consider a generalization of a log-transformed version of the inverse Weibull distribution. Several theoretical properties of the distribution are studied in detail including expressions for i...

View Full Text

Approximations of conditional probability density functions in Lebesgue spaces via mixture of experts models

Mixture of experts (MoE) models are widely applied for conditional probability density estimation problems. We demonstrate the richness of the class of MoE models by proving denseness results in Lebesgue space...

Structural properties of generalised Planck distributions

A family of generalised Planck (GP) laws is defined and its structural properties explored. Sometimes subject to parameter restrictions, a GP law is a randomly scaled gamma law; it arises as the equilibrium la...

New class of Lindley distributions: properties and applications

A new generalized class of Lindley distribution is introduced in this paper. This new class is called the T -Lindley{ Y } class of distributions, and it is generated by using the quantile functions of uniform, expon...

Tolerance intervals in statistical software and robustness under model misspecification

A tolerance interval is a statistical interval that covers at least 100 ρ % of the population of interest with a 100(1− α ) % confidence, where ρ and α are pre-specified values in (0, 1). In many scientific fields, su...

Combining assumptions and graphical network into gene expression data analysis

Analyzing gene expression data rigorously requires taking assumptions into consideration but also relies on using information about network relations that exist among genes. Combining these different elements ...

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follo...

A general stochastic model for bivariate episodes driven by a gamma sequence

We propose a new stochastic model describing the joint distribution of ( X , N ), where N is a counting variable while X is the sum of N independent gamma random variables. We present the main properties of this gene...

A flexible multivariate model for high-dimensional correlated count data

We propose a flexible multivariate stochastic model for over-dispersed count data. Our methodology is built upon mixed Poisson random vectors ( Y 1 ,…, Y d ), where the { Y i } are conditionally independent Poisson random...

Generalized fiducial inference on the mean of zero-inflated Poisson and Poisson hurdle models

Zero-inflated and hurdle models are widely applied to count data possessing excess zeros, where they can simultaneously model the process from how the zeros were generated and potentially help mitigate the eff...

Multivariate distributions of correlated binary variables generated by pair-copulas

Correlated binary data are prevalent in a wide range of scientific disciplines, including healthcare and medicine. The generalized estimating equations (GEEs) and the multivariate probit (MP) model are two of ...

On two extensions of the canonical Feller–Spitzer distribution

We introduce two extensions of the canonical Feller–Spitzer distribution from the class of Bessel densities, which comprise two distinct stochastically decreasing one-parameter families of positive absolutely ...

A new trivariate model for stochastic episodes

We study the joint distribution of stochastic events described by ( X , Y , N ), where N has a 1-inflated (or deflated) geometric distribution and X , Y are the sum and the maximum of N exponential random variables. Mod...

A flexible univariate moving average time-series model for dispersed count data

Al-Osh and Alzaid ( 1988 ) consider a Poisson moving average (PMA) model to describe the relation among integer-valued time series data; this model, however, is constrained by the underlying equi-dispersion assumpt...

Spatio-temporal analysis of flood data from South Carolina

To investigate the relationship between flood gage height and precipitation in South Carolina from 2012 to 2016, we built a conditional autoregressive (CAR) model using a Bayesian hierarchical framework. This ...

Affine-transformation invariant clustering models

We develop a cluster process which is invariant with respect to unknown affine transformations of the feature space without knowing the number of clusters in advance. Specifically, our proposed method can iden...

Distributions associated with simultaneous multiple hypothesis testing

We develop the distribution for the number of hypotheses found to be statistically significant using the rule from Simes (Biometrika 73: 751–754, 1986) for controlling the family-wise error rate (FWER). We fin...

New families of bivariate copulas via unit weibull distortion

This paper introduces a new family of bivariate copulas constructed using a unit Weibull distortion. Existing copulas play the role of the base or initial copulas that are transformed or distorted into a new f...

Generalized logistic distribution and its regression model

A new generalized asymmetric logistic distribution is defined. In some cases, existing three parameter distributions provide poor fit to heavy tailed data sets. The proposed new distribution consists of only t...

The spherical-Dirichlet distribution

Today, data mining and gene expressions are at the forefront of modern data analysis. Here we introduce a novel probability distribution that is applicable in these fields. This paper develops the proposed sph...

Item fit statistics for Rasch analysis: can we trust them?

To compare fit statistics for the Rasch model based on estimates of unconditional or conditional response probabilities.

Exact distributions of statistics for making inferences on mixed models under the default covariance structure

At this juncture when mixed models are heavily employed in applications ranging from clinical research to business analytics, the purpose of this article is to extend the exact distributional result of Wald (A...

A new discrete pareto type (IV) model: theory, properties and applications

Discrete analogue of a continuous distribution (especially in the univariate domain) is not new in the literature. The work of discretizing continuous distributions begun with the paper by Nakagawa and Osaki (197...

Density deconvolution for generalized skew-symmetric distributions

The density deconvolution problem is considered for random variables assumed to belong to the generalized skew-symmetric (GSS) family of distributions. The approach is semiparametric in that the symmetric comp...

The unifed distribution

We introduce a new distribution with support on (0,1) called unifed. It can be used as the response distribution for a GLM and it is suitable for data aggregation. We make a comparison to the beta regression. ...

On Burr III Marshal Olkin family: development, properties, characterizations and applications

In this paper, a flexible family of distributions with unimodel, bimodal, increasing, increasing and decreasing, inverted bathtub and modified bathtub hazard rate called Burr III-Marshal Olkin-G (BIIIMO-G) fam...

The linearly decreasing stress Weibull (LDSWeibull): a new Weibull-like distribution

Motivated by an engineering pullout test applied to a steel strip embedded in earth, we show how the resulting linearly decreasing force leads naturally to a new distribution, if the force under constant stress i...

Meta analysis of binary data with excessive zeros in two-arm trials

We present a novel Bayesian approach to random effects meta analysis of binary data with excessive zeros in two-arm trials. We discuss the development of likelihood accounting for excessive zeros, the prior, a...

On ( p 1 ,…, p k )-spherical distributions

The class of ( p 1 ,…, p k )-spherical probability laws and a method of simulating random vectors following such distributions are introduced using a new stochastic vector representation. A dynamic geometric disintegra...

A new class of survival distribution for degradation processes subject to shocks

Many systems experience gradual degradation while simultaneously being exposed to a stream of random shocks of varying magnitudes that eventually cause failure when a shock exceeds the residual strength of the...

A new extended normal regression model: simulations and applications

Various applications in natural science require models more accurate than well-known distributions. In this context, several generators of distributions have been recently proposed. We introduce a new four-par...

Multiclass analysis and prediction with network structured covariates

Technological advances associated with data acquisition are leading to the production of complex structured data sets. The recent development on classification with multiclass responses makes it possible to in...

High-dimensional star-shaped distributions

Stochastic representations of star-shaped distributed random vectors having heavy or light tail density generating function g are studied for increasing dimensions along with corresponding geometric measure repre...

A unified complex noncentral Wishart type distribution inspired by massive MIMO systems

The eigenvalue distributions from a complex noncentral Wishart matrix S = X H X has been the subject of interest in various real world applications, where X is assumed to be complex matrix variate normally distribute...

Particle swarm based algorithms for finding locally and Bayesian D -optimal designs

When a model-based approach is appropriate, an optimal design can guide how to collect data judiciously for making reliable inference at minimal cost. However, finding optimal designs for a statistical model w...

Admissible Bernoulli correlations

A multivariate symmetric Bernoulli distribution has marginals that are uniform over the pair {0,1}. Consider the problem of sampling from this distribution given a prescribed correlation between each pair of v...

On p -generalized elliptical random processes

We introduce rank- k -continuous axis-aligned p -generalized elliptically contoured distributions and study their properties such as stochastic representations, moments, and density-like representations. Applying th...

Parameters of stochastic models for electroencephalogram data as biomarkers for child’s neurodevelopment after cerebral malaria

The objective of this study was to test statistical features from the electroencephalogram (EEG) recordings as predictors of neurodevelopment and cognition of Ugandan children after coma due to cerebral malari...

A new generalization of generalized half-normal distribution: properties and regression models

In this paper, a new extension of the generalized half-normal distribution is introduced and studied. We assess the performance of the maximum likelihood estimators of the parameters of the new distribution vi...

Analytical properties of generalized Gaussian distributions

The family of Generalized Gaussian (GG) distributions has received considerable attention from the engineering community, due to the flexible parametric form of its probability density function, in modeling ma...

A new Weibull- X family of distributions: properties, characterizations and applications

We propose a new family of univariate distributions generated from the Weibull random variable, called a new Weibull-X family of distributions. Two special sub-models of the proposed family are presented and t...

The transmuted geometric-quadratic hazard rate distribution: development, properties, characterizations and applications

We propose a five parameter transmuted geometric quadratic hazard rate (TG-QHR) distribution derived from mixture of quadratic hazard rate (QHR), geometric and transmuted distributions via the application of t...

A nonparametric approach for quantile regression

Quantile regression estimates conditional quantiles and has wide applications in the real world. Estimating high conditional quantiles is an important problem. The regular quantile regression (QR) method often...

Mean and variance of ratios of proportions from categories of a multinomial distribution

Ratio distribution is a probability distribution representing the ratio of two random variables, each usually having a known distribution. Currently, there are results when the random variables in the ratio fo...

The power-Cauchy negative-binomial: properties and regression

We propose and study a new compounded model to extend the half-Cauchy and power-Cauchy distributions, which offers more flexibility in modeling lifetime data. The proposed model is analytically tractable and c...

Families of distributions arising from the quantile of generalized lambda distribution

In this paper, the class of T-R { generalized lambda } families of distributions based on the quantile of generalized lambda distribution has been proposed using the T-R { Y } framework. In the development of the T - R {

Risk ratios and Scanlan’s HRX

Risk ratios are distribution function tail ratios and are widely used in health disparities research. Let A and D denote advantaged and disadvantaged populations with cdfs F ...

Joint distribution of k -tuple statistics in zero-one sequences of Markov-dependent trials

We consider a sequence of n , n ≥3, zero (0) - one (1) Markov-dependent trials. We focus on k -tuples of 1s; i.e. runs of 1s of length at least equal to a fixed integer number k , 1≤ k ≤ n . The statistics denoting the n...

Quantile regression for overdispersed count data: a hierarchical method

Generalized Poisson regression is commonly applied to overdispersed count data, and focused on modelling the conditional mean of the response. However, conditional mean regression models may be sensitive to re...

Describing the Flexibility of the Generalized Gamma and Related Distributions

The generalized gamma (GG) distribution is a widely used, flexible tool for parametric survival analysis. Many alternatives and extensions to this family have been proposed. This paper characterizes the flexib...

ISSN: 2195-5832 (electronic)

How to Write Statistics Research Paper | Easy Guide

A statistics research paper is an academic document presenting original findings or analyses derived from the data’s collection, organization, analysis, and interpretation. It addresses research questions or hypotheses within the field of statistics.

As a rule, college students get such papers assigned during a semester to assess their knowledge of statistics. However, any statistician specialist can also write research papers and publish them in academic journals, thus developing and promoting this field.

Want to master the art of statistics research paper writing?

We’ve got expert tips from a professional research paper writing service on crafting such studies. In this article, you’ll find a step-by-step guide on writing a statistics research paper that your educators, colleagues, or clients will approve.

How to write a statistics research paper: Steps

Table of Contents

State the problem

Collect the data, write an introductory paragraph, craft an abstract, describe your methodology, present your findings: evaluate and illustrate, revise and proofread.

Research papers aren’t about describing the existing knowledge on the topic. You should state your intellectual concern with it, indicating why it’s worth studying. When choosing the problem you’ll research in the paper, emphasize its ongoing nature:

What have other researchers already studied about it? Cite at least one previous publication related to your research and provide your statistical motivation to continue researching the topic. (You’ll refer to those researches in footnotes or within the text of your paper.)

Once you have the topic (problem) to research in your paper, it’s time to collect sources you’ll use as evidence and references. For statistics papers, consider the following:

Published research from experts in Statistics (academic journals, newspapers, books, online publications, etc.)
Statistical data from reliable sources (Google’s Public Data and Scholar, FedStats, and others)
Your personal hypothesis, experiments, and info-gathering activities

The last one is a must-have! Your statistics research paper requires new information gathered by you as a researcher and not previously published anywhere. The massive block of your research paper will be about the data collection methods you used to investigate the problem and come to the conclusions you’ll provide.

Some underestimate the introductory paragraphs of research papers , but they are wrong. The introduction is the first thing a reader sees to understand if your research is worth their attention and time. With that in mind, ensure your introductory paragraph is intriguing yet informative enough for the audience to continue reading.

Start with a writing hook, a sentence grabbing a reader’s attention. Also, an intro needs background information: your topic and the scientific motivation for the new research methods. (What’s wrong with existing ones? Or, what do they miss?) Finally, move on to your thesis statement: 1-2 sentences summarizing the primary idea behind your research paper.

It’s an overview of your statistics research paper where you establish notation and outline the methods and the results. Abstracts are integral for all academic studies and research, giving readers enough details to decide whether your paper is relevant to them.

What do you include in an abstract?

Introduce your topic and explain why it’s significant in your field. State the gap present in the research at the moment and reveal the aim of your paper. Then, briefly describe your research methods and approach, summarize your findings, and explain their contribution to the field.

The methods section is the most extensive one in your research paper. Here, you provide sufficient information about how you collected data for your research, what methodologies you used, and how you evaluated the results.

Be specific; describe everything so the audience can repeat your research (experiments) and reconstruct your results. It’s the value your paper brings to the academic community.

Further paragraphs of your research paper present your findings. Try to stick to one idea per paragraph to make it clear and easy for readers to consume.

Prepare and add supporting materials that will help you illustrate findings: graphs, diagrams, charts, tables, etc. You can use them in a paper’s body or add them as an appendix; these elements will support your points and serve as extra proof for your findings.

Last but not least:

Write a concluding paragraph for your statistics research paper. Repeat your thesis, summarize your findings, and conclude whether they have proved or contradicted your initial theory (hypothesis). Also, you can make suggestions for further research in the same area.

Re-read your paper several times before publishing or submitting it for review. Ensure all the information is logical and coherent, all the terms are correct, and all the elements are present and accurately placed.

Also, proofread your final draft: Spelling, grammar, and punctuation mistakes are a no-no here! Re-check the list of references again; ensure you follow the required citation style and use the proper format.

So, now you know seven easy steps for writing a statistical research paper. Whether you’re a college student or a statistician willing to make a scientific contribution to a niche, follow them to craft a professionally structured academic document:

State a problem, choose methods of analyzing it, evaluate your findings, and illustrate them to engage the audience in discussion.

If you still need clarification or have questions about writing a statistics paper, don’t hesitate to ask for assistance. Professional writers with experience in statistics are ready to help you improve your writing skills.

Step by Step Guide on The Best Way to Finance Car

The Best Way on How to Get Fund For Business to Grow it Efficiently

IMAGES

Statistical Report Writing Sample No.4. Introduction
Sample of Chapter 3 in Research Paper (Probability and Statistics
(PDF) A case study report on integrating statistics, problem-based
📗 Central Limit Theorem in Statistical Analysis Paper Example
📚 Statistics Research Report Example
📚 Descriptive Statistics: Summarizing Research Data Into Graphs

VIDEO

HOW TO DOWNLOAD ANY RESEARCH PAPERS FREE
5th International Conference on Natural Language Processing and Machine Learning NLPML 2024
5 Best Free Statistics Sites in 2022
Call for Paper ~SEOA ~ February Dubai 2024
Never Use AI| turnitin class id
Journal Guideline|

COMMENTS

Home
Overview. Statistical Papers is a forum for presentation and critical assessment of statistical methods encouraging the discussion of methodological foundations and potential applications. The Journal stresses statistical methods that have broad applications, giving special attention to those relevant to the economic and social sciences.
Research Papers / Publications
Department of Statistics and Data Science. Menu. Home; Faculty. Faculty List; Faculty Awards; Recruiting. Tenure-track or Tenured Faculty Position(s) Research. ... Research Papers / Publications. Search Publication Type Publication Year Behrad Moniri, Seyed Hamed Hassani, Edgar Dobriban, ...
Statistics
Identification of CT radiomic features robust to acquisition and segmentation variations for improved prediction of radiotherapy-treated lung cancer patient recurrence. Thomas Louis. , François ...
arXiv.org e-Print archive
arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. Materials on this site are not peer-reviewed by arXiv.
Introduction to Research Statistical Analysis: An Overview of the
Introduction. Statistical analysis is necessary for any research project seeking to make quantitative conclusions. The following is a primer for research-based statistical analysis. It is intended to be a high-level overview of appropriate statistical testing, while not diving too deep into any specific methodology.
Statistics
Statistics is the application of mathematical concepts to understanding and analysing large collections of data. A central tenet of statistics is to describe the variations in a data set or ...
The Beginner's Guide to Statistical Analysis
Table of contents. Step 1: Write your hypotheses and plan your research design. Step 2: Collect data from a sample. Step 3: Summarize your data with descriptive statistics. Step 4: Test hypotheses or make estimates with inferential statistics.
Statistics for Research Students
The textbook covers all necessary areas and topics for students who want to conduct research in statistics. It includes foundational concepts, application methods, and advanced statistical techniques relevant to research methodologies. read more. Reviewed by Zhuanzhuan Ma, Assistant Professor, University of Texas Rio Grande Valley on 3/7/24 ...
ResearchGate
Access 160+ million publications and connect with 25+ million researchers. Join for free and gain visibility by uploading your research.
Journal of Probability and Statistics
Journal of Probability and Statistics publishes papers on the theory and application of probability and statistics that consider new methods and approaches to their ... Most Cited; Research Article. Open access. Evaluate Group Sequential Design Sample Sizes for Reference‐Scaled Average Bioequivalence Based on Monte Carlo Simulations in Highly ...
(PDF) Data Science: the impact of statistics
In this paper, we substantiate our premise that statistics is one of the most important disciplines to provide tools and methods. to ﬁnd structure in and to give deeper insight into data, and ...
Research in Statistics
Taylor & Francis are currently supporting a 100% APC discount for all authors. Research in Statistics is a broad open access journal publishing original research in all areas of statistics and probability.The journal focuses on broadening existing research fields, and in facilitating international collaboration, and is devoted to the international advancement of the theory and application of ...
Statistical Sources
A listing of databases free on the web for anyone. All of the databases listed below were selected by CSULB subject librarians. ... Includes detailed statistics on nearly every metropolitan area in the US. Based on 2000 Census data. Explore Census Data. Databases provided by the Bureau of the Census including: Annual Survey of Manufactures ...
Basic statistical tools in research and data analysis
Abstract. Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise ...
Statistics
Statistics is a leading international research journal that publishes high-quality research articles which develop new theory, methods and applications in any active field of statistics and statistical learning. Papers submitted for consideration should provide novel contributions to statistical theory, with rigorous mathematical proofs; or relevant statistical applications, with well ...
Journal of Applied Statistics
The Journal publishes original research papers, review articles, and short application notes. ... The Journal of Applied Statistics Best Paper Prize is awarded annually, as decided by the Editor-in-Chief with the support of the Associate Editors. The winning article receives a £500 prize, and their paper will be made free to view for the ...
Journals
Statistics in Biopharmaceutical Research Papers in this journal discuss appropriate statistical methodology and information regarding the use of statistics in all phases of research, development, and practice in the pharmaceutical, biopharmaceutical, device, and diagnostics industries. ... Toll-free: (888) 231-3473 Fax: (703) 997-7299 Email ...
Articles
Several theoretical properties of the distribution are studied in detail including expressions for i... C. Satheesh Kumar and Subha R. Nair. Journal of Statistical Distributions and Applications 2021 8 :14. Research Published on: 12 December 2021. Full Text. PDF.
These are the statistics papers you just have to read
While none of these papers actually need to be read, I really think it might help statistics Ph.D. students to get a sense of the gap between research practice and statistical theory and the problems and efforts of communication between statisticians and applied scientists.
How to Write Statistics Research Paper
Prepare and add supporting materials that will help you illustrate findings: graphs, diagrams, charts, tables, etc. You can use them in a paper's body or add them as an appendix; these elements will support your points and serve as extra proof for your findings. Last but not least: Write a concluding paragraph for your statistics research paper.
Statistics: Vol 58, No 2 (Current issue)
Explore the current issue of Statistics, Volume 58, Issue 2, 2024. Browse; Search. Close search. Publish. ... Free Access. Two-step online estimation and inference for expected shortfall regression with streaming data ... Register to receive personalised research and resources by email. Sign me up. Taylor and Francis Group Facebook page.

Statistical Papers

Latest issue

Latest articles

Multiple testing of interval composite null hypotheses using randomized p -values

On the statistical analysis of high-dimensional factor models

A new integrated discrimination improvement index via odds

Exact distribution of change-point MLE for a Multivariate normal sequence

Journal updates

Journal information

Research Papers / Publications

Mathematics

Computer Science

Quantitative Biology

Quantitative Finance

Electrical Engineering and Systems Science

About arXiv

Statistics articles from across Nature Portfolio

Latest Research and Reviews

Employees’ pro-environmental behavior in an organization: a case study in the UAE

The predictive capability of several anthropometric indices for identifying the risk of metabolic syndrome and its components among industrial workers

A scalable synergy-first backbone decomposition of higher-order structures in complex systems

A bayesian spatio-temporal dynamic analysis of food security in Africa

Research on the influencing factors of promoting flipped classroom teaching based on the integrated UTAUT model and learning engagement theory

Peak response regularization for localization

News and Comment

Efficient learning of many-body systems

Fudging the volcano-plot without dredging the data

Disentangling truth from bias in naturally occurring data

Sciama’s argument on life in a random universe and distinguishing apples from oranges

A method for generating constrained surrogate power laws

Connected climate tipping elements

Quick links

Have a language expert improve your writing

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Table of contents

Writing statistical hypotheses

Planning your research design

Measuring variables

Here's why students love Scribbr's proofreading services

Sampling for statistical analysis

Create an appropriate sampling procedure

Calculate sufficient sample size

Inspect your data

Calculate measures of central tendency

Calculate measures of variability

Hypothesis testing

Parametric tests

Receive feedback on language, structure, and formatting

Statistical significance

Effect size

Decision errors

Frequentist versus Bayesian statistics

Is this article helpful?

More interesting articles

What is your plagiarism score?

Free Databases (all subjects): Statistical Sources

Statistical Sources

Basic statistical tools in research and data analysis

S Bala Bhaskar

INTRODUCTION

Quantitative variables

STATISTICS: DESCRIPTIVE AND INFERENTIAL STATISTICS

Descriptive statistics

Measures of central tendency

Normal distribution or Gaussian distribution

Skewed distribution

Inferential statistics

PARAMETRIC AND NON-PARAMETRIC TESTS

Parametric tests

Non-parametric tests

Tests to analyse the categorical data

SOFTWARES AVAILABLE FOR STATISTICS, SAMPLE SIZE CALCULATION AND POWER ANALYSIS

Financial support and sponsorship

ASA Journals Online

Data Science in Science

Statistics and Public Policy

SIAM/ASA Journal on Uncertainty Quantification

Journal of Survey Statistics and Methodology

A generalization to the log-inverse Weibull distribution and its applications in cancer research

Approximations of conditional probability density functions in Lebesgue spaces via mixture of experts models