problem solving assessment statistics

ORIGINAL RESEARCH article

Statistical analysis of complex problem-solving process data: an event history analysis approach.

$\r\nYunxiao Chen*$

1 Department of Statistics, London School of Economics and Political Science, London, United Kingdom
2 School of Statistics, University of Minnesota, Minneapolis, MN, United States
3 Department of Statistics, Columbia University, New York, NY, United States

Complex problem-solving (CPS) ability has been recognized as a central 21st century skill. Individuals' processes of solving crucial complex problems may contain substantial information about their CPS ability. In this paper, we consider the prediction of duration and final outcome (i.e., success/failure) of solving a complex problem during task completion process, by making use of process data recorded in computer log files. Solving this problem may help answer questions like “how much information about an individual's CPS ability is contained in the process data?,” “what CPS patterns will yield a higher chance of success?,” and “what CPS patterns predict the remaining time for task completion?” We propose an event history analysis model for this prediction problem. The trained prediction model may provide us a better understanding of individuals' problem-solving patterns, which may eventually lead to a good design of automated interventions (e.g., providing hints) for the training of CPS ability. A real data example from the 2012 Programme for International Student Assessment (PISA) is provided for illustration.

1. Introduction

Complex problem-solving (CPS) ability has been recognized as a central 21st century skill of high importance for several outcomes including academic achievement ( Wüstenberg et al., 2012 ) and workplace performance ( Danner et al., 2011 ). It encompasses a set of higher-order thinking skills that require strategic planning, carrying out multi-step sequences of actions, reacting to a dynamically changing system, testing hypotheses, and, if necessary, adaptively coming up with new hypotheses. Thus, there is almost no doubt that an individual's problem-solving process data contain substantial amount of information about his/her CPS ability and thus are worth analyzing. Meaningful information extracted from CPS process data may lead to better understanding, measurement, and even training of individuals' CPS ability.

Problem-solving process data typically have a more complex structure than that of panel data which are traditionally more commonly encountered in statistics. Specifically, individuals may take different strategies toward solving the same problem. Even for individuals who take the same strategy, their actions and time-stamps of the actions may be very different. Due to such heterogeneity and complexity, classical regression and multivariate data analysis methods cannot be straightforwardly applied to CPS process data.

Possibly due to the lack of suitable analytic tools, research on CPS process data is limited. Among the existing works, none took a prediction perspective. Specifically, Greiff et al. (2015) presented a case study, showcasing the strong association between a specific strategic behavior (identified by expert knowledge) in a CPS task from the 2012 Programme for International Student Assessment (PISA) and performance both in this specific task and in the overall PISA problem-solving score. He and von Davier (2015 , 2016) proposed an N-gram method from natural language processing for analyzing problem-solving items in technology-rich environments, focusing on identifying feature sequences that are important to task completion. Vista et al. (2017) developed methods for the visualization and exploratory analysis of students' behavioral pathways, aiming to detect action sequences that are potentially relevant for establishing particular paths as meaningful markers of complex behaviors. Halpin and De Boeck (2013) and Halpin et al. (2017) adopted a Hawkes process approach to analyzing collaborative problem-solving items, focusing on the psychological measurement of collaboration. Xu et al. (2018) proposed a latent class model that analyzes CPS patterns by classifying individuals into latent classes based on their problem-solving processes.

In this paper, we propose to analyze CPS process data from a prediction perspective. As suggested in Yarkoni and Westfall (2017) , an increased focus on prediction can ultimately lead us to greater understanding of human behavior. Specifically, we consider the simultaneous prediction of the duration and the final outcome (i.e., success/failure) of solving a complex problem based on CPS process data. Instead of a single prediction, we hope to predict at any time during the problem-solving process. Such a data-driven prediction model may bring us insights about individuals' CPS behavioral patterns. First, features that contribute most to the prediction may correspond to important strategic behaviors that are key to succeeding in a task. In this sense, the proposed method can be used as an exploratory data analysis tool for extracting important features from process data. Second, the prediction accuracy may also serve as a measure of the strength of the signal contained in process data that reflects one's CPS ability, which reflects the reliability of CPS tasks from a prediction perspective. Third, for low stake assessments, the predicted chance of success may be used to give partial credits when scoring task takers. Fourth, speed is another important dimension of complex problem solving that is closely associated with the final outcome of task completion ( MacKay, 1982 ). The prediction of the duration throughout the problem-solving process may provide us insights on the relationship between the CPS behavioral patterns and the CPS speed. Finally, the prediction model also enables us to design suitable interventions during their problem-solving processes. For example, a hint may be provided when a student is predicted having a high chance to fail after sufficient efforts.

More precisely, we model the conditional distribution of duration time and final outcome given the event history up to any time point. This model can be viewed as a special event history analysis model, a general statistical framework for analyzing the expected duration of time until one or more events happen (see e.g., Allison, 2014 ). The proposed model can be regarded as an extension to the classical regression approach. The major difference is that the current model is specified over a continuous-time domain. It consists of a family of conditional models indexed by time, while the classical regression approach does not deal with continuous-time information. As a result, the proposed model supports prediction at any time during one's problem-solving process, while the classical regression approach does not. The proposed model is also related to, but substantially different from response time models (e.g., van der Linden, 2007 ) which have received much attention in psychometrics in recent years. Specifically, response time models model the joint distribution of response time and responses to test items, while the proposed model focuses on the conditional distribution of CPS duration and final outcome given the event history.

Although the proposed method learns regression-type models from data, it is worth emphasizing that we do not try to make statistical inference, such as testing whether a specific regression coefficient is significantly different from zero. Rather, the selection and interpretation of the model are mainly justified from a prediction perspective. This is because statistical inference tends to draw strong conclusions based on strong assumptions on the data generation mechanism. Due to the complexity of CPS process data, a statistical model may be severely misspecified, making valid statistical inference a big challenge. On the other hand, the prediction framework requires less assumptions and thus is more suitable for exploratory analysis. More precisely, the prediction framework admits the discrepancy between the underlying complex data generation mechanism and the prediction model ( Yarkoni and Westfall, 2017 ). A prediction model aims at achieving a balance between the bias due to this discrepancy and the variance due to a limited sample size. As a price, findings from the predictive framework are preliminary and only suggest hypotheses for future confirmatory studies.

The rest of the paper is organized as follows. In Section 2, we describe the structure of complex problem-solving process data and then motivate our research questions, using a CPS item from PISA 2012 as an example. In Section 3, we formulate the research questions under a statistical framework, propose a model, and then provide details of estimation and prediction. The introduced model is illustrated through an application to an example item from PISA 2012 in Section 4. We discuss limitations and future directions in Section 5.

2. Complex Problem-Solving Process Data

2.1. a motivating example.

We use a specific CPS item, CLIMATE CONTROL (CC) 1 , to demonstrate the data structure and to motivate our research questions. It is part of a CPS unit in PISA 2012 that was designed under the “MicroDYN” framework ( Greiff et al., 2012 ; Wüstenberg et al., 2012 ), a framework for the development of small dynamic systems of causal relationships for assessing CPS.

In this item, students are instructed to manipulate the panel (i.e., to move the top, central, and bottom control sliders; left side of Figure 1A ) and to answer how the input variables (control sliders) are related to the output variables (temperature and humidity). Specifically, the initial position of each control slider is indicated by a triangle “▴.” The students can change the top, central and bottom controls on the left of Figure 1 by using the sliders. By clicking “APPLY,” they will see the corresponding changes in temperature and humidity. After exploration, the students are asked to draw lines in a diagram ( Figure 1B ) to answer what each slider controls. The item is considered correctly answered if the diagram is correctly completed. The problem-solving process for this item is that the students must experiment to determine which controls have an impact on temperature and which on humidity, and then represent the causal relations by drawing arrows between the three inputs (top, central, and bottom control sliders) and the two outputs (temperature and humidity).

Figure 1. (A) Simulation environment of CC item. (B) Answer diagram of CC item.

PISA 2012 collected students' problem-solving process data in computer log files, in the form of a sequence of time-stamped events. We illustrate the structure of data in Table 1 and Figure 2 , where Table 1 tabulates a sequence of time-stamped events from a student and Figure 2 visualizes the corresponding event time points on a time line. According to the data, 14 events were recorded between time 0 (start) and 61.5 s (success). The first event happened at 29.5 s that was clicking “APPLY” after the top, central, and bottom controls were set at 2, 0, and 0, respectively. A sequence of actions followed the first event and finally at 58, 59.1, and 59.6 s, a final answer was correctly given using the diagram. It is worth clarifying that this log file does not collect all the interactions between a student and the simulated system. That is, the status of the control sliders is only recorded in the log file, when the “APPLY” button is clicked.

Table 1 . An example of computer log file data from CC item in PISA 2012.

Figure 2 . Visualization of the structure of process data from CC item in PISA 2012.

The process data for solving a CPS item typically have two components, knowledge acquisition and knowledge application, respectively. This CC item mainly focuses the former, which includes learning the causal relationships between the inputs and the outputs and representing such relationships by drawing the diagram. Since data on representing the causal relationship is relatively straightforward, in the rest of the paper, we focus on the process data related to knowledge acquisition and only refer a student's problem-solving process to his/her process of exploring the air conditioner, excluding the actions involving the answer diagram.

Intuitively, students' problem-solving processes contain information about their complex problem-solving ability, whether in the context of the CC item or in a more general sense of dealing with complex tasks in practice. However, it remains a challenge to extract meaningful information from their process data, due to the complex data structure. In particular, the occurrences of events are heterogeneous (i.e., different people can have very different event histories) and unstructured (i.e., there is little restriction on the order and time of the occurrences). Different students tend to have different problem-solving trajectories, with different actions taken at different time points. Consequently, time series models, which are standard statistical tools for analyzing dynamic systems, are not suitable here.

2.2. Research Questions

We focus on two specific research questions. Consider an individual solving a complex problem. Given that the individual has spent t units of time and has not yet completed the task, we would like to ask the following two questions based on the information at time t : How much additional time does the individual need? And will the individual succeed or fail upon the time of task completion?

Suppose we index the individual by i and let T i be the total time of task completion and Y i be the final outcome. Moreover, we denote H i ( t ) = ( h i 1 ( t ) , ... , h i p ( t ) ) ⊤ as a p -vector function of time t , summarizing the event history of individual i from the beginning of task to time t . Each component of H i ( t ) is a feature constructed from the event history up to time t . Taking the above CC item as an example, components of H i ( t ) may be, the number of actions a student has taken, whether all three control sliders have been explored, the frequency of using the reset button, etc., up to time t . We refer to H i ( t ) as the event history process of individual i . The dimension p may be high, depending on the complexity of the log file.

With the above notation, the two questions become to simultaneously predict T i and Y i based on H i ( t ). Throughout this paper, we focus on the analysis of data from a single CPS item. Extensions of the current framework to multiple-item analysis are discussed in Section 5.

3. Proposed Method

3.1. a regression model.

We now propose a regression model to answer the two questions raised in Section 2.2. We specify the marginal conditional models of Y i and T i given H i ( t ) and T i > t , respectively. Specifically, we assume

where Φ is the cumulative distribution function of a standard normal distribution. That is, Y i is assumed to marginally follow a probit regression model. In addition, only the conditional mean and variance are assumed for log( T i − t ). Our model parameters include the regression coefficients B = ( b jk )2 × p and conditional variance σ 2 . Based on the above model specification, a pseudo-likelihood function will be devived in Section 3.3 for parameter estimation.

Although only marginal models are specified, we point out that the model specifications (1) through (3) impose quite strong assumptions. As a result, the model may not most closely approximate the data-generating process and thus a bias is likely to exist. On the other hand, however, it is a working model that leads to reasonable prediction and can be used as a benchmark model for this prediction problem in future investigations.

We further remark that the conditional variance of log( T i − t ) is time-invariant under the current specification, which can be further relaxed to be time-dependent. In addition, the regression model for response time is closely related to the log-normal model for response time analysis in psychometrics (e.g., van der Linden, 2007 ). The major difference is that the proposed model is not a measurement model disentangling item and person effects on T i and Y i .

3.2. Prediction

Under the model in Section 3.1, given the event history, we predict the final outcome based on the success probability Φ( b 11 h i 1 ( t ) + ⋯ + b 1 p h ip ( t )). In addition, based on the conditional mean of log( T i − t ), we predict the total time at time t by t + exp( b 21 h i 1 ( t ) + ⋯ + b 2 p h ip ( t )). Given estimates of B from training data, we can predict the problem-solving duration and final outcome at any t for an individual in the testing sample, throughout his/her entire problem-solving process.

3.3. Parameter Estimation

It remains to estimate the model parameters based on a training dataset. Let our data be (τ i , y i ) and { H i ( t ): t ≥ 0}, i = 1, …, N , where τ i and y i are realizations of T i and Y i , and { H i ( t ): t ≥ 0} is the entire event history.

We develop estimating equations based on a pseudo likelihood function. Specifically, the conditional distribution of Y i given H i ( t ) and T i > t can be written as

where b 2 = ( b 11 , ... , b 1 p ) ⊤ . In addition, using the log-normal model as a working model for T i − t , the corresponding conditional distribution of T i can be written as

where b 2 = ( b 21 , ... , b 2 p ) ⊤ . The pseudo-likelihood is then written as

where t 1 , …, t J are J pre-specified grid points that spread out over the entire time spectrum. The choice of the grid points will be discussed in the sequel. By specifying the pseudo-likelihood based on the sequence of time points, the prediction at different time is taken into accounting in the estimation. We estimate the model parameters by maximizing the pseudo-likelihood function L ( B , σ).

In fact, (5) can be factorized into

Therefore, b 1 is estimated by maximizing L 1 ( b 1 ), which takes the form of a likelihood function for probit regression. Similarly, b 2 and σ are estimated by maximizing L 2 ( b 2 , σ), which is equivalent to solving the following estimation equations,

The estimating equations (8) and (9) can also be derived directly based on the conditional mean and variance specification of log( T i − t ). Solving these equations is equivalent to solving a linear regression problem, and thus is computationally easy.

3.4. Some Remarks

We provide a few remarks. First, choosing suitable features into H i ( t ) is important. The inclusion of suitable features not only improves the prediction accuracy, but also facilitates the exploratory analysis and interpretation of how behavioral patterns affect CPS result. If substantive knowledge about a CPS task is available from cognition theory, one may choose features that indicate different strategies toward solving the task. Otherwise, a data-driven approach may be taken. That is, one may select a model from a candidate list based on certain cross-validation criteria, where, if possible, all reasonable features should be consider as candidates. Even when a set of features has been suggested by cognition theory, one can still take the data-driven approach to find additional features, which may lead to new findings.

Second, one possible extension of the proposed model is to allow the regression coefficients to be a function of time t , whereas they are independent of time under the current model. In that case, the regression coefficients become functions of time, b jk ( t ). The current model can be regarded as a special case of this more general model. In particular, if b jk ( t ) has high variation along time in the best predictive model, then simply applying the current model may yield a high bias. Specifically, in the current estimation procedure, a larger grid point tends to have a smaller sample size and thus contributes less to the pseudo-likelihood function. As a result, a larger bias may occur in the prediction at a larger time point. However, the estimation of the time-dependent coefficient is non-trivial. In particular, constraints should be imposed on the functional form of b jk ( t ) to ensure a certain level of smoothness over time. As a result, b jk ( t ) can be accurately estimated using information from a finite number of time points. Otherwise, without any smoothness assumptions, to predict at any time during one's problem-solving process, there are an infinite number of parameters to estimate. Moreover, when a regression coefficient is time-dependent, its interpretation becomes more difficult, especially if the sign changes over time.

Third, we remark on the selection of grid points in the estimation procedure. Our model is specified in a continuous time domain that supports prediction at any time point in a continuum during an individual's problem-solving process. The use of discretized grid points is a way to approximate the continuous-time system, so that estimation equations can be written down. In practice, we suggest to place the grid points based on the quantiles of the empirical distribution of duration based on the training set. See the analysis in Section 4 for an illustration. The number of grid points may be further selected by cross validation. We also point out that prediction can be made at any time point on the continuum, not limited to the grid points for parameter estimation.

4. An Example from PISA 2012

4.1. background.

In what follows, we illustrate the proposed method via an application to the above CC item 2 . This item was also analyzed in Greiff et al. (2015) and Xu et al. (2018) . The dataset was cleaned from the entire released dataset of PISA 2012. It contains 16,872 15-year-old students' problem-solving processes, where the students were from 42 countries and economies. Among these students, 54.5% answered correctly. On average, each student took 129.9 s and 17 actions solving the problem. Histograms of the students' problem-solving duration and number of actions are presented in Figure 3 .

Figure 3. (A) Histogram of problem-solving duration of the CC item. (B) Histogram of the number of actions for solving the CC item.

4.2. Analyses

The entire dataset was randomly split into training and testing sets, where the training set contains data from 13,498 students and the testing set contains data from 3,374 students. A predictive model was built solely based on the training set and then its performance was evaluated based on the testing set. We used J = 9 grid points for the parameter estimation, with t 1 through t 9 specified to be 64, 81, 94, 106, 118, 132, 149, 170, and 208 s, respectively, which are the 10% through 90% quantiles of the empirical distribution of duration. As discussed earlier, the number of grid points and their locations may be further engineered by cross validation.

4.2.1. Model Selection

We first build a model based on the training data, using a data-driven stepwise forward selection procedure. In each step, we add one feature into H i ( t ) that leads to maximum increase in a cross-validated log-pseudo-likelihood, which is calculated based on a five-fold cross validation. We stop adding features into H i ( t ) when the cross-validated log-pseudo-likelihood stops increasing. The order in which the features are added may serve as a measure of their contribution to predicting the CPS duration and final outcome.

The candidate features being considered for model selection are listed in Table 2 . These candidate features were chosen to reflect students' CPS behavioral patterns from different aspects. In what follows, we discuss some of them. For example, the feature I i ( t ) indicates whether or not all three control sliders have been explored by simple actions (i.e., moving one control slider at a time) up to time t . That is, I i ( t ) = 1 means that the vary-one-thing-at-a-time (VOTAT) strategy ( Greiff et al., 2015 ) has been taken. According to the design of the CC item, the VOTAT strategy is expected to be a strong predictor of task success. In addition, the feature N i ( t )/ t records a student's average number of actions per unit time. It may serve as a measure of the student's speed of taking actions. In experimental psychology, response time or equivalently speed has been a central source for inferences about the organization and structure of cognitive processes (e.g., Luce, 1986 ), and in educational psychology, joint analysis of speed and accuracy of item response has also received much attention in recent years (e.g., van der Linden, 2007 ; Klein Entink et al., 2009 ). However, little is known about the role of speed in CPS tasks. The current analysis may provide some initial result on the relation between a student's speed and his/her CPS performance. Moreover, the features defined by the repeating of previously taken actions may reflect students' need of verifying the derived hypothesis on the relation based on the previous action or may be related to students' attention if the same actions are repeated many times. We also include 1, t, t 2 , and t 3 in H i ( t ) as the initial set of features to capture the time effect. For simplicity, country information is not taken into account in the current analysis.

Table 2 . The list of candidate features to be incorporated into the model.

Our results on model selection are summarized in Figure 4 and Table 3 . The pseudo-likelihood stopped increasing after 11 steps, resulting a final model with 15 components in H i ( t ). As we can see from Figure 4 , the increase in the cross-validated log-pseudo-likelihood is mainly contributed by the inclusion of features in the first six steps, after which the increment is quite marginal. As we can see, the first, second, and sixth features entering into the model are all related to taking simple actions, a strategy known to be important to this task (e.g., Greiff et al., 2015 ). In particular, the first feature being selected is I i ( t ), which confirms the strong effect of the VOTAT strategy. In addition, the third and fourth features are both based on N i ( t ), the number of actions taken before time t . Roughly, the feature 1 { N i ( t )>0} reflects the initial planning behavior ( Eichmann et al., 2019 ). Thus, this feature tends to measure students' speed of reading the instruction of the item. As discussed earlier, the feature N i ( t )/ t measures students' speed of taking actions. Finally, the fifth feature is related to the use of the RESET button.

Figure 4 . The increase in the cross-validated log-pseudo-likelihood based on a stepwise forward selection procedure. (A–C) plot the cross-validated log-pseudo-likelihood, corresponding to L ( B , σ), L 1 ( b 1 ), L 2 ( b 2 , σ), respectively.

Table 3 . Results on model selection based on a stepwise forward selection procedure.

4.2.2. Prediction Performance on Testing Set

We now look at the prediction performance of the above model on the testing set. The prediction performance was evaluated at a larger set of time points from 19 to 281 s. Instead of reporting based on the pseudo-likelihood function, we adopted two measures that are more straightforward. Specifically, we measured the prediction of final outcome by the Area Under the Curve (AUC) of the predicted Receiver Operating Characteristic (ROC) curve. The value of AUC is between 0 and 1. A larger AUC value indicates better prediction of the binary final outcome, with AUC = 1 indicating perfect prediction. In addition, at each time point t , we measured the prediction of duration based on the root mean squared error (RMSE), defined as

where τ i , i = N + 1, …, N + n , denotes the duration of students in the testing set, and τ ^ i ( t ) denotes the prediction based on information up to time t according to the trained model.

Results are presented in Figure 5 , where the testing AUC and RMSE for the final outcome and duration are presented. In particular, results based on the model selected by cross validation ( p = 15) and the initial model ( p = 4, containing the initial covariates 1, t , t 2 , and t 3 ) are compared. First, based on the selected model, the AUC is never above 0.8 and the RMSE is between 53 and 64 s, indicating a low signal-to-noise ratio. Second, the students' event history does improve the prediction of final outcome and duration upon the initial model. Specifically, since the initial model does not take into account the event history, it predicts the students with duration longer than t to have the same success probability. Consequently, the test AUC is 0.5 at each value of t , which is always worse than the performance of the selected model. Moreover, the selected model always outperforms the initial model in terms of the prediction of duration. Third, the AUC for the prediction of the final outcome is low when t is small. It keeps increasing as time goes on and fluctuates around 0.72 after about 120 s.

Figure 5 . A comparison of prediction accuracy between the model selected by cross validation and a baseline model without using individual specific event history.

4.2.3. Interpretation of Parameter Estimates

To gain more insights into how the event history affects the final outcome and duration, we further look at the results of parameter estimation. We focus on a model whose event history H i ( t ) includes the initial features and the top six features selected by cross validation. This model has similar prediction accuracy as the selected model according to the cross-validation result in Figure 4 , but contains less features in the event history and thus is easier to interpret. Moreover, the parameter estimates under this model are close to those under the cross-validation selected model, and the signs of the regression coefficients remain the same.

The estimated regression coefficients are presented in Table 4 . First, the first selected feature I i ( t ), which indicates whether all three control sliders have been explored via simple actions, has a positive regression coefficient on final outcome and a negative coefficient on duration. It means that, controlling the rest of the parameters, a student who has taken the VOTAT strategy tends to be more likely to give a correct answer and to complete in a shorter period of time. This confirms the strong effect of VOTAT strategy in solving the current task.

Table 4 . Estimated regression coefficients for a model for which the event history process contains the initial features based on polynomials of t and the top six features selected by cross validation.

Second, besides I i ( t ), there are two features related to taking simple actions, 1 { S i ( t )>0} and S i ( t )/ t , which are the indicator of taking at least one simple action and the frequency of taking simple actions. Both features have positive regression coefficients on the final outcome, implying larger values of both features lead to a higher success rate. In addition, 1 { S i ( t )>0} has a negative coefficient on duration and S i ( t )/ t has a positive one. Under this estimated model, the overall simple action effect on duration is b ^ 25 I i ( t ) + b ^ 26 1 { S i ( t ) > 0 } + b ^ 2 , 10 S i ( t ) / t , which is negative for most students. It implies that, overall, taking simple actions leads to a shorter predicted duration. However, once all three types of simple actions have been taken, a higher frequency of taking simple actions leads to a weaker but sill negative simple action effect on the duration.

Third, as discussed earlier, 1 { N i ( t )>0} tends to measure the student's speed of reading the instruction of the task and N i ( t )/ t can be regarded as a measure of students' speed of taking actions. According to the estimated regression coefficients, the data suggest that a student who reads and acts faster tends to complete the task in a shorter period of time with a lower accuracy. Similar results have been seen in the literature of response time analysis in educational psychology (e.g., Klein Entink et al., 2009 ; Fox and Marianti, 2016 ; Zhan et al., 2018 ), where speed of item response was found to negatively correlated with accuracy. In particular, Zhan et al. (2018) found a moderate negative correlation between students' general mathematics ability and speed under a psychometric model for PISA 2012 computer-based mathematics data.

Finally, 1 { R i ( t )>0} , the use of the RESET button, has positive regression coefficients on both final outcome and duration. It implies that the use of RESET button leads to a higher predicted success probability and a longer duration time, given the other features controlled. The connection between the use of the RESET button and the underlying cognitive process of complex problem solving, if it exists, still remains to be investigated.

5. Discussions

5.1. summary.

As an early step toward understanding individuals' complex problem-solving processes, we proposed an event history analysis method for the prediction of the duration and the final outcome of solving a complex problem based on process data. This approach is able to predict at any time t during an individual's problem-solving process, which may be useful in dynamic assessment/learning systems (e.g., in a game-based assessment system). An illustrative example is provided that is based on a CPS item from PISA 2012.

5.2. Inference, Prediction, and Interpretability

As articulated previously, this paper focuses on a prediction problem, rather than a statistical inference problem. Comparing with a prediction framework, statistical inference tends to draw stronger conclusions under stronger assumptions on the data generation mechanism. Unfortunately, due to the complexity of CPS process data, such assumptions are not only hardly satisfied, but also difficult to verify. On the other hand, a prediction framework requires less assumptions and thus is more suitable for exploratory analysis. As a price, the findings from the predictive framework are preliminary and can only be used to generate hypotheses for future studies.

It may be useful to provide uncertainty measures for the prediction performance and for the parameter estimates, where the former indicates the replicability of the prediction performance and the later reflects the stability of the prediction model. In particular, patterns from a prediction model with low replicability and low stability should not be overly interpreted. Such uncertainty measures may be obtained from cross validation and bootstrapping (see Chapter 7, Friedman et al., 2001 ).

It is also worth distinguishing prediction methods based on a simple model like the one proposed above and those based on black-box machine learning algorithms (e.g., random forest). Decisions based on black-box algorithms can be very difficult to understood by human and thus do not provide us insights about the data, even though they may have a high prediction accuracy. On the other hand, a simple model can be regarded as a data dimension reduction tool that extracts interpretable information from data, which may facilitate our understanding of complex problem solving.

5.3. Extending the Current Model

The proposed model can be extended along multiple directions. First, as discussed earlier, we may extend the model by allowing the regression coefficients b jk to be time-dependent. In that case, nonparametric estimation methods (e.g., splines) need to be developed for parameter estimation. In fact, the idea of time-varying coefficients has been intensively investigated in the event history analysis literature (e.g., Fan et al., 1997 ). This extension will be useful if the effects of the features in H i ( t ) change substantially over time.

Second, when the dimension p of H i ( t ) is high, better interpretability and higher prediction power may be achieved by using Lasso-type sparse estimators (see e.g., Chapter 3 Friedman et al., 2001 ). These estimators perform simultaneous feature selection and regularization in order to enhance the prediction accuracy and interpretability.

Finally, outliers are likely to occur in the data due to the abnormal behavioral patterns of a small proportion of people. A better treatment of outliers will lead to better prediction performance. Thus, a more robust objective function will be developed for parameter estimation, by borrowing ideas from the literature of robust statistics (see e.g., Huber and Ronchetti, 2009 ).

5.4. Multiple-Task Analysis

The current analysis focuses on analyzing data from a single task. To study individuals' CPS ability, it may be of more interest to analyze multiple CPS tasks simultaneously and to investigate how an individual's process data from one or multiple tasks predict his/her performance on the other tasks. Generally speaking, one's CPS ability may be better measured by the information in the process data that is generalizable across a representative set of CPS tasks than only his/her final outcomes on these tasks. In this sense, this cross-task prediction problem is closely related to the measurement of CPS ability. This problem is also worth future investigation.

Author Contributions

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

This research was funded by NAEd/Spencer postdoctoral fellowship, NSF grant DMS-1712657, NSF grant SES-1826540, NSF grant IIS-1633360, and NIH grant R01GM047845.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

1. ^ The item can be found on the OECD website ( http://www.oecd.org/pisa/test-2012/testquestions/question3/ )

2. ^ The log file data and code book for the CC item can be found online: http://www.oecd.org/pisa/pisaproducts/database-cbapisa2012.htm .

Allison, P. D. (2014). Event history analysis: Regression for longitudinal event data . London: Sage.

Google Scholar

Danner, D., Hagemann, D., Schankin, A., Hager, M., and Funke, J. (2011). Beyond IQ: a latent state-trait analysis of general intelligence, dynamic decision making, and implicit learning. Intelligence 39, 323–334. doi: 10.1016/j.intell.2011.06.004

CrossRef Full Text | Google Scholar

Eichmann, B., Goldhammer, F., Greiff, S., Pucite, L., and Naumann, J. (2019). The role of planning in complex problem solving. Comput. Educ . 128, 1–12. doi: 10.1016/j.compedu.2018.08.004

Fan, J., Gijbels, I., and King, M. (1997). Local likelihood and local partial likelihood in hazard regression. Anna. Statist . 25, 1661–1690. doi: 10.1214/aos/1031594736

Fox, J.-P., and Marianti, S. (2016). Joint modeling of ability and differential speed using responses and response times. Multivar. Behav. Res . 51, 540–553. doi: 10.1080/00273171.2016.1171128

PubMed Abstract | CrossRef Full Text | Google Scholar

Friedman, J., Hastie, T., and Tibshirani, R. (2001). The Elements of Statistical Learning . New York, NY: Springer.

Greiff, S., Wüstenberg, S., and Avvisati, F. (2015). Computer-generated log-file analyses as a window into students' minds? A showcase study based on the PISA 2012 assessment of problem solving. Comput. Educ . 91, 92–105. doi: 10.1016/j.compedu.2015.10.018

Greiff, S., Wüstenberg, S., and Funke, J. (2012). Dynamic problem solving: a new assessment perspective. Appl. Psychol. Measur . 36, 189–213. doi: 10.1177/0146621612439620

Halpin, P. F., and De Boeck, P. (2013). Modelling dyadic interaction with Hawkes processes. Psychometrika 78, 793–814. doi: 10.1007/s11336-013-9329-1

Halpin, P. F., von Davier, A. A., Hao, J., and Liu, L. (2017). Measuring student engagement during collaboration. J. Educ. Measur . 54, 70–84. doi: 10.1111/jedm.12133

He, Q., and von Davier, M. (2015). “Identifying feature sequences from process data in problem-solving items with N-grams,” in Quantitative Psychology Research , eds L. van der Ark, D. Bolt, W. Wang, J. Douglas, and M. Wiberg, (New York, NY: Springer), 173–190.

He, Q., and von Davier, M. (2016). “Analyzing process data from problem-solving items with n-grams: insights from a computer-based large-scale assessment,” in Handbook of Research on Technology Tools for Real-World Skill Development , eds Y. Rosen, S. Ferrara, and M. Mosharraf (Hershey, PA: IGI Global), 750–777.

Huber, P. J., and Ronchetti, E. (2009). Robust Statistics . Hoboken, NJ: John Wiley & Sons.

Klein Entink, R. H., Kuhn, J.-T., Hornke, L. F., and Fox, J.-P. (2009). Evaluating cognitive theory: A joint modeling approach using responses and response times. Psychol. Methods 14, 54–75. doi: 10.1037/a0014877

Luce, R. D. (1986). Response Times: Their Role in Inferring Elementary Mental Organization . New York, NY: Oxford University Press.

MacKay, D. G. (1982). The problems of flexibility, fluency, and speed–accuracy trade-off in skilled behavior. Psychol. Rev . 89, 483–506. doi: 10.1037/0033-295X.89.5.483

van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika 72, 287–308. doi: 10.1007/s11336-006-1478-z

Vista, A., Care, E., and Awwal, N. (2017). Visualising and examining sequential actions as behavioural paths that can be interpreted as markers of complex behaviours. Comput. Hum. Behav . 76, 656–671. doi: 10.1016/j.chb.2017.01.027

Wüstenberg, S., Greiff, S., and Funke, J. (2012). Complex problem solving–More than reasoning? Intelligence 40, 1–14. doi: 10.1016/j.intell.2011.11.003

Xu, H., Fang, G., Chen, Y., Liu, J., and Ying, Z. (2018). Latent class analysis of recurrent events in problem-solving items. Appl. Psychol. Measur . 42, 478–498. doi: 10.1177/0146621617748325

Yarkoni, T., and Westfall, J. (2017). Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci . 12, 1100–1122. doi: 10.1177/1745691617693393

Zhan, P., Jiao, H., and Liao, D. (2018). Cognitive diagnosis modelling incorporating item response times. Br. J. Math. Statist. Psychol . 71, 262–286. doi: 10.1111/bmsp.12114

Keywords: process data, complex problem solving, PISA data, response time, event history analysis

Citation: Chen Y, Li X, Liu J and Ying Z (2019) Statistical Analysis of Complex Problem-Solving Process Data: An Event History Analysis Approach. Front. Psychol . 10:486. doi: 10.3389/fpsyg.2019.00486

Received: 31 August 2018; Accepted: 19 February 2019; Published: 18 March 2019.

Reviewed by:

Copyright © 2019 Chen, Li, Liu and Ying. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Yunxiao Chen, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Collaborative Problem Solving: Considerations for the National Assessment of Educational Progress NAEP CPS NCES PISA

https://nces.ed.gov/nationsreportcard/pdf/researchcenter/collaborative_problem_solving.pdf

An IERI – International Educational Research Institute Journal

Open access
Published: 13 November 2023

Behavioral patterns in collaborative problem solving: a latent profile analysis based on response times and actions in PISA 2015

Areum Han ORCID: orcid.org/0000-0001-6974-521X 1 ,
Florian Krieger ORCID: orcid.org/0000-0001-9981-8432 2 ,
Francesca Borgonovi ORCID: orcid.org/0000-0002-6759-4515 3 &
Samuel Greiff ORCID: orcid.org/0000-0003-2900-3734 1

Large-scale Assessments in Education volume 11 , Article number: 35 ( 2023 ) Cite this article

1340 Accesses

1 Citations

8 Altmetric

Metrics details

Process data are becoming more and more popular in education research. In the field of computer-based assessments of collaborative problem solving (ColPS), process data have been used to identify students’ test-taking strategies while working on the assessment, and such data can be used to complement data collected on accuracy and overall performance. Such information can be used to understand, for example, whether students are able to use a range of styles and strategies to solve different problems, given evidence that such cognitive flexibility may be important in labor markets and societies. In addition, process information might help researchers better identify the determinants of poor performance and interventions that can help students succeed. However, this line of research, particularly research that uses these data to profile students, is still in its infancy and has mostly been centered on small- to medium-scale collaboration settings between people (i.e., the human-to-human approach). There are only a few studies involving large-scale assessments of ColPS between a respondent and computer agents (i.e., the human-to-agent approach), where problem spaces are more standardized and fewer biases and confounds exist. In this study, we investigated students’ ColPS behavioral patterns using latent profile analyses (LPA) based on two types of process data (i.e., response times and the number of actions) collected from the Program for International Student Assessment (PISA) 2015 ColPS assessment, a large-scale international assessment of the human-to-agent approach. Analyses were conducted on test-takers who: (a) were administered the assessment in English and (b) were assigned the Xandar unit at the beginning of the test. The total sample size was N = 2,520. Analyses revealed two profiles (i.e., Profile 1 [95%] vs. Profile 2 [5%]) showing different behavioral characteristics across the four parts of the assessment unit. Significant differences were also found in overall performance between the profiles.

Collaborative problem-solving (ColPS) skills are considered crucial 21st century skills (Graesser et al., 2018 ; Greiff & Borgonovi, 2022 ). They are a combination of cognitive and social skill sets (Organization for Economic Co-operation and Development [OECD], 2017a ), involving “an anchoring skill—a skill upon which other skills are built” (Popov et al., 2019 , p. 100). Thus, it makes sense that the importance of ColPS has been continually emphasized in research and policy spheres. Modern workplaces and societies require individuals to be able to work in teams to solve ill-structured problems, so having a sufficient level of the skills and the ability to effectively execute them are expected and required in many contexts in people’s lives (Gottschling et al., 2022 ; Rosen & Tager, 2013 , as cited in Herborn et al., 2017 ; Sun et al., 2022 ). Consequently, interest in research and policies on ColPS has grown in the past few years.

In 2015, the Program for International Student Assessment (PISA), managed by the OECD, administered an additional, computer-based assessment of ColPS alongside the core assessment domains of mathematics, reading, and science. The PISA 2015 ColPS assessment was administered in 52 education systems, targeting 15-year-old students (OECD, 2017a , 2017b ). It has provided a substantial body of theory and evidence related to computer-based assessments of the skills involved in the human-to-agent approach (i.e., H-A approach), which makes test-takers collaborate with a couple of computer agents to tackle simulative problems. A great deal of subsequent theoretical and empirical studies on ColPS have followed, drawing on the established framework of the PISA 2015 ColPS assessment and the data that were generated (e.g., Chang et al., 2017 ; Child & Shaw, 2019 ; Graesser et al., 2018 ; Herborn et al., 2017 ; Kang et al., 2019 ; Rojas et al., 2021 ; Swiecki et al., 2020 ; Tang et al., 2021 ; Wu et al., 2022 ).

Despite a growing body of research on ColPS, an unexplained aspect of ColPS revolves around the question, “What particular [ColPS] behaviors give rise to successful problem-solving outcomes?” (Sun et al., 2022 , p. 1). To address this question, a few studies have used students’ process data (e.g., response times) and specifically attempted to profile these students on the basis of their data to investigate behavioral patterns that go beyond performance. Indeed, analyzing test-takers’ process data makes it possible to understand the characteristics of performance in depth, for instance, how 15-year-old students interacted in problem spaces, such as incorrect responses despite overall effective strategies or correct responses that relied on guessing (He et al., 2022 ; Teig et al., 2020 ). However, such studies are still in the embryonic stage and have mostly revolved around the relatively small- to medium-scale assessments with the human-to-human approach (i.e., H-H approach), which entails naturalistic collaboration with people (e.g., Andrews-Todd et al., 2018 ; Dowell et al., 2018 ; Han & Wilson, 2022 ; Hao & Mislevy, 2019 ; Hu & Chen, 2022 ). Little research has been carried out on the process data from large-scale assessments that have used the H-A approach, such as the one employed in PISA 2015.

Therefore, in this research, we aimed to investigate test-takers’ profiles to address the aforementioned question about the behaviors that lead to successful collaborative problem solving. To do so, we conducted an exploratory latent profile analysis (LPA), a profiling methodology that is based on the two types of process data collected in PISA 2015: (a) response time (i.e., the sum of “the time spent on the last visit to an item” per part; OECD, 2019 , p. 3) and (b) the number of actions (e.g., “posting a chat log” or “conducting a search on a map tool”; De Boeck & Scalise, 2019 , p. 1). As described in the previous literature, PISA 2015 has several advantages, including automated scoring and easier and more valid comparisons in standardized settings, although it simultaneously has drawbacks (e.g., it is limited in its ability to deliver an authentic collaboration experience; Han et al., 2023 ; Siddiq & Scherer, 2017 ). It should be noted that PISA 2015 is just one of many (large-scale) H-A assessments on ColPS. Thus, there will be myriad possible ways to find behavioral patterns. As a steppingstone, we hope the results of this study will be helpful for clarifying the behaviors of (un)successful participants in ColPS and will thus be conducive to the development of appropriate interventions (Greiff et al., 2018 ; Hao & Mislevy, 2019 ; Hickendorff et al., 2018 ; Teig et al., 2020 ). Furthermore, as we identified subgroups on the basis of the process data, the subgroups will be used to design better task situations and assessment tools in terms of validity and statistical scoring rules in the future (AERA, APA, & NCME, 2014 ; Goldhammer et al., 2020 , 2021 ; Herborn et al., 2017 ; Hubley & Zumbo, 2017 ; Li et al., 2017 ; Maddox, 2023 ; von Davier & Halpin, 2013 ).

Literature review

The colps assessment in pisa 2015 and the xandar unit.

ColPS in PISA 2015 is defined as “the capacity of an individual to effectively engage in a process whereby two or more agents attempt to solve a problem by sharing the understanding and effort required to come to a solution and pooling their knowledge, skills and efforts to reach that solution” (OECD, 2017a , p. 6). To design and implement the assessment, the OECD defined a matrix of four individual problem-solving processes and three collaboration processes, for a total of 12 different skills (OECD, 2017a ; see Fig. 1 ). The four individual problem-solving processes came from PISA 2012 and entail (a) Exploring and understanding, (b) Representing and formulating, (c) Planning and executing, and (d) Monitoring and reflecting, whereas the three collaborative competencies are (a) Establishing and maintaining shared understanding, (b) Taking appropriate action to solve the problem, and (c) Establishing and maintaining team organization.

The overall composition of the Xandar unit and a general guideline of weighting. Note. The number in Figure 1( a ), highlighted in grey and italics, indicates the item number of the Xandar unit corresponding to each subskill. The item difficulty values in Figure 1 ( b ) are reported on the PISA scale. The framework of Figure 1 is based on the OECD ( 2016 , 2017b ). Adapted from the PISA 2015 collaborative problem-solving framework, by the OECD, 2017 ( https://www.oecd.org/pisa/pisaproducts/Draft%20PISA%202015%20Collaborative%20Problem%20Solving%20Framework%20.pdf ). Copyright 2017 by the OECD. The data for the descriptions of the Xandar unit are from OECD ( 2016 ). Adapted from Description of the released unit from the 2015 PISA collaborative problem-solving assessment, collaborative problem-solving skills, and proficiency levels, by the OECD, 2016 ( https://www.oecd.org/pisa/test/CPS-Xandar-scoring-guide.pdf ). Copyright 2016 by the OECD

With the matrix of four problem-solving and three collaboration processes in mind, the assessment was designed and consisted of assorted items, that is, a single communicative turn between the test-taker and agent(s), actions, products, or responses during ColPS (OECD, 2016 ). With difficulty ranging from 314 to 992, each item measured one (or sometimes more than one) of the 12 skills, and a score of 0, 1, or 2 was assigned (Li et al., 2021 ; OECD, 2016 , 2017a ). Diverse sets of items referred to each task (e.g., consensus building), and each task covered one component of each (problem scenario) unit with a predefined between-unit dimension (e.g., school context vs. non-school context) and various within-unit dimensions (e.g., types of tasks, including jigsaw or negotiation; see details in OECD, 2017a ).

In the computer-based assessment mode of PISA 2015 Footnote 1 , each test-taker worked on four 30-min clusters (i.e., 2 h in total), two of which were in the domain of science, whereas the rest involved reading, mathematics, or ColPS (OECD, 2017a ; see Fig. 2 ). Thus, one test-taker could have had one or two ColPS units—with different positions depending on an assessment form—if their countries or economies were participating in the ColPS assessment (see OECD, 2017c ). Among the ColPS units in the main PISA 2015 study, only one unit was released in the official OECD reports, called Xandar , with additional contextual information included to help interpret the findings (e.g., the unit structure or item difficulty) beyond the raw data, which included actions, response times, and performance levels (e.g., OECD, 2016 ). Consequently, this unit was utilized in the current study because the valid interpretations of the behavioral patterns we identified relied on each item’s specific contextual information (Goldhammer et al., 2021 ).

The computer-based assessment design of the PISA 2015 main study, including the domain of ColPS. Note. R01-R06 = Reading clusters; M01-M06 = Mathematics clusters; S = Science clusters; C01-C03 = ColPS clusters. From PISA 2015 Technical Report (p. 39), by the OECD, 2017 ( https://www.oecd.org/pisa/data/2015-technical-report/PISA2015_TechRep_Final.pdf ). Copyright 2017 by the OECD

In the Xandar unit, each test-taker worked with two computer agents to solve problems on the geography, people, and economy of an imaginary country named Xandar (OECD, 2017b ; see Fig. 3 ). It should be noted that performance in the Xandar unit was assessed as correct actions or responses in the ColPS process, not as the quality of group output. According to the OECD, this unit is “in-school, private, non-technology” context-based and is composed of four separate parts of “decision-making and coordination tasks” in the scenario of a contest (OECD, 2017b ; p. 53). The detailed composition of the unit is described in Fig. 1 (b).

An example screenshot of the PISA 2015 Xandar unit. Note. Adapted from Description of the released unit from the 2015 PISA collaborative problem-solving assessment, collaborative problem-solving skills, and proficiency levels (p. 11), by the OECD, 2016 ( https://www.oecd.org/pisa/test/CPS-Xandar-scoring-guide.pdf ). Copyright 2016 by the OECD

As Fig. 1 shows, the unit did not cover all 12 skills from the competency framework but covered the skills only partially across the four parts of the assessment. More information about each part is as follows.

In Part 1 (i.e., the stage for agreeing on a strategy as a team), participants get accustomed to the assessment unit, including the chat interface and task space (De Boeck & Scalise, 2019 ; OECD, 2017b ). The stage aims to establish common strategies for ColPS under the contest rules (De Boeck & Scalise, 2019 ). Part 1 contains five items, whose item difficulty ranges from 314 to 524 (OECD, 2016 ).

In Part 2 (i.e., reaching a consensus regarding preferences), participants and the computer agents each allocate a topic to themselves (i.e., the geography, people, or economy of Xandar; OECD, 2017b ). In this process, they should reach a consensus by resolving disagreements within their team (OECD, 2017b ). The purpose of this stage is to establish a mutual understanding (De Boeck & Scalise, 2019 ). There are three items in Part 2, with a difficulty of 381, 537, and 598, respectively (OECD, 2016 ).

In Part 3 (i.e., playing the game effectively), participants respond to the questions about the geography of Xandar (OECD, 2017b ), regardless of their choice in Part 2. In this part, they proceed with the contest and should respond appropriately to the agents who violate common rules and raise issues (De Boeck & Scalise, 2019 ). Part 3 consists of two items (i.e., one with a difficulty of 357 and the other with a difficulty of 992; OECD, 2016 ).

In Part 4 (i.e., assessing progress), participants are required to monitor and assess their team’s progress (OECD, 2017b ). In this part, the computer agents pose challenges to the progress evaluation and ask for extra help for the team to solve problems on the economy of Xandar (De Boeck & Scalise, 2019 ). Part 4 is composed of two items (i.e., one with a difficulty of 593 and the other with a difficulty of 730; OECD, 2016 ).

Process data and profiling students on the basis of response times and actions

Process data refer to “empirical information about the cognitive (as well as meta-cognitive, motivational, and affective) states and related behavior that mediate the effect of the measured construct(s) on the task product (i.e., item score)” (Goldhammer & Zehner, 2017 , p. 128). These data can thus indicate “ traces of processes” (e.g., strategy use or engagement; Ercikan et al., 2020 , p. 181; Goldhammer et al., 2021 ; Zhu et al., 2016 ). Such information is recorded and collected via external instruments and encompasses diverse types of data, such as eye-tracking data, paradata (e.g., mouse clicks) or anthropological data (e.g., gestures; Hubley & Zumbo, 2017 ). Process data have recently been spotlighted, as technology-based assessments have advanced with the growth of data science and computational psychometrics, thereby increasing the opportunities for their exploitation across the entire assessment cycle (Goldhammer & Zehner, 2017 ; Maddox, 2023 ).

A substantial number of studies on response times and the number of clicks (i.e., defined as actions in this study) along with test scores have been published, specifically in the field of cognitive ability testing (e.g., studies on complex problem-solving). For instance, according to Goldhammer et al. ( 2014 ), response times and task correctness have a positive relationship when controlled reasoning-related constructs (e.g., computer-based problem-solving) are being measured, in contrast to repetitive and automatic reasoning (e.g., basic reading; Greiff et al., 2018 ; Scherer et al., 2015 ). Greiff et al. ( 2018 ) also argued that the number of interventions employed across the investigation stages can be used as a way to gauge the thoroughness of task exploration because they indicate in-depth and longer commitments to the complex problem-solving task.

In the sphere of ColPS assessments—related to and not mutually exclusive from the domain of complex problem-solving—there is also currently active research on these data, particularly in the contexts of assessments that employ the H-H approach. One such research topic involves profiling students on the basis of their data to examine the behavioral patterns that occur during ColPS. For instance, Hao and Mislevy ( 2019 ) found four clusters via a hierarchical clustering analysis of communication data. One of their results was that participants’ performance level tended to improve (i.e., the scores on the questions about the factors related to volcanic eruption) through more negotiations after the team relayed information. Andrews-Todd et al. ( 2017 ) also discovered four profiles through the analysis of chat logs from applying Andersen/Rasch multivariate item response modeling: cooperative, collaborative, fake collaboration, and dominant/dominant interaction patterns. They reported that the propensities for displaying the cooperative and collaborative interaction patterns were positively correlated with the performance outcomes ( r = .28 and 0.11, p s < 0.05), in contrast to the dominant/dominant interaction pattern, which was negatively correlated with performance outcomes ( r = − .21, p < .01). However, there was no significant correlation between outcomes and the inclination to exhibit the fake collaboration pattern ( r = − .02, p = .64). Such results cannot be directly applied to assessments that have applied the H-A approach due to the differences in interactions.

Compared with studies that have employed the H-H approach, there is still not much research that has attempted to identify behavioral patterns on the basis of the process data collected in ColPS assessments that have employed the H-A approach. One of the few studies is De Boeck and Scalise ( 2019 ). They applied structural equation modeling to data on United States students’ actions, response times, and performance in each part of the PISA 2015 Xandar unit. Consequently, they found a general correlation between the number of actions and response times, a finding that suggests that “an impulsive and fast trial-and-error style” was not the most successful strategy for this unit (p. 6). They also demonstrated specific associations for each part of the unit. For example, performance was related to more actions and more time in Part 4 (i.e., the last part about assessing progress), in contrast to Part 1 on understanding the contest, where the association between actions and performance was negative (De Boeck & Scalise, 2019 ; OECD, 2016 ). Notably, these findings resonate with earlier studies in other settings employing different tools. For instance, despite being implemented in the H-H setting, Chung et al. ( 1999 ) reported that low-performing teams exchanged more predefined messages than high-performing teams during their knowledge mapping tasks. However, De Boeck and Scalise ( 2019 ) showed the general patterns of their entire sample of students and did not delve into the distinctiveness of the patterns, in contrast to the current study, which was designed to explore unobserved student groups and their behavioral characteristics via LPA. Furthermore, the patterns they identified in their study were associated with the performances in each part, thereby making it difficult to determine the relationship between the patterns and the overall level of performance. Their participants were also limited to only individuals from the United States. Therefore, there is still a need to uncover behavioral patterns on the basis of process data and their relationships with overall performance in detail, relying on more diverse populations in more standardized settings and by taking advantage of the H-A approach.

Research questions

The objective of this research was to investigate different behavioral profiles of test-takers by drawing on the two types of process data that are available (i.e., response time and the number of actions) collected during the PISA 2015 ColPS assessment, particularly in the four parts across the Xandar unit. To achieve the objective, we posed two research questions: (a) Which profiles can be identified on the basis of students’ response times and the number of actions in the Xandar unit? and (b) How do the profiles differ in terms of overall performance?

Methodology

Participants and sampling.

The current study examined the PISA 2015 ColPS assessment participants, specifically those who (a) took the assessment in English and (b) had the Xandar unit as the first cluster because we wanted to control for potential sources of bias (i.e., languages, item position, and fatigue). Out of the total of 3,065 students belonging to 11 education systems (i.e., Australia, Canada, Hong Kong, Luxembourg, Macao, New Zealand, Singapore, Sweden, the United Arab Emirates, the United Kingdom, and the United States) Footnote 2 , 539 outliers were excluded via the robust Mahalanobis distance estimation with a 0.01 cutoff for the p -value (see Leys et al., 2018 ) to avoid the influence of outliers on the profile solution (Spurk et al., 2020 ). Footnote 3 In addition, six inactive students were subsequently excluded (i.e., those who did not exhibit any activities across the indicators). Hence, the final sample consisted of 2,520 students (see Table 1 ). The student samples were chosen according to the specific two-stage sampling procedures employed by the OECD (De Boeck & Scalise, 2019 ; OECD, 2009 ) that bring about different probabilities of each student’s participation (Asparouhov, 2005 ; Burns et al., 2022 ; Scherer et al., 2017 ). Given the OECD’s official guidelines and the previous literature related to PISA and LPA (e.g., Burns et al., 2022 ; OECD, 2017c ; Wilson & Urick, 2022 ), we included the sampling hierarchy and the sampling weights of the students in the analyses (see also the Statistical Analyses section). Table 1 Participants’ characteristics Full size table

Materials and indicators

We employed a total of eight indicators for the analyses: (a) the total response time (i.e., one indicator per part for a total of four indicators; the sum of “the time spent on the last visit to an item” per part; OECD, 2019 ) and (b) the total number of actions (i.e., one indicator per part for a total of four indicators). For the distal outcome variables, we utilized the 10 plausible ColPS values (i.e., PV1CLPS-PV10CLPS), which have “a weighted mean of 500 and a weighted standard deviation of 100” (OECD, 2017c , p. 234). The plausible values are “multiple imputed proficiency values” given the test-takers’ patterns of responses, which thus include probabilistic components and indicate their possible level of ability (i.e., a latent construct; Khorramdel et al., 2020 , p. 44). To analyze the plausible values, we referred to the recommendations made in the official guidelines of the OECD (e.g., OECD, 2017c ) and the previous literature on large-scale assessments (e.g., Asparouhov & Muthén, 2010 ; Rutkowski et al., 2010 ; Scherer, 2020 ; Yamashita et al., 2020 , see also Statistical Analyses section). All measures included in this study were open to the public and can be found in the PISA 2015 repository database ( https://www.oecd.org/pisa/data/2015database/ ).

Data cleaning and preparation

We used R 4.2.1 to prepare the data (R Core Team, 2022 ). As shown above, we extracted the sample students on the basis of two conditions: (a) whether they took the assessment in English and (b) whether they had the Xandar unit as the first cluster of the assessment. We then used M plus version 8.8 (Muthén & Muthén, 1998–2017 ) to conduct exploratory analyses for all indicators. Given that the variances between the response time indicators were too high, the analyses did not converge. Thus, we applied a logarithmic transformation to the response time indicators in order to reduce the variance in further steps. Note that the action indicators could not be transformed because one student had none (i.e., 0 actions) in Part 4.

Statistical analyses

LPA was used to identify latent profiles of students on the basis of response time and action data measured in the Xandar unit (Morin et al., 2011 ; see the model in Fig. 4 ). LPA, a more person-centered and model-based approach, has many advantages over other traditional clustering methods, such as k -means clustering (Magidson & Vermunt, 2002 ; Morin et al., 2011 ). In particular, it classifies individuals into clusters on the basis of the estimated probabilities of belonging to specific profiles, where other covariates, such as demographics, can also be considered (Magidson & Vermunt, 2002 ; Pastor et al., 2007 ; Spurk et al., 2020 ). It also specifies alternative models, thereby making it possible to compare multiple models on the basis of various fit statistics (Morin et al., 2011 ).

Full LPA model in this research. Note. C denotes the categorical latent variable describing the latent profiles

Relying on these strengths of LPA, we conducted the statistical analyses with reference to syntax written by Burns et al. ( 2022 ) and Song ( 2021 ). We followed the default assumption of the traditional LPA that the residual correlations between the indicators can be explained only by profile membership (Morin et al., 2011 ; Vermunt & Magidson, 2002 ). There was insufficient empirical and theoretical evidence that it would be acceptable to relax assumptions related to the relationship between the two types of process data from the PISA 2015 ColPS assessment (Collie et al., 2020 ; Meyer & Morin, 2016 ; Morin et al., 2011 ). Therefore, we fixed (a) the covariances between the latent profile indicators to zero and (b) the variances to equality across profiles (i.e., default options; Asparouhov & Muthén, 2015 ; Muthén & Muthén, 1998–2017 ).

At the same time, because correlation coefficients greater than 0.50 were found between some indicators (see Table 2 ), we separately relaxed some assumptions (i.e., some indicators may be correlated within profiles) and tested them. According to Sinha et al. ( 2021 ), the cases with correlation coefficients over 0.50 may have an impact on modeling and fit statistics, so they should be examined carefully. Thus, we tried to formally check the level of local dependence between the indicators but could get only some evidence from the factor loadings due to the constraints of the statistical program. Using the evidence we gathered and drawing on Sinha et al. ( 2021 ), we separately conducted sensitivity analyses by relaxing the assumption (i.e., allowing local dependence between two specific indicators within profiles) or removing one of them. However, not all trials terminated well when based on the relaxed assumptions. When removing some indicators (e.g., C100Q01T and C100Q02A), the relative model fit statistics improved for some trials, but the overall profile membership did not change substantially. Therefore, we decided to stick with the current model with all the indicators and the most conservative assumptions.

In this study, we inspected several models with one to 10 latent profiles, particularly employing the standard three-step approach, in line with best practices (Asparouhov & Muthén, 2014 ; Bakk & Vermunt, 2016 ; Dziak et al., 2016 ; Nylund-Gibson & Choi, 2018 ; Nylund-Gibson et al., 2019 ; Wang & Wang, 2019 ). Footnote 4 According to the approach, (a) an unconditional LPA model is first specified on the basis of the indicator variables. Then, (b) the measurement errors and the most likely class variable of the latent profile C are allocated to the model. Finally, (c) the relationship between profile membership and the distal outcomes is estimated (Dziak et al., 2016 ). Specifically for Step 3, 10 data sets (i.e., each of which contains one of the 10 sets of plausible values, leaving the other variables the same) were prepared to utilize the PISA plausible values in Mplus (Asparouhov & Muthén, 2010 ; Yamashita et al., 2020 ). In this way, 10 analyses with each plausible value were conducted, and the final estimations were derived according to Rubin’s ( 1987 ) rules (Baraldi & Enders, 2013 ; Burns et al., 2022 ; Muthén & Muthén, 1998–2017 ; Rohatgi & Scherer, 2020 ; Mplus Type = IMPUTATION option).

Given the sampling design of PISA described earlier, we applied the final student weights (i.e., W_FSTUWT; Mplus WEIGHT option) and the hierarchical sampling structure (i.e., selecting schools first; cluster = CNTSCHID) to the models ( Mplus Type = COMPLEX MIXTURE option). As can be seen from the kurtosis and skewness values in Table 2 , the raw data were not normally distributed. Therefore, maximum likelihood estimation with robust standard errors (MLR) was used to address the nonnormality and the possibility of nonindependence in the data (Spurk et al., 2020 ; Teig et al., 2020 ). Out of 2,520 students in the sample, four did not fully respond to the test unit (i.e., missing rates = 0.1%; 20 observations/20,160 records of all eight indicators). Despite their small numbers, these missing data were handled with the full information maximum likelihood estimation (i.e., the default in Mplus ; Collie et al., 2020 ; Rohatgi & Scherer, 2020 ). Following the recommendations in earlier studies (e.g., Berlin et al., 2014 ; Nylund-Gibson & Choi, 2018 ; Spurk et al., 2020 ), we also used multiple starting values to avoid local solution problems. Thus, the models were estimated with at least 5,000 random start values, and the best 500 were retained for the final optimization (Geiser, 2012 ; Meyer & Morin, 2016 ; Morin et al., 2011 ). We report the results from the models that “converged on a replicated solution” (Morin et al., 2011 , p. 65).

Model evaluation and selection

We examined multiple criteria and referred to the prior literature to evaluate the candidate models and select the best profile solution. First, we checked whether an error message occurred (Berlin et al., 2014 ; Spurk et al., 2020 ). Second, we compared the relative information criteria, such as the Bayesian Information Criterion (BIC), across the candidate models. The lowest values of the relative information criteria suggest the best fitting model (Berlin et al., 2014 ; Morin et al., 2011 ; Spurk et al., 2020 ). Third, we reviewed the level of entropy and average posterior classification probabilities of the models, both of which can represent the confidence level of the classification. If the values for a specific model are closer to 1, its classification accuracy is greater (Berlin et al., 2014 ; Morin et al., 2011 ). Fourth, we considered profile sizes. According to Berlin et al. ( 2014 ) and Spurk et al. ( 2020 ), the profile should be retained if the additional profile consists of (a) greater than or equal to 1% of the total sample or (b) greater than or equal to 25 cases. Fifth, we examined whether a “salsa effect” existed (i.e., “the coercion of [profiles] to fit a population that may not have been latent [profiles]”; Sinha et al., 2021 , p. 26). In other words, the effect suggests that the differences in indicators between profiles are shown merely as parallel lines (Sinha et al., 2021 ). Thus, it indicates unreliable results of the profile analysis. Finally, we validated our identification by testing mean differences in the overall performance across the profile groups, which provided the answers to the second research question (Sinha et al., 2021 ; Spurk et al., 2020 ). We further tested mean differences in each indicator across the profile groups using the Wald test (Burns et al., 2022 ).

Due to limitations of the statistical program, we could not implement the bootstrapped likelihood ratio test (i.e., BLRT) to determine the number of profiles. Moreover, the results from the other alternative tests (e.g., the Lo-Mendell-Rubin test) were not statistically significant, which might be unreliable because our raw data deviated from the default assumption of normality (Guerra-Peña & Steinley, 2016 ; Spurk et al., 2020 ). Indeed, such tests for large-scale complex data, as in the current research, have yet to be thoroughly scrutinized (Scherer et al., 2017 ).

Descriptive statistics and correlations for the behavioral indicators: response times and the number of actions

Prior to identifying the profiles, we checked the descriptive statistics for the indicators, as presented in Table 2 . The correlations between the indicators were low to moderate in size overall (Guilford, 1942 ). Specifically, we found high correlations between the response times in Parts 1 and 2 ( r = .63, p < .01) and between the number of actions in Parts 2 and 4 ( r = .55, p < .01).

The number of profiles based on the behavioral indicators and their descriptions

The number of latent profiles based on the behavioral indicators.

Table 3 shows the model fit statistics across the models with one to 10 profiles specified in this study. As described in the Model Evaluation and Selection section, we evaluated the models on the basis of multiple criteria. First, we did not find any error messages across the 10 models. Second, the log-likelihood values and the relative information criteria (i.e., AIC, BIC, SABIC, CAIC) kept descending with the increase in the number of profiles. The decline in the information criteria can imply the existence of diverse subgroups in the sample in terms of behavioral patterns but can also be natural as the models become more complex (Scherer et al., 2017 ; Yi & Lee, 2017 ). Following Morin and Marsh’s ( 2015 ) advice to find the best solution, we drew an elbow plot that illustrates the changes in the information criteria with the number of profiles (see Fig. 5 ). However, unfortunately, it was not easy to identify the flattened point in our case (i.e., the information criteria kept decreasing without the definite point). Third, when relying on a cutoff of 0.80 for entropy, all the models seemed to indicate better fits. Likewise, the average classification probabilities generally showed a satisfactory level of distinctiveness (i.e., over 0.90; Spurk et al., 2020 ). Next, we considered each model’s profile size, which was the most influential criterion in the end. Using the rule of thumb (Berlin et al., 2014 ; Spurk et al., 2020 ), we excluded the models that had a profile that accounted for less than 1% of the sample size or fewer than 25 cases. Then, only the two-profile solution remained. As depicted in Fig. 6 , there was no salsa effect between the two profiles. As a result of validation from the Wald tests, the differences in most indicators (i.e., except for the response time indicator in Part 4) and the overall performance between the two profiles were statistically significant (see Table 4 ). Therefore, the two-profile solution was retained, even though it showed one major proportion (95%) and one small one (5%). The two profiles offered qualitatively contrasting interpretations, which will be discussed later.

Elbow plot of the akaike information criteria (AIC), bayesian information criteria (BIC), sample-size-adjusted bayesian information criteria (SABIC), and consistent akaike information criteria (CAIC)

Profile plot for a two-profile model with estimated mean from the indicators. Note. T represents the response time (with natural log transformation); A indicates the number of actions; each number following T or A means the order of the part (e.g., A2 = the number of actions in Part 2)

The descriptions of the profiles and the differences in the indicators

As mentioned earlier, we extracted two profiles: one large-sized group ( N = 2,395, 95%) and one contrasting one ( N = 125, 5%). All in all, the latter group of students ( N = 125, 5%) exhibited more actions and longer response times, except for the response times in Part 4 (see Figs. 6 and 7 ). To paraphrase, the first group of students ( N = 2,395, 95%) spent more time in Part 4 than the other, although the differences were nonsignificant ( p = .51). Interestingly, compared with other indicators for response times and actions, there were fewer disparities between the groups in Part 4 (∆ M response time_inverse log = 3005.70; ∆ M actions = 1.47; S.E. = 0.17 and 0.26; ns and p < .01, respectively). Conversely, Part 1 was the stage in which the most distinctive behavioral patterns emerged between the two groups (∆ M response time_inverse log = 58466.45; ∆ M actions = 21.54, S.E. = 0.08 and 3.37; p s < 0.01). To compare the two groups, we labeled them Profile 1 (95%) and Profile 2 (5%). We specifically took the inverse of the natural log transformation of response times to distinguish the two groups better and report the results in this subsection (see Table 4 ).

Profile Plot for a Two-Profile Model With Estimated Mean From the Indicators of Response Time (With Inverse Log Transformation). Note. T represents the response time (with inverse log transformation); each number following T denotes the order of the part (e.g., T2 = the response time in Part 2)

To elaborate on the differences between the two profiles in detail, for actions, we found distinct gaps in Part 1 (∆ M = 21.54; S.E. = 3.37; p < .01; M profile 1 = 13.41; M profile 2 = 34.96) and Part 3 (∆ M = 12.23, S.E = 2.21; p < .01; M profile 1 = 15.89; M profile 2 = 28.12). For response times, we also found the largest gap in Part 1 (∆ M = 58466.45; S.E. = 0.08; p < .01; M profile 1 = 131006.21; M profile 2 = 189472.66). However, the second largest gap was found in Part 2 (∆ M = 27454.92; S.E = 0.14; p < .05; M profile 1 = 74981.75; M profile 2 = 102436.67). Notably, the general trends of time spent and the gap between the two profiles kept diminishing, in contrast to the irregular patterns of action data shown across the four parts.

The differences in the distal outcome between the profiles

As reported earlier, there were significant differences in the distal outcome denoted by the 10 plausible values, the overall performance of ColPS (see Table 4 ). More precisely, Profile 1 showed better performance on average than Profile 2. The mean performance gap between the two profiles was 129.41 ( S.E. = 21.66; p < .01; M profile 1 = 519.11; M profile 2 = 389.70). Overall, the findings suggest that the participants of the PISA 2015 ColPS assessment (a) who took the assessment in English and (b) who had the Xandar unit as the first cluster could be divided into two distinct profiles (RQ1). Additionally, their ColPS performance was also partly related to their profile membership and its differentiation (RQ2). Profile 1 accounted for 95% of the total and could usually be characterized by better performance with fewer actions and shorter response times. By contrast, the second latent profile (i.e., 5% of the total) generally displayed more actions and longer response times with lower performance.

The objective of this study was to identify students’ behavioral patterns and the relationships between these patterns and the overall ColPS outcomes. To attain this objective, we based the LPA on two aspects of behavioral process data collected from the PISA 2015 ColPS assessment participants: (a) response time and (b) the number of actions. To our knowledge, this study is one of the first investigations into how process data from the PISA 2015 ColPS assessment can be used to identify test-takers’ profiles, which were collected via a computer-based environment with the H-A approach (cf. De Boeck & Scalise, 2019 ). Therefore, this study extends current research by offering information about students’ ColPS processes, in addition to the assessment results.

The profiles of students’ response times and actions during ColPS (RQ1)

Through the LPA, we found evidence for two distinct profiles. The Profile 1 group accounted for most of our sample (95%). Such disproportionate results could be due to the limitations of the tasks the students engaged in or the students’ characteristics (i.e., they had similar proficiency levels), which should be investigated further and will be elaborated on later (see the Limitations and Future Directions section). Nevertheless, the differences in most indicators were significant (except for the response times in Part 4). The information about the profiles suggests that the students’ different behaviors and approaches to engaging in ColPS with the agents were associated with students’ performance levels and can be referenced for future research, assessments, and interventions on ColPS.

With respect to the actions, the students from Profile 2 displayed more actions than the others, and we specifically found considerable divergences between the two profiles in Part 1. As explained earlier, in Part 1, participants establish common strategies for ColPS and familiarize themselves with the assessment unit (De Boeck & Scalise, 2019 ). In this situation, students explore and co-elaborate on the problem space with the agents, thereby obtaining a better understanding (Pöysä-Tarhonen et al., 2022 ). These processes are the “essence of collaboration” (Van den Bossche et al., 2011 , p. 284) because the shared understanding and representations will act as a steppingstone for coordinating the diverse perspectives within a team (Fiore & Schooler, 2004 ; Van den Bossche et al., 2011 ). Our results suggest that the students in Profile 1 established the common presentation as a team and figured out how best to achieve the goal efficiently in Part 1, whereas the students in Profile 2 failed to do so, leading to more unnecessary actions and more time to respond. The reasons they needed to explore the problems repeatedly and for a relatively long time may be related to a single team mental model or to the mixture of models (i.e., the equipment, task, team interaction, and team models; Cannon-Bowers et al., 1993 ). However, it is not easy to conclude which mental models and the extent of their (dis)similarities are related to the different behavioral patterns identified in this context.

Given the limited time of the assessment, if students fail Part 1 in an unstructured way (i.e., as the Profile 2 group’s patterns indicated), their subsequent processes were likely to be hampered, which could be seen in the conspicuous differences in their actions in Part 3. In Part 3, students respond to the questions assigned to them (i.e., about the geography of Xandar) and the uncooperative actions of the agents. As demonstrated, the two profiles’ actions showed the second largest disparities in this part. The students in Profile 2 displayed almost two times as many actions as the other profile. This result indicates that the Profile 2 students implemented “an impulsive and fast trial-and-error” approach, in line with what De Boeck and Scalise ( 2019 ) reported.

There are several possible explanations for such results in Part 3. First, as noted earlier, the results could be due to a failure to build a shared understanding and reach an agreement in previous parts. In other words, because of the (even partially) inaccurate team mental models, the students with Profile 2 could engage in many actions to catch up with the ColPS process and redress the gaps in their mental models. For instance, it is possible that they had a shared understanding with the computer agents about the given tasks but were not on the same page about how to tackle them (e.g., assigned roles and responsibilities; Fiore & Schooler, 2004 ; Mulder, 1999 ; Van den Bossche et al., 2011 ). Unless they disengaged during the tasks, they were likely to click multiple times in an attempt to quickly figure out how to proceed. In this case (e.g., when “team members’ understanding of task priorities is shared, but misguided”; Lim & Klein, 2006 , p. 406), ColPS processes could be ineffective, as shown by a substantive number of actions in our results. Chung et al.’s ( 1999 ) findings resonate with this explanation. They found that low-performing teams exchanged more-defined messages during their knowledge mapping task, which suggests that the low-performing teams might not have benefitted from the discussions, notwithstanding many exchanges. However, our results are inconsistent with Jeong and Chi ( 2007 ), who reported that more interactions were responsible for common knowledge construction. These mixed results might be induced by differences, for example, in tasks and settings (i.e., collaborative text comprehension in the H-H setting with free conversation). For example, disparate problem tasks may require different thresholds and upper limits of efficient interactions (i.e., actions) to accurately construct team mental models, set effective boundaries between their homogeneities and heterogeneities, and share them completely. These agendas should be delved into more in future studies.

Related to the above but slightly different, the students from Profile 2 might have succeeded in establishing the shared understanding but might not have capitalized on it. In other words, they might have failed to maintain the pre-established mental models until Part 3, one reason for which they did not deliberate on them or come to a complete agreement but did not want to reveal their misunderstanding. Accordingly, they might have adopted the quick trial-and-error approach in this part. As pointed out by Jeong and Chi ( 2007 ), what was shared within the team does not necessarily correspond to what team members agreed on. Thus, more research is needed to examine whether the mental models were (in)accurately kept and fully agreed upon and influenced the identified disparities in actions, via measurements at multiple points (Kozlowski & Chao, 2012 ).

Nevertheless, regardless of whether the students established and maintained a shared understanding, the students in Profile 2 might not have familiarized themselves with the assessment environment sufficiently before moving on to Part 3 . For instance, they might have been less familiar with the computer-based assessment environment due to having inadequate information and communication technologies (ICT) literacy and might consequently have engaged in a large number of actions. As we did not include any covariates in the model in the current research, further analysis should be conducted, including potential factors of influence and their interactions and applying an intersectional perspective to judge the validity of this interpretation.

On the one hand, our results imply that the students with Profile 2 might have been engaged across the entire assessment, given the remarkable number of actions. The number can be understood as the level of engagement with the tasks, as Greiff et al. ( 2018 ) suggested. In the study of a computer-based complex problem-solving assessment, they proposed that the number of actions could indicate “deeper and longer engagement with the tasks” (p. 251). However, one difference between their study and the current one is the thoroughness of the explorations, which might be ascribed to the differences in the assessment tool and the construct of interest. As mentioned above, in the context of the Xandar unit, too many exploration steps suggest a shallowness of the exploration, that is, a failure to establish, agree upon, or maintain a shared understanding. Nevertheless, as noted earlier, the extent to which an appropriate number of actions is important for being a good collaborator in this task should be investigated further. Triangulating the number of actions with other data (e.g., physiological metrics, think-aloud data, or survey results) will be one way to determine the thresholds and upper limits of efficient actions as well as students’ levels of engagement during this unit.

Turning to time-related aspects, the students with Profile 2 took more time across the assessment than the other group, except for Part 4. However, the overall trend for the time spent on the tasks kept decreasing for both profiles. This trend could indicate the levels of students’ engagement. In other words, the participants of both profiles might have shown more rapid guesses (i.e., “response occurring faster than an item’s threshold”) in the later part of the 30-min ColPS assessment, which should be further examined (Wise et al., 2009 , p.187; see also OECD, 2023 ). Notably, the gap in the response times between the two profiles also kept declining. These results suggest that both profiles might have become accustomed to the assessment environment, tasks, and collaboration processes as time went by. This waning pattern is in contrast to the irregular trend in the action data. It is more interesting to see the patterns when considering the numbers of items and their levels of difficulty in each part. For instance, comparing Parts 3 and 4, each of which consists of two items, it would be natural to spend more time in Part 3 because Part 3 has the item with the highest difficulty (i.e., 992 in Part 3).

Of the patterns we identified, we found the most noticeable disparities in response times between the two profiles in Part 1— and likewise for the actions. Given the item difficulty in this part (i.e., from 314 to 524; see Fig. 1 ), it would be interesting to figure out why the considerable differences in response times emerged here. One possible explanation is that it was the stage in which students are supposed to explore and co-elaborate on the given tasks without any pre-existing procedures or strategies, as mentioned earlier (Pöysä-Tarhonen et al., 2022 ). Particularly, in this part, students should lead the way by proposing a strategy for assigning responsibilities (i.e., Item 4; OECD, 2016 ), which could allow some of them to ponder how to do it. In addition, students should be getting used to the assessment environment in Part 1 (De Boeck & Scalise, 2019 ). Accordingly, students are more likely to exhibit differences in the time they spend in Part 1, depending on their proficiency in establishing a shared understanding and adapting themselves to the assessment.

Another interesting point to mention is with respect to Part 2, where we observed another significant difference in response times between the two profiles. As in Part 1, students continued to establish what the team models had in common, particularly on the roles and responsibilities of team members, factoring in their expertise and skills. Thus, for the same reason as above, it was very likely that differences in response times would be exhibited between the two profiles here. Specifically, the differences could be related to the conflict that arises between the agents at the beginning of this part (i.e., Item 6 with the difficulty of 598; OECD, 2016 ). This conflict requires students to seek members’ perspectives and negotiate solutions (OECD, 2016 ), but it might not be easy for students in Profile 2, which thus makes them expend a great deal of time clicking on incorrect messages (i.e., they engage in a large number of actions) or contemplating. However, the effect of this item on the response times should be scrutinized in the future. Unfortunately, the current form of the PISA 2015 ColPS assessment does not provide public information about process data at the item level.

To capitulate briefly, the identified profiles are different in the behavioral patterns of the action and time data collected for each part of the assessment. However, we could not provide unequivocal conclusions due to the limited amount of information. As Goldhammer et al. ( 2021 ) highlighted, there is a need for more science-backed and contextual evidence and information in order to provide definite explanations.

Differences in performance between the extracted profiles (RQ2)

In the current research, we found a significant relationship between profile membership and overall performance in the PISA 2015 ColPS assessment. There were significant mean differences in achievement between the two profiles. That is, Profile 1 (95%) outperformed the other. From a measurement standpoint, such results show that, based on the sample from the Xandar unit of PISA 2015, the identified behavioral patterns and the two types of process data can be utilized to identify students’ ColPS proficiencies to some extent. As described earlier, the part that differentiated the most between the profiles was Part 1. Given the general guidelines on how to assign the target weights to the target skills of the PISA 2015 ColPS framework (see Fig. 1 ; OECD, 2017a ), higher weights were allocated to the items in Part 1. Thus, it can be concluded that the behavioral patterns in Part 1 were what best determined the differences in performance between the profiles. Put differently, a structured approach leads to better performance than the trial-and-error approach in Part 1 of the Xandar unit, a finding that is consistent with De Boeck and Scalise ( 2019 ).

Conversely, there is a slight inconsistency between De Boeck and Scalise ( 2019 ) and our study, particularly regarding Part 4, where monitoring and reflecting processes were involved. They found significant positive relationships between the actions, response times, and performance in Part 4. In contrast to them, we found that (a) Profile 1 (i.e., the high-performing group) engaged in fewer actions than Profile 2 (i.e., the low-performing group) in this part. Additionally, (b) the differences in the time devoted here were not statistically significantly different, although the students in Profile 1 expended more time here than Profile 2 did. These disparities may have multiple reasons. First, we extended the sample of students (i.e., students with diverse nationalities, including the United States) compared with De Boeck and Scalise’s study. Thus, the unknown differences in students’ characteristics, such as personalities, working styles, dispositions on ColPS, ICT skills, or cultural values, may have influenced the observed behavioral patterns and led to differences between the studies. Given the purpose and content of the part, their findings and interpretation seem reasonable. However, provided that the earlier stages are well-implemented, that is, team mental models are well-established and shared, the processes for monitoring and reflecting may be done with fewer actions in an organized way. It can be assumed that the participants in our sample show this point more clearly than in De Boeck and Scalise’s study.

From a construct and training perspective, our findings spotlight the importance of the subcompetency of ColPS called Establishing and maintaining a shared understanding . Our findings are consistent with points that have already been emphasized in numerous previous studies (e.g., Andrews-Todd et al., 2017 ; Van den Bossche et al., 2011 ). Overall and specifically for such a competency, we found that the organized approach used by the students in Profile 1 was associated with better performance than the other profile’s approach, which involved trial and error. Although the relationship between the observed behavioral patterns and overall performance cannot be generalized because our sample was limited and we analyzed only a single task unit, our findings can be used as a reference for training students in ColPS. For instance, when designing instruction in ColPS, enough time and facilitating tools should be provided to establish a shared understanding, such as visual support or meta-cognitive questions, particularly in the early stage of ColPS (Dindar et al., 2022 ; Gergle et al., 2013 ; Newton et al., 2018 ). Furthermore, given that ColPS processes can be iterative in reality, it is recommended that students revisit the established common understanding and recognize any gaps in their understanding if problems occur in the middle of ColPS (e.g., “What might your teammate know that you need to know?”, see the guiding questions in Table 2 in Newton et al., 2018 , p. 47). In addition, since the findings can be related to the level of students’ engagement, it will be worthwhile considering ample scaffolding for the later parts of ColPS and effective assessment design (OECD, 2023 ; Wise et al., 2009 ). If more diverse types of ColPS tasks, process data, participants, and their backgrounds can be investigated in the future, more practical guidelines can be established (Greiff & Borgonovi, 2022 ).

Limitations and future directions

There are several limitations in this exploratory study. First, the results should be considered to reflect the socioeconomic and cultural contexts of the populations included in the study and should not be assumed to generalize beyond them. In particular, most of the countries in the sample were fairly high-achieving, with students obtaining better scores than the international average (see Table 1 ). We tried to control for the potential effects of languages and selected the participants who took the assessment in English, thereby analyzing a limited range of proficiency levels. A different set of profiles might be identified if such a study is conducted on a different sample of countries with diverse social contexts (e.g., cultural values or the perceptions of the test) because the social and economic contexts individuals experience might affect respondents’ behavioral patterns (e.g., differences in cognitive processes or test-taking motivation and preferences for specific tactics; Borgonovi & Biecek, 2016 ; He et al., 2021 ; Xu et al., 2021 ). Indeed, there is evidence that the students from Hong Kong show different response processes, for instance, in Item 11 of the Xandar unit (see Annex G in OECD, 2017c ). Future research could investigate behavioral profiles using a more diverse set of countries and student populations that differ in cultural or ethnic backgrounds and could compare the (dis)similarities between the extracted profiles across diverse backgrounds based on the item-level information (e.g., Morin et al., 2016 ).

Second, we did not include covariates that may be associated with the profiles, and we conducted a single-level LPA. On the basis of the previous literature, it will be possible to get ideas for potential predictors of profile membership, such as gender, students’ ICT skills, self-perceived collaboration, and teamwork dispositions (Scalise et al., 2016 ) or school-level variables, including schools’ socioeconomic status (Tang et al., 2021 ). As noted earlier, the variables related to the different cultural backgrounds can also be included as covariates. Thus, it could be possible to include the potential covariates and examine their effects on profile membership in future analyses, thereby ensuring valid interpretations of the results (e.g., the effects of students’ ICT use and socioeconomic backgrounds; Borgonovi & Pokropek, 2021 ; Maddox, 2023 ). Furthermore, multilevel LPAs can be implemented to derive more definitive explanations, such as the use of school-level variables (e.g., Collie et al., 2020 ).

Third, from a statistical standpoint, more investigations of large-scale assessment data sets need to be conducted with LPA. As mentioned earlier, we encountered some problems, potentially due to the non-normality of the indicators and the complex structure of the data set. Although we took steps to mitigate the degree of non-normality (i.e., adopting the MLR estimator and the log transformation of some indicators), the high level of non-normality might influence the analytical procedures and findings. Furthermore, because we had to account for the nested structure of the samples, we could not apply a widely used statistical test (i.e., the BLRT test) to determine the number of profiles, but this avenue should be examined more in the future (Scherer et al., 2017 ). The local independence between the response time and action data from the PISA 2015 ColPS assessment should also be scrutinized further, as local independence is an underlying assumption for LPA. However, because there is little research on the topic of such indicators, we had to rely on the levels of correlation coefficients and conduct sensitivity analyses. Therefore, further analyses based on more scientific and statistically stricter verifications are needed in the future.

Fourth, the behavioral indicators used in this study are tailored to one specific assessment unit that was based on the H-A approach in a computer-based assessment. As we relied on only one test unit, which partially covered the 12 skills and utilized computer agents, it is impossible to generalize our results. The specific assessment setting (i.e., exchanging predefined messages with computer agents about a certain problem-solving scenario) might not elicit behaviors that are as varied or as free-flowing as the behaviors that would occur in real ColPS situations. Furthermore, nonidentical types of collaboration can achieve different purposes, which may consequently lead to heterogenous behavioral patterns (Graesser et al., 2018 ; Han et al., 2021 ; Kapur, 2011 ). According to the OECD ( 2017a , p. 23), there might be other units in PISA 2015 involving different tasks (e.g., “group decision-making” or “group-production tasks”). Therefore, further investigations should be conducted on the data from the other test units, and the findings should be compared with other ColPS studies to validate the current and previous findings. For instance, we can gain more insights into the levels of students’ engagement by implementing another LPA on the next test unit and comparing the number of actions in the first part.

Lastly, the indicators used in this study indicate the total number of actions and the total response times in each of the various parts of the assessment (i.e., not on the item level) and are only two types of data, whose use and interpretation needs to be validated further (Zumbo et al., 2023 ). That is, we were able to utilize only the total number of actions and the response times for each part due to the constraints of the available data. We do not even know the contents of actions, which could be mistakes, help-seeking behaviors, or task-related behaviors. Likewise, what occurred during the time participants took to respond is unknown. Furthermore, the response time indicators in this study were calculated when students worked on an item for the last time. Thus, it might not always correspond to the total time spent on an item, such as when students moved back and forth across the items (OECD, 2019 ). If more information (e.g., at the item level) is provided and utilized, response sequences within/across test units can be investigated to better understand the behavioral patterns (e.g., Han & Wilson, 2022 ; He et al., 2021 ; He et al., 2022 ). It may also be possible to examine the specific number of thresholds and upper limits for effective behavioral patterns with respect to the shared mental models and the extent of their similarities and dissimilarities. Triangulations based on more diverse types of process data can also be considered by using additional apparati (e.g., eye movements, Maddox et al., 2018 ; chats, Shaffer et al., 2016 ) to validate the results (and the utilized process data) and to gain a better understanding of the behaviors (Maddox, 2023 ; Zumbo et al., 2023 ). Such research should be grounded in theoretical and analytical considerations of process data—including the validity, fairness, and reliability of their use, the given assessment settings, purposes, and participants—established in the design and development stage of the assessment (He et al., 2021 , 2022 ; Maddox, 2023 ; Zumbo et al., 2023 ).

We conducted an LPA to shed light on students’ ColPS behavioral patterns, based on two types of process data (i.e., response times and the number of actions) measured in the PISA 2015 ColPS assessment, specifically for those who (a) took the assessment in English and (b) had the Xandar unit as the first cluster ( N = 2,520). The results confirmed the presence of two distinguishable groups of students (i.e., Profile 1 [95%] vs. Profile 2 [5%]) with different behavioral patterns. We further confirmed that the disparities in behaviors were statistically significantly related to students’ overall ColPS performance. Thus, more organized and goal-driven behaviors (i.e., less time and actions) were associated with better performance than the trial-and-error-based approach (i.e., more time and actions) across the Xandar unit of PISA 2015. Whereas there is a need for further research that is aligned with diverse populations and types of process data, different tasks and covariates, and more contextual information for validation and triangulation, these exploratory findings provide initial insights into successful ColPS, extend the relevant extant literature, and hence serve researchers, policymakers, and practitioners.

Data Availability

The data sets generated and/or analyzed during the current study are available in the PISA 2015 repository, https://www.oecd.org/pisa/data/2015database/ .

In the main PISA 2015 assessment, there were two modes: paper-based and computer-based. The domain of ColPS was included in only the computer-based mode (see OECD, 2017c ).

Note that English-speaking samples from Scotland, Malaysia, and Cyprus were not included in the current study. Reviewing the data and information, we confirmed that Scotland is part of the sample of the United Kingdom. Cyprus data were not in the PISA international database and had to be requested ad hoc to the Cypriotic authorities. We decided not to pursue this avenue because we worried about data quality for process data since the dataset has not undergone the same level of scrutiny that is standard for the international database. We were also concerned about reproducibility since researchers would not be able to replicate our findings. Lastly, Malaysia was included alongside Kazakhstan and Argentina separately from the other countries and with a note in the PISA reports and publications because coverage in Malaysia was deemed too small to ensure comparability. Thus, we decided not to include Malaysia because response rates problems may mean that the sample is selected and contains bias in like with the OECD recommendations.

Note that we conducted sensitivity analyses on three data sets that differed in the method applied to exclude outliers. In addition to the robust Mahalanobis distance estimation, we implemented the basic Mahalanobis distance estimation from which 92 outliers were identified (i.e., the final sample had N = 2,967; Leys et al., 2018 ), and we deleted six inactive students without applying any outlier detection procedure (i.e., the final sample had N = 3,059). Comparing the fit indices between the different data sets, the results of the robust Mahalanobis distance estimation showed the best indices in Step 1 of the three-step approach. Consequently, in further analytic steps, we decided to use the data set from which we excluded the outliers on the basis of the robust estimation.

Note that we also tried the BCH approach (see Asparouhov & Muthén, 2021 ; Bakk & Vermunt, 2016 ), a more advanced method, but it failed with errors in Step 3. Accordingly, we stuck to the three-step approach. For a comprehensive discussion of the BCH and the three-step approaches, see Nylund-Gibson et al. ( 2019 ).

Andrews-Todd, J. J., Kerr, D., Mislevy, R. J., von Davier, A., Hao, J., & Liu, L. (2017). Modeling collaborative interaction patterns in a simulation-based task. Journal of Educational Measurement , 54 (1), 54–69. https://doi.org/10.1111/jedm.12132 .

Article Google Scholar

Andrews-Todd, J. J., Forsyth, C., Steinberg, J., & Rupp, A. (2018). Identifying profiles of collaborative problem solvers in an online electronics environment. Proceedings of the 11th International Conference on Educational Data Mining (EDM’18), 16–18 July 2018, Raleigh, NC, USA (pp. 239–245). https://eric.ed.gov/?id=ED593219 .

Asparouhov, T. (2005). Sampling weights in latent variable modeling. Structural Equation Modeling: A Multidisciplinary Journal , 12 (3), 411–434. https://doi.org/10.1207/s15328007sem1203_4 .

Asparouhov, T., & Muthén, B. (2010). Plausible values for latent variables using Mplus https://www.statmodel.com/download/Plausible.pdf .

Asparouhov, T., & Muthén, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal , 21 (3), 329–341. https://doi.org/10.1080/10705511.2014.915181 .

Asparouhov, T., & Muthén, B. (2015). Residual associations in latent class and latent transition analysis. Structural Equation Modeling: A Multidisciplinary Journal , 22 (2), 169–177. https://doi.org/10.1080/10705511.2014.935844 .

Asparouhov, T., & Muthén, B. (2021). Auxiliary variables in mixture modeling: Using the BCH method in Mplus to estimate a distal outcome model and an arbitrary secondary model . https://www.statmodel.com/examples/webnotes/webnote21.pdf .

Bakk, Z., & Vermunt, J. K. (2016). Robustness of stepwise latent class modeling with continuous distal outcomes. Structural Equation Modeling: A Multidisciplinary Journal , 23 (1), 20–31. https://doi.org/10.1080/10705511.2014.955104 .

Baraldi, A. N., & Enders, C. K. (2013). Missing data methods. In T. D. Little (Ed.), The Oxford handbook of quantitative methods: Statistical analysis (pp. 635–664). Oxford University Press.

Berlin, K. S., Williams, N. A., & Parra, G. R. (2014). An introduction to latent variable mixture modeling (part 1): Overview and cross-sectional latent class and latent profile analyses. Journal of Pediatric Psychology , 39 (2), 174–187. https://doi.org/10.1093/jpepsy/jst084 .

Borgonovi, F., & Biecek, P. (2016). An international comparison of students’ ability to endure fatigue and maintain motivation during a low-stakes test. Learning and Individual Differences , 49 , 128–137. https://doi.org/10.1016/j.lindif.2016.06.001 .

Borgonovi, F., & Pokropek, M. (2021). The evolution the association between ICT use and reading achievement in 28 countries. Computers and Education Open , 2 , 1–13. https://doi.org/10.1016/j.caeo.2021.100047 .

Burns, E. C., Collie, R. J., Bergen, P. V., & Martin, A. J. (2022). Intrapersonal and interpersonal psychosocial adjustment resources and achievement: A multilevel latent profile analysis of students and schools. Journal of Educational Psychology, 114(8), 1912–1930 . https://doi.org/10.1037/edu0000726 .

Cannon-Bowers, J. A., Salas, E., & Converse, S. (1993). Shared mental models in expert team decision making. In N. J. Castellan Jr. (Ed.), Individual and group decision making: Current issues (pp. 221–246). Lawrence Erlbaum Associates Publishers.

Chang, C. J., Chang, M. H., Chiu, B. C., Liu, C. C., Fan Chiang, S. H., Wen, C. T., Hwang, F. K., Wu, Y. T., Chao, P. Y., Lai, C. H., Wu, S. W., Chang, C. K., & Chen, W. (2017). An analysis of student collaborative problem solving activities mediated by collaborative simulations. Computers & Education , 114 , 222–235. https://doi.org/10.1016/j.compedu.2017.07.008 .

Child, S. F. J., & Shaw, S. (2019). Towards an operational framework for establishing and assessing collaborative interactions. Research Papers in Education , 34 (3), 276–297. https://doi.org/10.1080/02671522.2018.1424928 .

Chung, G. K. W. K., O’Neil, H. F., & Herl, H. E. (1999). The use of computer-based collaborative knowledge mapping to measure team processes and team outcomes. Computers in Human Behavior , 15 (3–4), 463–493. https://doi.org/10.1016/S0747-5632(99)00032-1 .

Collie, R. J., Malmberg, L. E., Martin, A. J., Sammons, P., & Morin, A. J. S. (2020). A multilevel person-centered examination of teachers’ workplace demands and resources: Links with work-related well-being. Frontiers in Psychology , 11 , 1–19. https://doi.org/10.3389/fpsyg.2020.00626 .

De Boeck, P., & Scalise, K. (2019). Collaborative problem solving: Processing actions, time, and performance. Frontiers in Psychology , 10 , 1–9. https://doi.org/10.3389/fpsyg.2019.01280 .

Dindar, M., Järvelä, S., Nguyen, A., Haataja, E., & Çini, A. (2022). Detecting shared physiological arousal events in collaborative problem solving. Contemporary Educational Psychology , 69 , 1–13. https://doi.org/10.1016/j.cedpsych.2022.102050 .

Dowell, N. M. M., Nixon, T. M., & Graesser, A. C. (2018). Group communication analysis: A computational linguistics approach for detecting sociocognitive roles in multiparty interactions. Behavior Research Methods , 51 , 1007–1041. https://doi.org/10.3758/s13428-018-1102-z .

Dziak, J. J., Bray, B. C., Zhang, J., Zhang, M., & Lanza, S. T. (2016). Comparing the performance of improved classify-analyze approaches for distal outcomes in latent profile analysis. Methodology , 12 (4), 107–116. https://doi.org/10.1027/1614-2241/a000114 .

Ercikan, K., Guo, H., & He, Q. (2020). Use of response process data to inform group comparisons and fairness research. Educational Assessment , 25 (3), 179–197. https://doi.org/10.1080/10627197.2020.1804353 .

Fiore, S. M., & Schooler, J. W. (2004). Process mapping and shared cognition: Teamwork and the development of shared problem models. In E. Salas & S. M. Fiore (Eds.), Team cognition: Understanding the factors that drive process and performance (pp. 133–152). American Psychological Association. https://doi.org/10.1037/10690-007 .

Geiser, C. (2012). Data analysis with Mplus . Guilford publications.

Gergle, D., Kraut, R. E., & Fussell, S. R. (2013). Using visual information for grounding and awareness in collaborative tasks. Human-Computer Interaction , 28 (1), 1–39. https://doi.org/10.1080/07370024.2012.678246 .

Goldhammer, F., & Zehner, F. (2017). What to make of and how to interpret process data. Measurement: Interdisciplinary Research and Perspectives , 15 (3–4), 128–132. https://doi.org/10.1080/15366367.2017.1411651 .

Goldhammer, F., Naumann, J., Stelter, A., Tóth, K., Rölke, H., & Klieme, E. (2014). The time on task effect in reading and problem solving is moderated by task difficulty and skill: Insights from a computer-based large-scale assessment. Journal of Educational Psychology , 106 (3), 608–626. https://doi.org/10.1037/a0034716 .

Goldhammer, F., Hahnel, C., & Kroehne, U. (2020). Analysing log file data from PIAAC. In D. Maehler & B. Rammstedt (Eds.), Large-scale cognitive assessment: Analyzing PIAAC data (pp. 239–269). Springer. https://doi.org/10.1007/978-3-030-47515-4_10 .

Goldhammer, F., Hahnel, C., Kroehne, U., & Zehner, F. (2021). From byproduct to design factor: On validating the interpretation of process indicators based on log data. Large-Scale Assessments in Education , 9 , 1–25. https://doi.org/10.1186/s40536-021-00113-5 .

Gottschling, J., Krieger, F., & Greiff, S. (2022). The fight against infectious Diseases: The essential role of higher-order thinking and problem-solving. Journal of Intelligence , 10 (1), 1–8. https://doi.org/10.3390/jintelligence10010014 .

Graesser, A. C., Fiore, S. M., Greiff, S., Andrews-Todd, J., Foltz, P. W., & Hesse, F. W. (2018). Advancing the science of collaborative problem solving. Psychological Science in the Public Interest , 19 (2), 59–92. https://doi.org/10.1177/1529100618808244 .

Greiff, S., & Borgonovi, B. (2022). Teaching of 21st century skills needs to be informed by psychological research. Nature Reviews Psychology , 1 , 314–315. https://doi.org/10.1038/s44159-022-00064-w .

Greiff, S., Molnár, G., Martin, R., Zimmermann, J., & Csapó, B. (2018). Students’ exploration strategies in computer-simulated complex problem environments: A latent class approach. Computers & Education , 126 , 248–263. https://doi.org/10.1016/j.compedu.2018.07.013 .

Guerra-Peña, K., & Steinley, D. (2016). Extracting spurious latent classes in growth mixture modeling with nonnormal errors. Educational and Psychological Measurement , 76 (6), 933–953. https://doi.org/10.1177/0013164416633735 .

Guilford, J. P. (1942). Fundamental statistics in psychology and education . McGraw-Hill.

Han, Y., & Wilson, M. (2022). Analyzing student response processes to evaluate success on a technology-based problem-solving task. Applied Measurement in Education , 35 (1), 33–45. https://doi.org/10.1080/08957347.2022.2034821 .

Han, A., Krieger, F., & Greiff, S. (2021). Collaboration analytics need more comprehensive models and methods. An opinion paper. Journal of Learning Analytics , 8 (1), 13–29. https://doi.org/10.18608/jla.2021.7288 .

Han, A., Krieger, F., & Greiff, S. (2023). Assessment of collaborative problem-solving: Past achievements and current challenges. In R. J. Tierney, F. Rizvi, & K. Erkican (Eds.), International Encyclopedia of Education (4th ed., pp. 234–244). Elsevier. https://doi.org/10.1016/B978-0-12-818630-5.09041-2 .

Hao, J., & Mislevy, R. J. (2019). Characterizing interactive communications in computer-supported collaborative problem-solving tasks: A conditional transition profile approach. Frontiers in Psychology , 10 , 1–9. https://doi.org/10.3389/fpsyg.2019.01011 .

He, Q., Borgonovi, F., & Paccagnella, M. (2021). Leveraging process data to assess adults’ problem-solving skills: Using sequence mining to identify behavioral patterns across digital tasks. Computers & Education , 166 , 1–14. https://doi.org/10.1016/j.compedu.2021.104170 .

He, Q., Borgonovi, F., & Suárez-Álvarez, J. (2022). Clustering sequential navigation patterns in multiple-source reading tasks with dynamic time warping method. Journal of Computer Assisted Learning , 1–18. https://doi.org/10.1111/jcal.12748 .

Herborn, K., Mustafić, M., & Greiff, S. (2017). Mapping an experiment-based assessment of collaborative behavior onto collaborative problem solving in PISA 2015: A cluster analysis approach for collaborator profiles. Journal of Educational Measurement , 54 (1), 103–122. https://doi.org/10.1111/jedm.12135 .

Hickendorff, M., Edelsbrunner, P. A., McMullen, J., Schneider, M., & Trezise, K. (2018). Informative tools for characterizing individual differences in learning: Latent class, latent profile, and latent transition analysis. Learning and Individual Differences , 66 , 4–15. https://doi.org/10.1016/j.lindif.2017.11.001 .

Hu, L., & Chen, G. (2022). Exploring turn-taking patterns during dialogic collaborative problem solving. Instructional Science , 50 , 63–88. https://doi.org/10.1007/s11251-021-09565-2 .

Hubley, A. M., & Zumbo, B. D. (2017). Response processes in the context of validity: Setting the stage. In B. D. Zumbo & A. M. Hubley (Eds.), Understanding and investigating response processes in validation research (pp. 1–12). Springer. https://doi.org/10.1007/978-3-319-56129-5_1 .

Jeong, H., & Chi, M. T. H. (2007). Knowledge convergence and collaborative learning. Instructional Science , 35 (4), 287–315. http://www.jstor.org/stable/41953741 .

Kang, J., An, D., Yan, L., & Liu, M. (2019). Collaborative problem-solving process in a science serious game: Exploring group action similarity trajectory. Proceedings of the 12th International Conference on Educational Data Mining , 336–341. https://files.eric.ed.gov/fulltext/ED599182.pdf .

Kapur, M. (2011). Temporality matters: Advancing a method for analyzing problem-solving processes in a computer-supported collaborative environment. International Journal of Computer-Supported Collaborative Learning , 6 , 39–56. https://doi.org/10.1007/s11412-011-9109-9 .

Khorramdel, L., von Davier, M., Gonzalez, E., & Yamamoto, K. (2020). Plausible values: Principles of item response theory and multiple imputations. In D. B. Maehler & B. Rammstedt (Eds.), Large-scale cognitive assessment: Analyzing PIAAC Data (pp. 27–47). Springer. https://doi.org/10.1007/978-3-030-47515-4_3 .

Kozlowski, S. W. J., & Chao, G. T. (2012). The dynamics of emergence: Cognition and cohesion in work teams. Managerial & Decision Economics , 33 (5–6), 335–354. https://doi.org/10.1002/mde.2552 .

Leys, C., Klein, O., Dominicy, Y., & Ley, C. (2018). Detecting multivariate outliers: Use a robust variant of the mahalanobis distance. Journal of Experimental Social Psychology , 74 , 150–156. https://doi.org/10.1016/j.jesp.2017.09.011 .

Li, Z., Banerjee, J., & Zumbo, B. D. (2017). Response time data as validity evidence: Has it lived up to its promise and, if not, what would it take to do so. In B. D. Zumbo & A. M. Hubley (Eds.), Understanding and investigating response processes in validation research (pp. 159–177). Springer. https://doi.org/10.1007/978-3-319-56129-5_9 .

Li, C. H., Tsai, P. L., Liu, Z. Y., Huang, W. C., & Hsieh, P. J. (2021). Exploring collaborative problem solving behavioral transition patterns in science of Taiwanese students at age 15 according to mastering levels. Sustainability , 13 (15), 1–15. https://doi.org/10.3390/su13158409 .

Lim, B. C., & Klein, K. J. (2006). Team mental models and team performance: A field study of the effects of team mental model similarity and accuracy. Journal of Organizational Behavior , 27 (4), 403–418. https://doi.org/10.1002/job.387 .

Maddox, B. (2023). The uses of process data in large-scale educational assessments (OECD Education Working Paper No. 286). https://doi.org/10.1787/5d9009ff-en .

Maddox, B., Bayliss, A. P., Fleming, P., Engelhardt, P. E., Edwards, S. G., & Borgonovi, F. (2018). Observing response processes with eye tracking in international large-scale assessment: Evidence from the OECD PIAAC assessment. European Journal of Psychology of Education , 33 , 543–558. https://doi.org/10.1007/s10212-018-0380-2 .

Magidson, J., & Vermunt, J. K. (2002). A nontechnical introduction to latent class models https://www.statisticalinnovations.com/wp-content/uploads/Magidson2002.pdf .

Meyer, J. P., & Morin, A. J. S. (2016). A person-centered approach to commitment research: Theory, research, and methodology. Journal of Organizational Behavior , 37 (4), 584–612. https://doi.org/10.1002/job.2085 .

Morin, A. J. S., & Marsh, H. W. (2015). Disentangling shape from level effects in person-centered analyses: An illustration based on university teachers’ multidimensional profiles of effectiveness. Structural Equation Modeling: A Multidisciplinary Journal , 22 (1), 39–59. https://doi.org/10.1080/10705511.2014.919825 .

Morin, A. J. S., Morizot, J., Boudrias, J. S., & Madore, I. (2011). A multifoci person-centered perspective on workplace affective commitment: A latent profile/factor mixture analysis. Organizational Research Methods , 14 (1), 58–90. https://doi.org/10.1177/1094428109356476 .

Morin, A., Mayer, J. S., Creusier, J. P., J., & Biétry, F. (2016). Multiple-group analysis of similarity in latent profile solutions. Organizational Research Methods , 19 (2), 231–254. https://doi.org/10.1177/1094428115621148 .

Mulder, I. (1999). Understanding technology-medicated interaction processes – A theoretical context Telematica Instituut. https://www.researchgate.net/profile/Ingrid-Mulder/publication/264971258_Understanding_technology_mediated_interaction_processes_a_theoretical_context/links/53f79a730cf2c9c3309c3c46/Understanding-technology-mediated-interaction-processes-a-theoretical-context.pdf .

Muthén, L. K., & Muthén, B. O. (1998–2017). Mplus user’s guide (8th ed.) Muthén & Muthén.

National Council on Measurement in Education [AERA, APA, & NCME]. (2014). Standards for educational and psychological testing . American Educational Research Association.

Newton, O., Wiltshire, T. J., & Fiore, S. M. (2018). Macrocognition in teams and metacognition: Developing instructional strategies for complex collaborative problem solving. In J. Johnston, R. Sottilare, A. M. Sinatra, & C. S. Burke (Eds.), Building intelligent tutoring systems for teams: What matters (Vol. 19, pp. 33–54). Emerald Publishing. https://doi.org/10.1108/S1534-085620180000019006 .

Nylund-Gibson, K., & Choi, A. Y. (2018). Ten frequently asked questions about latent class analysis. Translational Issues in Psychological Science , 4 (4), 440–461. https://doi.org/10.1037/tps0000176 .

Nylund-Gibson, K., Grimm, R. P., & Masyn, K. E. (2019). Prediction from latent classes: A demonstration of different approaches to include distal outcomes in mixture models. Structural Equation Modeling: A Multidisciplinary Journal , 26 (6), 967–985. https://doi.org/10.1080/10705511.2019.1590146 .

Organization for Economic Co-operation and Development (2019). PISA 2018 technical Report . https://www.oecd.org/pisa/data/pisa2018technicalreport/PISA2018-TechReport-Annex-K.pdf .

Organization for Economic Co-operation and Development (2017c). PISA 2015 technical Report https://www.oecd.org/pisa/data/2015-technical-report/PISA2015_TechRep_Final.pdf .

Organization for Economic Co-operation and Development (2017b). PISA 2015 results: Collaborative problem solving (Volume V) . https://doi.org/10.1787/9789264285521-en .

Organization for Economic Co-operation and Development (2017a). PISA 2015 collaborative problem-solving framework https://www.oecd.org/pisa/pisaproducts/Draft%20PISA%202015%20Collaborative%20Problem%20Solving%20Framework%20.pdf .

Organization for Economic Co-operation and Development (2016). Description of the released unit from the 2015 PISA collaborative problem-solving assessment, collaborative problem-solving skills, and proficiency levels . https://www.oecd.org/pisa/test/CPS-Xandar-scoring-guide.pdf .

Organization for Economic Co-operation and Development (2009). PISA data analysis manual: SPSS, second edition . https://doi.org/10.1787/9789264056275-en .

Organization for Economic Co-operation and Development (2023). Item characteristics and test-taker disengagement in PISA . https://one.oecd.org/document/EDU/PISA/GB(2023)5/en/pdf .

Pastor, D. A., Barron, K. E., Miller, B. J., & Davis, S. L. (2007). A latent profile analysis of college students’ achievement goal orientation. Contemporary Educational Psychology , 32 (1), 8–47. https://doi.org/10.1016/j.cedpsych.2006.10.003 .

Popov, V., Biemans, H. J. A., Fortuin, K. P. J., van Vliet, A., Erkens, J. H., Mulder, G., Jaspers, M., J., & Li, Y. (2019). Effects of an interculturally enriched collaboration script on student attitudes, behavior, and learning performance in a CSCL environment. Learning Culture and Social Interaction , 21 , 100–123. https://doi.org/10.1016/j.lcsi.2019.02.004 .

Pöysä–Tarhonen, J., Häkkinen, P., Tarhonen, P., Näykki, P., & Järvelä, S. (2022). Anything taking shape? Capturing various layers of small group collaborative problem solving in an experiential geometry course in initial teacher education. Instructional Science , 50 , 1–34. https://doi.org/10.1007/s11251-021-09562-5 .

R Core Team. (2022). R: A language and environment for statistical computing . R Foundation for Statistical Computing. https://www.R-project.org/ .

Rohatgi, A., & Scherer, R. (2020). Identifying profiles of students’ school climate perceptions using PISA 2015 data. Large-scale Assessments in Education , 8 , 1–25. https://doi.org/10.1186/s40536-020-00083-0 .

Rojas, M., Nussbaum, M., Chiuminatto, P., Guerrero, O., Greiff, S., Krieger, F., & Van Der Westhuizen, L. (2021). Assessing collaborative problem-solving skills among elementary school students. Computers & Education , 175 , 1–45. https://doi.org/10.1016/j.compedu.2021.104313 .

Rosen, Y., & Tager, M. (2013). Computer-based assessment of collaborative problem solving skills: Human-to-agent versus human-to-human approach . Pearson Education.

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys . John Wiley & Sons, Inc.

Rutkowski, L., Gonzalez, E., Joncas, M., & Von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher , 39 (2), 141–151. https://doi.org/10.3102/0013189X10363170 .

Scalise, K., Mustafic, M., & Greiff, S. (2016). Dispositions for collaborative problem solving. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: An international perspective (pp. 283–299). Springer. https://doi.org/10.1007/978-3-319-45357-6_11 .

Scherer, R. (2020). Analysing PIAAC data with structural equation modelling in Mplus. In D. Maehler & B. Rammstedt (Eds.), Large-scale cognitive assessment: Analyzing PIAAC data (pp. 165–208). Springer. https://doi.org/10.1007/978-3-030-47515-4_8 .

Scherer, R., Greiff, S., & Hautamäki, J. (2015). Exploring the relation between Time on Task and ability in Complex Problem solving. Intelligence , 48 , 37–50. https://doi.org/10.1016/j.intell.2014.10.003 .

Scherer, R., Rohatgi, A., & Hatlevik, O. E. (2017). Students’ profiles of ICT use: Identification, determinants, and relations to achievement in a computer and information literacy test. Computers in Human Behavior , 70 , 486–499. https://doi.org/10.1016/j.chb.2017.01.034 .

Shaffer, D. W., Collier, W., & Ruis, A. R. (2016). A tutorial on epistemic network analysis: Analyzing the structure of connections in cognitive, social, and interaction data. Journal of Learning Analytics , 3 (3), 9–45. https://doi.org/10.18608/jla.2016.33.3 .

Siddiq, F., & Scherer, R. (2017). Revealing the processes of students’ interaction with a novel collaborative problem solving task: An in-depth analysis of think-aloud protocols. Computers in Human Behavior , 76 , 509–525. https://doi.org/10.1016/j.chb.2017.08.007 .

Sinha, P., Calfee, C. S., & Delucchi, K. L. (2021). Practitioner’s guide to latent class analysis: Methodological considerations and common pitfalls. Critical Care Medicine , 49 (1), 63–79. https://doi.org/10.1097/CCM.0000000000004710 .

Song, J. (2021). Beyond the results: Identifying students’ problem solving processes on a problem solving task [Master’s thesis, University of Oslo]. http://hdl.handle.net/10852/86870 .

Spurk, D., Hirschi, A., Wang, M., Valero, D., & Kauffeld, S. (2020). Latent profile analysis: A review and how to guide of its application within vocational behavior research. Journal of Vocational Behavior , 120 , 1–21. https://doi.org/10.1016/j.jvb.2020.103445 .

Sun, C., Shute, V., Stewart, J., Beck-White, A. E. B., Reinhardt, Q., Zhou, C. R., Duran, G., N., & D’Mello, S. K. (2022). The relationship between collaborative problem solving behaviors and solution outcomes in a game-based learning environment. Computers in Human Behavior , 128 , 1–14. https://doi.org/10.1016/j.chb.2021.107120 .

Swiecki, A., Ruis, A. R., Farrell, C., & Shaffer, D. W. (2020). Assessing individual contributions to collaborative problem solving: A network analysis approach. Computers in Human Behavior , 104 , 1–15. https://doi.org/10.1016/j.chb.2019.01.009 .

Tang, P., Liu, H., & Wen, H. (2021). Factors predicting collaborative problem solving: Based on the data from PISA 2015. Frontiers in Psychology , 6 , 1–10. https://doi.org/10.3389/feduc.2021.619450 .

Teig, N., Scherer, R., & Kjærnsli, M. (2020). Identifying patterns of students’ performance on simulated inquiry tasks using PISA 2015 log-file data. Journal of Research in Science Teaching , 57 (9), 1400–1429. https://doi.org/10.1002/tea.21657 .

Van den Bossche, P., Gijselaers, W., Segers, M., Woltjer, G., & Kirschner, P. (2011). Team learning: Building shared mental models. Instructional Science , 39 , 283–301. https://doi.org/10.1007/s11251-010-9128-3 .

Von Davier, A. A., & Halpin, P. F. (2013). Collaborative problem solving and the assessment of cognitive skills: Psychometric considerations. ETS Research Report Series, 2013 (2), i–36. https://doi.org/10.1002/j.2333-8504.2013.tb02348.x .

Vermunt, J. K., & Magidson, J. (2002). Latent class cluster analysis. In J. A. Aagenaars, & A. L. McCutcheon (Eds.), Applied latent class analysis (pp. 89–106). Cambridge University Press. https://doi.org/10.1017/CBO9780511499531.004 .

Wang, J., & Wang, X. (2019). Structural equation modeling: Applications using Mplus . John Wiley & Sons.

Wilson, A. S. P., & Urick, A. (2022). An intersectional examination of the opportunity gap in science: A critical quantitative approach to latent class analysis. Social Science Research , 102 , 1–21. https://doi.org/10.1016/j.ssresearch.2021.102645 .

Wise, S., Pastor, D. A., & Kong, X. J. (2009). Correlates of rapid-guessing behavior in low-stakes testing: Implications for test development and measurement practice. Applied Measurement in Education , 22 (2), 185–205. https://doi.org/10.1080/08957340902754650 .

Wu, Y., Zhao, B., Wei, B., & Li, Y. (2022). Cultural or economic factors? Which matters more for collaborative problem-solving skills: Evidence from 31 countries. Personality and Individual Differences , 190 , 1–10. https://doi.org/10.1016/j.paid.2021.111497 .

Xu, K. M., Cunha-Harvey, A. R., King, R. B., De Koning, B. B., Paas, F., Baars, M., et al. (2021). A cross-cultural investigation on perseverance, self-regulated learning, motivation, and achievement. Compare: A Journal of Comparative and International Education , 53 (3), 361–379. https://doi.org/10.1080/03057925.2021.1922270 .

Yamashita, T., Smith, T. J., & Cummins, P. A. (2020). A practical guide for analyzing large-scale assessment data using Mplus: A case demonstration using the program for international assessment of adult competencies data. Journal of Educational and Behavioral Statistics , 46 (4), 501–518. https://doi.org/10.3102/1076998620978554 .

Yi, H. S., & Lee, Y. (2017). A latent profile analysis and structural equation modeling of the instructional quality of mathematics classrooms based on the PISA 2012 results of Korea and Singapore. Asia Pacific Education Review , 18 , 23–39. https://doi.org/10.1007/s12564-016-9455-4 .

Zhu, M., Shu, Z., & von Davier, A. A. (2016). Using networks to visualize and analyze process data for educational assessment. Journal of Educational Measurement , 53 (2), 190–211. https://doi.org/10.1111/jedm.12107 .

Zumbo, B., Maddox, B., & Care, N. M. (2023). Process and product in computer-based assessments: Clearing the ground for a holistic validity framework. European Journal of Psychological Assessment, 39, 252–262. https://doi.org/10.1027/1015-5759/a000748 .

Download references

Acknowledgements

The authors are grateful to Dr. Emma Burns and Jayeong Song for sharing their valuable codes and comments during this research.

The current study was not funded.

Author information

Authors and affiliations.

Department of Behavioral and Cognitive Sciences, University of Luxembourg, Esch-sur-Alzette, Luxembourg

Areum Han & Samuel Greiff

Department of Rehabilitation Sciences, Technical University of Dortmund, Dortmund, Germany

Florian Krieger

Institute of Education, University College London, London, United Kingdom

Francesca Borgonovi

You can also search for this author in PubMed Google Scholar

Contributions

Areum Han: Conceptualization, Methodology, Analysis, Writing – Original Draft, Writing – Review and Editing. Florian Krieger: Conceptualization, Writing – Review and Editing, Supervision. Francesca Borgonovi: Conceptualization, Methodology, Writing – Review and Editing. Samuel Greiff: Conceptualization, Writing – Review and Editing, Supervision. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Areum Han .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Han, A., Krieger, F., Borgonovi, F. et al. Behavioral patterns in collaborative problem solving: a latent profile analysis based on response times and actions in PISA 2015. Large-scale Assess Educ 11 , 35 (2023). https://doi.org/10.1186/s40536-023-00185-5

Download citation

Received : 06 June 2023

Accepted : 30 October 2023

Published : 13 November 2023

DOI : https://doi.org/10.1186/s40536-023-00185-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Latent profile analysis
Collaborative problem solving
Process data
Human-to-agent assessment

Challenges of Assessing Collaborative Problem Solving

First Online: 09 November 2017

Cite this chapter

Arthur C. Graesser 6 ,
Peter W. Foltz 7 ,
Yigal Rosen 8 ,
David Williamson Shaffer 9 ,
Carol Forsyth 6 &
Mae-Lynn Germany 6

Part of the book series: Educational Assessment in an Information Age ((EAIA))

5445 Accesses

24 Citations

An assessment of Collaborative Problem Solving (CPS) proficiency was developed by an expert group for the PISA 2015 international evaluation of student skills and knowledge. The assessment framework defined CPS skills by crossing three major CPS competencies with four problem solving processes that were adopted from PISA 2012 Complex Problem Solving to form a matrix of 12 specific skills . The three CPS competencies are (1) establishing and maintaining shared understanding, (2) taking appropriate action, and (3) establishing and maintaining team organization. For the assessment, computer-based agents provide the means to assess students by varying group composition and discourse across multiple collaborative situations within a short period of time. Student proficiency is then measured by the extent to which students respond to requests and initiate actions or communications to advance the group goals. This chapter identifies considerations and challenges in the design of a collaborative problem solving assessment for large-scale testing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Available as PDF
Read on any device
Instant download
Own it forever
Available as EPUB and PDF
Compact, lightweight edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info
Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Aronson, E., & Patnoe, S. (1997). The jigsaw classroom: Building cooperation in the classroom . New York: Addison Wesley Longman.

Google Scholar

Asterhan, C. S., & Schwarz, B. B. (2009). Argumentation and explanation in conceptual change: Indications from protocol analyses of peer-to-peer dialog. Cognitive Science, 33 , 374–400.

Article Google Scholar

Austin, J. R. (2003). Transactive memory in organizational groups: The effects of content, consensus, specialization, and accuracy on group performance. Journal of Applied Psychology, 88 , 866–878.

Barth, C. M., & Funke, J. (2010). Negative affective environments improve complex solving performance. Cognition and Emotion, 24 , 1259–1268.

Biswas, G., Jeong, H., Kinnebrew, J., Sulcer, B., & Roscoe, R. (2010). Measuring self-regulated learning skills through social interactions in a teachable agent environment. Research and Practice in Technology Enhanced Learning, 5 , 123–152.

Brannick, M. T., & Prince, C. (1997). An overview of team performance measurement. In M. T. Brannick, E. Salas, & C. Prince (Eds.), Team performance assessment and measurement: Theory methods and applications (pp. 3–16). Mahwah: Lawrence Erlbaum Associates.

Cannon-Bowers, J. A., & Salas, E. (2001). Reflections on shared cognition. Journal of Organizational Behavior, 22 , 195–202.

Clark, H. H. (1996). Using language . Cambridge: Cambridge University Press.

Book Google Scholar

Clark, H. H., & Brennan, S. E. (1991). Grounding in communication. In L. B. Resnick, J. M. Levine, & S. D. Teasley (Eds.), Perspectives on socially shared cognition (pp. 127–149). Washington, DC: American Psychological Association.

Chapter Google Scholar

Clark, H. H., & Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22 , 221–239.

Dillenbourg, P. (1999). Collaborative learning: Cognitive and computational approaches. Advances in learning and instruction series . New York: Elsevier Science.

Dillenbourg, P., & Traum, D. (2006). Sharing solutions: Persistence and grounding in multi-modal collaborative problem solving. The Journal of the Learning Sciences, 15 , 121–151.

Fan, X., McNeese, M., & Yen, J. (2010). NDM-based cognitive agents for supporting decision-making teams. Human Computer Interaction, 25 , 195–234.

Fiore, S., & Schooler, J. W. (2004). Process mapping and shared cognition: Teamwork and the development of shared problem models. In E. Salas & S. M. Fiore (Eds.), Team cognition: Understanding the factors that drive process and performance (pp. 133–152). Washington, DC: American Psychological Association.

Fiore, S. M., Rosen, M., Salas, E., Burke, S., & Jentsch, F. (2008). Processes in complex team problem solving: Parsing and defining the theoretical problem space. In M. Letsky, N. Warner, S. M. Fiore, & C. Smith (Eds.), Macrocognition in teams: Theories and methodologies (pp. 143–163). London: Ashgate Publishers.

Fiore, S. M., Rosen, M. A., Smith-Jentsch, K. A., Salas, E., Letsky, M., & Warner, N. (2010). Toward an understanding of macrocognition in teams: Predicting processes in complex collaborative contexts. Human Factors, 52 , 203–224.

Fiore, S. M., Wiltshire, T. J., Oglesby, J. M., O’Keefe, W. S., & Salas, E. (2014). Complex collaborative problem solving in mission control. Aviation, Space, and Environmental Medicine, 85 , 456–461.

Foltz, P. W., & Martin, M. J. (2008). Automated communication analysis of teams. In E. Salas, G. F. Goodwin, & S. Burke (Eds.), Team effectiveness in complex organisations and systems: Cross-disciplinary perspectives and approaches (pp. 411–431). New York: Routledge.

Franklin, S., & Graesser, A. (1997). Is it an agent, or just a program? A taxonomy for autonomous agents. In Proceedings of the agent theories, architectures, and languages workshop (pp. 21–35). Berlin: Springer-Verlag.

Funke, J. (2010). Complex problem solving: A case for complex cognition? Cognitive Processing, 11 , 133–142.

Graesser, A. C., Gernsbacher, M. A., & Goldman, S. (Eds.). (2003). Handbook of discourse processes . Mahwah: Erlbaum.

Graesser, A. C., McNamara, D. S., & Rus, V. (2011). Computational modeling of discourse and conversation. In M. Spivey, M. Joanisse, & K. McRae (Eds.), The Cambridge handbook of psycholinguistics (pp. 558–572). Cambridge: Cambridge University Press.

Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23 , 374–380.

Greiff, S., Würstenburg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13 , 74–83.

Griffin, P., Care, E., & McGaw, B. (2012). The changing role of education and schools. In P. Griffin, B. McGaw, & E. Care (Eds.), Assessment and teaching 21st century skills (pp. 1–15). Heidelberg: Springer.

Halpern, D. F., Millis, K., Graesser, A. C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7 , 93–100.

Hsu, J. L., & Chou, H. W. (2009). The effects of communicative genres on intra-group conflict in virtual student teams. International Journal of Distance Education Technologies, 7 , 1–22.

Jackson, G. T., & McNamara, D. S. (2013). Motivation and performance in a game-based intelligent tutoring system. Journal of Educational Psychology, 105 , 1036–1049.

Johnson, L. W., & Valente, A. (2008). Tactical language and culture training systems: Using artificial intelligence to teach foreign languages and cultures. In M. Goker & K. Haigh (Eds.), Proceedings of the twentieth conference on innovative applications of artificial intelligence (pp. 1632–1639). Menlo Park: AAAI Press.

Klein, G. (2008). Naturalistic decision making. Human Factors: The Journal of the Human Factors and Ergonomics Society, 50 , 456–460.

Klein, C., DeRouin, R. E., & Salas, E. (2006). Uncovering workplace interpersonal skills: A review, framework, and research agenda. In G. P. Hodgkinson & J. K. Ford (Eds.), International review of industrial and organisational psychology (pp. 80–126). New York: Wiley.

Kreijns, K., Kirschner, P. A., & Jochems, W. (2003). Identifying the pitfalls for social interaction in computer-supported collaborative learning environments: A review of the research. Computers in Human Behavior, 19 , 335–353.

Lehman, B., D’Mello, S. K., Strain, A., Mills, C., Gross, M., Dobbins, A., Wallace, P., Millis, K., & Graesser, A. C. (2013). Inducing and tracking confusion with contradictions during complex learning. International Journal of Artificial Intelligence in Education, 22 , 85–105.

Lewis, K., Lange, D., & Gillis, L. (2005). Transactive memory systems, learning, and learning transfer. Organization Science, 16 , 581–598.

Lipshitz, R., Klein, G., Orasanu, J., & Salas, E. (2001). Taking stock of naturalistic decision making. Journal of Behavioral Decision Making, 14 , 331–352.

Littlepage, G. E., Hollingshead, A. B., Drake, L. R., & Littlepage, A. M. (2008). Transactive memory and performance in work groups: Specificity, communication, ability differences, and work allocation. Group Dynamics: Theory, Research, and Practice, 12 , 223–241.

Mayer, R. E. (2009). Multimedia learning (2nd ed.). New York: Cambridge University Press.

Millis, K., Forsyth, C., Butler, H., Wallace, P., Graesser, A., & Halpern, D. (2011). Operation ARIES!: A serious game for teaching scientific inquiry. In M. Ma, A. Oikonomou, & J. Lakhmi (Eds.), Serious games and edutainment applications (pp. 169–195). London: Springer.

Mullins, D., Rummel, N., & Spada, H. (2011). Are two heads always better than one? Differential effects of collaboration on students’ computer-supported learning in mathematics. International Journal of Computer-Supported Collaborative Learning, 6 , 421–443.

Nash, J. M., Collins, B. N., Loughlin, S. E., Solbrig, M., Harvey, R., Krishnan-Sarin, S., Unger, J., Miner, C., Rukstalis, M., Shenassa, E., Dube, C., & Spirito, A. (2003). Training the transdisciplinary scientist: A general framework applied to tobacco use behavior. Nicotine & Tobacco Research , 5 , 41–53.

National Assessment of Educational Progress. (2014). Abridged technology and engineering literacy framework for the national assessment of educational progress . Washington, DC: National Academies Press.

National Research Council. (2011). Assessing 21st century skills . Washington, DC: National Academies Press.

O’Neil, H. F., Chuang, S. H., & Baker, E. L. (2010). Computer-based feedback for computer-based collaborative problem-solving. In D. Ifenthaler, P. Pirnay-Dummer, & N. M. Seel (Eds.), Computer-based diagnostics and systematic analysis of knowledge (pp. 261–279). New York: Springer-Verlag.

OECD. (2009). Problem solving in technology-rich environments: A conceptual framework . OECD Education Working Papers, 36, OECD Publishing. Retrieved from http://search.oecd.org/officialdocuments/displaydocumentpdf/?doclanguage=en&cote=edu/wkp(2009)15

OECD. (2010). PISA 2012 field trial problem solving framework . Paris: OECD. Retrieved from http://www.oecd.org/dataoecd/8/42/46962005.pdf .

OECD (2013 ). PISA 2015 collaborative problem solving framework. Paris: OECD. Retrieved from http://www.oecd.org/pisa/pisaproducts/Draft%20PISA%202015%20 Collaborative%20Problem%20Solving%20Framework%20.pdf.

Rice, R. E. (2008). Unusual routines: Organizational (non)sensemaking. Journal of Communication, 58 , 1–19.

Rosé, C., Wang, Y. C., Cui, Y., Arguello, J., Stegmann, K., Weinberger, A., & Fischer, F. (2008). Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning. International Journal of Computer-Supported Collaborative Learning, 3 , 237–271.

Rosen, Y. (2014). Comparability of conflict opportunities in human-to-human and human-to-agent online collaborative problem solving. Technology, Knowledge and Learning, 19 , 147–174.

Rosen, Y., & Foltz, P. (2014). Assessing collaborative problem solving through automated technologies. Research and Practice in Technology Enhanced Learning, 9 , 389–410.

Rosen, Y., & Rimor, R. (2009). Using a collaborative database to enhance students’ knowledge construction. Interdisciplinary Journal of E-Learning and Learning Objects, 5 , 187–195.

Rouet, J.-F. (2006). The skills of document use . Mahwah: Erlbaum.

Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50 , 696–735.

Salas, E., Cooke, N. J., & Rosen, M. A. (2008). On teams, teamwork, and team performance: Discoveries and developments. Human Factors: The Journal of the Human Factors and Ergonomics Society, 50 , 540–547.

Schwartz, D. L. (1995). The emergence of abstract representations in dyad problem solving. The Journal of the Learning Sciences, 4 , 321–354.

Shaffer, D. W. (2012). Models of situated action: Computer games and the problem of transfer. In C. Steinkuehler, K. Squire, & S. Barab (Eds.), Games, learning, and society: Learning and meaning in the digital age (pp. 403–433). Cambridge: Cambridge University Press.

Shaffer, D. W., & Gee, J. P. (2012). The right kind of GATE: Computer games and the future of assessment. In M. Mayrath, D. Robinson, & J. Clarke-Midura (Eds.), Technology-based assessments for 21st century skills: Theoretical and practical implications from modern research . Charlotte: Information Age Publishing.

Sonnentag, S., & Lange, I. (2002). The relationship between high performance and knowledge about how to master cooperation situations. Applied Cognitive Psychology, 16 , 491–508.

Stasser, G., & Titus, W. (2003). Hidden profiles: A brief history. Psychological Inquiry, 14 , 304–313.

Stewart, C. O., Setlock, L. D., & Fussell, S. R. (2007). Conversational argumentation in decision making: Chinese and US participants in face-to-face and instant-messaging interactions. Discourse Processes, 44 , 113–139.

Theiner, G., & O’Connor, T. (2010). The emergence of group cognition. In A. Corradini & T. O’Connor (Eds.), Emergence in science and philosophy (pp. 79–117). New York: Routledge.

Van der Sluis, I., & Krahmer, E. (2007). Generating multimodal references. Discourse Processes, 44 , 145–174.

Wildman, J. L., Shuffler, M. L., Lazzara, E. H., Fiore, S. M., Burke, C. S., Salas, E., & Garven, S. (2012). Trust development in swift starting action teams: A multilevel framework. Group & Organization Management, 37 , 137–170.

Download references

Author information

Authors and affiliations.

University of Memphis, Memphis, TN, USA

Arthur C. Graesser, Carol Forsyth & Mae-Lynn Germany

University of Colorado, Boulder, and Pearson, Boulder, CO, USA

Peter W. Foltz

Harvard University, Cambridge, MA, USA

Yigal Rosen

University of Wisconsin, Madison, WI, USA

David Williamson Shaffer

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arthur C. Graesser .

Editor information

Editors and affiliations.

Brookings Institution, Washington DC, Washington, USA

Esther Care

Melbourne Graduate School of Education, University of Melbourne, Parkville, Victoria, Australia

Patrick Griffin

University of California, Berkeley, California, USA

Mark Wilson

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

C. Graesser, A., Foltz, P.W., Rosen, Y., Shaffer, D.W., Forsyth, C., Germany, ML. (2018). Challenges of Assessing Collaborative Problem Solving. In: Care, E., Griffin, P., Wilson, M. (eds) Assessment and Teaching of 21st Century Skills. Educational Assessment in an Information Age. Springer, Cham. https://doi.org/10.1007/978-3-319-65368-6_5

Download citation

DOI : https://doi.org/10.1007/978-3-319-65368-6_5

Published : 09 November 2017

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-65366-2

Online ISBN : 978-3-319-65368-6

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
My Account Login
Explore content
About the journal
Publish with us
Sign up for alerts
Review Article
Open access
Published: 11 January 2023

The effectiveness of collaborative problem solving in promoting students’ critical thinking: A meta-analysis based on empirical literature

Enwei Xu ORCID: orcid.org/0000-0001-6424-8169 1 ,
Wei Wang 1 &
Qingxia Wang 1

Humanities and Social Sciences Communications volume 10 , Article number: 16 ( 2023 ) Cite this article

15k Accesses

15 Citations

3 Altmetric

Metrics details

Science, technology and society

Collaborative problem-solving has been widely embraced in the classroom instruction of critical thinking, which is regarded as the core of curriculum reform based on key competencies in the field of education as well as a key competence for learners in the 21st century. However, the effectiveness of collaborative problem-solving in promoting students’ critical thinking remains uncertain. This current research presents the major findings of a meta-analysis of 36 pieces of the literature revealed in worldwide educational periodicals during the 21st century to identify the effectiveness of collaborative problem-solving in promoting students’ critical thinking and to determine, based on evidence, whether and to what extent collaborative problem solving can result in a rise or decrease in critical thinking. The findings show that (1) collaborative problem solving is an effective teaching approach to foster students’ critical thinking, with a significant overall effect size (ES = 0.82, z = 12.78, P < 0.01, 95% CI [0.69, 0.95]); (2) in respect to the dimensions of critical thinking, collaborative problem solving can significantly and successfully enhance students’ attitudinal tendencies (ES = 1.17, z = 7.62, P < 0.01, 95% CI[0.87, 1.47]); nevertheless, it falls short in terms of improving students’ cognitive skills, having only an upper-middle impact (ES = 0.70, z = 11.55, P < 0.01, 95% CI[0.58, 0.82]); and (3) the teaching type (chi 2 = 7.20, P < 0.05), intervention duration (chi 2 = 12.18, P < 0.01), subject area (chi 2 = 13.36, P < 0.05), group size (chi 2 = 8.77, P < 0.05), and learning scaffold (chi 2 = 9.03, P < 0.01) all have an impact on critical thinking, and they can be viewed as important moderating factors that affect how critical thinking develops. On the basis of these results, recommendations are made for further study and instruction to better support students’ critical thinking in the context of collaborative problem-solving.

Impact of artificial intelligence on human loss in decision making, laziness and safety in education

Sleep quality, duration, and consistency are associated with better academic performance in college students

An overview of clinical decision support systems: benefits, risks, and strategies for success

Introduction.

Although critical thinking has a long history in research, the concept of critical thinking, which is regarded as an essential competence for learners in the 21st century, has recently attracted more attention from researchers and teaching practitioners (National Research Council, 2012 ). Critical thinking should be the core of curriculum reform based on key competencies in the field of education (Peng and Deng, 2017 ) because students with critical thinking can not only understand the meaning of knowledge but also effectively solve practical problems in real life even after knowledge is forgotten (Kek and Huijser, 2011 ). The definition of critical thinking is not universal (Ennis, 1989 ; Castle, 2009 ; Niu et al., 2013 ). In general, the definition of critical thinking is a self-aware and self-regulated thought process (Facione, 1990 ; Niu et al., 2013 ). It refers to the cognitive skills needed to interpret, analyze, synthesize, reason, and evaluate information as well as the attitudinal tendency to apply these abilities (Halpern, 2001 ). The view that critical thinking can be taught and learned through curriculum teaching has been widely supported by many researchers (e.g., Kuncel, 2011 ; Leng and Lu, 2020 ), leading to educators’ efforts to foster it among students. In the field of teaching practice, there are three types of courses for teaching critical thinking (Ennis, 1989 ). The first is an independent curriculum in which critical thinking is taught and cultivated without involving the knowledge of specific disciplines; the second is an integrated curriculum in which critical thinking is integrated into the teaching of other disciplines as a clear teaching goal; and the third is a mixed curriculum in which critical thinking is taught in parallel to the teaching of other disciplines for mixed teaching training. Furthermore, numerous measuring tools have been developed by researchers and educators to measure critical thinking in the context of teaching practice. These include standardized measurement tools, such as WGCTA, CCTST, CCTT, and CCTDI, which have been verified by repeated experiments and are considered effective and reliable by international scholars (Facione and Facione, 1992 ). In short, descriptions of critical thinking, including its two dimensions of attitudinal tendency and cognitive skills, different types of teaching courses, and standardized measurement tools provide a complex normative framework for understanding, teaching, and evaluating critical thinking.

Cultivating critical thinking in curriculum teaching can start with a problem, and one of the most popular critical thinking instructional approaches is problem-based learning (Liu et al., 2020 ). Duch et al. ( 2001 ) noted that problem-based learning in group collaboration is progressive active learning, which can improve students’ critical thinking and problem-solving skills. Collaborative problem-solving is the organic integration of collaborative learning and problem-based learning, which takes learners as the center of the learning process and uses problems with poor structure in real-world situations as the starting point for the learning process (Liang et al., 2017 ). Students learn the knowledge needed to solve problems in a collaborative group, reach a consensus on problems in the field, and form solutions through social cooperation methods, such as dialogue, interpretation, questioning, debate, negotiation, and reflection, thus promoting the development of learners’ domain knowledge and critical thinking (Cindy, 2004 ; Liang et al., 2017 ).

Collaborative problem-solving has been widely used in the teaching practice of critical thinking, and several studies have attempted to conduct a systematic review and meta-analysis of the empirical literature on critical thinking from various perspectives. However, little attention has been paid to the impact of collaborative problem-solving on critical thinking. Therefore, the best approach for developing and enhancing critical thinking throughout collaborative problem-solving is to examine how to implement critical thinking instruction; however, this issue is still unexplored, which means that many teachers are incapable of better instructing critical thinking (Leng and Lu, 2020 ; Niu et al., 2013 ). For example, Huber ( 2016 ) provided the meta-analysis findings of 71 publications on gaining critical thinking over various time frames in college with the aim of determining whether critical thinking was truly teachable. These authors found that learners significantly improve their critical thinking while in college and that critical thinking differs with factors such as teaching strategies, intervention duration, subject area, and teaching type. The usefulness of collaborative problem-solving in fostering students’ critical thinking, however, was not determined by this study, nor did it reveal whether there existed significant variations among the different elements. A meta-analysis of 31 pieces of educational literature was conducted by Liu et al. ( 2020 ) to assess the impact of problem-solving on college students’ critical thinking. These authors found that problem-solving could promote the development of critical thinking among college students and proposed establishing a reasonable group structure for problem-solving in a follow-up study to improve students’ critical thinking. Additionally, previous empirical studies have reached inconclusive and even contradictory conclusions about whether and to what extent collaborative problem-solving increases or decreases critical thinking levels. As an illustration, Yang et al. ( 2008 ) carried out an experiment on the integrated curriculum teaching of college students based on a web bulletin board with the goal of fostering participants’ critical thinking in the context of collaborative problem-solving. These authors’ research revealed that through sharing, debating, examining, and reflecting on various experiences and ideas, collaborative problem-solving can considerably enhance students’ critical thinking in real-life problem situations. In contrast, collaborative problem-solving had a positive impact on learners’ interaction and could improve learning interest and motivation but could not significantly improve students’ critical thinking when compared to traditional classroom teaching, according to research by Naber and Wyatt ( 2014 ) and Sendag and Odabasi ( 2009 ) on undergraduate and high school students, respectively.

The above studies show that there is inconsistency regarding the effectiveness of collaborative problem-solving in promoting students’ critical thinking. Therefore, it is essential to conduct a thorough and trustworthy review to detect and decide whether and to what degree collaborative problem-solving can result in a rise or decrease in critical thinking. Meta-analysis is a quantitative analysis approach that is utilized to examine quantitative data from various separate studies that are all focused on the same research topic. This approach characterizes the effectiveness of its impact by averaging the effect sizes of numerous qualitative studies in an effort to reduce the uncertainty brought on by independent research and produce more conclusive findings (Lipsey and Wilson, 2001 ).

This paper used a meta-analytic approach and carried out a meta-analysis to examine the effectiveness of collaborative problem-solving in promoting students’ critical thinking in order to make a contribution to both research and practice. The following research questions were addressed by this meta-analysis:

What is the overall effect size of collaborative problem-solving in promoting students’ critical thinking and its impact on the two dimensions of critical thinking (i.e., attitudinal tendency and cognitive skills)?

How are the disparities between the study conclusions impacted by various moderating variables if the impacts of various experimental designs in the included studies are heterogeneous?

This research followed the strict procedures (e.g., database searching, identification, screening, eligibility, merging, duplicate removal, and analysis of included studies) of Cooper’s ( 2010 ) proposed meta-analysis approach for examining quantitative data from various separate studies that are all focused on the same research topic. The relevant empirical research that appeared in worldwide educational periodicals within the 21st century was subjected to this meta-analysis using Rev-Man 5.4. The consistency of the data extracted separately by two researchers was tested using Cohen’s kappa coefficient, and a publication bias test and a heterogeneity test were run on the sample data to ascertain the quality of this meta-analysis.

Data sources and search strategies

There were three stages to the data collection process for this meta-analysis, as shown in Fig. 1 , which shows the number of articles included and eliminated during the selection process based on the statement and study eligibility criteria.

This flowchart shows the number of records identified, included and excluded in the article.

First, the databases used to systematically search for relevant articles were the journal papers of the Web of Science Core Collection and the Chinese Core source journal, as well as the Chinese Social Science Citation Index (CSSCI) source journal papers included in CNKI. These databases were selected because they are credible platforms that are sources of scholarly and peer-reviewed information with advanced search tools and contain literature relevant to the subject of our topic from reliable researchers and experts. The search string with the Boolean operator used in the Web of Science was “TS = (((“critical thinking” or “ct” and “pretest” or “posttest”) or (“critical thinking” or “ct” and “control group” or “quasi experiment” or “experiment”)) and (“collaboration” or “collaborative learning” or “CSCL”) and (“problem solving” or “problem-based learning” or “PBL”))”. The research area was “Education Educational Research”, and the search period was “January 1, 2000, to December 30, 2021”. A total of 412 papers were obtained. The search string with the Boolean operator used in the CNKI was “SU = (‘critical thinking’*‘collaboration’ + ‘critical thinking’*‘collaborative learning’ + ‘critical thinking’*‘CSCL’ + ‘critical thinking’*‘problem solving’ + ‘critical thinking’*‘problem-based learning’ + ‘critical thinking’*‘PBL’ + ‘critical thinking’*‘problem oriented’) AND FT = (‘experiment’ + ‘quasi experiment’ + ‘pretest’ + ‘posttest’ + ‘empirical study’)” (translated into Chinese when searching). A total of 56 studies were found throughout the search period of “January 2000 to December 2021”. From the databases, all duplicates and retractions were eliminated before exporting the references into Endnote, a program for managing bibliographic references. In all, 466 studies were found.

Second, the studies that matched the inclusion and exclusion criteria for the meta-analysis were chosen by two researchers after they had reviewed the abstracts and titles of the gathered articles, yielding a total of 126 studies.

Third, two researchers thoroughly reviewed each included article’s whole text in accordance with the inclusion and exclusion criteria. Meanwhile, a snowball search was performed using the references and citations of the included articles to ensure complete coverage of the articles. Ultimately, 36 articles were kept.

Two researchers worked together to carry out this entire process, and a consensus rate of almost 94.7% was reached after discussion and negotiation to clarify any emerging differences.

Eligibility criteria

Since not all the retrieved studies matched the criteria for this meta-analysis, eligibility criteria for both inclusion and exclusion were developed as follows:

The publication language of the included studies was limited to English and Chinese, and the full text could be obtained. Articles that did not meet the publication language and articles not published between 2000 and 2021 were excluded.

The research design of the included studies must be empirical and quantitative studies that can assess the effect of collaborative problem-solving on the development of critical thinking. Articles that could not identify the causal mechanisms by which collaborative problem-solving affects critical thinking, such as review articles and theoretical articles, were excluded.

The research method of the included studies must feature a randomized control experiment or a quasi-experiment, or a natural experiment, which have a higher degree of internal validity with strong experimental designs and can all plausibly provide evidence that critical thinking and collaborative problem-solving are causally related. Articles with non-experimental research methods, such as purely correlational or observational studies, were excluded.

The participants of the included studies were only students in school, including K-12 students and college students. Articles in which the participants were non-school students, such as social workers or adult learners, were excluded.

The research results of the included studies must mention definite signs that may be utilized to gauge critical thinking’s impact (e.g., sample size, mean value, or standard deviation). Articles that lacked specific measurement indicators for critical thinking and could not calculate the effect size were excluded.

Data coding design

In order to perform a meta-analysis, it is necessary to collect the most important information from the articles, codify that information’s properties, and convert descriptive data into quantitative data. Therefore, this study designed a data coding template (see Table 1 ). Ultimately, 16 coding fields were retained.

The designed data-coding template consisted of three pieces of information. Basic information about the papers was included in the descriptive information: the publishing year, author, serial number, and title of the paper.

The variable information for the experimental design had three variables: the independent variable (instruction method), the dependent variable (critical thinking), and the moderating variable (learning stage, teaching type, intervention duration, learning scaffold, group size, measuring tool, and subject area). Depending on the topic of this study, the intervention strategy, as the independent variable, was coded into collaborative and non-collaborative problem-solving. The dependent variable, critical thinking, was coded as a cognitive skill and an attitudinal tendency. And seven moderating variables were created by grouping and combining the experimental design variables discovered within the 36 studies (see Table 1 ), where learning stages were encoded as higher education, high school, middle school, and primary school or lower; teaching types were encoded as mixed courses, integrated courses, and independent courses; intervention durations were encoded as 0–1 weeks, 1–4 weeks, 4–12 weeks, and more than 12 weeks; group sizes were encoded as 2–3 persons, 4–6 persons, 7–10 persons, and more than 10 persons; learning scaffolds were encoded as teacher-supported learning scaffold, technique-supported learning scaffold, and resource-supported learning scaffold; measuring tools were encoded as standardized measurement tools (e.g., WGCTA, CCTT, CCTST, and CCTDI) and self-adapting measurement tools (e.g., modified or made by researchers); and subject areas were encoded according to the specific subjects used in the 36 included studies.

The data information contained three metrics for measuring critical thinking: sample size, average value, and standard deviation. It is vital to remember that studies with various experimental designs frequently adopt various formulas to determine the effect size. And this paper used Morris’ proposed standardized mean difference (SMD) calculation formula ( 2008 , p. 369; see Supplementary Table S3 ).

Procedure for extracting and coding data

According to the data coding template (see Table 1 ), the 36 papers’ information was retrieved by two researchers, who then entered them into Excel (see Supplementary Table S1 ). The results of each study were extracted separately in the data extraction procedure if an article contained numerous studies on critical thinking, or if a study assessed different critical thinking dimensions. For instance, Tiwari et al. ( 2010 ) used four time points, which were viewed as numerous different studies, to examine the outcomes of critical thinking, and Chen ( 2013 ) included the two outcome variables of attitudinal tendency and cognitive skills, which were regarded as two studies. After discussion and negotiation during data extraction, the two researchers’ consistency test coefficients were roughly 93.27%. Supplementary Table S2 details the key characteristics of the 36 included articles with 79 effect quantities, including descriptive information (e.g., the publishing year, author, serial number, and title of the paper), variable information (e.g., independent variables, dependent variables, and moderating variables), and data information (e.g., mean values, standard deviations, and sample size). Following that, testing for publication bias and heterogeneity was done on the sample data using the Rev-Man 5.4 software, and then the test results were used to conduct a meta-analysis.

Publication bias test

When the sample of studies included in a meta-analysis does not accurately reflect the general status of research on the relevant subject, publication bias is said to be exhibited in this research. The reliability and accuracy of the meta-analysis may be impacted by publication bias. Due to this, the meta-analysis needs to check the sample data for publication bias (Stewart et al., 2006 ). A popular method to check for publication bias is the funnel plot; and it is unlikely that there will be publishing bias when the data are equally dispersed on either side of the average effect size and targeted within the higher region. The data are equally dispersed within the higher portion of the efficient zone, consistent with the funnel plot connected with this analysis (see Fig. 2 ), indicating that publication bias is unlikely in this situation.

This funnel plot shows the result of publication bias of 79 effect quantities across 36 studies.

Heterogeneity test

To select the appropriate effect models for the meta-analysis, one might use the results of a heterogeneity test on the data effect sizes. In a meta-analysis, it is common practice to gauge the degree of data heterogeneity using the I 2 value, and I 2 ≥ 50% is typically understood to denote medium-high heterogeneity, which calls for the adoption of a random effect model; if not, a fixed effect model ought to be applied (Lipsey and Wilson, 2001 ). The findings of the heterogeneity test in this paper (see Table 2 ) revealed that I 2 was 86% and displayed significant heterogeneity ( P < 0.01). To ensure accuracy and reliability, the overall effect size ought to be calculated utilizing the random effect model.

The analysis of the overall effect size

This meta-analysis utilized a random effect model to examine 79 effect quantities from 36 studies after eliminating heterogeneity. In accordance with Cohen’s criterion (Cohen, 1992 ), it is abundantly clear from the analysis results, which are shown in the forest plot of the overall effect (see Fig. 3 ), that the cumulative impact size of cooperative problem-solving is 0.82, which is statistically significant ( z = 12.78, P < 0.01, 95% CI [0.69, 0.95]), and can encourage learners to practice critical thinking.

This forest plot shows the analysis result of the overall effect size across 36 studies.

In addition, this study examined two distinct dimensions of critical thinking to better understand the precise contributions that collaborative problem-solving makes to the growth of critical thinking. The findings (see Table 3 ) indicate that collaborative problem-solving improves cognitive skills (ES = 0.70) and attitudinal tendency (ES = 1.17), with significant intergroup differences (chi 2 = 7.95, P < 0.01). Although collaborative problem-solving improves both dimensions of critical thinking, it is essential to point out that the improvements in students’ attitudinal tendency are much more pronounced and have a significant comprehensive effect (ES = 1.17, z = 7.62, P < 0.01, 95% CI [0.87, 1.47]), whereas gains in learners’ cognitive skill are slightly improved and are just above average. (ES = 0.70, z = 11.55, P < 0.01, 95% CI [0.58, 0.82]).

The analysis of moderator effect size

The whole forest plot’s 79 effect quantities underwent a two-tailed test, which revealed significant heterogeneity ( I 2 = 86%, z = 12.78, P < 0.01), indicating differences between various effect sizes that may have been influenced by moderating factors other than sampling error. Therefore, exploring possible moderating factors that might produce considerable heterogeneity was done using subgroup analysis, such as the learning stage, learning scaffold, teaching type, group size, duration of the intervention, measuring tool, and the subject area included in the 36 experimental designs, in order to further explore the key factors that influence critical thinking. The findings (see Table 4 ) indicate that various moderating factors have advantageous effects on critical thinking. In this situation, the subject area (chi 2 = 13.36, P < 0.05), group size (chi 2 = 8.77, P < 0.05), intervention duration (chi 2 = 12.18, P < 0.01), learning scaffold (chi 2 = 9.03, P < 0.01), and teaching type (chi 2 = 7.20, P < 0.05) are all significant moderators that can be applied to support the cultivation of critical thinking. However, since the learning stage and the measuring tools did not significantly differ among intergroup (chi 2 = 3.15, P = 0.21 > 0.05, and chi 2 = 0.08, P = 0.78 > 0.05), we are unable to explain why these two factors are crucial in supporting the cultivation of critical thinking in the context of collaborative problem-solving. These are the precise outcomes, as follows:

Various learning stages influenced critical thinking positively, without significant intergroup differences (chi 2 = 3.15, P = 0.21 > 0.05). High school was first on the list of effect sizes (ES = 1.36, P < 0.01), then higher education (ES = 0.78, P < 0.01), and middle school (ES = 0.73, P < 0.01). These results show that, despite the learning stage’s beneficial influence on cultivating learners’ critical thinking, we are unable to explain why it is essential for cultivating critical thinking in the context of collaborative problem-solving.

Different teaching types had varying degrees of positive impact on critical thinking, with significant intergroup differences (chi 2 = 7.20, P < 0.05). The effect size was ranked as follows: mixed courses (ES = 1.34, P < 0.01), integrated courses (ES = 0.81, P < 0.01), and independent courses (ES = 0.27, P < 0.01). These results indicate that the most effective approach to cultivate critical thinking utilizing collaborative problem solving is through the teaching type of mixed courses.

Various intervention durations significantly improved critical thinking, and there were significant intergroup differences (chi 2 = 12.18, P < 0.01). The effect sizes related to this variable showed a tendency to increase with longer intervention durations. The improvement in critical thinking reached a significant level (ES = 0.85, P < 0.01) after more than 12 weeks of training. These findings indicate that the intervention duration and critical thinking’s impact are positively correlated, with a longer intervention duration having a greater effect.

Different learning scaffolds influenced critical thinking positively, with significant intergroup differences (chi 2 = 9.03, P < 0.01). The resource-supported learning scaffold (ES = 0.69, P < 0.01) acquired a medium-to-higher level of impact, the technique-supported learning scaffold (ES = 0.63, P < 0.01) also attained a medium-to-higher level of impact, and the teacher-supported learning scaffold (ES = 0.92, P < 0.01) displayed a high level of significant impact. These results show that the learning scaffold with teacher support has the greatest impact on cultivating critical thinking.

Various group sizes influenced critical thinking positively, and the intergroup differences were statistically significant (chi 2 = 8.77, P < 0.05). Critical thinking showed a general declining trend with increasing group size. The overall effect size of 2–3 people in this situation was the biggest (ES = 0.99, P < 0.01), and when the group size was greater than 7 people, the improvement in critical thinking was at the lower-middle level (ES < 0.5, P < 0.01). These results show that the impact on critical thinking is positively connected with group size, and as group size grows, so does the overall impact.

Various measuring tools influenced critical thinking positively, with significant intergroup differences (chi 2 = 0.08, P = 0.78 > 0.05). In this situation, the self-adapting measurement tools obtained an upper-medium level of effect (ES = 0.78), whereas the complete effect size of the standardized measurement tools was the largest, achieving a significant level of effect (ES = 0.84, P < 0.01). These results show that, despite the beneficial influence of the measuring tool on cultivating critical thinking, we are unable to explain why it is crucial in fostering the growth of critical thinking by utilizing the approach of collaborative problem-solving.

Different subject areas had a greater impact on critical thinking, and the intergroup differences were statistically significant (chi 2 = 13.36, P < 0.05). Mathematics had the greatest overall impact, achieving a significant level of effect (ES = 1.68, P < 0.01), followed by science (ES = 1.25, P < 0.01) and medical science (ES = 0.87, P < 0.01), both of which also achieved a significant level of effect. Programming technology was the least effective (ES = 0.39, P < 0.01), only having a medium-low degree of effect compared to education (ES = 0.72, P < 0.01) and other fields (such as language, art, and social sciences) (ES = 0.58, P < 0.01). These results suggest that scientific fields (e.g., mathematics, science) may be the most effective subject areas for cultivating critical thinking utilizing the approach of collaborative problem-solving.

The effectiveness of collaborative problem solving with regard to teaching critical thinking

According to this meta-analysis, using collaborative problem-solving as an intervention strategy in critical thinking teaching has a considerable amount of impact on cultivating learners’ critical thinking as a whole and has a favorable promotional effect on the two dimensions of critical thinking. According to certain studies, collaborative problem solving, the most frequently used critical thinking teaching strategy in curriculum instruction can considerably enhance students’ critical thinking (e.g., Liang et al., 2017 ; Liu et al., 2020 ; Cindy, 2004 ). This meta-analysis provides convergent data support for the above research views. Thus, the findings of this meta-analysis not only effectively address the first research query regarding the overall effect of cultivating critical thinking and its impact on the two dimensions of critical thinking (i.e., attitudinal tendency and cognitive skills) utilizing the approach of collaborative problem-solving, but also enhance our confidence in cultivating critical thinking by using collaborative problem-solving intervention approach in the context of classroom teaching.

Furthermore, the associated improvements in attitudinal tendency are much stronger, but the corresponding improvements in cognitive skill are only marginally better. According to certain studies, cognitive skill differs from the attitudinal tendency in classroom instruction; the cultivation and development of the former as a key ability is a process of gradual accumulation, while the latter as an attitude is affected by the context of the teaching situation (e.g., a novel and exciting teaching approach, challenging and rewarding tasks) (Halpern, 2001 ; Wei and Hong, 2022 ). Collaborative problem-solving as a teaching approach is exciting and interesting, as well as rewarding and challenging; because it takes the learners as the focus and examines problems with poor structure in real situations, and it can inspire students to fully realize their potential for problem-solving, which will significantly improve their attitudinal tendency toward solving problems (Liu et al., 2020 ). Similar to how collaborative problem-solving influences attitudinal tendency, attitudinal tendency impacts cognitive skill when attempting to solve a problem (Liu et al., 2020 ; Zhang et al., 2022 ), and stronger attitudinal tendencies are associated with improved learning achievement and cognitive ability in students (Sison, 2008 ; Zhang et al., 2022 ). It can be seen that the two specific dimensions of critical thinking as well as critical thinking as a whole are affected by collaborative problem-solving, and this study illuminates the nuanced links between cognitive skills and attitudinal tendencies with regard to these two dimensions of critical thinking. To fully develop students’ capacity for critical thinking, future empirical research should pay closer attention to cognitive skills.

The moderating effects of collaborative problem solving with regard to teaching critical thinking

In order to further explore the key factors that influence critical thinking, exploring possible moderating effects that might produce considerable heterogeneity was done using subgroup analysis. The findings show that the moderating factors, such as the teaching type, learning stage, group size, learning scaffold, duration of the intervention, measuring tool, and the subject area included in the 36 experimental designs, could all support the cultivation of collaborative problem-solving in critical thinking. Among them, the effect size differences between the learning stage and measuring tool are not significant, which does not explain why these two factors are crucial in supporting the cultivation of critical thinking utilizing the approach of collaborative problem-solving.

In terms of the learning stage, various learning stages influenced critical thinking positively without significant intergroup differences, indicating that we are unable to explain why it is crucial in fostering the growth of critical thinking.

Although high education accounts for 70.89% of all empirical studies performed by researchers, high school may be the appropriate learning stage to foster students’ critical thinking by utilizing the approach of collaborative problem-solving since it has the largest overall effect size. This phenomenon may be related to student’s cognitive development, which needs to be further studied in follow-up research.

With regard to teaching type, mixed course teaching may be the best teaching method to cultivate students’ critical thinking. Relevant studies have shown that in the actual teaching process if students are trained in thinking methods alone, the methods they learn are isolated and divorced from subject knowledge, which is not conducive to their transfer of thinking methods; therefore, if students’ thinking is trained only in subject teaching without systematic method training, it is challenging to apply to real-world circumstances (Ruggiero, 2012 ; Hu and Liu, 2015 ). Teaching critical thinking as mixed course teaching in parallel to other subject teachings can achieve the best effect on learners’ critical thinking, and explicit critical thinking instruction is more effective than less explicit critical thinking instruction (Bensley and Spero, 2014 ).

In terms of the intervention duration, with longer intervention times, the overall effect size shows an upward tendency. Thus, the intervention duration and critical thinking’s impact are positively correlated. Critical thinking, as a key competency for students in the 21st century, is difficult to get a meaningful improvement in a brief intervention duration. Instead, it could be developed over a lengthy period of time through consistent teaching and the progressive accumulation of knowledge (Halpern, 2001 ; Hu and Liu, 2015 ). Therefore, future empirical studies ought to take these restrictions into account throughout a longer period of critical thinking instruction.

With regard to group size, a group size of 2–3 persons has the highest effect size, and the comprehensive effect size decreases with increasing group size in general. This outcome is in line with some research findings; as an example, a group composed of two to four members is most appropriate for collaborative learning (Schellens and Valcke, 2006 ). However, the meta-analysis results also indicate that once the group size exceeds 7 people, small groups cannot produce better interaction and performance than large groups. This may be because the learning scaffolds of technique support, resource support, and teacher support improve the frequency and effectiveness of interaction among group members, and a collaborative group with more members may increase the diversity of views, which is helpful to cultivate critical thinking utilizing the approach of collaborative problem-solving.

With regard to the learning scaffold, the three different kinds of learning scaffolds can all enhance critical thinking. Among them, the teacher-supported learning scaffold has the largest overall effect size, demonstrating the interdependence of effective learning scaffolds and collaborative problem-solving. This outcome is in line with some research findings; as an example, a successful strategy is to encourage learners to collaborate, come up with solutions, and develop critical thinking skills by using learning scaffolds (Reiser, 2004 ; Xu et al., 2022 ); learning scaffolds can lower task complexity and unpleasant feelings while also enticing students to engage in learning activities (Wood et al., 2006 ); learning scaffolds are designed to assist students in using learning approaches more successfully to adapt the collaborative problem-solving process, and the teacher-supported learning scaffolds have the greatest influence on critical thinking in this process because they are more targeted, informative, and timely (Xu et al., 2022 ).

With respect to the measuring tool, despite the fact that standardized measurement tools (such as the WGCTA, CCTT, and CCTST) have been acknowledged as trustworthy and effective by worldwide experts, only 54.43% of the research included in this meta-analysis adopted them for assessment, and the results indicated no intergroup differences. These results suggest that not all teaching circumstances are appropriate for measuring critical thinking using standardized measurement tools. “The measuring tools for measuring thinking ability have limits in assessing learners in educational situations and should be adapted appropriately to accurately assess the changes in learners’ critical thinking.”, according to Simpson and Courtney ( 2002 , p. 91). As a result, in order to more fully and precisely gauge how learners’ critical thinking has evolved, we must properly modify standardized measuring tools based on collaborative problem-solving learning contexts.

With regard to the subject area, the comprehensive effect size of science departments (e.g., mathematics, science, medical science) is larger than that of language arts and social sciences. Some recent international education reforms have noted that critical thinking is a basic part of scientific literacy. Students with scientific literacy can prove the rationality of their judgment according to accurate evidence and reasonable standards when they face challenges or poorly structured problems (Kyndt et al., 2013 ), which makes critical thinking crucial for developing scientific understanding and applying this understanding to practical problem solving for problems related to science, technology, and society (Yore et al., 2007 ).

Suggestions for critical thinking teaching

Other than those stated in the discussion above, the following suggestions are offered for critical thinking instruction utilizing the approach of collaborative problem-solving.

First, teachers should put a special emphasis on the two core elements, which are collaboration and problem-solving, to design real problems based on collaborative situations. This meta-analysis provides evidence to support the view that collaborative problem-solving has a strong synergistic effect on promoting students’ critical thinking. Asking questions about real situations and allowing learners to take part in critical discussions on real problems during class instruction are key ways to teach critical thinking rather than simply reading speculative articles without practice (Mulnix, 2012 ). Furthermore, the improvement of students’ critical thinking is realized through cognitive conflict with other learners in the problem situation (Yang et al., 2008 ). Consequently, it is essential for teachers to put a special emphasis on the two core elements, which are collaboration and problem-solving, and design real problems and encourage students to discuss, negotiate, and argue based on collaborative problem-solving situations.

Second, teachers should design and implement mixed courses to cultivate learners’ critical thinking, utilizing the approach of collaborative problem-solving. Critical thinking can be taught through curriculum instruction (Kuncel, 2011 ; Leng and Lu, 2020 ), with the goal of cultivating learners’ critical thinking for flexible transfer and application in real problem-solving situations. This meta-analysis shows that mixed course teaching has a highly substantial impact on the cultivation and promotion of learners’ critical thinking. Therefore, teachers should design and implement mixed course teaching with real collaborative problem-solving situations in combination with the knowledge content of specific disciplines in conventional teaching, teach methods and strategies of critical thinking based on poorly structured problems to help students master critical thinking, and provide practical activities in which students can interact with each other to develop knowledge construction and critical thinking utilizing the approach of collaborative problem-solving.

Third, teachers should be more trained in critical thinking, particularly preservice teachers, and they also should be conscious of the ways in which teachers’ support for learning scaffolds can promote critical thinking. The learning scaffold supported by teachers had the greatest impact on learners’ critical thinking, in addition to being more directive, targeted, and timely (Wood et al., 2006 ). Critical thinking can only be effectively taught when teachers recognize the significance of critical thinking for students’ growth and use the proper approaches while designing instructional activities (Forawi, 2016 ). Therefore, with the intention of enabling teachers to create learning scaffolds to cultivate learners’ critical thinking utilizing the approach of collaborative problem solving, it is essential to concentrate on the teacher-supported learning scaffolds and enhance the instruction for teaching critical thinking to teachers, especially preservice teachers.

Implications and limitations

There are certain limitations in this meta-analysis, but future research can correct them. First, the search languages were restricted to English and Chinese, so it is possible that pertinent studies that were written in other languages were overlooked, resulting in an inadequate number of articles for review. Second, these data provided by the included studies are partially missing, such as whether teachers were trained in the theory and practice of critical thinking, the average age and gender of learners, and the differences in critical thinking among learners of various ages and genders. Third, as is typical for review articles, more studies were released while this meta-analysis was being done; therefore, it had a time limit. With the development of relevant research, future studies focusing on these issues are highly relevant and needed.

Conclusions

The subject of the magnitude of collaborative problem-solving’s impact on fostering students’ critical thinking, which received scant attention from other studies, was successfully addressed by this study. The question of the effectiveness of collaborative problem-solving in promoting students’ critical thinking was addressed in this study, which addressed a topic that had gotten little attention in earlier research. The following conclusions can be made:

Regarding the results obtained, collaborative problem solving is an effective teaching approach to foster learners’ critical thinking, with a significant overall effect size (ES = 0.82, z = 12.78, P < 0.01, 95% CI [0.69, 0.95]). With respect to the dimensions of critical thinking, collaborative problem-solving can significantly and effectively improve students’ attitudinal tendency, and the comprehensive effect is significant (ES = 1.17, z = 7.62, P < 0.01, 95% CI [0.87, 1.47]); nevertheless, it falls short in terms of improving students’ cognitive skills, having only an upper-middle impact (ES = 0.70, z = 11.55, P < 0.01, 95% CI [0.58, 0.82]).

As demonstrated by both the results and the discussion, there are varying degrees of beneficial effects on students’ critical thinking from all seven moderating factors, which were found across 36 studies. In this context, the teaching type (chi 2 = 7.20, P < 0.05), intervention duration (chi 2 = 12.18, P < 0.01), subject area (chi 2 = 13.36, P < 0.05), group size (chi 2 = 8.77, P < 0.05), and learning scaffold (chi 2 = 9.03, P < 0.01) all have a positive impact on critical thinking, and they can be viewed as important moderating factors that affect how critical thinking develops. Since the learning stage (chi 2 = 3.15, P = 0.21 > 0.05) and measuring tools (chi 2 = 0.08, P = 0.78 > 0.05) did not demonstrate any significant intergroup differences, we are unable to explain why these two factors are crucial in supporting the cultivation of critical thinking in the context of collaborative problem-solving.

Data availability

All data generated or analyzed during this study are included within the article and its supplementary information files, and the supplementary information files are available in the Dataverse repository: https://doi.org/10.7910/DVN/IPFJO6 .

Bensley DA, Spero RA (2014) Improving critical thinking skills and meta-cognitive monitoring through direct infusion. Think Skills Creat 12:55–68. https://doi.org/10.1016/j.tsc.2014.02.001

Article Google Scholar

Castle A (2009) Defining and assessing critical thinking skills for student radiographers. Radiography 15(1):70–76. https://doi.org/10.1016/j.radi.2007.10.007

Chen XD (2013) An empirical study on the influence of PBL teaching model on critical thinking ability of non-English majors. J PLA Foreign Lang College 36 (04):68–72

Google Scholar

Cohen A (1992) Antecedents of organizational commitment across occupational groups: a meta-analysis. J Organ Behav. https://doi.org/10.1002/job.4030130602

Cooper H (2010) Research synthesis and meta-analysis: a step-by-step approach, 4th edn. Sage, London, England

Cindy HS (2004) Problem-based learning: what and how do students learn? Educ Psychol Rev 51(1):31–39

Duch BJ, Gron SD, Allen DE (2001) The power of problem-based learning: a practical “how to” for teaching undergraduate courses in any discipline. Stylus Educ Sci 2:190–198

Ennis RH (1989) Critical thinking and subject specificity: clarification and needed research. Educ Res 18(3):4–10. https://doi.org/10.3102/0013189x018003004

Facione PA (1990) Critical thinking: a statement of expert consensus for purposes of educational assessment and instruction. Research findings and recommendations. Eric document reproduction service. https://eric.ed.gov/?id=ed315423

Facione PA, Facione NC (1992) The California Critical Thinking Dispositions Inventory (CCTDI) and the CCTDI test manual. California Academic Press, Millbrae, CA

Forawi SA (2016) Standard-based science education and critical thinking. Think Skills Creat 20:52–62. https://doi.org/10.1016/j.tsc.2016.02.005

Halpern DF (2001) Assessing the effectiveness of critical thinking instruction. J Gen Educ 50(4):270–286. https://doi.org/10.2307/27797889

Hu WP, Liu J (2015) Cultivation of pupils’ thinking ability: a five-year follow-up study. Psychol Behav Res 13(05):648–654. https://doi.org/10.3969/j.issn.1672-0628.2015.05.010

Huber K (2016) Does college teach critical thinking? A meta-analysis. Rev Educ Res 86(2):431–468. https://doi.org/10.3102/0034654315605917

Kek MYCA, Huijser H (2011) The power of problem-based learning in developing critical thinking skills: preparing students for tomorrow’s digital futures in today’s classrooms. High Educ Res Dev 30(3):329–341. https://doi.org/10.1080/07294360.2010.501074

Kuncel NR (2011) Measurement and meaning of critical thinking (Research report for the NRC 21st Century Skills Workshop). National Research Council, Washington, DC

Kyndt E, Raes E, Lismont B, Timmers F, Cascallar E, Dochy F (2013) A meta-analysis of the effects of face-to-face cooperative learning. Do recent studies falsify or verify earlier findings? Educ Res Rev 10(2):133–149. https://doi.org/10.1016/j.edurev.2013.02.002

Leng J, Lu XX (2020) Is critical thinking really teachable?—A meta-analysis based on 79 experimental or quasi experimental studies. Open Educ Res 26(06):110–118. https://doi.org/10.13966/j.cnki.kfjyyj.2020.06.011

Liang YZ, Zhu K, Zhao CL (2017) An empirical study on the depth of interaction promoted by collaborative problem solving learning activities. J E-educ Res 38(10):87–92. https://doi.org/10.13811/j.cnki.eer.2017.10.014

Lipsey M, Wilson D (2001) Practical meta-analysis. International Educational and Professional, London, pp. 92–160

Liu Z, Wu W, Jiang Q (2020) A study on the influence of problem based learning on college students’ critical thinking-based on a meta-analysis of 31 studies. Explor High Educ 03:43–49

Morris SB (2008) Estimating effect sizes from pretest-posttest-control group designs. Organ Res Methods 11(2):364–386. https://doi.org/10.1177/1094428106291059

Article ADS Google Scholar

Mulnix JW (2012) Thinking critically about critical thinking. Educ Philos Theory 44(5):464–479. https://doi.org/10.1111/j.1469-5812.2010.00673.x

Naber J, Wyatt TH (2014) The effect of reflective writing interventions on the critical thinking skills and dispositions of baccalaureate nursing students. Nurse Educ Today 34(1):67–72. https://doi.org/10.1016/j.nedt.2013.04.002

National Research Council (2012) Education for life and work: developing transferable knowledge and skills in the 21st century. The National Academies Press, Washington, DC

Niu L, Behar HLS, Garvan CW (2013) Do instructional interventions influence college students’ critical thinking skills? A meta-analysis. Educ Res Rev 9(12):114–128. https://doi.org/10.1016/j.edurev.2012.12.002

Peng ZM, Deng L (2017) Towards the core of education reform: cultivating critical thinking skills as the core of skills in the 21st century. Res Educ Dev 24:57–63. https://doi.org/10.14121/j.cnki.1008-3855.2017.24.011

Reiser BJ (2004) Scaffolding complex learning: the mechanisms of structuring and problematizing student work. J Learn Sci 13(3):273–304. https://doi.org/10.1207/s15327809jls1303_2

Ruggiero VR (2012) The art of thinking: a guide to critical and creative thought, 4th edn. Harper Collins College Publishers, New York

Schellens T, Valcke M (2006) Fostering knowledge construction in university students through asynchronous discussion groups. Comput Educ 46(4):349–370. https://doi.org/10.1016/j.compedu.2004.07.010

Sendag S, Odabasi HF (2009) Effects of an online problem based learning course on content knowledge acquisition and critical thinking skills. Comput Educ 53(1):132–141. https://doi.org/10.1016/j.compedu.2009.01.008

Sison R (2008) Investigating Pair Programming in a Software Engineering Course in an Asian Setting. 2008 15th Asia-Pacific Software Engineering Conference, pp. 325–331. https://doi.org/10.1109/APSEC.2008.61

Simpson E, Courtney M (2002) Critical thinking in nursing education: literature review. Mary Courtney 8(2):89–98

Stewart L, Tierney J, Burdett S (2006) Do systematic reviews based on individual patient data offer a means of circumventing biases associated with trial publications? Publication bias in meta-analysis. John Wiley and Sons Inc, New York, pp. 261–286

Tiwari A, Lai P, So M, Yuen K (2010) A comparison of the effects of problem-based learning and lecturing on the development of students’ critical thinking. Med Educ 40(6):547–554. https://doi.org/10.1111/j.1365-2929.2006.02481.x

Wood D, Bruner JS, Ross G (2006) The role of tutoring in problem solving. J Child Psychol Psychiatry 17(2):89–100. https://doi.org/10.1111/j.1469-7610.1976.tb00381.x

Wei T, Hong S (2022) The meaning and realization of teachable critical thinking. Educ Theory Practice 10:51–57

Xu EW, Wang W, Wang QX (2022) A meta-analysis of the effectiveness of programming teaching in promoting K-12 students’ computational thinking. Educ Inf Technol. https://doi.org/10.1007/s10639-022-11445-2

Yang YC, Newby T, Bill R (2008) Facilitating interactions through structured web-based bulletin boards: a quasi-experimental study on promoting learners’ critical thinking skills. Comput Educ 50(4):1572–1585. https://doi.org/10.1016/j.compedu.2007.04.006

Yore LD, Pimm D, Tuan HL (2007) The literacy component of mathematical and scientific literacy. Int J Sci Math Educ 5(4):559–589. https://doi.org/10.1007/s10763-007-9089-4

Zhang T, Zhang S, Gao QQ, Wang JH (2022) Research on the development of learners’ critical thinking in online peer review. Audio Visual Educ Res 6:53–60. https://doi.org/10.13811/j.cnki.eer.2022.06.08

Download references

Acknowledgements

This research was supported by the graduate scientific research and innovation project of Xinjiang Uygur Autonomous Region named “Research on in-depth learning of high school information technology courses for the cultivation of computing thinking” (No. XJ2022G190) and the independent innovation fund project for doctoral students of the College of Educational Science of Xinjiang Normal University named “Research on project-based teaching of high school information technology courses from the perspective of discipline core literacy” (No. XJNUJKYA2003).

Author information

Authors and affiliations.

College of Educational Science, Xinjiang Normal University, 830017, Urumqi, Xinjiang, China

Enwei Xu, Wei Wang & Qingxia Wang

You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Enwei Xu or Wei Wang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Additional information.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary tables, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Xu, E., Wang, W. & Wang, Q. The effectiveness of collaborative problem solving in promoting students’ critical thinking: A meta-analysis based on empirical literature. Humanit Soc Sci Commun 10 , 16 (2023). https://doi.org/10.1057/s41599-023-01508-1

Download citation

Received : 07 August 2022

Accepted : 04 January 2023

Published : 11 January 2023

DOI : https://doi.org/10.1057/s41599-023-01508-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Impacts of online collaborative learning on students’ intercultural communication apprehension and intercultural communicative competence.

Hoa Thi Hoang Chau
Hung Phu Bui
Quynh Thi Huong Dinh

Education and Information Technologies (2024)

Exploring the effects of digital technology on deep learning: a meta-analysis

Sustainable electricity generation and farm-grid utilization from photovoltaic aquaculture: a bibliometric analysis.

A. A. Amusa
M. Alhassan

International Journal of Environmental Science and Technology (2024)

More From Forbes

The proven tool to level up your (and your team’s) leadership today.

Share to Facebook
Share to Twitter
Share to Linkedin

Have you found that tracking leadership growth can be a bit tricky?

Sure, we all know what good leadership feels like, but how—specifically—does one achieve it? What’s the recipe?

In my 30+ years building businesses I’ve distilled leadership growth and the tracking of it to 6 categories and 9 levels. Our coaching clients find it helpful to set numbered levels for leadership growth. This helps everyone to track how they are cultivating key leadership behaviors, and nicely complements the use of Individual Development Plans.

Here We Grow

A good starting point is William Oncken’s 5 Levels of Freedom . We prefer to call them Leadership Levels, as this makes more sense to the employee. Some companies, like Google, have many levels (ten in their case). See what makes sense to you, and is easy to define, track, and help people stretch and grow into. When coupled with needle movers/KPIs/OKRs/goals we see employees’ growth and performance soar. Here’s a link to our complimentary Leadership Level Assessment .

The first category is Planning & Proactivity . Here are the key criteria to score:

Consistently plan ahead. Not just for the immediate need, but for the overarching trajectory.
Know what your action items are without being reminded.
Don’t wait for or rely on others to tell you what to do. Take the lead and make a recommendation for how to drive toward the intended outcome.
Come prepared for every meeting.
Present yourself with appropriate energetic weight and executive presence for the given situation.

The second category is Follow Through & Accountability . Here are the key criteria to score:

Consistently deliver on time, on budget, and on-strategy in order to meet the client’s and business’s objectives.
Meet deadlines and follow through on commitments without prompting, additional micro-management, or someone else coming in to finish.
“Manage up” to your project lead and/or supervisor to ensure you’re getting what you need and delivering on expectations.
"Manage down/across" to your direct reports/peers to ensure they're getting what they need to be accountable per your expectations.
You are known for being consistently reliable.

The third category is Customer-Centric Quality . Here are the key criteria to score:

Consistently deliver in alignment with your Client-Centric Quality Standards (wherever possible within your agreed-upon leadership level).
Across Your Work – delivered per spec, as work as expected, done right the first time.
Client Service – timely, professional, accurate, value-added response.
Logistics – accurate details, all details.
Business Development / Growth – sensitivity to relationship building and fostering trust and safety, belonging, mattering with client.
Finance / Operations – ensuring Finance/Ops is in the loop when needed and providing clarity so their involvement is optimized/minimized.

‘Ghost Of Tsushima’ Is Already Flooded With Negative Reviews On Steam

Wwe smackdown results, winners and grades with stratton vs. belair, biden trump debates what to know as trump pushes for 2 more faceoffs.

The fourth category is Communication . Here are the key criteria to score:

Consistently communicate in a clear, professional, and personable manner — both in email and conversation even when upset.
Include necessary details, dates, expectations, subject tags, etc., and follow any existing communication standards.
Proactively communicate wants, needs, and obstacles to ensure your team (above and below) clearly understand what's needed of them, and what you need too.
Carefully read communications to ensure a complete response.
Proactively provides timely clear communication on outstanding items and status so team members can do their best work.

The fifth category is Strategic Thinking & Problem Solving . Here are the key criteria to score:

Consistently maintain alignment with the agreed-upon guiding strategy and desired outcomes.
Keep an eye on how everything relates to the bigger picture. (Nothing should exist in a silo.)
Approach challenges with a strategically focused problem-solving mindset — bringing multiple solutions to the table rather than solely seeking answers.
Approach challenges with an open mind and positive attitude.
Proactively provides ideas and well-formed strategies to grow the team and business.

The sixth category is Client Service & Presentation . Here are the key criteria to score:

Consistently manage, coordinate, and communicate with clients (and/or project/client leads) to fully understand and exceed their expectations.
Manage the process proactively to build confidence and trust within the client.
Meet and deliver on key milestone dates as expected and agreed upon.
Manage and communicate timeline updates appropriately.
Prepare and present client-ready materials in alignment with your Client-Centric Quality Standards without additional polishing support.

For each of the criteria in each level a leader will score themselves 1-5 where 5 is the highest. Here’s the link again to our complimentary Leadership Level Assessment .

And next, here are the 9 levels:

Image of 9 Leadership Levels

A few words on Leadership Level 5, which is where things start to get interesting!

First, bear in mind that to be performing at this level the individual needs an average score of 4 or 5. That said, if they gave themselves some 3s (or you did when you reviewed their self-assessment and provided your input), you’ll want to focus on helping them to boost their scores to get them to a consistent average of a 4 or greater to be at this Leadership Level.

Here are brief summaries of the other levels, see the graphic above for more detail:

Level 6 = This leader effectively and consistently leads and cultivates others. This means the leader is intentionally and demonstrably growing others into greater levels of leadership.

Level 7 = This leader’s influence is much higher than that of Level 6, which is what enables them to lead others inside as well as outside of the company (where they have less “same as” since outsiders won’t necessarily be tribe members).

Level 8 = This leader understands and practices more sophisticated degrees of strategy. They apply Porter’s Five Forces , for example, as opposed to a basic SWOT (strengths, weaknesses, opportunities, threats) approach.

Level 9 = This leader demonstrates a high and consistent level of self-awareness. They call out where they need to grow, use themselves as an example to mentor others and are fearless in their commitment to personal and professional development.

Would Leadership Levels make sense at your company? Doubtlessly. They provide us with a framework for ownership, accountability, drive, intrinsic motivation. How many levels would you like? How will you define each and help people increase their level?

Editorial Standards
Reprints & Permissions

Join The Conversation

One Community. Many Voices. Create a free account to share your thoughts.

Forbes Community Guidelines

Our community is about connecting people through open and thoughtful conversations. We want our readers to share their views and exchange ideas and facts in a safe space.

In order to do so, please follow the posting rules in our site's Terms of Service. We've summarized some of those key rules below. Simply put, keep it civil.

Your post will be rejected if we notice that it seems to contain:

False or intentionally out-of-context or misleading information
Insults, profanity, incoherent, obscene or inflammatory language or threats of any kind
Attacks on the identity of other commenters or the article's author
Content that otherwise violates our site's terms.

User accounts will be blocked if we notice or believe that users are engaged in:

Continuous attempts to re-post comments that have been previously moderated/rejected
Racist, sexist, homophobic or other discriminatory comments
Attempts or tactics that put the site security at risk
Actions that otherwise violate our site's terms.

So, how can you be a power user?

Stay on topic and share your insights
Feel free to be clear and thoughtful to get your point across
‘Like’ or ‘Dislike’ to show your point of view.
Protect your community.
Use the report tool to alert us when someone breaks the rules.

Thanks for reading our community guidelines. Please read the full list of posting rules found in our site's Terms of Service.

A .gov website belongs to an official government organization in the United States.

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Older Adult Fall Prevention
Falls Facts
Fall Prevention Resources
Falls Interventions
About Still Going Strong
STEADI - Older Adult Fall Prevention

[cdc_image image_type="basic" dep_image_url="/falls/images/STEADI-banner-2020-b.jpg" image_size="original" image_title="STEADI-banner-2020-b" box_style="header_main_primary" alignment="center" image_alt="STEADI%20falls%20rate%202020" image_margin_position="top,right,bottom,left" image_border="none" overlay_alignment="bottom" overlay_width="50" overlay_margin="large" overlay_padding="standard" overlay_title_size="standard" overlay_button_position="left" image_link="/steadi/index.html" more_link_color="main|primary" image_padding="none" image_padding_position="top,right,bottom,left" cs_rule="inherit" mp_action="none" /] -->

About Older Adult Fall Prevention

Falls can be prevented.
Falls among adults 65 and older caused over 38,000 deaths in 2021, making it the leading cause of injury death for that group. 1
In 2021, emergency departments recorded nearly 3 million visits for older adult falls. 1

Falls can be prevented

Falls are a threat to the health of older adults and can reduce their ability to remain independent. However, falls don't have to be inevitable as you age. You can reduce your chance of falling or help a loved one prevent falls. There are proven ways to reduce and prevent falls, even for older adults. We identify older adults as anyone 65 years and older. CDC uses data and research to help prevent falls and save lives.

Take the Falls Free Checkup

Take the Falls Free Checkup .

Take steps to reduce fall risk among your older adult patients. STEADI: Stopping Elderly Accidents, Deaths, & Injuries

Health care providers are encouraged to visit the STEADI site to learn more about CDC's initiative to help reduce fall risk among your older patients.

Older man smiling and holding keys in foreground with younger self driving in the background. Still Going Strong. Learn how you can age without injury.

Visit the Still Going Strong site to learn how you can age without injury.

MyMobility Plan  ( English  | Spanish  |  Tribal ) [8 pages]
Medicines Risk: Are Your Medicines Increasing Your Risk of a Fall or Car Crash?
Transportation Safety: Older Adult Drivers
Concussions and Traumatic Brain Injury (TBI)
Elder Abuse Prevention
A Descriptive Analysis of Location of Older Adult Falls that Resulted in Emergency Department Visits in the U.S., 2015 ( American Journal of Lifestyle Living , August 2020)
Trends in Nonfatal Falls and Fall-related Injuries Among Adults Aged ≥65 Years—U.S., 2012–2018 ( MMWR , July 2020)
Fall-related Emergency Department Visits Involving Alcohol Among Older Adults ( Journal of Safety Research , June 2020)
1. Centers for Disease Control and Prevention, National Center for Injury Prevention and Control. Web–based Injury Statistics Query and Reporting System (WISQARS) [online].
2. Florence CS, Bergen G, Atherly A, Burns ER, Stevens JA, Drake C. Medical Costs of Fatal and Nonfatal Falls in Older Adults . Journal of the American Geriatrics Society. 2018 Apr;66(4):693–698. DOI:10.1111/jgs.15304.

Falls—and the injuries and deaths they cause—are increasing, but falls can be prevented. Learn more about Older Adult Fall Prevention.

An official website of the United States government

Here’s how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

United States Drug Enforcement Administration

Get Updates
Submit A Tip

DEA Releases 2024 National Drug Threat Assessment

WASHINGTON – Today, DEA Administrator Anne Milgram announced the release of the 2024 National Drug Threat Assessment (NDTA), DEA’s comprehensive strategic assessment of illicit drug threats and trafficking trends endangering the United States.

For more than a decade, DEA’s NDTA has been a trusted resource for law enforcement agencies, policy makers, and prevention and treatment specialists and has been integral in informing policies and laws. It also serves as a critical tool to inform and educate the public.

DEA’s top priority is reducing the supply of deadly drugs in our country and defeating the two cartels responsible for the vast majority of drug trafficking in the United States. The drug poisoning crisis remains a public safety, public health, and national security issue, which requires a new approach.

“The shift from plant-based drugs, like heroin and cocaine, to synthetic, chemical-based drugs, like fentanyl and methamphetamine, has resulted in the most dangerous and deadly drug crisis the United States has ever faced,” said DEA Administrator Anne Milgram. “At the heart of the synthetic drug crisis are the Sinaloa and Jalisco cartels and their associates, who DEA is tracking world-wide. The suppliers, manufacturers, distributors, and money launderers all play a role in the web of deliberate and calculated treachery orchestrated by these cartels. DEA will continue to use all available resources to target these networks and save American lives.”

Drug-related deaths claimed 107,941 American lives in 2022, according to the Centers for Disease Control and Prevention (CDC). Fentanyl and other synthetic opioids are responsible for approximately 70% of lives lost, while methamphetamine and other synthetic stimulants are responsible for approximately 30% of deaths.

Fentanyl is the nation’s greatest and most urgent drug threat. Two milligrams (mg) of fentanyl is considered a potentially fatal dose. Pills tested in DEA laboratories average 2.4 mg of fentanyl, but have ranged from 0.2 mg to as high as 9 mg. The advent of fentanyl mixtures to include other synthetic opioids, such as nitazenes, or the veterinary sedative xylazine have increased the harms associated with fentanyl. Seizures of fentanyl, in both powder and pill form, are at record levels. Over the past two years seizures of fentanyl powder nearly doubled. DEA seized 13,176 kilograms (29,048 pounds) in 2023. Meanwhile, the more than 79 million fentanyl pills seized by DEA in 2023 is almost triple what was seized in 2021. Last year, 30% of the fentanyl powder seized by DEA contained xylazine. That is up from 25% in 2022.

Social media platforms and encrypted apps extend the cartels’ reach into every community in the United States and across nearly 50 countries worldwide. Drug traffickers and their associates use technology to advertise and sell their products, collect payment, recruit and train couriers, and deliver drugs to customers without having to meet face-to-face. This new age of digital drug dealing has pushed the peddling of drugs off the streets of America and into our pockets and purses.

The cartels have built mutually profitable partnerships with China-based precursor chemical companies to obtain the necessary ingredients to manufacturer synthetic drugs. They also work in partnership with Chinese money laundering organizations to launder drug proceeds and are increasingly using cryptocurrency.

Nearly all the methamphetamines sold in the United States today is manufactured in Mexico, and it is purer and more potent than in years past. The shift to Mexican-manufactured methamphetamine is evidenced by the dramatic decline in domestic clandestine lab seizures. In 2023, DEA’s El Paso Intelligence Center (EPIC) documented 60 domestic methamphetamine clandestine lab seizures, which is a stark comparison to 2004 when 23,700 clandestine methamphetamine labs were seized in the United States.

DEA’s NDTA gathers information from many data sources, such as drug investigations and seizures, drug purity, laboratory analysis, and information on transnational and domestic criminal groups.

It is available DEA.gov to view or download.

Wavefunction matching for solving quantum many-body problems

Strongly interacting systems play an important role in quantum physics and quantum chemistry. Stochastic methods such as Monte Carlo simulations are a proven method for investigating such systems. However, these methods reach their limits when so-called sign oscillations occur. This problem has now been solved by an international team of researchers from Germany, Turkey, the USA, China, South Korea and France using the new method of wavefunction matching. As an example, the masses and radii of all nuclei up to mass number 50 were calculated using this method. The results agree with the measurements, the researchers now report in the journal " Nature ."

All matter on Earth consists of tiny particles known as atoms. Each atom contains even smaller particles: protons, neutrons and electrons. Each of these particles follows the rules of quantum mechanics. Quantum mechanics forms the basis of quantum many-body theory, which describes systems with many particles, such as atomic nuclei.

One class of methods used by nuclear physicists to study atomic nuclei is the ab initio approach. It describes complex systems by starting from a description of their elementary components and their interactions. In the case of nuclear physics, the elementary components are protons and neutrons. Some key questions that ab initio calculations can help answer are the binding energies and properties of atomic nuclei and the link between nuclear structure and the underlying interactions between protons and neutrons.

However, these ab initio methods have difficulties in performing reliable calculations for systems with complex interactions. One of these methods is quantum Monte Carlo simulations. Here, quantities are calculated using random or stochastic processes. Although quantum Monte Carlo simulations can be efficient and powerful, they have a significant weakness: the sign problem. It arises in processes with positive and negative weights, which cancel each other. This cancellation leads to inaccurate final predictions.

A new approach, known as wavefunction matching, is intended to help solve such calculation problems for ab initio methods. "This problem is solved by the new method of wavefunction matching by mapping the complicated problem in a first approximation to a simple model system that does not have such sign oscillations and then treating the differences in perturbation theory," says Prof. Ulf-G. Meißner from the Helmholtz Institute for Radiation and Nuclear Physics at the University of Bonn and from the Institute of Nuclear Physics and the Center for Advanced Simulation and Analytics at Forschungszentrum Jülich. "As an example, the masses and radii of all nuclei up to mass number 50 were calculated -- and the results agree with the measurements," reports Meißner, who is also a member of the Transdisciplinary Research Areas "Modeling" and "Matter" at the University of Bonn.

"In quantum many-body theory, we are often faced with the situation that we can perform calculations using a simple approximate interaction, but realistic high-fidelity interactions cause severe computational problems," says Dean Lee, Professor of Physics from the Facility for Rare Istope Beams and Department of Physics and Astronomy (FRIB) at Michigan State University and head of the Department of Theoretical Nuclear Sciences.

Wavefunction matching solves this problem by removing the short-distance part of the high-fidelity interaction and replacing it with the short-distance part of an easily calculable interaction. This transformation is done in a way that preserves all the important properties of the original realistic interaction. Since the new wavefunctions are similar to those of the easily computable interaction, the researchers can now perform calculations with the easily computable interaction and apply a standard procedure for handling small corrections -- called perturbation theory.

The research team applied this new method to lattice quantum Monte Carlo simulations for light nuclei, medium-mass nuclei, neutron matter and nuclear matter. Using precise ab initio calculations, the results closely matched real-world data on nuclear properties such as size, structure and binding energy. Calculations that were once impossible due to the sign problem can now be performed with wavefunction matching.

While the research team focused exclusively on quantum Monte Carlo simulations, wavefunction matching should be useful for many different ab initio approaches. "This method can be used in both classical computing and quantum computing, for example to better predict the properties of so-called topological materials, which are important for quantum computing," says Meißner.

The first author is Prof. Dr. Serdar Elhatisari, who worked for two years as a Fellow in Prof. Meißner's ERC Advanced Grant EXOTIC. According to Meißner, a large part of the work was carried out during this time. Part of the computing time on supercomputers at Forschungszentrum Jülich was provided by the IAS-4 institute, which Meißner heads.

Quantum Computers
Computers and Internet
Computer Modeling
Spintronics Research
Mathematics
Quantum mechanics
Quantum entanglement
Introduction to quantum mechanics
Computer simulation
Quantum computer
Quantum dot
Quantum tunnelling
Security engineering

Story Source:

Materials provided by University of Bonn . Note: Content may be edited for style and length.

Journal Reference :

Serdar Elhatisari, Lukas Bovermann, Yuan-Zhuo Ma, Evgeny Epelbaum, Dillon Frame, Fabian Hildenbrand, Myungkuk Kim, Youngman Kim, Hermann Krebs, Timo A. Lähde, Dean Lee, Ning Li, Bing-Nan Lu, Ulf-G. Meißner, Gautam Rupak, Shihang Shen, Young-Ho Song, Gianluca Stellin. Wavefunction matching for solving quantum many-body problems . Nature , 2024; DOI: 10.1038/s41586-024-07422-z

Cite This Page :

Explore More

Life Expectancy May Increase by 5 Years by 2050
Toward a Successful Vaccine for HIV
Highly Efficient Thermoelectric Materials
Toward Human Brain Gene Therapy
Whale Families Learn Each Other's Vocal Style
AI Can Answer Complex Physics Questions
Otters Use Tools to Survive a Changing World
Monogamy in Mice: Newly Evolved Type of Cell
Sustainable Electronics, Doped With Air
Male Vs Female Brain Structure

problem solving assessment statistics

IMAGES

VIDEO

COMMENTS

ORIGINAL RESEARCH article

1. Introduction

2. Complex Problem-Solving Process Data

2.2. Research Questions

3. Proposed Method

3.2. Prediction

3.3. Parameter Estimation

3.4. Some Remarks

4. An Example from PISA 2012

4.2. Analyses

4.2.1. Model Selection

4.2.2. Prediction Performance on Testing Set

4.2.3. Interpretation of Parameter Estimates

5. Discussions

5.2. Inference, Prediction, and Interpretability

5.3. Extending the Current Model

5.4. Multiple-Task Analysis

Author Contributions

Conflict of Interest Statement

Collaborative Problem Solving: Considerations for the National Assessment of Educational Progress NAEP CPS NCES PISA

Behavioral patterns in collaborative problem solving: a latent profile analysis based on response times and actions in PISA 2015

Literature review

Process data and profiling students on the basis of response times and actions

Research questions

Methodology

Materials and indicators

Data cleaning and preparation

Statistical analyses

Model evaluation and selection

Descriptive statistics and correlations for the behavioral indicators: response times and the number of actions

The number of profiles based on the behavioral indicators and their descriptions

The descriptions of the profiles and the differences in the indicators

The differences in the distal outcome between the profiles

The profiles of students’ response times and actions during ColPS (RQ1)

Differences in performance between the extracted profiles (RQ2)

Limitations and future directions

Data Availability

Acknowledgements

Author information

Contributions

Corresponding author

Ethics declarations

Consent for publication

Additional information

Electronic supplementary material

Supplementary Material 1

About this article

Share this article

Challenges of Assessing Collaborative Problem Solving

Cite this chapter

Access this chapter

Author information

Corresponding author

Editor information

Rights and permissions

Copyright information

About this chapter

Download citation

Share this chapter

The effectiveness of collaborative problem solving in promoting students’ critical thinking: A meta-analysis based on empirical literature

Similar content being viewed by others

Impact of artificial intelligence on human loss in decision making, laziness and safety in education

Sleep quality, duration, and consistency are associated with better academic performance in college students

An overview of clinical decision support systems: benefits, risks, and strategies for success

Data sources and search strategies

Eligibility criteria

Data coding design

Procedure for extracting and coding data

Publication bias test

Heterogeneity test

The analysis of the overall effect size

The analysis of moderator effect size

The effectiveness of collaborative problem solving with regard to teaching critical thinking

The moderating effects of collaborative problem solving with regard to teaching critical thinking

Suggestions for critical thinking teaching

Implications and limitations

Conclusions

Data availability

Acknowledgements

Author information