What's AI's impact on critical thinking?

What is AI's impact on individuals' performance at critical thinking?

Investigating the Effects of LLM Use on Critical Thinking

Under Time Constraints: Access Timing and Time Availability

Jiayin Zhi, Harsh Kumar, Mina Lee

Paper

DOI

Bib

Resources

The impact of large language models (LLMs) on critical thinking has provoked growing attention, yet this impact on actual performance may not be uniformly negative or positive. Particularly, the role of time—the temporal context under which an LLM is provided—remains overlooked. In a between-subjects experiment (n=393), we examined two types of time constraints for a critical thinking task requiring participants to make a reasoned decision for a real-world civic scenario based on diverse documents: (1) LLM access timing—an LLM available only at the beginning (early), throughout (continuous), near the end (late), or not at all (no LLM), and (2) time availability—insufficient or sufficient time for the task. We found a temporal reversal: LLM access from the start (early, continuous) improved performance under time pressure but impaired it with sufficient time, whereas beginning the task independently (late, no LLM) showed the opposite pattern. These findings demonstrate that time constraints fundamentally shape whether an LLM augments or undermines critical thinking, making time a central consideration when designing LLM support and evaluating human-AI collaboration in cognitive tasks.

Critical Thinking in the Age of AI

Critical thinking is the ability to analyze, evaluate, and synthesize diverse — and sometimes conflicting — information to reach reasoned decisions (Braun et al.; Krathwohl et al.). Whether navigating contradictory news stories, making purchase decisions, or writing research reports, we constantly encounter situations that demand critical thinking. Consider an everyday scenario: you need to decide whether to accept an offer or a proposal, and you have a stack of sources — reports, testimonials, analyses — each offering a different perspective. You begin reading, distill key arguments, move between sources to weigh trade-offs, check your own biases, and ultimately revise and communicate your reasoning. Each of these facets reflects the inherently non-linear process of critical thinking.

Today, critical thinking is highly exposed to large language models (LLMs), as it has been the most frequently required capability in tasks that users bring to popular LLM chatbots (Handa et al.). There is growing attention to the benefits and risks of using AI for critical thinking and human reasoning. On one hand, AI offers new efficiencies in consuming and producing information. On the other hand, concerns are rising that outsourcing cognitive activities to AI may not lead to better thinking outcomes.

So, what are the actual consequences for people's critical thinking performance when using AI?

Time Constraints

To answer this question, we cannot simply compare using an LLM versus not using one to label the impact as positive or negative. Time constraints — the temporal context in which an LLM is provided — can fundamentally shape its impact on human-AI collaboration for learning and thinking outcomes. We identified two crucial types of time constraints: the timing of LLM access and the time availability for task completion.

Within the dimension of access timing, comparing "using an LLM" versus "not using one" is effectively comparing continuous LLM access against no access at all. Early LLM access — where the LLM is only available at the beginning of a task — may be particularly beneficial, serving as initial scaffolding that handles groundwork and potentially frees up cognitive resources for deeper deliberation (Imundo et al.; Singh et al.). Late LLM access — where the LLM is only available near the end of a task — may also prove beneficial. Prior studies on other cognitive tasks, such as math problem solving (Kumar et al.) and creative writing (Qin et al.), have shown that providing LLM access after participants' independent attempts improved their performance. The second type of time constraint is time availability. Time pressure is a pervasive reality in our daily life and work, and it can shift people from deliberative reasoning to heuristic processing (Gonthier; Karau and Kelly; Kocher and Sutter).

Early Access

Continuous Access

Late Access

No Access

Insufficient Time

💭 Insufficient < Sufficient (Gonthier)

Sufficient Time

🧮 Early < Late (Kumar et al.)

🖊️ Continuous < Late (Qin et al.)

🧮 Early < Late (Kumar et al.)

💭 Insufficient < Sufficient (Gonthier)

This table summarizes the directionality of effects in prior work across LLM access timing and time availability. Only studies that explicitly manipulate time constraints are included; empty cells indicate conditions with no recent empirical studies. By the time of our study, these two dimensions were studied in isolation, leaving many combinations unexplored. Our study systematically examines human-AI collaboration for critical thinking across all conditions in the two-dimensional space of time constraints — see the updated table.

Study Design

Experiment Framework

We conducted a between-subjects experiment (n=393) to examine the effects of LLM use on critical thinking across both types of time constraints. Time availability was manipulated as Sufficient (30 minutes) versus Insufficient (10 minutes), and LLM access timing was divided into Early, Continuous, and Late based on thirds of the task duration. These parameters were informed by participant behavior observed in the pilots. Under Sufficient or Insufficient time, participants were randomly assigned to one of four LLM access timings: Early, Continuous, Late, or No LLM access.

Task

To assess critical thinking performance, we employed a task from the performance assessment framework (Braun et al.; Ebright-Jones and Cortina). The task presents a real-world civic decision-making scenario in which participants act as city council members deciding whether to accept a company's proposal to address water contamination it caused, in exchange for dropping lawsuits and future liabilities. Participants were provided with a set of documents varying in relevance to the decision, trustworthiness of the sources, and stance toward the proposal (pro, con, or neutral), representing diverse information sources including technical reports, news articles, and government agency brochures. Participants were asked to prepare an essay explaining their reasoning behind their decision, based on information drawn from some or all of the documents. Beyond the Essay performance, we measured participants' free Recall of documents, explicit Evaluation of each document's characteristics, and Comprehension through factual inference. These additional measures capture related cognitive activities that may not be reflected in the essays themselves. See the paper for more details about the measures.

Interface

Task interface: Instructions of the document viewer (left), LLM-powered chatbot with access to all the documents (right), and essay textbox (bottom). The timer begins after participants click all "I understand" buttons.

The task interface controls the LLM access and time availability. The timer at the bottom right shows the remaining task time. Participants are automatically taken to the next page when time runs out.

Interface for assessing Recall: Participants can add fields to summarize the main ideas of documents by free recall. Pasting is disabled.

Interface for assessing Evaluation: Participants rate the relevance, trustworthiness, and stance of each document with title provided. They can select "I didn't read this'' for a document.

Findings

Task Performance

Participants' essays were scored based on the number of valid arguments — arguments that can be developed from the provided documents. A myside bias score was derived from each essay to capture the balance of pro and con arguments; a higher score indicates that a participant predominantly presented arguments on one side. We found a dramatic temporal reversal: under Insufficient time, having LLM access from the start (Early, Continuous LLM access) improved the Essay performance of critical thinking. Yet under Sufficient time, those who worked independently first (Late, No LLM access) showed better Essay and Recall performance. Moreover, those having LLM access from the start showed minimal gains from Insufficient to Sufficient time. These findings suggest that having LLM access from the start limits the benefits of more task time and may prevent deeper internalization of the source documents.

Browse the results for Essay (Essay, Myside Bias, Number of arguments), Recall, Evaluation, and Comprehension performance below. See the paper for details about the stats.

Patterns reversed based on time availability. Under Insufficient time, participants having LLM access from the start (Early, Continuous LLM access) outperformed those who worked independently first (Late, No LLM access). Notably, those having LLM access from the start showed minimal gains from Insufficient to Sufficient time.

Under Sufficient time, Late LLM access reduced Myside Bias compared to No LLM access, while maintaining argument quantity, whereas the stable low Myside Bias for Early and Continuous LLM access may reflect fewer arguments rather than balanced reasoning.

The number of valid arguments, as the primary component of the Essay score, showed a similar trend. Specifically, under Sufficient time, Late and No LLM access showed similar argument quantity.

Under Insufficient time, all LLM access timings showed similarly poor Recall. Under Sufficient time, having LLM access from the start (Early, Continuous LLM access) impaired Recall compared to working independently first (Late, No LLM access).

Having Sufficient time generally improved Evaluation correctness overall, with minimal differences between LLM access timings.

Having Sufficient time generally improved Comprehension correctness, with minimal differences between LLM access timings.

Behavioral Engagement

Why would the benefits of having LLM access from the start (Early, Continuous LLM access) disappear when people have Sufficient time? We examined participants' interaction logs to understand the mechanisms behind the temporal reversal. We found that participants having LLM access from the start showed minimal increase in unique arguments beyond what the LLM provided and document engagement during writing as time availability increased from Insufficient to Sufficient time. Having LLM access from the start appeared to limit further deliberation, narrow document engagement, and anchor participants to AI-provided framing. This potentially explains why having LLM access from the start boosts performance under Insufficient time with pressure — where efficiency matters most — but not when Sufficient time allows for the deeper reasoning that characterizes strong critical thinking.

Browse the patterns below. See the paper for additional findings on textual overlap and how participants having Late LLM access revised their essays to reduce Myside Bias.

Did participants simply follow the LLM’s arguments? By examining the overlap between arguments in each participant's essay and the LLM responses they received, we found that participants having Early and Continuous LLM access showed minimal increases in non-overlapping arguments from Insufficient to Sufficient time, while those having Late and No LLM access showed substantial increases. This suggests that having LLM access from the start may limit further deliberation, even when more task time is provided.

How did participants engage with source documents before and during writing? We found that participants having Early and Continuous LLM access viewed fewer documents during writing, particularly under Sufficient time. This suggests less iterative consultation of the sources and greater anchoring to LLM-provided initial framing, rather than further deliberation.

How did participants describe their approaches for the critical thinking task? Actually, the most common task approaches of participants having LLM access from the start (Early, Continuous LLM access) is not directly adopting AI-generated content. They used the LLM for summarizing documents or verifying ideas and concepts. But this type of LLM use seems to be enough to anchor their subsequent deliberation about which documents and ideas to pursue.

Implications

For Designing AI Support for Critical Thinking

Our findings reveal that whether AI augments or undermines critical thinking depends crucially on the temporal context in which it is used. We highlight time as a central consideration for designing AI support for critical thinking.

With sufficient time, encourage independent thinking first. In our study, participants who worked independently before accessing the LLM showed the strongest critical thinking outcomes. AI systems can nudge users to form their own thoughts first — through prompts, frictions, or phased access — then augment their existing thinking.
Under time pressure, using AI from the start is often inevitable. AI design should support this reality but with scaffolding to mitigate anchoring risks. Rather than simply verifying output accuracy, interventions should encourage users to consider deeper aspects of critical thinking, such as source diversity and alternative perspectives, to avoid suppressing further deliberation.
Adapt AI support dynamically to users' time availability. Rather than offering static assistance, AI systems should account for real-world temporal realities like procrastination and deadlines. Lightweight features — such as asking how much time is available or tracking session duration — can help tailor when and how AI is introduced.

For Research on Human-AI Collaboration

Early Access

Continuous Access

Late Access

No Access

Insufficient Time

Essay: Moderate

Myside Bias: Moderate

Essay: Moderate

Myside Bias: Moderate

Essay: Low

Myside Bias: Moderate

Essay: Low

Myside Bias: Moderate

Sufficient Time

Essay: Moderate

Myside Bias: Moderate

Essay: Moderate

Myside Bias: Moderate

Essay: High

Myside Bias: Low

Essay: High

Myside Bias: High

This table summarizes our high-level findings on human-AI collaboration for critical thinking across all conditions in the two-dimensional space of time constraints. Text weight indicates performance level: bold for High, semi-bold for Moderate, and regular for Low.

Research should consider time constraints when evaluating human-AI collaboration. Testing under multiple temporal conditions can reveal when benefits emerge or disappear — and our two-dimensional space of time constraints (experiment framework; high-level findings) provides one approach for systematically doing so. Researchers do not need to examine every combination; rather, they should select conditions that align with their tasks, design choices, and research goals. Time availability should always be considered. Access timing, however, depends on the design choices and scope of the AI being studied. For example, studies that present pre-generated LLM responses cannot claim continuous LLM access, but the design can be framed as early access and paired with a condition where participants work independently first. Studies that provide interactive LLMs face different considerations: for unfamiliar tasks, early and continuous access may effectively converge as users tend to access AI from the beginning. But for tasks involving participants with diverse expertise, users may self-regulate and delay AI use, meaning continuous access cannot be assumed to function like early access.

BibTeX