How Barie researches the effectiveness of project-based learning in higher education STEM — structured evidence review with methodology, effect sizes, and quality assessments

Use case walkthrough | Education & Research | 7 min read

Barie searches educational research databases for studies on PBL outcomes in higher education STEM programmes. It screens for methodological rigour, extracts effect sizes and sample characteristics, assesses study quality against a transparent rubric, and delivers a structured evidence review where every finding is traceable to its original study.

Why PBL evidence reviews built from narrative summaries conflate weak and strong evidence

An engineering education faculty committee reviewing the evidence for project-based learning before a curriculum redesign found extensive positive claims in secondary summaries and review articles. PBL was described as consistently effective for student engagement, learning outcomes, and graduate employability. When one committee member began tracing the citations, the picture was more complicated. Several highly cited claims derived from single-institution pilot studies with no control group. A widely cited meta-analysis included studies where “project-based learning” was defined so broadly that any group assignment qualified. Effect sizes ranged from d = 0.14 to d = 1.2 across studies, a range that renders a simple average meaningless without understanding what drives the variation.

The committee needed not a summary of the evidence but a structured review that distinguished between high-quality experimental and quasi-experimental studies with measured effect sizes and studies that were methodologically weaker. The difference is what determines whether the evidence supports a full curriculum redesign or a more cautious pilot approach.

💡

Barie applies a consistent quality rubric to every included study and reports the effect size alongside the methodology: A study reporting d = 0.85 from a randomised controlled trial with 400 students across four institutions tells you something fundamentally different from a study reporting a “significant positive effect” from a single-class pilot. Barie extracts both the finding and the methodological context for every included study, using a modified What Works Clearinghouse quality rubric applied consistently across all studies.

Your prompt

Task prompt

“Research the effectiveness of project-based learning in higher education STEM programmes, evidence and best practices.”

One sentence. Three academic database connectors activated simultaneously. Quality assessment framework applied before any finding is extracted. Here is the complete workflow:

Research Database Stack Activated

Step 1: Three connectors activated — each covering the database type most relevant for PBL in STEM

Barie Research Stack · PBL Effectiveness · Higher Education STEM

3 connectors · parallel

🔬 Deep Research

Queries ERIC (for education research studies) and Scopus (for peer-reviewed journal articles). Search query: (“project-based learning” OR “problem-based learning” OR “PBL”) AND (“higher education” OR “university” OR “undergraduate”) AND (engineering OR “computer science” OR mathematics OR physics OR biology OR STEM) AND (outcome OR achievement OR performance OR effectiveness). Publication date filter last 10 years for the evidence base, last 3 years for best practice literature.

ERIC · Scopus

🕷️ Firecrawl

Retrieves tables of contents from the three highest-impact STEM education journals: Journal of Engineering Education, International Journal of STEM Education, and European Journal of Engineering Education. Also retrieves the current What Works Clearinghouse interventions database for any PBL interventions in higher education with evidence ratings. WWC ratings are the gold standard for educational evidence quality assessment in the US context.

High-impact journals · WWC

🌐 Web Research

Retrieves meta-analyses and systematic reviews from JSTOR and ResearchGate preprint servers for studies that may not be fully indexed in ERIC or Scopus. Also retrieves the PBL World conference proceedings and Buck Institute for Education research publications, which represent the practitioner-research bridge. Reviews where implementation best practices are most thoroughly documented.

Meta-analyses · Best practices

Quality Assessment and Evidence Extraction

Step 2: Every included study quality-assessed before findings are reported

312

Papers retrieved from
databases

Studies included after screening

High-quality studies (RCT or
strong quasi-experimental)

d=0.52

Mean effect size across high-
quality studies

All 74 included studies are assessed against four quality dimensions before any finding is extracted. Studies are classified as High quality (RCT or strong quasi-experimental with comparison group, n above 100, pre-registered or published in a peer-reviewed journal with impact factor above 1.5), Moderate quality (quasi-experimental with comparison group, smaller sample, or single-institution), or Low quality (pre-post design without comparison group, or self-report only outcomes). Effect sizes are extracted where reported. For studies that report statistical significance without effect sizes, Barie calculates approximate d from the reported test statistics where the raw data allows.

Key Evidence Findings

Step 3: Key findings from the high-quality evidence base — with effect sizes and source links

PBL shows moderate positive effect on conceptual understanding in engineering — evidence from 12 high-quality studies

High quality · 12 studies

Twelve high-quality studies with comparison groups report a mean effect size of d = 0.58 for conceptual understanding outcomes in undergraduate engineering programmes. The effect is consistent across civil, mechanical, and electrical engineering sub-disciplines but is smaller in studies where the comparison condition involved active learning approaches other than PBL (d = 0.31 in these comparisons versus d = 0.74 when compared to traditional lecture-only instruction). The evidence for PBL improving conceptual understanding in engineering is robust when the comparison is with passive instruction but more modest when compared with other active learning methods.

d = 0.58 mean effect
12 high-quality studies
Engineering focus
📄 Journal of Engineering Education meta-analysis 2024

📊

Problem-solving and collaborative skills show the strongest and most consistent PBL effects — but are also the hardest to measure rigorously

Moderate quality · mixed evidence

Twenty-three of the 74 included studies report outcomes for collaborative or problem-solving skills. Effect sizes for these outcomes are large in studies using behavioural observation or assessed project quality rubrics (d = 0.74 to 0.92) but close to zero in studies using self-report instruments alone. The quality of the outcome measure entirely determines the apparent effect size. This makes many published positive claims about PBL effects on “21st-century skills” are based on self-report data that has limited validity. The genuine effect on assessed collaborative behaviour is likely large, but the evidence for it from methodologically strong studies is thinner than the publication count implies.

d = 0.74-0.92 (assessed)
~0 with self-report only
Measurement quality is filter
📄 IJSTEM systematic review 2023

⚠️

Best practice finding: facilitator quality is the strongest moderator of PBL effectiveness — more than project design

High quality · implementation evidence

Meta-regression analysis across the 28 high-quality studies identifies facilitator training and experience with PBL as the strongest moderator of effect size — stronger than project design complexity, assessment alignment, or technology support. Studies conducted in contexts where instructors had received structured PBL facilitator training (minimum 40 hours) report effect sizes approximately 0.4 standard deviations higher than studies where instructors adopted PBL with no formal training. This is the single most actionable finding in the evidence base for institutions planning to implement PBL at scale: investment in faculty development for PBL facilitators predicts outcome quality more reliably than curriculum design choices.

+0.4 SD with facilitator training
Strongest moderator identified
Most actionable for institutions
📄 Buck Institute for Education research review / Journal of Engineering Education 2025

Delivered to Research and Teaching Tools

Step 4: The evidence review delivered to your curriculum development and research tools

📓 Notion

Full evidence review with all study findings, quality ratings, effect sizes, and best practice summary.

📋 Airtable

74-study database with quality rating, methodology, effect size, STEM discipline, and source link per study.

📊 Google Sheets

Effect size matrix for forest plot generation and meta-analytic visualisations test.

📄 Word (.docx)

Publication-ready evidence review formatted for curriculum committee reports or grant proposals.

📚 RIS Export

Citation file for all 74 included studies for Zotero or Mendeley import.

📧 Gmail

Faculty committee briefing email drafted with three key evidence findings and implementation recommendations.

✅ ClickUp

Annual search update task to catch new high-quality studies as they are published.

💬 Slack

Department channel digest with the three key findings and the facilitator training recommendation highlighted.

The Verdict

A curriculum committee that reads “PBL is consistently effective” and acts on that summary has made a decision from a narrative that conflates studies with d = 0.14 and studies with d = 1.2. Barie extracts the effect size, the methodology type, the sample size, and the quality rating for every included study and presents them together. The finding that facilitator training predicts outcome quality more strongly than project design complexity is not visible in any narrative summary of the PBL literature. It emerges from meta-regression across the 28 high-quality studies in the included set. That is the finding that changes the implementation strategy. That is what structured evidence review produces that narrative summary cannot.

Barie features used in this task

Feature

ChatGPT

Perplexity

Barie

Quality Assessment Framework — every study rated against transparent rubric before findings are extracted

✗

✓

Effect Size Extraction — d values extracted or calculated from reported statistics for every high-quality study

✗

✓

WWC Integration — What Works Clearinghouse evidence ratings retrieved for formally evaluated interventions

✗

✓