Barie searches educational research databases for studies on PBL outcomes in higher education STEM programmes. It screens for methodological rigour, extracts effect sizes and sample characteristics, assesses study quality against a transparent rubric, and delivers a structured evidence review where every finding is traceable to its original study.
Why PBL evidence reviews built from narrative summaries conflate weak and strong evidence
An engineering education faculty committee reviewing the evidence for project-based learning before a curriculum redesign found extensive positive claims in secondary summaries and review articles. PBL was described as consistently effective for student engagement, learning outcomes, and graduate employability. When one committee member began tracing the citations, the picture was more complicated. Several highly cited claims derived from single-institution pilot studies with no control group. A widely cited meta-analysis included studies where “project-based learning” was defined so broadly that any group assignment qualified. Effect sizes ranged from d = 0.14 to d = 1.2 across studies, a range that renders a simple average meaningless without understanding what drives the variation.
The committee needed not a summary of the evidence but a structured review that distinguished between high-quality experimental and quasi-experimental studies with measured effect sizes and studies that were methodologically weaker. The difference is what determines whether the evidence supports a full curriculum redesign or a more cautious pilot approach.
💡
Barie applies a consistent quality rubric to every included study and reports the effect size alongside the methodology: A study reporting d = 0.85 from a randomised controlled trial with 400 students across four institutions tells you something fundamentally different from a study reporting a “significant positive effect” from a single-class pilot. Barie extracts both the finding and the methodological context for every included study, using a modified What Works Clearinghouse quality rubric applied consistently across all studies.
Your prompt
Task prompt
“Research the effectiveness of project-based learning in higher education STEM programmes, evidence and best practices.”
One sentence. Three academic database connectors activated simultaneously. Quality assessment framework applied before any finding is extracted. Here is the complete workflow:
1
Research Database Stack Activated
Step 1: Three connectors activated — each covering the database type most relevant for PBL in STEM
🔬 Deep Research
Queries ERIC (for education research studies) and Scopus (for peer-reviewed journal articles). Search query: (“project-based learning” OR “problem-based learning” OR “PBL”) AND (“higher education” OR “university” OR “undergraduate”) AND (engineering OR “computer science” OR mathematics OR physics OR biology OR STEM) AND (outcome OR achievement OR performance OR effectiveness). Publication date filter last 10 years for the evidence base, last 3 years for best practice literature.
ERIC · Scopus
🕷️ Firecrawl
Retrieves tables of contents from the three highest-impact STEM education journals: Journal of Engineering Education, International Journal of STEM Education, and European Journal of Engineering Education. Also retrieves the current What Works Clearinghouse interventions database for any PBL interventions in higher education with evidence ratings. WWC ratings are the gold standard for educational evidence quality assessment in the US context.
High-impact journals · WWC
🌐 Web Research
Retrieves meta-analyses and systematic reviews from JSTOR and ResearchGate preprint servers for studies that may not be fully indexed in ERIC or Scopus. Also retrieves the PBL World conference proceedings and Buck Institute for Education research publications, which represent the practitioner-research bridge. Reviews where implementation best practices are most thoroughly documented.
Meta-analyses · Best practices
2
Quality Assessment and Evidence Extraction
Step 2: Every included study quality-assessed before findings are reported
312
Papers retrieved from
databases
74
Studies included after screening
28
High-quality studies (RCT or
strong quasi-experimental)
d=0.52
Mean effect size across high-
quality studies
All 74 included studies are assessed against four quality dimensions before any finding is extracted. Studies are classified as High quality (RCT or strong quasi-experimental with comparison group, n above 100, pre-registered or published in a peer-reviewed journal with impact factor above 1.5), Moderate quality (quasi-experimental with comparison group, smaller sample, or single-institution), or Low quality (pre-post design without comparison group, or self-report only outcomes). Effect sizes are extracted where reported. For studies that report statistical significance without effect sizes, Barie calculates approximate d from the reported test statistics where the raw data allows.
Step 3: Key findings from the high-quality evidence base — with effect sizes and source links
Twelve high-quality studies with comparison groups report a mean effect size of d = 0.58 for conceptual understanding outcomes in undergraduate engineering programmes. The effect is consistent across civil, mechanical, and electrical engineering sub-disciplines but is smaller in studies where the comparison condition involved active learning approaches other than PBL (d = 0.31 in these comparisons versus d = 0.74 when compared to traditional lecture-only instruction). The evidence for PBL improving conceptual understanding in engineering is robust when the comparison is with passive instruction but more modest when compared with other active learning methods.
Twenty-three of the 74 included studies report outcomes for collaborative or problem-solving skills. Effect sizes for these outcomes are large in studies using behavioural observation or assessed project quality rubrics (d = 0.74 to 0.92) but close to zero in studies using self-report instruments alone. The quality of the outcome measure entirely determines the apparent effect size. This makes many published positive claims about PBL effects on “21st-century skills” are based on self-report data that has limited validity. The genuine effect on assessed collaborative behaviour is likely large, but the evidence for it from methodologically strong studies is thinner than the publication count implies.
Meta-regression analysis across the 28 high-quality studies identifies facilitator training and experience with PBL as the strongest moderator of effect size — stronger than project design complexity, assessment alignment, or technology support. Studies conducted in contexts where instructors had received structured PBL facilitator training (minimum 40 hours) report effect sizes approximately 0.4 standard deviations higher than studies where instructors adopted PBL with no formal training. This is the single most actionable finding in the evidence base for institutions planning to implement PBL at scale: investment in faculty development for PBL facilitators predicts outcome quality more reliably than curriculum design choices.
4
Delivered to Research and Teaching Tools
Step 4: The evidence review delivered to your curriculum development and research tools
The Verdict
A curriculum committee that reads “PBL is consistently effective” and acts on that summary has made a decision from a narrative that conflates studies with d = 0.14 and studies with d = 1.2. Barie extracts the effect size, the methodology type, the sample size, and the quality rating for every included study and presents them together. The finding that facilitator training predicts outcome quality more strongly than project design complexity is not visible in any narrative summary of the PBL literature. It emerges from meta-regression across the 28 high-quality studies in the included set. That is the finding that changes the implementation strategy. That is what structured evidence review produces that narrative summary cannot.
Barie features used in this task
Feature
ChatGPT
Perplexity
Barie
Quality Assessment Framework — every study rated against transparent rubric before findings are extracted
✗
✗
✓
Effect Size Extraction — d values extracted or calculated from reported statistics for every high-quality study
✗
✗
✓
WWC Integration — What Works Clearinghouse evidence ratings retrieved for formally evaluated interventions
✗
✗
✓