164 lines
8.5 KiB
Markdown
164 lines
8.5 KiB
Markdown
# Mock Pupil-History Dataset — Learning Path Agent Hackathon
|
||
|
||
Hand-crafted, internally-consistent JSON dataset for the BoostAI **Learning
|
||
Path Agent** hackathon challenge. Mirrors the production SQLAlchemy schema
|
||
in `elevenplus-backend/src/app/models/` so an agent built on this data
|
||
transfers cleanly to the real database after the hackathon.
|
||
|
||
**Reference date:** `2026-05-01` (timestamps in the data are relative to
|
||
this — adjust `TODAY` in `generate.py` to shift them).
|
||
|
||
## What's in here
|
||
|
||
```
|
||
classroom.json One Year-6 Maths class + tutor + student-classroom links
|
||
students.json 12 students (with _persona annotation per student)
|
||
question_bank.json 50 unique 11+ Maths questions
|
||
assignments.json 8 assignments (5 CLOSED, 2 PUBLISHED, 1 DRAFT)
|
||
assignment_questions.json 64 assignment-question rows (links to question_bank)
|
||
assignment_assignees.json 85 (student × assignment) rows with status/scores
|
||
student_answers.json 593 per-question answer records — THE primary file
|
||
activity_logs.json 593 activity log rows (timestamps, durations, solve_mode)
|
||
dataset.json All of the above bundled into one object
|
||
bonus_early_warning_input.json 3 classes × 10 students for the bonus EWS challenge
|
||
bonus_early_warning_output.json Expected morning alert output (top 3 per class)
|
||
generate.py Deterministic generator (re-run anytime)
|
||
```
|
||
|
||
The agent's main input is `dataset.json` (or `student_answers.json` plus
|
||
`question_bank.json` if you'd rather load files separately).
|
||
|
||
For the bonus challenge shown in the brief, use
|
||
`bonus_early_warning_input.json` as a class-level monitor input with ordered
|
||
topics plus per-student topic understanding scores, and
|
||
`bonus_early_warning_output.json` as the expected ranked risk list with a
|
||
specific weak topic and recommended action for each student.
|
||
|
||
## Schema fidelity
|
||
|
||
Field names and enum values match the production models exactly:
|
||
|
||
| Production model | This dataset |
|
||
|---|---|
|
||
| `users.py` (role=student) | `students.json` |
|
||
| `classrooms.py` + `classroom_student_rs.py` | `classroom.json` |
|
||
| `assignments.py` (status: DRAFT/PUBLISHED/CLOSED) | `assignments.json` |
|
||
| `assignment_questions.py` | `assignment_questions.json` |
|
||
| `assignment_assignees.py` (status: NOT_STARTED/IN_PROGRESS/SUBMITTED) | `assignment_assignees.json` |
|
||
| `question_bank.py` (difficulty: EASY/MEDIUM/HARD; source: BOOST) | `question_bank.json` |
|
||
| `student_assignment_answers.py` (graded_marks, marks_awarded, grading_status=GRADED) | `student_answers.json` |
|
||
| `assignment_activity_logs.py` (activity_type, duration_seconds, extra_data) | `activity_logs.json` |
|
||
| `question_metrics.py` (explanation_type) | embedded as `_solve_mode` on each answer |
|
||
|
||
Timestamps are **Unix milliseconds** (BigInteger) per the project convention.
|
||
|
||
### Hackathon annotations (not in production schema)
|
||
|
||
Fields prefixed with **`_`** are hackathon-only annotations. Strip them if
|
||
you ever seed this data into the real DB.
|
||
|
||
| Annotation | Where | Meaning |
|
||
|---|---|---|
|
||
| `_persona` | `students.json` | Engineered behaviour persona (see below) |
|
||
| `_solve_mode` | `student_answers.json` | One of `just_answer`, `step_by_step`, `solve_together`, `handwritten` |
|
||
| `_time_on_task_seconds` | `student_answers.json` | Seconds spent on this question |
|
||
| `_is_correct` | `student_answers.json` | Boolean correctness (already implied by `graded_marks`) |
|
||
| `_misconception_tag` | `student_answers.json` | Set when wrong answer matches a known misconception (e.g. `add_tops_add_bottoms`) |
|
||
| `_question_topic` / `_sub_topic` / `_difficulty` | `student_answers.json` | Denormalised from `question_bank` for convenience |
|
||
| `_answered_at` | `student_answers.json` | Same as `created_at`, just clearer name |
|
||
|
||
The mock exports now also include newer review-style fields used by the current app schema, such as:
|
||
|
||
- per-question: `is_correct`, `ai_feedback`, `review_needs_attention`, `review_issue_reason`, `review_correctness_score`, `review_understanding_score`, `review_question_score`, `review_confidence`, `review_tags`
|
||
- per-assignee: `overall_score`, `ai_feedback`, `next_step_outcome`
|
||
|
||
These make it easier to seed or backfill historical closed homework into the newer review surfaces.
|
||
|
||
## The 12 students
|
||
|
||
Five students have engineered misconception personas; the other seven are
|
||
realistic noise. The agent should ideally identify the personas from the
|
||
data itself — `_persona` is included only so you can grade the agent.
|
||
|
||
| ID | Name | Persona | What you'll see in the data |
|
||
|---|---|---|---|
|
||
| 201 | Aisha Khan | `fraction_inversion` | ~12% on Fractions Add/Subtract/Multiply, ~78% elsewhere. Wrong answers show **add-tops-add-bottoms** pattern (e.g. ½+⅓ → ⅖). |
|
||
| 202 | Ben Carter | `place_value_gaps` | Fails multi-digit subtraction with borrowing & decimal alignment. Strong on single-digit ops. |
|
||
| 203 | Chen Wei | `rushed_careless` | Right method when in `step_by_step`; wrong final answer in `just_answer`. Time-on-task drops week-over-week. **No activity in the last 8 days** — also drives Early Warning. |
|
||
| 204 | Daniela Rossi | `solve_together_dependent` | Solve-Together share rises **21% → 88%** across the period. Independent accuracy degrading. |
|
||
| 205 | Elif Demir | `word_problem_weak` | 0% on word problems, ~90% on bare computation of the same operations. |
|
||
| 206 | Felix Brown | `stable_strong` | ~84% overall — baseline noise. |
|
||
| 207 | Grace Park | `stable_strong` | ~85% overall — baseline noise. |
|
||
| 208–210 | Singh / Nakamura / Williams | `stable_mid` | ~65% overall — baseline noise. |
|
||
| 211–212 | Patel / O'Connor | `stable_weak` | ~50% overall — baseline noise. |
|
||
|
||
## The 8 assignments
|
||
|
||
| ID | Name | Topic | Due | Status | Why it matters |
|
||
|---|---|---|---|---|---|
|
||
| 3001 | HW1 — Place Value Warmup | Place Value | -28d | CLOSED | Baseline data |
|
||
| 3002 | HW2 — Arithmetic Practice | Arithmetic | -22d | CLOSED | |
|
||
| 3003 | HW3 — Fractions Foundations | Fractions | -16d | CLOSED | First fraction signal |
|
||
| 3004 | HW4 — Negatives & BIDMAS | BIDMAS | -10d | CLOSED | |
|
||
| 3005 | HW5 — Geometry Basics | Geometry | -6d | CLOSED | |
|
||
| 3006 | HW6 — Algebra & Sequences | Algebra | +2d | PUBLISHED | In flight |
|
||
| **3007** | **HW7 — Adding Fractions (test prep)** | **Fractions** | **+5d** | **PUBLISHED** | **Curriculum deadline anchor for the bonus EWS** |
|
||
| 3008 | HW8 — Mixed Revision | Mixed | +12d | DRAFT | No activity yet |
|
||
|
||
## Bonus / Early Warning signals embedded in the data
|
||
|
||
| Signal | Where it lives | Who exhibits it |
|
||
|---|---|---|
|
||
| Drop in attempt rate (last 7 days) | `student_answers.json` timestamps | Student 203 (no recent activity); 211 reduced volume |
|
||
| Increasing Solve-Together reliance | `_solve_mode` distribution over time | Student 204 (21% → 88%) |
|
||
| Declining score trend on deadline topic | Fractions accuracy across HW3 → HW7 | Student 201 (very weak on Fractions; HW7 is fractions, due in 5 days) |
|
||
| Time since last session | `activity_logs.json` last timestamp | Student 203 (≥ 8 days) |
|
||
|
||
**Expected agent output for the bonus monitor on this dataset:**
|
||
|
||
| Rank | Student | Topic driving risk | Suggested action |
|
||
|---|---|---|---|
|
||
| 1 | 203 Chen Wei | Engagement collapse | Reach out + light re-entry assignment; check for blockers |
|
||
| 2 | 201 Aisha Khan | Fractions / Add — HW7 in 5 days | 3 targeted fraction-add questions on the add-tops-add-bottoms misconception |
|
||
| 3 | 204 Daniela Rossi | Increasing scaffold dependence (any topic) | Pair with `step_by_step` mode + scheduled `just_answer` checkpoint |
|
||
|
||
## Running / re-running the generator
|
||
|
||
```bash
|
||
cd boost-ai-eval/mock-data
|
||
python3 generate.py
|
||
```
|
||
|
||
The RNG is seeded (`20260501`), so output is byte-stable across runs unless
|
||
you change the source. To shift dates, edit `TODAY` at the top of
|
||
`generate.py`.
|
||
|
||
## Quick agent-side recipes
|
||
|
||
**Load everything:**
|
||
```python
|
||
import json
|
||
data = json.load(open('boost-ai-eval/mock-data/dataset.json'))
|
||
students = data['students']
|
||
answers = data['student_answers']
|
||
qbank = {q['id']: q for q in data['question_bank']}
|
||
```
|
||
|
||
**Pull all answers for a single student:**
|
||
```python
|
||
def answers_for(student_id):
|
||
aa_ids = {aa['id'] for aa in data['assignment_assignees']
|
||
if aa['student_id'] == student_id}
|
||
return [a for a in data['student_answers'] if a['assignee_id'] in aa_ids]
|
||
```
|
||
|
||
**Compute topic-level mastery snapshot:**
|
||
```python
|
||
from collections import defaultdict
|
||
topic_acc = defaultdict(lambda: [0, 0]) # [correct, total]
|
||
for a in answers_for(201):
|
||
topic_acc[a['_question_topic']][1] += 1
|
||
topic_acc[a['_question_topic']][0] += int(a['_is_correct'])
|
||
# {'Fractions': [2, 16], 'Place Value': [3, 4], ...}
|
||
```
|