Lazizbek

Building maintAIn: LLMs for Student Wellbeing at the OSU AI Hackathon

·7 min read

The Short Version:

We built a Chrome extension and dashboard that uses IBM WatsonX to detect student burnout before it becomes a crisis — shipped in 48 hours, placed 3rd in the Education track at the IBM SkillsBuild Hackathon at Ohio State.

48 hours. That's how long we had to go from idea to working demo at the IBM SkillsBuild AI Hackathon at Ohio State University, hosted by Buckeye FinTech and BeMyApp. We placed 3rd in the Education track.

The team: Mitchell Hooper led product strategy and structured thinking, Jaeha Lee and I handled the technical build. The constraint of 48 hours turned out to be clarifying — it forced us to cut everything that wasn't essential and ship the thing that actually mattered.

The problem we picked: not another AI study assistant. Those already exist. We focused on something different — turning invisible student behaviors into actionable insights, both for students reflecting on their own patterns and for institutions trying to understand academic trends at the system level.

This is the story of what we built, how the technical pieces fit together, and what we learned about applying LLMs to sensitive domains under real constraints.

The Problem

College students spend most of their digital lives inside a browser. Course platforms (Canvas, Blackboard, Brightspace), library databases, email, Google Docs. The behavioral signals are rich: when they're active, how long they spend on assignments, whether they're opening class materials at all.

What's missing is a system that:

  1. Collects these signals passively (without requiring students to manually log anything)
  2. Analyzes them for patterns associated with disengagement or distress
  3. Surfaces actionable insights — to students themselves, and to educators — without being creepy or surveillance-heavy
  4. Does all of this in a way that's compliant with FERPA, the federal law governing student educational records

That last constraint isn't optional. Any tool that touches student behavioral data at a university must respect FERPA, which means data minimization, explicit consent, and strict controls on what can be shared with whom.

Architecture

The system has three components: a Chrome extension, a Node.js/Express backend, and a React dashboard.

The Chrome Extension is the data collection layer. It monitors activity on known educational domains (Canvas, Blackboard, Google Classroom, etc.) and logs behavioral signals:

  • Time spent on course pages vs. assignment pages
  • Time of day patterns (late-night studying vs. normal hours)
  • Frequency of platform visits relative to assignment due dates
  • Session length and engagement duration

The extension does not capture keystrokes, content of documents, or any personally identifiable content. It captures metadata patterns only. This was a deliberate FERPA-compliance decision — we're analyzing behavior, not content.

// Extension background script - captures engagement metadata only
chrome.tabs.onUpdated.addListener((tabId, changeInfo, tab) => {
  if (!isEducationalDomain(tab.url)) return;
 
  const signal = {
    domain: new URL(tab.url).hostname,
    pageType: classifyPage(tab.url), // 'course', 'assignment', 'grade', 'resource'
    timestamp: Date.now(),
    // No content, no URLs beyond domain classification
  };
 
  sendToBackend(signal, studentId);
});

Data is transmitted to the backend over HTTPS, associated with a pseudonymous student ID (not name or email at the collection layer), and stored with a 90-day retention window.

The WatsonX Integration

IBM WatsonX Granite was the required model for the hackathon (IBM was a sponsor). The Granite models are enterprise-focused — they're explicitly designed for domains where hallucination and reliability matter more than creative generation.

I was skeptical initially. But for this use case — generating structured interpretations of behavioral patterns rather than open-ended text — Granite performed well. The key was prompt engineering.

The core analysis task: given a student's behavioral metrics over the past two weeks, generate an assessment and recommendations.

def analyze_student_patterns(metrics: StudentMetrics) -> WellbeingAssessment:
    prompt = f"""You are an educational psychologist analyzing student engagement data.
 
STUDENT BEHAVIORAL METRICS (last 14 days):
- Average daily study time: {metrics.avg_daily_minutes} minutes
- Late-night sessions (after 11pm): {metrics.late_night_sessions} occurrences
- Assignment access before due date: {metrics.advance_access_rate:.0%}
- Platform engagement trend: {metrics.engagement_trend} (improving/stable/declining)
- Days with zero platform activity: {metrics.zero_activity_days}
 
Provide:
1. A wellbeing signal (green/yellow/red) with brief rationale
2. Two specific, actionable suggestions for the student
3. Whether educator check-in is recommended (yes/no with reason)
 
Base your assessment only on the behavioral patterns above.
Do not speculate about personal circumstances.
Respond in JSON format."""
 
    response = watsonx_client.generate(
        model_id="ibm/granite-13b-chat-v2",
        inputs=prompt,
        parameters={"max_new_tokens": 400, "temperature": 0.1}
    )
 
    return WellbeingAssessment.parse_raw(response.results[0].generated_text)

Low temperature (0.1) was important — we don't want creative interpretations of student distress signals. We want consistent, conservative analysis. The model at low temperature is essentially a structured reasoner over the input data.

The Dashboard

The dashboard has two views: student and educator.

Student view (15 cards) shows personal insights:

  • Weekly engagement summary
  • Study pattern analysis ("You study most effectively on Tuesday and Wednesday mornings")
  • Upcoming deadline readiness score
  • Sleep hygiene signal based on late-night session patterns
  • Personalized suggestions based on current patterns

The framing was deliberately positive and actionable rather than clinical. "Your engagement dropped this week — here are two things that might help" rather than "burnout risk: high."

Educator view (8 cards) shows class-level and individual-level signals:

  • Aggregate class engagement trends
  • Students with declining engagement flags (anonymized until educator takes action)
  • Suggested intervention timing based on historical patterns
  • Class-wide deadline stress indicators

The educator view doesn't show individual student data without a two-step process: the system flags a concern, the educator chooses to "request visibility," the student is notified. This is the FERPA compliance mechanism — no automatic disclosure of individual behavioral data to instructors.

FERPA Compliance by Design

FERPA compliance wasn't a checklist item — it shaped the architecture from the start.

The key decisions:

Data minimization. The Chrome extension captures the minimum behavioral signals needed, not everything technically possible. No content, no full URLs, no geolocation.

Pseudonymization at collection. The extension uses a local student ID that maps to the student's actual identity only in an encrypted lookup table the student controls. The behavioral data in the database is not directly linked to FERPA-protected records.

Consent flow. Students explicitly opt in during onboarding. The extension is not installed without consent. They can delete their data at any time.

Educator access controls. Educators see aggregate signals by default. Individual student data requires the notification-and-consent flow described above.

Retention limits. 90-day rolling window. Data older than 90 days is automatically deleted.

Building these constraints into the system from day one rather than bolting them on at the end is the difference between a tool universities can actually deploy and one that never makes it through legal review.

What 48 Hours Teaches You

The most useful lesson from the hackathon format: scope ruthlessly and ship something real.

Our initial whiteboard had 30 features. We shipped 8. But the 8 we shipped were coherent, working, and demonstrable. Teams that tried to build everything had nothing that worked at demo time.

The second lesson: the AI layer is only as good as the problem framing. Granite didn't magically analyze student wellbeing. We designed exactly what question to ask the model, what data to give it, and what format we needed back. The model's job was to be a reliable structured reasoner — which it was, when prompted correctly.

maintAIn placed 3rd in the Education track at the IBM SkillsBuild Hackathon at Ohio State University. But more importantly, the judges from IBM and OSU's student wellness office said the FERPA-compliant architecture was the thing that made it potentially deployable — not the AI, not the dashboard. The most important technical work was the boring compliance work.

The Bigger Picture

Student mental health is a genuine crisis on US campuses. Dropout rates are up. Counseling center wait times are weeks long. The signals that predict who needs help are often sitting in behavioral data that already exists.

LLMs won't solve this alone. But as a layer on top of real behavioral signals — trained to flag, not diagnose; to suggest, not prescribe — they can close the gap between "student struggling in silence" and "educator reaches out at the right moment."

That's a use case worth building for.