IN THIS LESSON
Get ready to step inside the engine room of every AI system you’ll ever meet. Over a quick video presentation, you’ll trace the Data → Model → Output pipeline, be asked questions that test your instincts, and see exactly where improvement—and bias—sneak in. By the time you finish, you’ll be able to sketch this three-step map on a whiteboard and use it to decode any AI-powered claim that lands in your inbox.
Try This Tomorrow
Quick Audit: Pick one digital tool you use. Find its website. Can you identify its training data?
Student Activity: Have students diagram the AI pipeline for a tool they use (Grammarly, TikTok, etc.)
Team Discussion Starter: "Should our essay feedback tool learn from our specific students or use generic training?"
FAQ Section
-
This is a red flag. Legitimate AI companies are typically transparent about their training data sources (even if they protect the exact datasets for competitive reasons). They should at least be able to tell you:
General categories (e.g., "10 million student essays from diverse US schools")
Data collection methods
Steps taken to ensure diversity and reduce bias
Privacy protections used
What to do: Ask specifically: "Can you describe the types and sources of data used to train your model?" If they deflect with vague claims about "proprietary algorithms," you're likely dealing with either:
Simple automation dressed up as AI
A company that hasn't properly considered bias and representation
A tool using questionable data sources
Your move: Consider vendors who can clearly articulate their data practices. No transparency = no trust = no purchase.
-
It depends on the use case:
Rapidly changing domains (current events, slang detection, social media trends): Weekly to monthly
Stable educational patterns (essay quality, math problem-solving): Quarterly to annually
Behavioral patterns (student engagement, dropout risk): Each semester or term
Core language understanding: May only need updates every 1-2 years
Key insight: More frequent isn't always better. Retraining too often can make the system unstable and harder for teachers to predict. The sweet spot is retraining often enough to stay current but not so often that the tool's behavior becomes erratic.
Ask vendors: "What's your retraining schedule and why?" Good answers include specific timelines with educational rationales. Worried if they say "constantly learning from every interaction"—this often means they're collecting data without proper retraining protocols.
-
Yes, but carefully. Your classroom generates valuable, context-specific data that could improve AI tools for your specific needs. However, consider:
✅ Good training data practices:
Get explicit consent from students/parents
Anonymise all personally identifiable information
Focus on patterns, not individual students
Ensure diverse representation across all student groups
Create clear data retention/deletion policies
⚠️ What to avoid:
Using data that could reinforce existing biases
Including only high-performing students
Sharing raw data with vendors without agreements
Creating datasets too small to be meaningful (usually need 1000+ examples)
Best approach: Partner with your IT department and consider working with universities or established EdTech companies who have proper data handling protocols. Some districts are creating consortium datasets that benefit everyone while protecting individual privacy.
-
This is the most critical question to ask any AI vendor.
Essential protections to demand:
Data minimisation: Only collect what's needed for improvement
Anonymisation: Strip all identifying information before training
Consent protocols: Clear opt-in/opt-out mechanisms
Data retention limits: Automatic deletion after specified periods
No selling/sharing: Training data stays within the educational context
Audit rights: Ability to see what data is collected and request deletion
Red flags to watch for:
Vague privacy policies with phrases like "we may use data to improve services"
No mention of FERPA/COPPA compliance
Requirements to share student work without anonymisation
No clear data deletion process
Best practice: Look for vendors who:
Use differential privacy techniques
Train on aggregated, not individual, data
Have third-party privacy audits
Provide teacher dashboards showing what data is collected
Allow schools to improve the AI without exposing student data
Remember: Good AI can improve without compromising privacy. If a vendor says they need identifiable student data for their feedback loop, find a different vendor. The technology exists to do this right—accept nothing less.