IN THIS LESSON

Get ready to step inside the engine room of every AI system you’ll ever meet. Over a quick video presentation, you’ll trace the Data → Model → Output pipeline, be asked questions that test your instincts, and see exactly where improvement—and bias—sneak in. By the time you finish, you’ll be able to sketch this three-step map on a whiteboard and use it to decode any AI-powered claim that lands in your inbox.

Try This Tomorrow

  1. Quick Audit: Pick one digital tool you use. Find its website. Can you identify its training data?

  2. Student Activity: Have students diagram the AI pipeline for a tool they use (Grammarly, TikTok, etc.)

  3. Team Discussion Starter: "Should our essay feedback tool learn from our specific students or use generic training?"

FAQ Section

  • This is a red flag. Legitimate AI companies are typically transparent about their training data sources (even if they protect the exact datasets for competitive reasons). They should at least be able to tell you:

    • General categories (e.g., "10 million student essays from diverse US schools")

    • Data collection methods

    • Steps taken to ensure diversity and reduce bias

    • Privacy protections used

    What to do: Ask specifically: "Can you describe the types and sources of data used to train your model?" If they deflect with vague claims about "proprietary algorithms," you're likely dealing with either:

    • Simple automation dressed up as AI

    • A company that hasn't properly considered bias and representation

    • A tool using questionable data sources

    Your move: Consider vendors who can clearly articulate their data practices. No transparency = no trust = no purchase.

  • It depends on the use case:

    • Rapidly changing domains (current events, slang detection, social media trends): Weekly to monthly

    • Stable educational patterns (essay quality, math problem-solving): Quarterly to annually

    • Behavioral patterns (student engagement, dropout risk): Each semester or term

    • Core language understanding: May only need updates every 1-2 years

    Key insight: More frequent isn't always better. Retraining too often can make the system unstable and harder for teachers to predict. The sweet spot is retraining often enough to stay current but not so often that the tool's behavior becomes erratic.

    Ask vendors: "What's your retraining schedule and why?" Good answers include specific timelines with educational rationales. Worried if they say "constantly learning from every interaction"—this often means they're collecting data without proper retraining protocols.

  • Yes, but carefully. Your classroom generates valuable, context-specific data that could improve AI tools for your specific needs. However, consider:

    ✅ Good training data practices:

    • Get explicit consent from students/parents

    • Anonymise all personally identifiable information

    • Focus on patterns, not individual students

    • Ensure diverse representation across all student groups

    • Create clear data retention/deletion policies

    ⚠️ What to avoid:

    • Using data that could reinforce existing biases

    • Including only high-performing students

    • Sharing raw data with vendors without agreements

    • Creating datasets too small to be meaningful (usually need 1000+ examples)

    Best approach: Partner with your IT department and consider working with universities or established EdTech companies who have proper data handling protocols. Some districts are creating consortium datasets that benefit everyone while protecting individual privacy.

  • This is the most critical question to ask any AI vendor.

    Essential protections to demand:

    1. Data minimisation: Only collect what's needed for improvement

    2. Anonymisation: Strip all identifying information before training

    3. Consent protocols: Clear opt-in/opt-out mechanisms

    4. Data retention limits: Automatic deletion after specified periods

    5. No selling/sharing: Training data stays within the educational context

    6. Audit rights: Ability to see what data is collected and request deletion

    Red flags to watch for:

    • Vague privacy policies with phrases like "we may use data to improve services"

    • No mention of FERPA/COPPA compliance

    • Requirements to share student work without anonymisation

    • No clear data deletion process

    Best practice: Look for vendors who:

    • Use differential privacy techniques

    • Train on aggregated, not individual, data

    • Have third-party privacy audits

    • Provide teacher dashboards showing what data is collected

    • Allow schools to improve the AI without exposing student data

    Remember: Good AI can improve without compromising privacy. If a vendor says they need identifiable student data for their feedback loop, find a different vendor. The technology exists to do this right—accept nothing less.