Summary of the Fine Print on AI Data Use

Most people don't think twice before pasting a document, uploading a file, or typing a sensitive question into an AI chat interface. But depending on which tool you use and which plan you're on, that data may be stored, reviewed by humans, and used to train future versions of the model, potentially for years.

For most everyday use cases, this is a minor concern. But in regulated environments like pharmaceutical manufacturing, clinical research, healthcare, or any setting involving personal health information, this matters a great deal. GMP-related processes might contain intelectual property and patient data is protected under GDPR, HIPAA, and sector-specific regulations that impose hard limits on secondary use. Using a consumer AI tool in these contexts, without understanding its data practices, could put you in breach of your data protection obligations.

The good news is that almost every major provider draws a clear line: consumer tiers often train on your data by default; enterprise and paid API tiers usually don't. The bad news is that the details vary significantly and you will have to analyse data protection policies of the model providers (or consult the summary below!).

Provider overview

Provider	Consumer tier training?	Enterprise/API training?	Retention	Controls & caveats	Enterprise protections
OpenAI	Yes (unless opted out)	No (default off)	Consumer chats: until deleted, then ~30 days. API logs: 30 days.	Data Controls toggle in settings. Temporary Chat mode available. Feedback interactions (thumbs up/down) may still trigger training even after opt-out.	DPA + Services Agreement. Enterprise/Business tiers: no training by default.
Anthropic	Only if 'Help improve Claude' is enabled	No (requires express permission)	If improvement enabled: up to 5 years (de-identified). If not: 30 days. API: 30 days default; zero data retention option available.	Privacy settings toggle. Incognito chats excluded from training regardless of setting. Deleting a conversation prevents it being used for future training.	DPA with SCCs in Commercial Terms. Zere Data Retention option for API. No training without express permission.
Google Gemini	Yes. Chats and uploads used to improve/train models	Unpaid API: yes. Paid API: no.	Consumer: 18-month auto-delete default; human-reviewed chats kept up to 3 years even after deletion. Paid API: limited abuse-monitoring window.	Gemini Apps Activity auto-delete (adjustable). Manual deletion available. Human-reviewed chats persist separately and cannot be deleted retroactively.	Workspace and Google Cloud: no training on customer data without permission. Paid Gemini API: DPA, processor mode, no training. EEA/UK/CH users get paid-tier protections even on free quota.
Alibaba Cloud (Qwen / Model Studio)	Unspecified (consumer Qwen Chat policy not verified from primary sources)	No. explicitly states “will never use your data for model training”	Direct API calls: conversation data not saved (only call status). Assistant API mode: conversation history retained with no expiration date.	Mode-dependent. Direct API = no retention. Assistant API = indefinite retention of conversation history. No ZDR arrangement documented. ⚠ Third-party models in Model Studio may route data to external providers — check each model's own terms.	AES-256 encryption stated. GDPR DPA specifics not confirmed in available sources. Third-party models in the platform may have different policies.
Mistral AI	Yes (Le Chat Free/Pro/Student) - unless opted out	Paid 'Scale' API: no. Free/Experiment tier: yes.	Le Chat: until you delete the chat or account. API: 30 rolling days for abuse monitoring. Agents API and Fine-Tuning API: until account termination.	Opt-out available for consumer tiers. Zero data retention: not available for Le Chat; may be requested for AI Studio subject to approval. Feedback (thumbs up/down with comment) authorises training use regardless of opt-out.	DPA available. Le Chat Team/Enterprise and paid API: no training, cannot opt in. Processor-mode stated.
DeepSeek	Yes. privacy policy explicitly covers training use of prompts and uploaded files	Yes/likely. No enterprise “no training” clause identified	“As long as necessary”, example given: as long as account exists. No fixed retention period documented.	No documented “do not train” toggle. User rights (erasure, restriction) exist in the policy but no training-specific opt-out identified.	No DPA or enterprise contractual protections identified.

The practical takeaway for regulated environments

If you're working with GMP data, patient information, or anything subject to data protection regulation, the default assumption should be that consumer-tier tools are off-limits until you've confirmed the provider's enterprise terms. Most providers offer a compliant path, but it requires being on the right tier, ideally with a signed DPA, and in some cases a zero data retention arrangement.