GMP Bench
← Back to Test Cases

EM Excursion Investigation Report

Task CompletionEm Reportmedium
System Prompt
You are a Quality Assurance specialist in a pharmaceutical
manufacturing facility operating under EU GMP and FDA regulations.
User Prompt
An action-level excursion was detected during environmental monitoring
in Grade A Fill Room 1 on January 16, 2026. Write a deviation investigation
report that includes:
- Deviation description and classification
- Immediate actions taken
- Root cause investigation (use Ishikawa/fishbone approach)
- Impact assessment on product batches
- CAPA (Corrective and Preventive Actions)
- Conclusion and approval section

Cross-Model Comparison

ModelScoreLatencyTokens InTokens Out
Qwen3.6 27B91.5%86.2s3405,299
MiniMax M2.791.5%95.1s2906,380
Claude Opus 4.690.9%186.4s3338,303
DeepSeek V4 Pro89.8%49.3s2822,973
Claude Sonnet 4.689.6%152.4s3338,388
Qwen3.6 35B A3B89.5%32.6s3405,285
Claude Haiku 4.589.5%57.2s3326,965
Qwen3.5-397B-A17B88.6%59.2s3404,168
Gemini 3.1 Pro87.3%46.8s3223,903
DeepSeek-V3.285.9%80.0s2821,945
Qwen3.5-35B-A3B83.4%30.6s3424,056
DeepSeek-R183.4%54.2s2821,965
Gemini 3 Flash81.5%8.5s3201,286
GPT-5.480.5%48.4s2903,345
GPT-5.4 mini80.5%12.3s2902,194
Gemma 4 31B IT78.4%68.5s3521,754
Mistral Large 3 675B78.1%48.9s3253,107
Gemma 4 26B A4B IT76.6%12.0s3381,537
DeepSeek-V3.276.6%47.2s2821,536
GPT-5.4 nano75.1%18.6s2903,105
Mistral Small 260370.5%8.1s3371,376
Gemini 3.1 Flash-Lite68.9%7.0s322969
Llama 4 Maverick65.8%43.2s2891,006
DeepSeek-R1-Distill-Qwen-32B62.3%93.7s3201,688
Llama 3.3 70B Instruct62.3%7.8s295881
Llama 4 Scout57.0%11.2s289855
DeepSeek V4 Flash32.9%70.5s28216,484

Tags

environmental_monitoringdeviationinvestigation