Duke AI Suite Models Guide

Not sure which model to use? Instead of memorizing model names, think in terms of clusters based on capability, access, and use case. Models can appear in more than one category, so pick the cluster that matches your needs. Additionally, we are always evaluating new models and sunsetting old or deprecated models, so this list is subject to change.

On-Prem Models (Private & Controlled)

What This Means: These models are hosted on Duke-managed GPUs within Duke’s campus infrastructure, so your data never leaves the Duke network or Duke servers.

Use When: You prioritize privacy and data control above all else.

Models: Mistral On-Prem

Best For: 

  • Research use cases (e.g., pre-publication data) 
  • Internal-only tools like department-specific chatbots 
  • Student-related data and academic projects where keeping data on campus is ideal 

Note: These models are approved for use with sensitive data under Duke policy, except for PHI or health-related data.

Cloud-Hosted Models (Azure-Backed, Secure)

What This Means: Our cloud models are all hosted via Azure under Duke’s data security agreement with Microsoft.

Use When: You need advanced model capabilities or a variety of models.

Available Models:  

  • GPT-5-chat, GPT-5, GPT-5-mini, GPT-5-nano, GPT-oss 120b 
  • LLaMA 4 Scout, LLaMA 4 Maverick, LLaMA 3.3 70B
  • Removed in DukeGPT and deprecated in myGPT Builder and AI Gateway: GPT-4.1, GPT-4.1 Mini, GPT-4.1 Nano, o3, o4-mini 

Best For:

  • Sophisticated reasoning, long context, high performance tasks
  • AI-enhanced learning and faculty tools
  • Supports experimentation by offering a variety of models—allows users to run simultaneous requests, compare outputs, and find the best fit for their needs.

Note: These models are approved for use with sensitive data under Duke policy, except for PHI or health-related data.

Reasoning Models (Deep Logic & Long-Form Thinking)

What This Means: Typical AI models just produce an answer one word at a time. Reasoning models go a step further by constructing logical chains of thought to solve complex, multi-step problems.   

Use When: You’re tackling a complex problem that requires extended logic, memory, or planning. 

Available Models: GPT-5 

Best For:

  • Long-form writing or research synthesis 
  • Planning, simulations, step-by-step breakdowns 

Not Great For: Rapid chat or casual Q&A, due to the longer “reasoning” time and additional resources it takes to produce a response.

Rate-Limited Models (High Power, Use Wisely) 

What This Means: Some models in DukeGPT have usage limits to prevent runaway costs and ensure fair access. Users have a defined limit on these models, which resets daily.  

Use When: You need the best model output, but in limited quantities. 

Affected Models: GPT-5-chat, GPT-5 

Recommended Backups: GPT-5-mini, GPT-5-nano, or any on-prem model 

Best For: 

  • Research synthesis, paper writing, coding deep dives 
  • Anything where quality outweighs quantity

STEM-Focused Models (Engineering, Math, Code)

What This Means: Some models are better than others when it comes to coding, logic, and math-base queries. Try different models to see what works best, but you can start with these.

Use When: You need help with logic-heavy, numerical, or technical problems.

Available Models: Mistral 24B, GPT-5

Best For:

  • Coding assignments or Python notebooks
  • Solving math problems, formula generation
  • Logic-heavy reasoning tasks and structured responses

Multilingual Models (Language Flexibility)

Use When: You need responses in non-English languages or support across multilingual materials.

Available Models: All models

Best For:

  • Translating course content or creating global-facing tools
  • Supporting international students in their native language

Model Costs

Model 

Company 

Cloud vs. On-Prem 

Input Cost (per 1M tokens) 

Output Cost (per 1M tokens) 

Llama 3.3 

Meta 

Cloud 

$0.71 

$0.71 

Llama 4 Maverick 

Meta 

Cloud 

$0.35 

$1.41 

Llama 4 Scout 

Meta 

Cloud 

$0.20 

$0.78 

GPT-5, GPT-5-chat

OpenAI

Cloud 

$1.25

$10.00

GPT-5-mini

OpenAI 

Cloud 

$0.25 

$2.00

GPT-5-nano

OpenAI 

Cloud 

$0.05 

$0.40 

GPT-OSS 120B 

OpenAI 

Cloud 

$0.15

$0.06

text-embedding-3-small

OpenAI 

Cloud 

$0.02 

Mistral 

Mistral 

On-premise 

no cost 

no cost 

 

Article number: KB0038832

Valid to: October 28, 2026