Duke AI Suite Models Guide
Not sure which model to use? Instead of memorizing model names, think in terms of clusters based on capability, access, and use case. Models can appear in more than one category, so pick the cluster that matches your needs. Additionally, we are always evaluating new models and sunsetting old or deprecated models, so this list is subject to change.
On-Prem Models (Private & Controlled)
What This Means: These models are hosted on Duke-managed GPUs within Duke’s campus infrastructure, so your data never leaves the Duke network or Duke servers.
Use When: You prioritize privacy and data control above all else.
Available Models: Mistral On-Prem
Best For:
- Research use cases (e.g., pre-publication data)
- Internal-only tools like department-specific chatbots
- Student-related data and academic projects where keeping data on campus is ideal
Note: These models are approved for use with sensitive data under Duke policy, except for PHI or health-related data.
Cloud-Hosted Models (Azure and OpenAI Backed, Secure)
What This Means: Our cloud models are all hosted via Azure under Duke’s data security agreement with Microsoft.
Use When: You need advanced model capabilities or a variety of models.
Available Models: See Model tables below
Best For:
- Sophisticated reasoning, long context, high performance tasks
- AI-enhanced learning and faculty tools
- Supports experimentation by offering a variety of models—allows users to run simultaneous requests, compare outputs, and find the best fit for their needs.
Note: These models are approved for use with sensitive data under Duke policy, except for PHI or health-related data.
Reasoning Models (Deep Logic & Long-Form Thinking)
What This Means: Typical AI models just produce an answer one word at a time. Reasoning models go a step further by constructing logical chains of thought to solve complex, multi-step problems.
Use When: You’re tackling a complex problem that requires extended logic, memory, or planning.
Available Models: GPT-5.2, o3-deep-research, o4-mini-deep-research
Best For:
- Long-form writing or research synthesis
- Planning, simulations, step-by-step breakdowns
Not Great For: Rapid chat or casual Q&A, due to the longer “reasoning” time and additional resources it takes to produce a response.
Rate-Limited Models (High Power, Use Wisely)
What This Means: Some models in DukeGPT have usage limits to prevent runaway costs and ensure fair access. Users have a defined limit on these models, which resets daily.
Use When: You need the best model output, but in limited quantities.
Affected Models: See Model Costs tables below
Recommended Backups: Any on-prem model
Best For:
- Research synthesis, paper writing, coding deep dives
- Anything where quality outweighs quantity
Specialty Models (Code, Engineering, Math)
What This Means: Some models are better than others when it comes to specific purposes other than creating chatbots such as coding, logic, math-based queries, and transcribing speech to text.
Use When: You need help with logic-heavy, numerical, or technical problems.
Available Models:
- Coding Models: GPT-5.2-codex, GPT-5.1-codex, GPT-5.1-codex-mini, GPT-5.1-codex-max
- Speech-to-Text Models: GPT-4o-transcribe, GPT-4o-transcribe-diarize, whisper-1
- Text-Embedding Models: text-embedding-3-small, text-embedding-3-large
Best For:
- Coding assignments or Python notebooks
- Solving math problems, formula generation
- Logic-heavy reasoning tasks and structured responses
General Models & Costs
|
Model |
Company |
Cloud vs. On-Prem |
Input Cost (per 1M tokens) |
Output Cost (per 1M tokens) |
|
GPT-5.2, GPT-5.2-chat |
OpenAI |
Cloud |
$1.75 | $14.00 |
|
GPT-5.1, GPT-5.1-chat |
OpenAI |
Cloud |
$1.25 |
$10.00 |
|
GPT-5, GPT-5-chat |
OpenAI |
Cloud |
$1.25 |
$10.00 |
|
GPT-5-mini |
OpenAI |
Cloud |
$0.25 |
$2.00 |
|
GPT-5-nano |
OpenAI |
Cloud |
$0.05 |
$0.40 |
| GPT-4.1 | OpenAI | Cloud | $2.00 | $8.00 |
| GPT-4.1-mini | OpenAI | Cloud | $0.40 | $1.60 |
| GPT-4.1-nano | OpenAI | Cloud | $0.10 | $0.40 |
|
GPT-OSS 120B |
OpenAI |
Cloud |
$0.15 | $0.60 |
|
Llama 3.3 |
Meta |
Cloud |
$0.71 |
$0.71 |
|
Llama 4 Maverick |
Meta |
Cloud |
$0.35 |
$1.41 |
|
Llama 4 Scout |
Meta |
Cloud |
$0.20 |
$0.78 |
|
Mistral |
Mistral |
On-premise |
no cost |
no cost |
Specialty Models & Costs
|
Model |
Company |
Cloud vs. On-Prem |
Input Cost (per 1M tokens) |
Output Cost (per 1M tokens) |
|
GPT-5.2-codex |
OpenAI |
Cloud
|
$1.75 |
$14.00 |
| GPT-5.1-codex | OpenAI | Cloud | $1.25 | $10.00 |
| GPT-5.1-codex-mini | OpenAI | Cloud | $0.25 | $2.00 |
| GPT-5.1-codex-max | OpenAI | Cloud | $1.25 | $10.00 |
| GPT-4o-transcribe | OpenAI | Cloud | $2.50 | $10.00 |
| GPT-4o-transcribe-diarize | OpenAI | Cloud | $2.50 | $10.00 |
| o4-mini | OpenAI | Cloud | $1.10 | $4.40 |
|
o4-mini-deep-research |
OpenAI |
Cloud |
$2.00 |
$8.00 |
| o3 | OpenAI | Cloud | $10.00 | $40.00 |
| o3-deep-research | OpenAI | Cloud | $10.00 | $40.00 |
|
text-embedding-3-small |
OpenAI |
Cloud |
$0.02 |
- |
| text-embedding-3-large | OpenAI | Cloud | $0.13 | - |
|
whisper-1 |
OpenAI |
Cloud |
$0.006 per minute |
$0.006 per minute |
Article number: KB0038832
Valid to: February 2, 2027