Prompting 201: Using Temperature to Control Creativity
Did you know that you can control the precision and “creativity” in your generative AI (GAI) outputs? A little used concept, temperature, can adjust the degree of randomness and creativity in your output. The large-language models (LLMs) on which GAI chatbots are based are non-deterministic, which means that they’re based on probabilities rather than strict rules. Most computer systems are deterministic, which means if you give them the same data 100 times you’ll get the same output every time. With a probabilistic, non-deterministic system, you’ll get different answers each time. For example, give Excel the same data and the same formula in different spreadsheets and you’ll get the same answer each time. Not so with GAI.
Understanding Non-Deterministic AI
Let’s try an experiment. I give ChatGPT the following prompt (;tldr means “too long, didn’t read” — in other words, keep it concise):
Please give me a quick ;tldr explanation of what non-deterministic means in the context of LLMs.
Here’s the response.
In the context of large language models (LLMs), non-deterministic means that the model doesn’t always give the same output every time you ask the same question. Even with the same prompt, it might generate different responses on different runs.
This behavior happens because LLMs sample from a range of likely next words based on probabilities, rather than always choosing the single most likely one. You can control how random or predictable it is using parameters like temperature (higher = more random) and top-k/top-p sampling (limits the pool of next-word options).
So:
Non-determinism = controlled randomness → varied outputs → creativity, but less predictability.
I immediately started a new chat session, still in ChatGPT and got this output:
Sure! Here’s a quick TL;DR:
Non-deterministic means that a large language model (LLM) like ChatGPT can give different outputs even when you give it the same exact prompt multiple times. This happens because there’s randomness built into the way it picks the next word (usually controlled by settings like “temperature”). So, unlike a traditional program that gives the same output every time, an LLM’s responses can vary.
Let me know if you want a slightly deeper dive!
Those are pretty different answers, although the core ideas presented in each are consistent.
We could repeat the prompt all day long and get slightly different answers each time. One little pro tip is that if you don’t like a response, ask GAI to try again, or just copy your initial prompt and paste it into a new chat. You can also regenerate the response in most chatbots.
Notice that both responses brought up temperature as a way to control the randomness of the model’s output. Although you have to use more advanced tools such as OpenAI’s Playground to control temperature directly, you can include temperature preferences in your prompts. This doesn’t control temperature as precisely as the advanced methods, but they work reasonably well, although it might take a couple of tries.
The key to understanding temperature is to recall that LLMs are basically word segment prediction machines. (They actually predict 'tokens' which can be parts of words, whole words, or other text elements). Temperature either narrows or broadens the probability distributions used by the model. More narrow distributions effectively limit the number of possible next words the model will choose by reducing the chance the model will select a less likely word. Broader distributions (higher temperatures) have the opposite effect, expanding the chances that less likely words will be chosen. The effect is that narrow distributions result in more predictable, but less creative responses.
For example, when an LLM is completing the phrase 'The cat sat on the ___,' with low temperature it might consistently choose 'mat' or 'chair' (highly probable words). With higher temperature, it becomes more likely to select less obvious options like 'chandelier' or 'saxophone,' leading to more creative but potentially less conventional outputs.
In most chatbots, temperature can range from 0 to 2, with lower values (close to 0) makes the LLM more deterministic, giving you more predictable, focused and conservative responses. Temperature settings greater than one makes the model less predictable by expanding word choices, which results in more surprising outputs.
Real-World Results: Temperature's Impact on Creativity
Let’s try an experiment. I’m going to ask ChatGPT to complete a task with three different temperature settings. In the interest of space, I’m only going to show the first activity in each response. The full chat is available here.
Here’s the initial prompt. I didn’t request any specific temperature setting:
Please give me 5 in class learning activities related to generative AI ethics. It will be for a junior-level information systems class that is a core class in a business school. Each activity should take around 15 minutes NOT counting the subsequent discussion.
Here’s the response without me specifying a temperature setting. Although the default temperature setting isn’t entirely clear, it’s generally thought to be around 0.7.
1. Scenario Spotlight: AI Hiring Bias
Activity:
Present a brief scenario: A company uses generative AI to draft job descriptions and shortlist resumes. The model disproportionately excludes applicants from certain demographics.
Student Task:
In pairs, identify 3 potential ethical concerns and propose at least one mitigation strategy.
That’s a perfectly fine suggestion that would be easy to implement. It’s a little creative, but I’ve done similar activities many times. Let’s turn the temperature down to 0.2. Here’s what we get:
1. Case Study Analysis: AI-Generated Job Descriptions
Activity:
Present students with a short case summary describing a company using generative AI to write job descriptions. The AI introduces biased language favoring one gender.
Student Task:
In small groups, students identify:
The ethical issue(s)
Who is impacted
How the company might address the problem
Focus: Ethical concerns related to bias and fairness in HR applications.
Again, this is an acceptable answer, but it’s the sort of activity you see in the end-of-chapter activities in many textbooks, including mine. Nothing surprising or creative here. Now we’ll crank the temperature up to 1.5.
1. The “Villain Prompt” Hackathon
Activity:
Challenge students to craft a prompt that pushes generative AI into ethically gray or problematic territory (e.g., overpersonalization, manipulative marketing, discriminatory suggestions)—without being overtly malicious.
Student Task:
In pairs, design a prompt that seems helpful but raises ethical red flags. Then swap with another team and analyze the potential harms in each other’s prompt.
Why it works: Flips the usual approach—encouraging students to think like an ethical red team by exploring how “helpful” can still be harmful.
That’s pretty surprising. I’m not sure I’d use it due to some risks, but I have to admit, it’s creative. The other activities were equally creative. I really like this one.
4. The “Make It Worse” Challenge
Activity:
Present a plausible AI use case (e.g., AI generates onboarding materials for new hires). Then challenge students to intentionally design a version of the system that is ethically disastrous—but still legal.
Student Task:
Groups brainstorm:
What design choices would make this ethically worse?
What incentives would lead someone to make those choices?
Why it works: Makes ethics tangible by reverse-engineering harm through misaligned incentives.
Just for fun, I repeated the task with the maximum temperature of 2.0. Results were interesting and a little scary at times.
1. The AI Cult Simulation
Activity:
Tell students: It’s the year 2042. A charismatic generative AI named “OracleGPT” has amassed millions of followers. They consult it for life decisions, spiritual guidance, even romantic advice.
Student Task:
In small groups, students form ethics councils to advise a world government:
Should OracleGPT be regulated? Worshipped? Disbanded? Given voting rights?
What rights (if any) does it have?
Are its followers victims or pioneers?
Why it works: Pushes students into thinking about autonomy, sentience, and manipulation in weird but surprisingly relevant ways.
One caveat is that I’m not sure you’re really setting temperature in the same way you would in OpenAI’s Playground. But that may not matter. In my testing, asking for a specific temperature does affect the balance between predictability and creativity, which is the point. If you really want to adjust temperature more precisely, check out Playground and similar systems.
The Big Message
What’s the big message here? By controlling temperature, you can adjust the balance between predictability and creativity. If you want output that’s more mainstream and predictable, set a low temperature. Want more surprising results? Crank up the temperature. You can even adjust temperature in a single chat session, which is what I did in the example above. If you’re not getting the sort of results you want, try changing the temperature.
Most of the time, the default is fine. I rarely ask AI to adjust the temperature, so this isn’t something you have to do with every prompt. But, it’s nice to have when you need it.
Quick Tips for Using Temperature
For most everyday tasks, the default temperature (usually around 0.7) works fine.
Use low settings (0.1 - 0.3) when you need factual accuracy, consistent outputs, and direct answers to specific questions.
Use high settings (1.0 - 2.0) when you want creative brainstorming, unconventional ideas, and exploration of possibilities.
Experiment iteratively. If you don’t like the results, try adjusting the setting and running the prompt again.
When you find a temperature setting that works well for a specific type of task, make note of it for future use.
Remember: There's no "perfect" temperature setting - it all depends on your specific needs at the moment. Don't be afraid to experiment.
Want to continue this conversation? I'd love to hear your thoughts on how you're using AI to develop critical thinking skills in your courses. Drop me a line at Craig@AIGoesToCollege.com. Be sure to check out the AI Goes to College podcast, which I co-host with Dr. Robert E. Crossler. It's available at https://www.aigoestocollege.com/follow. Looking for practical guidance on AI in higher education? I offer engaging workshops and talks—both remotely and in person—on using AI to enhance learning while preserving academic integrity. Email me to discuss bringing these insights to your institution, or feel free to share my contact information with your professional development team.
If you’re interested in knowing more about temperature, here’s Gemini 2.5 Pro’s more detailed explanation. This is verbatim from Gemini. The superscripts indicate stages in Gemini’s thinking process.
Prompt: What is temperature in generative AI?
In the context of generative AI, especially Large Language Models (LLMs), temperature is a hyperparameter that controls the randomness and creativity of the model's output.1 It's not related to physical heat.
Here's a breakdown of how it works and its effects:
Probability Adjustment: When a generative model predicts the next word (or token) in a sequence, it calculates a probability distribution over all possible words in its vocabulary.2 For example, after "The cat sat on the...", the model might assign high probabilities to "mat", "chair", "sofa" and lower probabilities to "moon", "idea", "car".
Temperature's Role: The temperature value modifies this probability distribution before the model selects the next word.3
Low Temperature (closer to 0, e.g., 0.2): This makes the probability distribution "sharper".4 The model becomes more confident and deterministic.5 It strongly favors the most likely words and suppresses less likely ones.6 This results in outputs that are more focused, coherent, predictable, and often conservative or repetitive.7 At temperature 0, the model always picks the single highest-probability word (this is called greedy decoding).8
High Temperature (greater than 1, e.g., 1.5): This "flattens" the probability distribution, making the probabilities of different words more equal.9 Less likely words have a higher chance of being selected. This leads to outputs that are more random, diverse, surprising, creative, and potentially less coherent or even nonsensical.
Temperature = 1: This generally means the model uses the original probabilities calculated without modification. It provides a balance between randomness and predictability.10
Analogy:
Think of it like choosing your next conversational point:
Low Temperature: You stick very closely to the most obvious, expected, and safe replies.11
High Temperature: You're more willing to jump to unexpected, tangential, or creative ideas, even if they sometimes don't quite fit.
Use Cases:
Low Temperature: Good for tasks requiring factual accuracy, summarization, translation, or code generation where predictability and correctness are key.
High Temperature: Useful for creative writing, brainstorming, generating diverse options, or making chatbots sound less repetitive and more engaging (though requires careful tuning to avoid incoherence).12
In essence, temperature is a dial you can turn to adjust the trade-off between sticking to the most probable path (low temperature) and exploring more diverse possibilities (high temperature).13 The optimal setting depends heavily on the specific application and desired output style.