Image Generation and Editing

Image generation improves quickly when you stop describing a mood and start defining a brief. The same is true for editing. Good edits preserve what matters and change only what you intend to change.

Show a first image brief, then an edit brief that changes one variable while preserving the rest.

What you'll learn

What belongs in a good image brief.
How to phrase edits without accidentally rewriting the whole scene.
How to iterate visually with purpose.
What changed with native image generation and why it matters for your workflow.

Why this matters

This matters because weak image prompts produce endless rerolls. A stronger brief gives the model a visual target. A stronger edit request protects continuity across iterations.

The skill is useful for mockups, concept art, internal communication, creative exploration, and simple visual production tasks.

The underlying model

As of March 2025, ChatGPT generates images natively using GPT-4o rather than calling the separate DALL-E 3 model. This was later upgraded to GPT Image 1.5 in December 2025. The shift is architectural: image generation is now a built-in multimodal capability rather than a handoff to an external model. All generated images include C2PA metadata for provenance tracking.

What this changes in practice:

Text rendering is dramatically improved. Legible text in images was DALL-E 3's biggest weakness and is now reliable.

Iterative refinement happens through conversation. You can ask for changes and the model adjusts the existing image rather than regenerating from scratch.

Photorealism is significantly better, with more natural lighting, skin tones, and materials.

Speed is up to 4x faster than DALL-E 3 generation.

Availability: image generation is available to both free and paid users.

DALL-E 3 remains accessible through a dedicated DALL-E GPT but is being deprecated from the API (May 2026).

The core idea

For generation, think in components: subject, setting, composition, style, lighting, mood, and constraints. You do not need every component every time, but you need enough to create a stable visual intention.

The reason component-based thinking matters is that vague prompts produce unpredictable images, and unpredictable images lead to endless rerolls. When you write "make a cool workspace image," you are leaving every visual decision to the model: what kind of workspace, what angle, what lighting, what style, what mood, what objects, what color palette. Each of those decisions compounds, so the result is essentially random within a broad category. But when you specify even three or four of those components, you constrain the output space enough to get something close to what you imagined. The remaining unspecified components still vary, but they vary within a much smaller range.

For editing, tell ChatGPT what must remain the same and what should change. That is the visual equivalent of preserving meaning in a rewrite. The editing principle is particularly important because it is counterintuitive. Most people describe only what they want changed, but the model also needs to know what to preserve. Without preservation instructions, an edit request can drift the entire image in unexpected ways: changing the background when you only wanted to adjust the lighting, or altering a character's expression when you only wanted to change their shirt color.

How it works

Start with a concise visual brief rather than a loose adjective cloud.
When editing, name the preserved elements first, then the change.
Iterate one or two variables at a time so you can tell what improved the result.
Use the region selection tool for precise edits rather than describing the location in words.

What skilled users do differently

A novice writes a prompt full of adjectives and hopes the result matches what they imagined. When it does not, they reroll or add more adjectives, creating a cycle of random generation that rarely converges on the right image.

A skilled user writes a brief, not a wish. They specify the concrete elements: subject, setting, composition, style, and constraints. They leave room for the model to make good creative decisions on the dimensions they care less about. When they need to iterate, they change one variable at a time and state what should remain fixed. This systematic approach converges faster because each iteration provides useful signal about what works and what does not.

Skilled users also think about the downstream use of the image. They specify framing (vertical, horizontal, square) based on where the image will appear. They mention technical constraints like "no text overlay needed" or "leave space on the right third for copy." They treat image generation as a production task with requirements, not a creative lottery.

Editing capabilities

The editing workflow is now more precise than full re-generation:

Region selection: Use the selection tool in the ChatGPT UI to highlight a specific area of the image for targeted edits.
Likeness and composition preservation: The model maintains facial likeness, lighting, and overall composition across edits.
Text fixes: Text within images can be corrected or adjusted without regenerating the full image.
Describe what should stay the same before describing what should change. This anchors the model.

Two worked examples

Example 1: a vague request

Make a cool image of a workspace.

This prompt is weak because "cool" is not a visual specification. It provides no subject detail, no composition guidance, no style preference, and no constraints. The model will generate something that fits the broad category of "workspace," but the result is essentially random. If it does not match what you imagined, you have no systematic way to adjust because you never specified what you wanted in the first place.

Example 2: a structured visual brief

Create an editorial-style image of a calm, premium workspace for a modern AI course.

Requirements:
- desk with notebook, laptop, and subtle natural light
- clean composition with warm neutral tones
- realistic photography style, not illustration
- no visible logos or clutter
- vertical framing suitable for a lesson cover

After generating, help me edit only the lighting if I want a softer morning feel.

This version works because each requirement constrains a different visual dimension: subject elements (desk, notebook, laptop), lighting (natural), palette (warm neutral), style (realistic photography), negatives (no logos, no clutter), and framing (vertical). The model now has a coherent visual target. The edit instruction at the end also demonstrates the preservation principle: change the lighting, keep everything else.

Example 3: an editing brief with preservation

Edit the image I just generated. Keep the following exactly as they are:
- the desk arrangement and objects
- the warm neutral color palette
- the realistic photography style

Change only:
- shift the lighting from overhead to soft morning light from the left
- add a subtle shadow on the right side of the desk

Do not alter the composition, framing, or any objects.

This example shows how to write an edit brief that protects continuity. By explicitly listing what should stay fixed before describing what should change, you prevent the model from drifting the image. Without the preservation list, the model might reinterpret the entire scene while applying the lighting change.

Prompt block

Make a cool image of a workspace.

Better prompt

Create an editorial-style image of a calm, premium workspace for a modern AI course.

Requirements:
- desk with notebook, laptop, and subtle natural light
- clean composition with warm neutral tones
- realistic photography style, not illustration
- no visible logos or clutter
- vertical framing suitable for a lesson cover

After generating, help me edit only the lighting if I want a softer morning feel.

Why this works

The better prompt describes the subject, style, composition, and constraints clearly enough to create a coherent visual target. This works because image generation is essentially a translation from language to pixels. The more precise the language, the more constrained the pixel output. Vague language produces a wide distribution of possible images, and the model samples from that distribution essentially at random. Precise language narrows the distribution so that most samples are close to what you want.

The constraint list is particularly important. Negative constraints ("no logos," "no clutter") are often more valuable than positive ones because they prevent common default behaviors that the model falls into without guidance. Framing and aspect ratio specifications save you from generating an unusable image that looks great but does not fit your layout.

Common mistakes

Using only mood words with no concrete scene description. "Vibrant and dynamic" is not a visual brief. "A person walking through a busy Tokyo street market at dusk, shot from behind, with warm lantern light" is.
Requesting multiple unrelated changes in a single edit step. Each edit should change one or two variables so you can evaluate the result and adjust.
Forgetting to state what should remain fixed during an edit. Without preservation instructions, the model may reinterpret the entire scene.
Not specifying the output format or framing. An image that looks great as a square may be unusable as a vertical lesson cover. Specify aspect ratio and intended use upfront.
Describing what you do not want without describing what you do want. A prompt full of negatives ("no people, no text, no bright colors, no clutter") gives the model constraints but no target. Lead with the positive brief, then add negatives to refine.

Mini lab

Write a visual brief for one useful image you actually need. Include at least subject, style, composition, and one constraint.
Generate the image and evaluate how closely it matches your brief.
Write an edit brief that explicitly lists what should stay fixed and changes only one variable, such as lighting or background detail.
Run the edit and compare it to the original.
In one sentence, name which part of your brief had the most impact on the result and which part the model seemed to ignore or reinterpret.

Key takeaway

Image generation and editing become much more reliable when you define the brief and the edit boundary clearly.