I acquired an early have a look at ChatGPT Photos 2.0, and it is spectacular – with one exception

I got an early look at ChatGPT Images 2.0, and it's impressive - with one exception — Elyse Betters Picaro / ZDNET

Comply with ZDNET: Add us as a most well-liked supply on Google.

ZDNET’s key takeaways

OpenAI reframes photographs as a visible language.
Pondering mode builds context-aware infographics.
Model constancy continues to be inconsistent in early testing.

Right this moment, OpenAI introduced ChatGPT Photos 2.0, its next-generation picture mannequin, which the corporate says is concentrated on precision, usability, and complicated visible duties.

Essentially the most notable new functionality is the power to mix textual content and pictures to construct advanced, stunning pages. OpenAI is reframing the entire concept of picture technology from a course of that creates decorations (their phrase) to a language (additionally their time period).

Additionally: The perfect AI picture turbines of 2026: There’s just one clear winner now

OpenAI describes it as, “A superb picture does what an excellent sentence does — it selects, arranges, and divulges. It will probably clarify a mechanism, stage a temper, take a look at an concept, or make an argument.”

Pondering capabilities allow advanced workflows

Along with its vastly improved potential to combine textual content and graphics, the brand new mannequin makes use of enhanced pondering capabilities. It will probably generate a number of photographs per immediate with continuity throughout outputs. This method is feasible as a result of the mannequin truly integrates reasoning into the picture output.

Created by ChatGPT/Screenshot by David Gewirtz/ZDNET

This shift is huge. As an alternative of simply producing a picture that just about matches the immediate particulars, Photos 2.0 can take a a lot vaguer immediate, like “Generate an infographic about actions I ought to do with tomorrow’s climate in San Francisco in thoughts.”

Additionally: Find out how to change from ChatGPT to Gemini

From this immediate, the AI will collect climate and exercise information about San Francisco, decide actions acceptable to the climate, after which construct a picture or set of photographs that match the outcomes.

Based on OpenAI, “On this mannequin, Photos 2.0 acts extra like a visible thought companion, serving to carry a undertaking from tough idea to completed asset with considerably much less work in your half.”

Precision and design management enhance usability

Many people have lengthy struggled to persuade ChatGPT to generate photographs in a selected desired facet ratio. Usually, the AI stubbornly produces what it needs. However now, with Photos 2.0, the mannequin has help for “facet ratios as vast as 3:1 and as tall as 1:3.”

The mannequin additionally helps higher-fidelity outputs that (principally) produce correct object placement, detailed textual content rendering, and complicated compositions. We’ll see if we are able to take away the phrase “principally” from that sentence after the product is formally launched.

Additionally: I attempted Private Intelligence, and it was correct (however unsettling)

The AI additionally helps small textual content, UI components, and stylistic constraints at as much as 2K decision. Cool.

Testing the preview

I used to be given entry to a day-before-release preview, and the mannequin is spectacular, principally. I fed it a screenshot of the ZDNET residence web page and a draft of the Photos 2.0 press launch.

Then I instructed, “Primarily based on the contents of the press launch, generate a 16:9 infographic in regards to the new picture replace and generate it utilizing the ZDNET model type as proven within the ZDNET residence web page doc.”

Additionally: I attempted Google Pictures’ new AI Improve instrument: The way it crops, relights, and fixes your pictures – generally

The mannequin did an excellent job on the infographic, however strive as it would, it couldn’t reproduce the ZDNET brand. On its first strive, it rendered the Z in ZDNET with a slight droop.

I attempted a wide range of requests on the order of, “Repair the ZDNET Brand. The Z droops in your model however is just not droopy within the precise brand.” However Photos 2.0 by no means managed to repair it.

So I began a brand new session. This time, I included the instruction, “Use particular care to breed the ZDNET brand precisely.”

Additionally: I examined ChatGPT Plus vs. Gemini Professional to see which is healthier – and if it is price switching

This is the place issues acquired very odd. For its first run, the mannequin by some means dug up a replica of ZDNET’s brand from earlier than our 2022 redesign. This brand is nowhere to be discovered on our present residence web page. Weirdly, it rendered that outdated brand utilizing the present coloration scheme. The mannequin then pushed the emblem and the infographic data off the left fringe of the picture. It additionally selected a light-weight blue for “Photos 2.0” that is not a ZDNET model coloration.

I attempted mightily to persuade it to make use of the present brand. I managed to get it to push the picture to the correct, so nothing was minimize off. However including the immediate, “Use the ZDNET brand that’s on the supplied web page. Don’t seek for an alternate brand,” did nothing to repair the issue.

I took yet another shot on the problem earlier than deciding to return to ending up this text. As soon as once more, I began a brand new session so the AI did not have muscle reminiscence from its earlier miscalculations.

Additionally: This highly effective Gemini setting made my AI outcomes far more private and correct

The mannequin tousled the emblem once more. This time, the AI determined so as to add a rudder form to the stem of the stretched-out capital D.

To be honest, I am utilizing a pre-release model of Photos 2.0. I will be again with a way more complete take a look at run of the mannequin after the official product launch.

I additionally tried the same take a look at utilizing a unique doc with Google’s Nano Banana Professional, however as a result of it did not deal with the synthesis the best way that this new model of OpenAI’s product does, it wasn’t actually capable of repeat the outcomes I acquired right here. We’ll know extra as we do extra superior exams

Pricing and availability

The brand new mannequin is accessible right now to all ChatGPT and Codex customers. Superior outputs and the pondering functionality can be found to ChatGPT Plus, Professional, Enterprise, and Enterprise customers. Be sure you choose “Pondering” from the ChatGPT dropdown bar on the high of the display screen.

On the time of writing, earlier than launch, the brand new Photos 2.0 mannequin is just out there on the desktop. However OpenAI guarantees that these capabilities shall be within the cell model as nicely, together with the power to finger-select photographs utilizing your cell touchscreen.

The pictures are additionally out there through API utilizing the gpt-image-2 mannequin. API pricing varies relying on the standard, thinkiness (my phrase), and desired picture decision.

If an AI can deal with structure and content material together, will that change the way you method design initiatives? Tell us within the feedback under.

You’ll be able to observe my day-to-day undertaking updates on social media. Be sure you subscribe to my weekly replace e-newsletter, and observe me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Source link