We Analyzed 83 ChatGPT Shopping Conversations. Here's How It Decides What to Recommend.

Geology Research·January 2026

Key Findings

Products below 4.0 stars are invisible

Less than 0.5% of ChatGPT recommendations have ratings under 4.0. The 4.3–4.7 range accounts for 68% of all recommended products.

Retailers control your rating more than you do

53% of supporting ratings come from retailers. Only 16% come from the brand's own site.

Reddit and YouTube drive 39% of final purchase citations

User-generated content is the single largest source when shoppers are deciding what to buy.

Reddit threads from 2023 still get cited

0% of UGC review citations are less than 31 days old. Median age: 894 days. Old content wins.

Outerwear triggers 2x more search queries than casual wear

ChatGPT generates 4.3 queries for complex categories vs 1.9 for simple ones.

Intent completely reshuffles the source mix

Brand sites dominate discovery (25%). UGC dominates validation (39%). Same product, different journey stage, different winners.

About This Study

This analysis is based on 83 real ChatGPT shopping conversations across fashion, cosmetics, outerwear, and casual wear categories. We tracked every citation, source type, product rating, content age, and user intent signal across these conversations to understand how ChatGPT selects which brands and products to recommend. This research was conducted by Geology, a platform that monitors how brands appear across AI platforms including ChatGPT, Perplexity, Gemini, and Google AI Overviews.

Full methodology ↓

How ChatGPT Decides What to Recommend

Ask ChatGPT to recommend a product and it does three things. First, it generates search queries (2 per prompt on average, more for complex categories). Then it pulls citations from five source types: brand sites, retailers, editorial content, marketplaces, and user-generated content. Finally, it assembles product cards, typically showing about 8 products with ratings and reviews.

Every conversation we analyzed followed this same pipeline. That consistency is what makes these numbers useful. The same factors determine who shows up, regardless of category.

The 4-Star Cliff: Below 4.0, You Don't Exist

Products rated below 4.0 stars account for less than 0.5% of ChatGPT recommendations. At 4.0, a product shows up in 2.8% of citations. At 3.9, that drops to 0.2%. One tenth of a star, and you basically disappear.

Most recommended products cluster in a narrow band. The 4.3 to 4.7 range accounts for 68% of all recommendations. 4.6 is the single most common rating among cited products (18.2%).

Review Volume Is More Forgiving

Review counts are a different story. Nearly a third of recommended products have fewer than 100 reviews, so newer products can get visibility if their ratings are high enough. But products with 1,000+ reviews still make up 41.5% of recommendations. High volume helps. It just isn't a hard cutoff the way ratings are.

Who Controls Your Rating? Not You.

ChatGPT doesn't just show one rating per product. Each product card has a primary rating (the main score) and often pulls in supporting ratings from other sources to back it up.

Brand sites supply 53% of primary ratings, which looks like the brand is in charge. Look at the supporting ratings, though. Retailers account for 52.8% of those. Marketplaces jump from 2% on the primary side to 30% on the supporting side. Brand sites drop from 53% to 16.4%.

It looks like ChatGPT uses the brand's own rating as a starting point, then checks third parties. A product rated 4.7 on its own site but 3.8 on Amazon has a credibility problem. The algorithm seems to notice.

The Five Sources ChatGPT Trusts

Across all 83 conversations, citations came from five source types. Their relative weight shifts depending on what the shopper is trying to do, but all five show up consistently.

Brand Sites (18–25%)

The brand's own website: product pages, specs, owned reviews. Peaks during discovery intent (25.4%), when users are browsing without a specific product in mind, and drops to 18-19% at comparison and validation stages. Freshness doesn't matter much here. The median age of cited brand site content is 322 days. Accurate product pages hold their value for close to a year.

Retailers (9–21%)

Mass retailers (Target, Walmart, Macy's) and specialty retailers (Nordstrom, Sephora, REI). Most cited during discovery (21.5%) via buying guides and category pages. In cosmetics, Sephora captures 17.5% and Target 12.3% of retail citations. In apparel, Macy's provides 80–93% of marketplace primary ratings.

Editorial Content (21–27%)

Magazines and publications: Vogue, Allure, Byrdie, Who What Wear. The most stable source across intent types, always between 21% and 27%. Also the freshest. Median content age is 101 days, and 27% of editorial citations are less than a month old. In cosmetics, Allure and Byrdie alone capture nearly 40% of editorial citations.

Marketplaces (4–8%)

Amazon, eBay, Lyst, plus resale platforms like Poshmark, Depop, and Mercari. They only get cited directly 4-8% of the time, but their rating influence is way bigger than that share would suggest. Marketplaces supply 30% of supporting ratings while accounting for just 2% of primary ratings. Resale platforms are especially interesting: zero primary ratings, but 13% of validation-stage support. In apparel, Poshmark alone drives 37-49% of supporting ratings.

User-Generated Content (16–39%)

YouTube videos, Reddit discussions, blog opinions, affiliate content. This is the most volatile source in the dataset. It goes from 15.8% at discovery to 39.0% at validation, more than doubling. YouTube accounts for 89-100% of video citations (TikTok barely registers, which may surprise people). Reddit handles 93-100% of UGC review citations across most categories.

From Discovery to “Should I Buy This?”: How Intent Shifts Everything

The same product query pulls from completely different sources depending on what the user is trying to do. We saw four intent stages, each with its own source mix:

Discovery — “What's out there?”

Brand sites (25.4%) and retailers (21.5%) combine for nearly 47%. Users are browsing, and owned + retail content dominates. Editorial provides 25.0%.

Information — “Tell me more about this category”

Editorial content peaks at 26.6%. UGC begins its climb to 23.2%. Retailers drop to 12.1% as users move past browsing.

Comparison — “Which one is better?”

UGC surges to 38.0%, more than double its discovery share. This is where YouTube reviews and Reddit threads take over as the go-to comparison sources. Brand sites fall to 18.3%.

Validation — “Should I actually buy this?”

UGC reaches 39.0%, which is close to brand sites (18.5%) + retailers (9.3%) + marketplaces (5.6%) combined. When someone is about to buy, they want to hear from other people, not from the brand.

Inline vs Footer Citations: Not All Mentions Are Equal

Where a citation appears matters too. Brand sites and retailers get cited inline (within the response text, backing a specific claim) more often than in footnotes. Brand sites have an inline/footer ratio of 1.09; retailers reach 1.53. Marketplaces and UGC videos are mostly footnote references. They get mentioned, but not quoted directly.

Reddit Threads From 2023 Still Get Cited. Fresh Blog Posts Don't.

More than a third of all cited content is over a year old. That surprised us. But the freshness picture changes a lot depending on the source type.

Source Type	Median Content Age	< 31 Days Old	Takeaway
Editorial	101 days	27.1%	Freshness matters most here
Marketplaces	84 days	9.7%	Moderate freshness requirements
UGC Videos	269 days	—	Months-old videos still perform
Brand Sites	322 days	3.7%	Evergreen content holds value
UGC Reviews	894 days	0%	Years-old Reddit threads still cited

There's a clear gradient here. The more “official” the source, the more freshness matters. Editorial content (trend pieces, seasonal guides, roundups) has a built-in expiration date. UGC reviews don't. A Vogue roundup from 8 months ago is stale. A Reddit thread from 2023 is evidence.

Category Complexity: Not All Products Get Equal Research

ChatGPT doesn't put the same effort into every product category. The number of search queries it generates per prompt varies a lot:

Outerwear (jackets, coats): 4.32 queries on average. Technical factors like insulation and weather protection seem to trigger more research.
Cosmetics: 2.55 queries. Middle of the road.
Casual wear (tops, basics): 1.90 queries. Simpler purchase, less digging.

That 2x gap between outerwear and casual wear matters for brands in technical categories. More queries means a wider surface area to compete on. More chances to show up, but also more content you need to have in place.

What This Means for E-Commerce Visibility

ChatGPT shopping is more structured than most people assume. It isn't surfacing random results. The same pipeline runs every time, and the same factors decide who shows up.

What stands out most is how little brands control about their own visibility. You can perfect your own website, but 53% of the supporting rating evidence comes from retailers and 30% from marketplaces. And when someone is deciding whether to actually buy, the content that matters most is content you never created: Reddit threads and YouTube reviews.

There's also the freshness question. Everyone assumes newer content ranks better. For editorial, that's true. But UGC drives 39% of validation-stage citations, and the median review being cited is from 2.5 years ago. A Reddit thread from 2023 carries more weight than a press release from last week.

Methodology

We analyzed 83 real ChatGPT shopping conversations conducted between November 2025 and January 2026. The conversations covered four product categories: women's fashion, men's apparel, outerwear, and cosmetics/beauty.

For each conversation we recorded the number of search queries generated, every citation and its source type (brand site, retailer, editorial, marketplace, or UGC), the user's intent (discovery, information, comparison, or validation), product ratings and where they came from, review counts, content publication dates, and whether citations appeared inline or in footnotes.

We classified source types manually against a fixed taxonomy. Intent was classified based on the user's query language and conversational context. Ratings were split into primary (the main displayed score) and supporting (additional sources cited alongside the primary).

All conversations used the standard ChatGPT web interface with browsing turned on. No plugins, custom instructions, or API access.

Cite This Research

Geology. “We Analyzed 83 ChatGPT Shopping Conversations. Here's How It Decides What to Recommend.” Geology Research, January 2026. https://www.getgeology.com/reports/chatgpt-shopping-study

See how your brand shows up across AI platforms

Geology tracks how brands appear across ChatGPT, Perplexity, Gemini, Copilot, and Google AI Overviews.

Run a Live Audit →