Visibility in AI Search (Google AI Mode, AI Overviews, LLMs): Are AI Tools Misleading Us?

I recently talked about tools that promise to measure the visibility on Google AI Mode, ‘VE Overviews and the LLM. I explained why I couldn’t see no urgency to subscribe to it, and why I expect data instead IA native in Search Console, especially in France, where these features are not yet active.

Visibility AI (AI Mode, 'VE Overviews, LLM) : tools for tracking

A recent study shows variation between some of the data API used by these tools, and the answers actually displayed.

This is where it gets interesting.

Dans cet article

AI hates bullshit

First find (a bit anecdotal, but talking) : AI hate the bullshit.

The long intros, the excess of details, the jargon and useless, and all that, it is necessary to forget.

It worked at the time of the Helpful Content Update from Google, but Google has let go of the case. Today, it is necessary to go right to the goal.

Thus entering the heart of the matter !

The responses generated by the AI by scraping vs results API. What are the differences ?

The study mentioned above was carried out by Surf Blog, that I did not know before. The results, however, are very interesting.

Here are the main findings in the form of a table :

Aspect	Results API	Actual results (scraping)	Comment
Answer length	~406 words average	~743 words average	The responses APIS are often shorter and less detailed than what actually sees the user
Triggering Web Search	~23 % of the responses do not trigger the research, especially if <100 words	Still triggered	The answers scrapées contain more sources and more diversified
Sources cited	No source in ~25 % of cases ; average 7 sources	Always present ; average of 16 sources	The real answers provide nearly two times more sources
Detection of brands	8 % of the responses’t detect any brand ; an average of 12 marks when there are	Always detected ; average of 9 marks	The API detects, on average, more marks per answer, but misses some
Overlap API vs scraping	—	—	Only 24 % marks and 4 % of the sources overlap between API and scraping
Overall summary	Data in a more structured, shorter, sometimes incomplete	Data longer, complete, with the logic of the interface and all the sources	To build apps : API → ideal ; to monitor the real-life experience : scraping → vital

Chatgpt :results API vs actual results

Surf has been compared to approximately 2,000 queries : the same question asked in ChatGPT (and Perplexity) via API vs. the answers actually displayed to a user (scrap). And the differences are substantially important.

Length of answers API vs actual results

API : ~406 words average
Scrap real : ~743 words average: APIS often involved in responses shorter and less detailed than what the user really sees.

The responses from the API are significantly shorter than those retrieved via the interface.

Triggering of the Web search API vs actual results

Trademarks and sources are detected API vs actual results

In ~23% of the cases, the API does not trigger Web search so that the real answers are.
The answers “scrapées” contain more sources cited and most diverse in the API.

About 23 % of the responses API will not initiate research on the Web, usually when they are less than 100 words. In contrast, responses recovered trigger a systematic search.

Sources answers results API vs actual results

The API does not provide any source in approximately 25 % of cases. The responses generated by theAI from data scrapées always provide sources, and about two times higher (16 vs. 7 on average).

Detection of brand results API vs actual results

The data from the API does not detect any trademark in about 8 % of the cases, while the responses retrieved identify still brands.

When a mark is detected, the API identifies on average more : 12, against 9.

the API and the real answers do not use the same sources

The results API and the results ChatGPT are they the same ?

Then, the results are they the same results with the API and the web interface/app ?

The answer : no, absolutely not.

The differences are striking :

Only 24 % of marks detected overlap between the API and the scraping.
For the sources, the overlap drops to only 4 %.

Okay, we have seen the difference between the responses recovered by scraping on ChatGPT and those provided directly by the API. Now let’s see what gives Perplexity.

Perplexity : the differences between the results recovered by scraping and those obtained via the API

Aspect	Results API	Retrieved results (scraping)	Comment
Answer length	~332 words	~433 words	The answers API are shorter and less detailed
Web Search	Still used, but some sources may be omitted, sometimes leading to the complete lack of response	Always used, all sources	The scraping reflects the actual experience of the user
Sources cited	Average 7	10	The responses extracted contain systematically sources
Mentions of brands	Typically more than 10 marks	Approximately 6 marks ; 5% of the responses replace the name of the generic job descriptions	The answers API to identify more brands, but some of the answers extracted are more accurate
Overlap API vs scraping	—	—	Only 8 % of the sources overlap between the API and the scraping, showing references are often different

Length results recovered by scraping and those obtained via the API

The answers API are shorter, with an average of 332 words, against 433 words for the answers retrieved.

Web search : results recovered by scraping and those obtained via the API

The two methods always use the Web search, but the API can sometimes omit some of the sources, which can lead to the total lack of response.

Sources : the results recovered by scraping and those obtained via the API

The API return on average 7 sources, while the responses retrieved by always contain 10.

Mentions of brands : results recovered by scraping and those obtained via the API

In about 5% of the responses extracted, the brand names are replaced by descriptions more generic. The answers API generally include more than 10 brands, compared to about 6 for the answers extracted.

Perplexity : the differences between the results recovered by scraping and those obtained via the API

Trademarks and sources are similar in Perplexity ?

Again : NO.

The overlap of sources nis 8 %, which means that theAPI and theuser interface soften rely on references that are totally different.

Trademarks and sources are similar in Perplexity ?

What it wants to tell on the reliability of tracking tools

The majority of the tools of visibility IA is based on the API of the LLMs to extract metrics (such as frequency of mentions, presence in LLM, visibility, etc). But the study shows that these APIS are not intended to reflect not accurately reflect what the users actually see in the interface of the LLMs (ChatGPT, Perplexity, etc).

This means that :

The metrics of tracking may be biased or incomplete
They may underestimate or overestimate the real visibility
They may poorly represent the way in which a mark or website is referred to in the responses visible

In short : use the API as a unique proxy to measure the visibility IA/LLM is, according to this study, insufficient or even misleading, which explains your distrust vis-à-vis these tools.

Why theAPI seems to be less efficient (and why cis wrong) ?

An article of Gumshoe I attack the idea that APIS are the de facto less efficient. According to him, the gap between the API and the UI is mostly the lack of system prompts (system messages).

The Role of the ‘System Messages’

System messages work as a director of publication invisible : they are not part of the question of the user, but dictate the AI how to respond. They control :

Citations : in order to quote sources.
Length : verbose or concise (where the raw API often shorter).
Formatting : bold, lists, tables…

Gumshoe’VE been trying to convince us that thestudy of SurferSEO comparing apples and oranges :

They compare an API ‘naked’ (default settings, no instruction behavior).
To an Interface ‘dressed’ (strongly guided by prompts system-complex).

For him, the API that forgets citations or trademark that is not less powerful : it just lacks good instructions.

Except… SurferSEO has used a model with a system prompt. So yes, interesting article, but perhaps it is not that the true variance.

they say black and white :

We tested two scenarios, by running 1 000 requests every time.
Atfirst, we compared the results obtained with thehelp ofan API ‘ clean ‘.
Then, we’ve added a twist : we used a system prompt OpenAI released on GitHub.

The results were almost identical in the two cases, with and without thesystem prompt.
Surfer SEO

Two tested scenarios :

Scraping IU vs API “clean”
- It is compared that the UI displays with what the API returns by default, without additional instructions.
- Objective : to measure the difference in baseline between UI and API.
Scraping IU vs API with system prompt
- It adds to the API, a system prompt OpenAI disclosed.
- Objective : to see whether it brings the results of the UI.

Result : in both cases, the results were almost identical. The gap does, therefore, not necessarily of the system prompts.

API OpenAI vs ChatGPT : what is the difference ?

ChatGPT, this is the model(API) gross more :

special instructions (system prompt),
flow of additional data,
a logical interface,
and a few adjustments secrets known only to OpenAI.

These layers are as ChatGPT behaves differently from the API, even with exactly the same model.

Difference between collecting responses via API VS collecting responses via scraping

Aspect	Web Scraping	API Access
Purpose	Capture the real experience of the user	Access data in a structured and programmatic
Includes	– Final Message displayed to the user – formatting – interactive Elements of the interface – Sources – additional Logic applied by the platform	– Own response and structured – Call functions – Formats consistent
Does not provide	Data Structure ready to use	Interface, behavior research, the sources, the “magic” added
Recommended use	Monitor how your brand or your content appear in the tools did	Build apps, automate treatments or integrations

If you want to understand the differences between these two methods of data collection and their issues, I recommend the article SEO Clarity, that explains them in detail.

Verdict ?

For the team of SEO clarity, when itis of GEO, the Scraping ofthe Interface is higher.

Why ? Because theAPIS lack of nuance. In the research, IA, the visibility depends not justbe mentioned in the text, but tobe cited as a source clickable. Only the scraping allows you to check if your brand is presented as a recommendation active, with a link, or simply as a word in a paragraph of plain text.

In summary : theAPI is ideal for data clean and fast on a large scale, but it is blind to theuser experience for real. To understand your true visibility in theIA, you must see what theuser sees, and only the scraping allows.

AEO Certification Certificate (16)

The difference between responses API vs real answers on Google AI Overviews

Google and its solutions AI are not immune to the debate on the differences between responses and API responses ‘ real ‘. Explanation :

Writesonic has published an excellent article that highlights the differences between the responses provided via the API, and publicly accessible in the ecosystem Google.

The finding is in line with that of SEO Surfing : what tools GEO see through the API is often not what your customers see on their screen.

API : it is often based on classification methods conventional and index sometimes less-to-date. It returns the raw data, structured but incomplete side ‘ generated response ‘.

IU (human vision) : it uses LLM, which summarize the information in real-time, offering a response rich, contextual, and personalized

The 4 Reasons of the Discrepancy

Writesonic identifies four key factors that explain why the API and the UI does not give the same results :

Different algorithms : the API is often based on conventional criteria (backlinks, keywords), so that the UI uses LLM to synthesize a relevant answer, not just to classify the links.
Freshness of the data (AAR) : the UI can look up information in real time through the RAG, where the API is based on the index static updated less often.
Customization & context : the UI takes into account the history, the location and the previous queries. The API, it remains ‘ cold ‘.
Post-processing : the UI clean, and writes the results to avoid duplication and bias, which is not done to the raw data from the API.

My opinion on the difference between the responses LLM API vs the results LLM scrapees

Honestly, it does not surprise me. In my SEO audits/GEO, I often see that the answers API does not stick to the scraping reveals.

To be clear : the API is scalable and safe, but it does not reflect the real experience. The UI shows the reality, but it is more complex and sometimes risky.

Personally, all this debate is that give me even more eager to explore the data IA directly via Search Console…

Sources

The quality source cited in this article are collected here :

https://surferseo.com/blog/llm-scraped-ai-answers-vs-api-results/
https://blog.gumshoe.ai/how-apis-unlock-better-insights-into-ai-search-visibility/
https://writesonic.com/blog/api-vs-ui-results

SEO/GEO consultant

Aslane SAMAI

Google AIO SEO