The OTA is writing your AI profile. You set it and forgot it.

The data quality problem that makes everything else in AI hotel distribution irrelevant

Mar 11, 2026

There is a simple test any hotel can run. Open ChatGPT, Perplexity, or Google AI Mode. Type the name of your property. Read what comes back.

In most cases, the response will be recognisable but wrong. The pool that was refurbished in 2023 is still described as it was in 2021. The restaurant that changed concept is still described as its predecessor. The meeting space that can accommodate 80 delegates is listed as holding 40. The policy on pets, which changed twice in the last three years, may reflect any of the three versions depending on which source the model pulled from most recently.

None of this is hallucination in the technical sense. The AI is not inventing facts about a property it cannot find. It is accurately reporting facts from a source that is outdated, incomplete, or simply wrong — and that source, in the majority of cases, is an OTA listing.

According to data from VertoDigital, only around 25% of AI-generated hotel answers currently draw from official hotel websites. The other 75% comes from public databases, OTA profiles, and aggregated third-party data. Hotels have, in effect, outsourced the authorship of their AI identity to intermediaries they have been trying to reduce dependency on for a decade.

How it happened

The OTA listing became the source of record for hotel content by default, not by design.

When Booking.com and Expedia scaled through the 2000s and 2010s, they built content engines that hoteliers learned to populate. Stars, room types, amenity checkboxes, free-text descriptions, photo uploads. The platforms made it simple, and the content sat there, stable, feeding availability calendars and generating bookings. Hotels updated listings when something significant changed and largely forgot about them otherwise.

Then AI search arrived and started reading those listings as authoritative. Not because they are authoritative, but because they are structured, widely replicated, and easily parsed. The OTA content — whatever its age, accuracy, or depth — became the most machine-readable version of the hotel that existed at scale.

This is the original sin of the data quality problem. It is not that AI is doing something wrong. It is that the content environment hotels created — or failed to create — over many years made the OTA listing the most legible source available.

The problem is now structural. OTA descriptions are written for conversion on a booking platform, not for comprehension by a language model. They are optimised for skimming human eyes, not for semantic extraction. Generic phrases like "a warm welcome awaits" are understood by humans as placeholder marketing language. To a model trying to determine whether this property is appropriate for a solo business traveller seeking a quiet workspace, those phrases are noise.

What AI actually reads

Understanding how AI systems surface hotel content requires moving past the old mental model of search as a ranking system.

Traditional search sorted pages. AI answers questions. The distinction matters because answering a question requires the model to extract meaning from text, not simply match keywords. For hotel content, this means a model trying to answer "is this hotel good for a remote-working stay in April?" needs to parse several things simultaneously: does the property have reliable, fast connectivity; what is the working environment like; are there quiet spaces outside guest rooms; what does the price-to-value look like for a multi-night stay; and what is the cancellation policy if travel plans shift.

None of that information is contained in a standard OTA listing. Some of it might exist on the hotel website. Most of it has never been published in structured, machine-readable form anywhere.

The result is that AI models fill these gaps with approximations — often drawn from reviews, forum posts, and aggregated travel content. What the hotel says about itself contributes a fraction of what the model uses to construct its summary. What guests said about it on TripAdvisor three years ago contributes more.

Structured schema markup — the technical implementation of machine-readable content developed through Schema.org — gives AI systems something more reliable to work from. A property that has implemented granular hotel schema across its website, covering room types, amenity categories, policies, accessibility features, location attributes, and on-site services, is significantly more likely to be accurately represented when a model constructs a response. Without that markup, the model falls back to whatever text it can parse, which is typically the OTA copy.

The three content failures

The data quality problem is not a single issue. It is the compound effect of three distinct failures operating simultaneously.

The freshness gap. Hotel content ages faster than most hospitality operators appreciate. A restaurant changes its concept. A spa reopens after renovation with a different treatment menu. A room category is retired and replaced. An ownership or management change shifts the brand positioning. In a traditional search environment, stale content costs rankings. In an AI environment, it costs accuracy — and an AI recommendation built on inaccurate content leads to a booking that creates a mismatched guest experience. That mismatch now has a trail: the guest complains to the AI interface, or at minimum forms a negative impression that affects the review cycle.

The depth gap. Standard OTA listings were built around a lowest-common-denominator content model. Amenity checkboxes can tell a model whether a property has a gym; they cannot tell it that the gym is well-equipped and open 24 hours, or that it is a single room with two treadmills from 2017. Distinguishing between those two realities matters for a guest asking the AI to find a hotel for a fitness-focused trip. The depth required to answer conversational queries with genuine accuracy simply does not exist in the average hotel's published content inventory.

The source conflict gap. When multiple sources — hotel website, Booking.com, Expedia, Google Business Profile, TripAdvisor — describe the same property in inconsistent terms, AI models face a reconciliation problem. If the check-in time is listed as 3pm on the hotel website and 2pm on the OTA, the model has no mechanism to determine which is current. It may default to whichever source it assigns higher authority, which is often not the hotel's own domain. Conflicting data across platforms does not average out. It introduces unreliability into the AI response, which reduces the probability that the property is recommended at all.

What AI-ready content actually looks like

The phrase "AI-ready content" has become a category of vendor positioning, but the underlying requirements are reasonably concrete.

The foundation is structured schema markup implemented consistently across every page of the hotel's website — not just the homepage. Room type pages need to carry room-specific schema: dimensions, bed configuration, view category, occupancy, included amenities. The restaurant page needs to describe cuisine, covers, opening times, reservation requirements, and price range in structured fields, not buried in a paragraph. Policies — cancellation, pets, children, accessibility — need to be in machine-readable format, not PDF attachments or static text blocks that a model cannot reliably parse.

Beyond schema, AI models respond to what has been described as semantic completeness: the capacity of a piece of content to answer a question fully without requiring the model to cross-reference other sources. A room description that contains the room type, the dimensions, the bed configuration, the specific view, the distance to amenities, the included services, the connectivity specification, and the accessibility features is more likely to be pulled into an AI response than one that says "a comfortable room with garden views."

The operational requirement is not a one-time content audit. Freshness is a continuous maintenance function. Hotels connecting to AI platforms through MCP-enabled channels — the standard that now underpins the Lighthouse ChatGPT app and others in this space — are feeding live data: current rates, real availability, up-to-date descriptions. That live feed is the mechanism that addresses the freshness gap. But the feed is only as good as the content it draws from. Connecting to MCP with a 2021 OTA description as the source document does not solve the problem.

The OTA as unintended author

There is a more uncomfortable dimension to this problem that the industry has been slow to acknowledge directly.

Hotels spent the last decade developing best-rate guarantee programmes, direct booking campaigns, and loyalty incentives designed to reduce OTA dependence. Those efforts had limited but real impact on the booking mix. The OTA, as a distribution channel, has remained structurally dominant because it controls discovery — the step before the booking.

AI search is a new discovery layer. The OTAs, which moved first to secure partnerships with ChatGPT and Gemini, are already positioned within it. Their content, scraped and indexed at scale, is already feeding models. A hotel that has not built direct AI distribution and has not invested in its own structured content is, in the AI environment, even more dependent on the OTA narrative than it was in the web environment — because in web search, the hotel's own website at least appeared in results. In an AI summary, it may not appear at all.

The guest is not seeing ten links and choosing which to click. They are receiving a synthesised answer. The source of that answer determines the story the hotel is being told through. For most properties today, that source is a description the revenue manager wrote six years ago and submitted to Booking.com.

What fixing it requires

The practical steps are not exotic, but they require treating content as operational infrastructure rather than marketing collateral.

The first is an audit: what does a live AI query about your property currently return, and where is that content sourced from? The gaps between what the AI says and what is actually true map directly to the content failures that need addressing. Several vendors now offer tooling that surfaces these gaps systematically, rather than requiring manual spot-checks.

The second is schema implementation at the property level — ideally across the full website, not just the homepage. This is a technical function, not a copywriting function, and for most independent hotels it requires either a technology partner with hospitality schema expertise or a website platform that implements it by default.

The third is a content maintenance cadence. Every change to the property — physical, operational, or commercial — needs to trigger a content update across all structured sources. This is the step that most often breaks down. Operational teams make changes; content updates follow weeks later, if at all.

The fourth — and the step that connects content quality to AI distribution — is a direct feed mechanism. Whether through MCP connectivity, a platform like Lighthouse's Connect AI product, or a proprietary integration, hotels need a channel through which verified, up-to-date content reaches AI platforms directly rather than being filtered through OTA proxies.

The infrastructure required to connect to ChatGPT or Google AI Mode with verified content is now accessible to hotels of almost any size. The infrastructure is not the bottleneck. The content is. And unlike the connectivity problem, which can be solved by subscribing to a platform, the content problem has to be solved by the hotel.

The compounding disadvantage

AI travel planning is still early. The share of bookings attributable to AI discovery channels today is small. The trajectory is not.

AI-referred web sessions grew 527% year-over-year in the twelve months to May 2025, according to Previsible's AI Traffic Report. Deloitte's 2025 summer survey found that 15% of Americans were using AI tools for travel planning, up from 10% the year before — and the rate of adoption among younger demographics is significantly higher. These numbers will continue to move.

The problem with waiting for the channel to mature before addressing content quality is that the content AI platforms have already indexed will shape their responses for years. Language models are trained on historical data. Content that is accurate, structured, and authoritative in the training window earns a representational advantage that degrades slowly. Content that is inaccurate, sparse, or absent in the training window creates a representational deficit that takes sustained effort to correct.

Hotels that addressed their mobile booking experience before mobile traffic became dominant captured the channel cleanly. The ones that waited found themselves paying OTA commissions on traffic that their brand and location should have converted directly.

The content gap in AI is the equivalent problem, earlier in the cycle, while it can still be addressed before the channel is set.

The OTA description from six years ago is not going to fix itself.

by Markus Busch, Editor/Publisher Hospitality.today

Enjoying this analysis? Hospitality.today delivers daily insights on hotel distribution, AI trends, and travel commerce — straight to your inbox. Subscribe for free at Hospitality.today →

Hospitality.today™

The OTA is writing your AI profile. You set it and forgot it.

Related must-reads

Get our Daily Brief in your inbox