Know Your AI: Why KYA Is the New KYC

Most of us upload sensitive data to AI tools on a daily basis, knowing (and ignoring) the risks. We do it anyway because the tools are genuinely useful. And that tension - between the productivity AI gives you and the trust it demands - is exactly what this piece is about.

First, we're not here to tell you to stop using AI. That's just not practical. Not fun. Not useful. Unless you decide to go live in the the mountains to breed goats and live an offline life (which is an alternative that some of our SRE friends often consider). Instead, we'll explore what we are seeing in the wild across industries and organizations. And also a bit across homes.

What the research says

In Q2 2025, the average enterprise uploaded 1.32 GB of files to generative AI tools. Nearly 22% of those files contained sensitive data - PII, payment card numbers, financial projections, employee records.¹

Forty-five percent of enterprise employees are actively using GenAI platforms. Among those who paste data into these tools, more than half of every paste event contains corporate information.²

The average user who pastes information into GenAI tools does it 6.8 times per day, and more than half of those pastes contain sensitive corporate data.² By Q4 2025, sensitive data made up 34.8% of all ChatGPT inputs - more than triple the 11% measured in 2023.³

Samsung had to ban ChatGPT entirely after engineers leaked source code and internal meeting notes three times in a single month.²

The question isn't whether your people are using AI. They are. The question is: do you know what the AI is doing with what your people give it?

What we're hearing (and seeing) in the corridors

Over the past months, we've been talking to compliance leaders, DPOs and general employees across fintech, financial services, and other industries. A pattern emerged quickly.

The large enterprises - the ones with serious compliance budgets, dedicated security teams, and mostly operating in heavily regulated industries - have mostly solved this problem. They run Gemini, Azure OpenAI, or similar models on their own infrastructure. Their data stays inside their walls. They spent real money to make that happen, precisely because they assessed the risk of sending data to third-party AI providers and decided it was unacceptable.

The smaller organizations, not so much. They can't afford to run AI on their own infrastructure. Their teams use the public versions of ChatGPT, Claude, Copilot, and Gemini because that's what's available.

But even in those big enterprises, those same compliance leaders told us that even with on-prem AI, the perimeter leaks.

Employees have preferences. One senior compliance officer explained it well: the company provides a corporate Gemini, but employees still go to ChatGPT or Claude because they prefer the output for certain tasks. They do it on their phones. They do it from personal laptops at home. A document gets shared in Slack, someone opens it on their phone, and pastes it into their personal ChatGPT account to get a quick summary. The corporate AI perimeter just got bypassed in three taps.

We know this happens because we do it ourselves. We pay for multiple AI subscriptions alongside having access to corporate tools - either because different models are better at different things, or most likely, because we have favorites. We're not unusual. The LayerX 2025 report found that 67% of GenAI interactions happen through personal accounts, completely outside corporate visibility.²

The BYOD parallel

A decade ago, "Bring Your Own Device" broke the corporate network perimeter. Employees brought personal phones and laptops to work, and IT had to rethink security from the ground up. "Bring Your Own AI" is doing the same thing to data security right now - except the data doesn't just leave the building. It leaves for a third-party server that might store it, process it, or train on it.

The trust problem

There's a standard response to all of this: "Just use the enterprise plan. ChatGPT Enterprise doesn't train on your data. Problem solved."

Well, on paper, it's not wrong. Most enterprise AI plans come with contractual assurances about data handling. ChatGPT has a setting to disable memory. Premium AI chatbots have a toggle not to train on your data. Providers publish data processing agreements. There are checkboxes and toggles and terms of service.

But let's be honest about who we're trusting when we check those boxes. The idea that a checkbox in settings or a clause in a Terms of Service meaningfully protects your data requires a level of trust that Big Tech has not earned.

Meta has been fined over €2.5 billion under GDPR for data handling violations.⁴ Google has faced hundreds of millions in penalties. OpenAI was fined €15 million by Italy's data protection authority.⁵ Clearview AI racked up over €100 million in fines across EU regulators for collecting personal data without consent.⁶

The idea that a checkbox in settings or a clause in a Terms of Service meaningfully protects your data requires a level of trust that Big Tech has not earned.

When a company's entire business model is built on data, and that company tells you it won't use your data, it's reasonable to be at least a little skeptical.

And here's the uncomfortable follow-up: even if you trust the provider today, policies change. Models get updated. Features get added that alter data handling. Terms of Service get updated and you're often left with only an "Accept" or "just stop using this service / phone / Gmail / whatever it is" - which, once the system is deeply embedded in your daily life, is impractical at best and almost always ridiculously expensive in terms of effort. A setting that opts you out of training today might work differently after the next platform update.

The "move fast and break things, pay the fine later" culture of Silicon Valley is not ancient history.

And trust isn't a one-time assessment - it's a continuous obligation. Which is exactly the mindset that financial services has been forced to adopt for decades.

Regulatory context

For businesses, under GDPR, transferring personal data to a third-party processor requires a legal basis, a data processing agreement, and appropriate safeguards. When an employee pastes PII into ChatGPT through a personal account, none of these requirements are met. The EU AI Act adds another layer: high-risk AI system obligations become enforceable August 2, 2026, with penalties up to €35 million or 7% of global revenue.⁷ The regulatory pressure is real - and it applies whether or not you trust the provider.

KYA: a KYC for the AI era

If you work in or around financial services, you know KYC - Know Your Customer. It started decades ago as a basic obligation: verify who you're doing business with. Over time it grew into a multi-billion-dollar compliance infrastructure covering identity verification, transaction monitoring, risk scoring, and continuous audit trails.

KYC could also be framed as not just identifying the customer. It's about understanding the relationship. What is this customer doing with your services? What risk does that create? KYC matured because regulators decided the answer to "do you know what your customers are doing?" couldn't be "not really" anymore.

We think the same inflection point is happening with AI - just in the other direction. KYC asks: what is your customer doing with your services? The question organizations now need to ask is the mirror: what is your AI service doing with your data?

We call this KYA - Know Your AI.

KYA isn't a product and it's not a compliance checklist. It's more like a discipline - the same way KYC became a discipline. And from what we've seen, it has three natural layers:

Know what goes in. Which AI tools are your people actually using? Through which accounts? What data are they sharing - and do they even realize they're sharing it? Shadow AI is the new shadow IT, and over half of organizations still don't have a basic inventory of their AI systems.⁷ You can't manage what you can't see.

Know what happens on the other side. This is the hard part - and the part most organizations skip entirely. Does the provider train on your inputs? What's the data retention policy? Who are the sub-processors? What jurisdiction does your data land in? These questions need real answers, reviewed regularly. Not a one-time vendor assessment filed during procurement and never revisited.

Know how the system actually behaves. AI tools aren't static software. Models get retrained. Policies change quietly. Features get added that alter how data is handled. A setting that protects you today might work differently after the next platform update. Ongoing verification - not just at onboarding, but continuously - is what separates a real control from a hope.

Figuring out how to properly do all this is part of the complex challenge we - and many others - are trying to solve.

Those big enterprises we talked about earlier did the work of knowing their AI. They evaluated what the service does with data, didn't like what they found, and built their own controls. They practiced KYA before the term existed.

Because even when you have checkboxes and terms and conditions in your contracts, it's reasonable to want more than a promise. It's reasonable to want actual controls.

Why we started KYA Labs

When we founded KYA Labs, the thinking was simple: use, analyze, experiment, push the limits of current AI tools. Understand what they can do and what risks come with their mass adoption - practically, honestly, without the hype or the fear. Not selling panic. Not pretending the tools aren't useful. Just asking the questions that need asking and building things to help people answer them.

The first tool to come out of this thinking was Paper Ghost - a desktop app that strips sensitive data from documents before they reach AI tools. It runs locally, processes everything on the user's machine, and puts a human in the loop before anything is finalized. It addresses the "know what goes in" layer: making sure that what crosses the trust boundary is clean and reviewed, regardless of which AI tool is used at the other end of the pipeline.

But Paper Ghost is just one piece of a bigger puzzle. The "Labs" in our name is there for a reason. We test things. We investigate how different AI systems handle data, how they behave in edge cases, which biases they have, and where the gaps sit between what providers promise and what actually happens. Some of that work looks like compliance research. Some of it is more experimental. All of it comes from the same instinct: if you're going to use these tools - and you should, they're genuinely powerful - you ought to understand what you're dealing with.

Sources

Harmonic Security, via Help Net Security (August 2025) - analysis of 1M GenAI prompts and 20,000 uploaded files across 300+ AI-powered apps in Q2 2025.
LayerX Security, Enterprise AI and SaaS Data Security Report 2025 - enterprise browser telemetry. Also covered by The Register and Tom's Guide.
Metomic, Q4 2025 research - sensitive data as a share of ChatGPT inputs rose from 11% in 2023 to 34.8% in Q4 2025.
DLA Piper, GDPR Fines and Data Breach Survey (January 2025) - €1.2B in fines issued in 2024; €5.88B cumulative since May 2018. Also: CMS GDPR Enforcement Tracker Report 2024/2025.
OpenAI €15M GDPR fine by Italy's Garante della privacy. Reported by ComplyDog and others. Italy initially banned ChatGPT in March 2023.
Clearview AI fined across multiple EU jurisdictions - Netherlands (€30.5M), France, Italy (€20M), and others - totaling over €100M. Via Scrut and DLA Piper.
EU AI Act (Regulation 2024/1689) - high-risk system obligations enforceable August 2, 2026. Penalties up to €35M or 7% of global revenue. Summary via Secure Privacy, Orrick, and Legal Nodes.