Q1 | Torys Quarterly•Winter 2024

The in(put)s and out(put)s of generative AI

Authors

Hear from prominent industry experts on the state of play and what’s next for the sector in our feature Q&As from Shawn Malhotra of Thomson Reuters and Jordan Jacobs of Radical Ventures.

For technology lawyers, virtually every conversation with a client regarding the use of new technology eventually turns to the use of generative AI. With the explosion of ChatGPT and all the media attention surrounding the power of generative AI, AI has evolved from a “niche” product that techies talk about to something that is now debated broadly by society in general. While people may not understand how it works, they believe it has the ability to revolutionize the way people work. Businesses want to harness its power while being responsible with its use and, in particular, understand what goes into and comes out of the “black box” of generative AI. As lawyers, we work with clients to address their concerns regarding how AI consumes sensitive data and what rights the client has over the inputs to, and outputs from, an AI model.

Generative AI concerns from the legal perspective

When we deal with consumers of generative AI products and services, our clients’ starting position often centers on questions of intellectual property (IP), privacy and security. They typically want to ensure that 1) they have sole discretion over the data they provide to the AI system, 2) their data can only be used by the system to provide services to them (not third parties), and 3) any output of the system that was created for them is owned by them. We understand where these positions come from and want to share some perspectives on whether they fit within the context of generative AI.

Handling client data within an AI model

An initial concern that often arises is that the client’s data will be “in” the model, implying that when you feed information into the model, it keeps a record of all the information that it has been fed. The concern is that the model acts like a library, which can use the data that it has been fed in the future and, therefore, is subject to security risks.

However, we have found that generally this isn’t the case when speaking to AI providers. If the model is being trained with the client’s data, what is happening at a basic level is that the model looks at the new data (be it a document or photo, etc.), updates all of the relevant model’s data records (which could be as simple as a spreadsheet) with the additional information and then discards the original document or photo. You can think of it as numerous spreadsheets of data containing relationships between words or pieces of information, which are, in turn, able to be updated.

The numbers of these “spreadsheets”, which are referred to as parameters or vectors, are huge. For example, in 2023, it was understood that ChatGPT-4 had more than 1.75 trillion parameters¹. Given that the original document (or the client’s data) is discarded after entering the parameters, generally the generative AI model cannot retrieve the original document; however, with the right inputs or queries, it may be able to reproduce the information or produce something similar to it, including, concerningly, some of the sensitive data contained in it. Thus, when considering the protections needed in an agreement with a generative AI provider, we have found that a more helpful way to discuss the issues between the parties is to frame the issues around the use of the client’s data, and not necessarily as a question of copyright (such as whether there was fair use or fair dealing in the training or if the output of the tool is a “copy” of inputted data).

Finding common ground for clients and vendors

We are finding that, at this early stage, clients who are consumers of generative AI products and services still prefer that their data is not used to train an AI model at all; however, it is often worthwhile to explore this position further. More nuanced discussions regarding what data is being provided and whether certain data (e.g., not business confidential or personal information) would be appropriate to allow the vendor to use to train the model may lead to common ground that is beneficial to both the AI provider and the client of such products and services.

To address common business concerns regarding the use of these models, a number of providers of generative AI are now offering models that do not “learn” from the inputs that are provided in the live version. After the model is first trained with other data, the live model is static, meaning that it discards the input and outputs once the query is complete, and is then only improved by further training done on the model outside of the live version. As an aside, the use of a live model that evolves with user inputs (in particular, where the model is being used by people other than the client) creates other risks, which need to be considered but are beyond the scope of this article.

The remaining question(s) of copyright

The outputs of generative AI can lead to additional concerns since there is no current consensus on the parameters that would create copyrightable generative AI outputs. Further, the question remains as to whether there can be copyright afforded to the output of generative AI at all.

The questions of who owns AI output (is it the user of the AI, the provider of the AI, or the AI?), or if there is anything to own at all, are currently subject to various litigation and government consultations. In the here and now, if the question of who owns the output of generative AI cannot be effectively answered, or answered with a sufficiently high degree of predictability, there is an additional suite of questions for clients and providers to consider:

How do clients and providers of generative AI contract for the provision of products and services?
Can a provider of the generative AI agree that a client will “own” the output or assign the copyright of the output to the client?
If it is possible for the provider to assign the copyright of the output to a first client, will the provider then be able to assign the output to a second client, who caused the generation of a similar output by using similar input queries?

These questions are leading to a potential trend where a provider will not agree to assign the copyright, or warrant that a client will own the copyright, to the output of generative AI. It is unclear if this is a sustainable business model that would be acceptable to clients in time.

See https://the-decoder.com/gpt-4-has-a-trillion-parameters/.

To discuss these issues, please contact the author(s).

This publication is a general discussion of certain legal and related developments and should not be relied upon as legal advice. If you require legal advice, we would be pleased to discuss the issues in this publication with you, in the context of your particular circumstances.

For permission to republish this or any other publication, contact Janelle Weed.

Subscribe and stay informed

Stay in the know. Get the latest commentary, updates and insights for business from Torys.

Subscribe Now