An Apple Byte : Apple To Allow Retro Console Game Emulators On App Store Globally

As part of its compliance with EU’s Digital Markets Act (DMA) rules, Apple has announced a change in its App Store rules to allow emulators for retro console games globally. The change allows an option for downloading the titles, but Apple has warned the developers of such emulator apps that need to follow copyright rules.

Android users already enjoy access to these types of emulators and Apple’s move to allow them via an in-app purchase mechanism could provide another welcome revenue stream.

Following its hefty $1.9 billion fine by the EU earlier this month for restricting music-streaming app developers from sharing subscription options outside of Apple’s App Store, Apple has introduced new “Music Streaming Services Entitlements” for apps distributed in the EU. These are guidelines allowing some music streaming apps to include links, e.g. ‘buy buttons’ that go to external websites.

Featured Article : New Certification For Copyright Compliant AI

Following many legal challenges to AI companies about copyrighted content being scraped and used to train their AI models (without consent or payment), a new certification for copyright-compliant AI has been launched.

The Issue 

As highlighted in the recent case of the New York Times suing OpenAI over the alleged training of its AI on New York Times articles without permission for free (with the likelihood of a ‘fair use’ claim in defence), how AI companies train their models is now a big issue.

The organisation ‘Fairly Trained’ says that its new Licensed Model certification is intended to highlight this difference between AI companies who scrape data (and claim fair usage) and AI companies who license it, thereby getting permission and pay for training data (i.e. they choose to do so for ethical and legal reasons). As Fairly Trained’s CEO, Ed Newton-Rex says: “You’ve got a bunch of people who want to use licenced models and you’ve got a bunch of people who are providing those. I didn’t see any way of being able to tell them apart” 

Fairly Trained says it hopes its certification will “reinforce the principle that rights-holder consent is needed for generative AI training.” 

Fairly Trained – The Certification Initiative

The non-profit ‘Fairly Trained’ initiative has introduced a Licensed Model (L) certification for AI providers that can be obtained by (awarded to) any generative AI model that doesn’t use any copyrighted work without a licence.

Who? 

Fairly Trained says the certification can go to “any company, organisation, or product that makes generative AI models or services available” and meets certain criteria.

The Criteria  

The main criteria for the certification includes:

– The data used for the model(s) must be explicitly provided to the model developer for the purposes of being used as training data, or available under an open license appropriate to the use-case, or in the public domain globally, or fully owned by the model developer.

– There must be a “robust process for conducting due diligence into the training data,” including checks into the rights position of the training data provider.

– There must also be a robust process for keeping records of the training data that was used for each model training.

The Price 

In addition to meeting the criteria, AI companies will also have to pay for their certification. The price is based on an organisation’s annual revenue and ranges from $150 submission fee and $500 annual certification fee for an organisation with a $100k annual revenue to a $500 submission fee and $6,000 annual certification fee for an organisation with a $10M annual revenue.

What If The Company Changes Its Training Data Practices? 

If an organisation acquires the certification and then changes its data practices afterwards (i.e. it no longer meets the criteria), Fairly Trained says it is up to that organisation to inform Fairly Trained of the change, which suggests that there’s no pro-active checking in place. Fairly Trained does, however, say it reserves the right to withdraw certification without reimbursement if “new information comes to light” that shows an organisation no longer meets the criteria.

None Would Meet The Criteria For Text 

Although Fairly Trained accepts that its certification scheme is not an end to the debate over what creator consent should look like, the scheme does appear to have one significant flaw at the moment.

As Fairly Trained’s CEO, Ed Newton-Rex has acknowledged, it’s unlikely that any of the major text generation models could currently get certified because they have been trained upon a large amount of copyrighted work, i.e. even ChatGPT is unlikely to meet the criteria.

The AI companies argue, however, that they have had little choice but to do so because copyright protection seems to cover so many different things including blog and forum posts, photos, code, government documents, and more.

Alternative? 

Mr Newton-Rex has been reported as saying he’s hopeful that there will be models (in future) that are trained on a small amount of data and end up being licensed, and that there may also be other alternatives. Examples of some ways AI models could be trained without using copyrighted material (but probably not without consent) include:

– Using open datasets that are explicitly marked for free use, modification, and distribution. These can include government datasets, datasets released by academic institutions, or datasets available through platforms like Kaggle (provided their licenses permit such use).

– Using works that have entered the public domain, meaning copyright no longer applies. This includes many classic literary works, historical documents, and artworks. Generating synthetic data using algorithms. This could include text, images, and other media. Generative models can create new, original images based on certain parameters or styles (but could arguably still allow copyrighted styles to creep in).

– Using crowdsourcing and user contribution, i.e. contributions from users under an open license.

– Using data from sources that have been released under Creative Commons or other licenses that allow for reuse, sometimes with certain conditions (like attribution or non-commercial use).

– Partnering / collaboratiing with artists, musicians, and other creators to generate original content specifically for training the AI. This can also involve contractual agreements where the rights for AI training are clearly defined.

– Using web scraping but with strict filters to only collect data from pages that explicitly indicate the content is freely available or licensed for reuse.

Collaboration and Agreements 

Alternatively, AI companies could choose to partner with artists, musicians, and other creators to generate original content (using contractual agreements) specifically for training the AI. Also, they could choose to Enter into agreements with organisations or individuals to use private or proprietary data, ensuring that the terms of use permit AI training.

What Does This Mean For Your Business? 

It’s possible to see both sides of the argument to a degree. For example, so many things are copyrighted and AI companies such as OpenAI with ChatGPT wouldn’t have been able to create and get a reasonable generative AI chatbot out there if it had to get consent from everyone for everything and pay for all the licenses needed.

On the other hand, it’s understandable that creatives such as artists or journalistic sources such as the New York Times are angry that their output may have been used for free (with no permission) to train an LLM and thereby create the source of its value that it may then charge users for. Although the idea of providing a way to differentiate between AI companies that had paid and acquired permission (i.e. acted ethically for their training content sounds like a fair idea), the fact that the LLMs from the main AI companies (including ChatGPT) may not even meet the criteria does make it sound a little self-defeating and potentially not that useful for the time being.

Also, some would say that relying upon companies to admit when they may have changed their AI training practices and potentially lose the certification they’ve paid for (when Fairly Trained isn’t checking anyway) may also sound as though this may not work. All that said, there are other possible alternatives (as mentioned above) that could require consent and organisations working together that could result in useful, trained LLMs without copyright headaches.

Although the Fairly Trained scheme sounds reasonable, Fairly Trained admits that it’s not a definitive answer to the problem. It’s probably more likely that the outcomes of the many lawsuits will help shape how AI companies act as regards training their LLMs in the near future.

Featured Article : NY Times Sues OpenAI And Microsoft Over Alleged Copyright

It’s been reported that The New York Times has sued OpenAI and Microsoft, alleging that they used millions of its articles without permission to help train chatbots.

The First 

It’s understood that the New York Times (NYT) is the first major US media organisation to sue ChatGPT’s creator OpenAI, plus tech giant Microsoft (which is also an OpenAI investor and creator of Copilot), over copyright issues associated with its works.

Main Allegations 

The crux of the NYT’s argument appears to be that the use of its work to create GenAI tools should come with permission and an agreement that reflects the fair value of the work. Also, it’s important in this case to note that the NYT relies on digital subscriptions rather than physical newspaper subscriptions, of which it now has 9 million+ subscribers (the relevance of which will be clear below).

With this in mind, in addition to the main allegation of training AI on its articles without permission (for free), other main allegations made by the NYT about OpenAI and Microsoft in relation to the lawsuit include :

– OpenAI and Microsoft may be trying to get a “free-ride on The Times’s massive investment in its journalism” by using it to provide another way to deliver information to readers, i.e. a way around its payment wall. For example, the NYT alleges that OpenAI and Microsoft chatbots gave users near-verbatim excerpts of its articles. The NYT’s legal team have given examples of these, such as restaurant critic Pete Wells’ 2012 review of Guy Fieri’s (of Diners, Drive-Ins, and Dives fame) “Guy’s American Kitchen & Bar”. The NYT argues that this threatens its high-quality journalism by reducing readers’ perceived need to visit its website, thereby reducing its web traffic, and potentially reducing its revenue from advertising and from the digital subscriptions that now make up most of its readership.

– Misinformation from OpenAI’s (and Microsoft’s) chatbots, in the form of errors and so-called ‘AI hallucinations’ make it harder for readers to tell fact from fiction, including when their technology falsely attributes information to the newspaper. The NYT’s legal team cite examples of where this may be the case, such as ChatGPT once falsely attributing two recommendations for office chairs to its Wirecutter product review website.

“Fair Use” And Transformative 

In their defence, Open AI and Microsoft appear likely to be relying mainly on the arguments that the training of AI on NYT’s content amounts to “fair use” and the outputs of the chatbots are “transformative.”

For example, under US law, “fair use” is a doctrine that allows limited use of copyrighted material without permission or payment, especially for purposes like criticism, comment, news reporting, teaching, scholarship, or research. Determining whether a specific use qualifies as fair use, however, will involve considering factors like the purpose and character of the usage. For example, the use must be “transformative”, i.e. adding something new or altering the original work in a significant way (often for a different purpose). OpenAI and Microsoft may therefore argue that training their AI products could potentially be seen as transformative as the AI uses the newspaper content in a way that is different from the original purpose of news reporting or commentary. However, the NYT has already stated that: “There is nothing ‘transformative’ about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it”. Any evidence of verbatim outputs may also damage the ‘transformative’ argument for OpenAI and Microsoft.

Complicated 

Although these sound like relatively clear arguments either way, there are several factors that add to the complication of this case. These include:

– The fact that OpenAI altered its products following copyright issues, thereby making it difficult to decide whether its outputs are currently enough to find liability.

– Many possible questions about the journalistic, financial, and legal implications of generative AI for news organisations.

– Broader ethical and practical dilemmas facing media companies in the age of AI.

What Is It Going To Cost? 

Given reports that talks between all three companies to avert the lawsuit have failed to resolve the matter, what the NYT wants is:

Damages of an as yet undisclosed sum, which some say could be in the $billions (given that OpenAI is valued at $80 billion and Microsoft has invested $13 billion in a for-profit subsidiary).

For OpenAI and Microsoft to destroy the chatbot models and training sets that incorporate the NYT’s material.

Many Other Examples

AI companies like OpenAI are now facing many legal challenges of a similar nature, e.g. the scraping/automatic collection of online content/data by AI without compensation, and for other related reasons. For example:

– A class action lawsuit filed in the Northern District of California accuses OpenAI and Microsoft of scraping personal data from internet users, alleging violations of privacy, intellectual property, and anti-hacking laws. The plaintiffs claim that this practice violates the Computer Fraud and Abuse Act (CFAA).

– Google has been accused in a class-action lawsuit of misusing large amounts of personal information and copyrighted material to train its AI systems. This case raises issues about the boundaries of data use and copyright infringement in the context of AI training.

– A Stability AI, Midjourney, and DeviantArt class action claims that these companies used copyrighted images to train their AI systems without permission. The key issue in this lawsuit is likely to be whether the training of AI models with copyrighted content, particularly visual art, constitutes copyright infringement. The challenge lies in proving infringement, as the generated art may not directly resemble the training images. The involvement of Large-scale Artificial Intelligence Open Network (LAION) in compiling images used for training adds another layer of complexity to the case.

– Back in February 2023, Getty Images sued Stability AI alleging that it had copied 12 million images to train its AI model without permission or compensation.

The Actors and Writers Strike 

The recent strike by Hollywood actors and writers is another example of how fears about AI, consent, and copyright, plus the possible effects of AI on eroding the value of people’s work and jeopardising their income are now of real concern. For example, the strike was primarily focused on concerns regarding the use of AI in the entertainment industry. Writers, represented by the Writers Guild of America, were worried about AI being used to write or complete scripts, potentially affecting their jobs and pay. Actors, under SAG-AFTRA, protested against proposals to use AI to scan and use their likenesses indefinitely without ongoing consent or compensation.

Disputes like this, and the many lawsuits against AI companies highlight the urgent need for clear policies and regulations on AI’s use, and the fear that AI’s advance is fast outstripping the ability for laws to keep up.

What Does This Mean For Your Business? 

We’re still very much at the beginning of a fast-evolving generative AI revolution. As such, lawsuits against AI companies like Google, Meta, Microsoft, and OpenAI are now challenging the legal limits of gathering training material for AI models from public databases. These types of cases are likely to help to shape the legal framework around what is permissible in the realm of data-scraping for AI purposes going forward.

The NYT/OpenAI/Microsoft lawsuit and other examples, therefore, demonstrate the evolving legal landscape as courts now try to grapple with the implications of AI technology on copyright, privacy, and data use laws, and its complexities. Each case will contribute to defining the boundaries and acceptable practices in the use of online content for AI training purposes, and it will be very interesting to see whether arguments like “fair use” are enough to stand up to the pressure from multiple companies and industries. It will also be interesting to see what penalties (if things go the wrong way for OpenAI and others) will be deemed suitable, both in terms of possible compensation and/or the destruction of whole models and training sets.

For businesses (who are now able to create their own specialised, tailored chatbots), these major lawsuits should serve as a warning to be very careful in the training of their chatbots and to think carefully about any legal implications, and to focus on creating chatbots that are not just effective but are also likely to be compliant.

Tech News : Copyrights Conundrum: OpenAI Sued

It’s been reported that a trade group for U.S. authors (including John Grisham) has sued OpenAI, accusing it of unlawfully training its chatbot ChatGPT on their work.

Which Authors? 

The Authors Guild trade group has filed the lawsuit (in Manhattan federal court) on behalf of a number of prominent authors including John Grisham, Jonathan Franzen, George Saunders, Jodi Picoult, “Game of Thrones” novelist George R.R. Martin, “The Lincoln Lawyer” writer Michael Connelly and lawyer-novelists David Baldacci and Scott Turow.

Why? 

The Guild’s lawsuit alleges that the datasets that have been used to train OpenAI’s large language model (LLM) to respond to human prompts include text from the authors’ books, which may have been taken from illegal online “pirate” book repositories.

As proof, the Guild alleges that ChatGPT can generate accurate summaries of the authors’ books when prompted (including details not available in reviews anywhere else online), which indicates that that their text must be included in its database.

Also, the Authors Guild has expressed concerns that ChatGPT could be used to replace authors and instead could simply “generate low-quality eBooks, impersonating authors and displacing human-authored books.” 

Threat 

The Authors Guild said it organised the lawsuit after witnessing first-hand, “the harm and existential threat to the author profession wrought by the unlicensed use of books to create large language models that generate texts.”  

The Guild cites its latest author income survey as an example of how the income of authors could be adversely affected by LLMs. For example, in 2022 authors (according to the survey) earned just over $20,000, including book and other author-related activities, and although 10 percent of authors earn far above the median, half earn even less.

The Authors Guild says, “Generative AI threatens to decimate the author profession.”  

The Point 

To illustrate the main point of the Guild’s allegations, Scott Sholder, a partner with Cowan, DeBaets, Abrahams & Sheppard and co-counsel for Plaintiffs and the Proposed Class, is reported on their website as saying : “Plaintiffs don’t object to the development of generative AI, but Defendants had no right to develop their AI technologies with unpermitted use of the authors’ copyrighted works. Defendants could have ‘trained’ their large language models on works in the public domain or paid a reasonable licensing fee to use copyrighted works.”  

Open Letter With 10,000 Signatures 

The lawsuit may have been the inevitable next step considering that back in July, the Authors Guild submitted a 10,000 signature open letter to the CEOs of prominent AI companies (OpenAI, Alphabet, Meta, Stability AI, IBM, and Microsoft) complaining about the building of lucrative generative AI technologies using copyrighted works and asking AI developers get consent from, credit, and fairly compensate authors.

What Does Open AI Say? 

As expected in a case where so much may be at stake, no direct comment has been made public by OpenAI (so far) although one source (Forbes) reported online that an OpenAI spokesperson has told it was involved in “productive conversation” many creators around (including the Authors Guild) to discuss their AI concerns.

Where previous (copyright) lawsuits have been filed against it, in its defence OpenAI is reported to have pointed the idea of fair use that could be applied to LLMs.

Others 

Other generative AI providers are also facing similar lawsuits, e.g. Meta Platforms and Stability AI.

What Does This Mean For Your Business? 

Ever since ChatGPT’s disruptive introduction last November with its amazing generative abilities (e.g. with text and code, plus the abilities of image generators), creators (artists, authors, coders etc) have felt AI’s negative effects, expressed their fears about it, and felt the need to protest. For example, the Hollywood actors and writers strikes, complaints from artists that AI image generators have copied their styles, and now the Authors Guild are all part of a growing opposition who feel threatened and exploited.

We are still in the very early stages of generative AI where it appears to many that the technology may be running way ahead of regulation, and where AI providers may appear to be able to bypass areas of consent, copyright, and crediting, and in doing so, use the work of others to generate profits for themselves. This has led to authors, writers, actors, and other creatives fearing a reduction or loss of income and fearing that their skills and professions could be devalued, and that they can and will be replaced by AI. Also, they fear that generative AI could be preferred by studios and other content providers to reduce costs and complication, leading to the inevitable, multiple legal fights that we’re seeing now to clarify boundaries and protect themselves and their livelihoods. In the case of the very powerful Authors Guild, OpenAI will need to bring its ‘A’ game to the dispute as the Authors Guild points out it’s “here to fight” and has “a formidable legal team” with “expertise in copyright law.”

This is not the only lawsuit against an AI provider and there are likely to be many more and many similar protests until legal outcomes provide more clarity of the boundaries in the altered environment created by generative AI.

Tech News : Watermark Trial To Spot AI Images

Google’s AI research lab DeepMind has announced that in partnership with Google Cloud, it’s launching a beta version of SynthID, a tool for watermarking and identifying AI-generated images.

The AI Image Challenge

Generative AI technologies are rapidly evolving, and AI-generated imagery, also known as ‘synthetic imagery,’ is becoming much harder to distinguish from images not created by an AI system. Many AI generated images are now so good that they can easily fool people, and there are now so many (often free) AI image generators around and being widely used that misuse is becoming more common.

This raises a host of ethical, legal, economic, technological, and psychological concerns ranging from the proliferation of deepfakes that can be used for misinformation and identity theft, to legal ambiguities around intellectual property rights for AI-generated content. Also, there’s potential for job displacement in creative fields as well as the risk of perpetuating social and algorithmic biases. The technology also poses challenges to our perception of reality and could erode public trust in digital media. Although the synthetic imagery challenge calls for a multi-disciplinary approach to tackle it, many believe a system such as ‘watermarking’ may help in terms of issues like ownership, misuse, and accountability.

What Is Watermarking?  

Creating a special kind of watermark for images to identify them as being AI-produced is a relatively new idea, but adding visible watermarks to images is a method that’s been used for many years (to show copyright and ownership) on sites including Getty Images, Shutterstock, iStock Photo, Adobe Stock and many more. Watermarks are designs that can be layered on images to identify them.  Images can have visible or invisible, reversable, or irreversible watermarks added to them. Adding a watermark can make it more difficult for an image to be copied and used without permission.

What’s The Challenge With AI Image Watermarking? 

AI-generated images can be produced on-the-fly and customised and can be very complex, making it challenging to apply a one-size-fits-all watermarking technique. Also, AI can generate a large number of images in a short period of time, making traditional watermarking impractical, plus simply adding visible watermarks to areas of an image (e.g. the extremities) means it could be cropped and the images can be edited to remove it.

Google’s SynthID Watermarking 

Google SynthID tool  works with Google Cloud’s ‘Imagen’ text-to-image diffusion model (AI text to image generator) and uses a combined approach of being able to add and detect watermarks. For example, the SynthID watermarking tool can add an imperceptible watermark to synthetic images produced by Imagen, doesn’t compromise image quality, and allows the watermark to remain detectable, even after modifications (e.g. the addition of filters, changing colours, and saving with various lossy compression schemes – most commonly used for JPEGs). SynthID can also be used to scan an image for its digital watermark and can assess the likelihood of an image being created by Imagen and provides the user with three confidence levels for interpreting the results.

Based On Metadata 

Adding Metadata to an image file (e.g. who created it and when), plus adding digital signatures to that metadata can show if an image has been changed. Where metadata information is intact, users can easily identify an image, but metadata can be manually removed when files are edited.

Google says the SynthID watermark is embedded in the pixels of an image and is compatible with other image identification approaches that are based on metadata and, most importantly, the watermark remains detectable even when metadata is lost.

Other Advantages 

Some of the other advantages of the SynthID watermark addition and detection tool are:

– Images are modified so as to be imperceptible to the human eye.

– Even if an image has been heavily edited and the colour, contrast and size changed, the DeepMind technology behind the tool will still be able to tell if an imaged is AI-generated.

Part Of The Voluntary Commitment

The idea of watermarking to expose and filter AI-generated images falls within the commitment of seven leading AI companies (Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI) who recently committed to developing AI safeguards. Part of the commitments under the ‘Earning the Public’s Trust’ heading was to develop robust technical mechanisms to ensure that users know when content is AI generated, such as a watermarking system, thereby enabling creativity AI while reducing the dangers of fraud and deception.

What Does This Mean For Your Business?

It’s now very easy for people to generate AI images with any of the AI image generating tools available, with many of these images able to fool the viewer possibly resulting in ethical, legal, economic, political, technological, and psychological consequences. Having a system that can reliably identify AI-generated images (even if they’ve been heavily edited) is therefore of value to businesses, citizens, and governments.

Although Google admits its SynthID system is still experimental and not foolproof, it at least means something fairly reliable will be available soon at a time when AI seems to be running ahead of regulation and protection. One challenge, however, is that although there is a general commitment by the big tech companies to watermarking, the SynthID tool is heavily linked to Google’s DeepMind, Cloud and Imagen and other companies may also be pursuing different methods. I.e. there may be a lack of standardisation.

That said, it’s a timely development and it remains to be seen how successful it can be and how watermarking and/or other methods develop going forward.