Microsoft and Meta Tackle AI Advancements and Data Confidentiality Concerns

Towards the end of my recent blog post on Copilot I referenced Microsoft strongly recommends customers ensure their data storage and access strategies are well-established to minimise the risk of data leaks due to the use of Artificial Intelligence (AI) and LLM (Large Language Models).

This is in part due to the potential for AI (Artificial Intelligence) and LLM (Large Language Models) to potentially access significant amounts of freely available data located both publicly on the internet and data held privately on clients’ network or cloud deployments, to fulfil the User’s AI request.

This is then compounded by the need for the User to identify, take responsibility, and consider thoroughly if what is ‘produced’ can be released to their Client or in the Public Domain without contravening company rules or more importantly data confidentiality legislation.

But in accessing the data requested by AI and LLM another issue has arisen which appears to have been overlooked by the major player such as Microsoft and Meta – the issue of copyright infringement.

Last week in a meeting with the House of Lords Communications and Digital Committee, representatives from Microsoft and Meta, the parent company of Facebook and WhatsApp, delved into discussions about the current state of Artificial Intelligence (AI) and large language models (LLMs).

While both tech giants highlighted the remarkable advancements in the field, the conversation took an interesting turn as parliamentarians probed into the potential implications of their work on protecting intellectual property (IP). The central issue revolves around the need for LLMs to access vast amounts of freely available data, raising questions about copyright infringement under UK law.

Advances in AI and LLMs

During the committee meeting, Meta’s Vice-President and Deputy Chief Privacy Officer for Policy, Rob Sherman, and Microsoft’s Director of Public Policy at the Office for Responsible AI, Owen Larter, shed light on the current state of LLMs. Sherman emphasised that the industry is at a crucial inflection point, where AI models can run more efficiently. He also discussed Meta’s investment in detecting and correcting bias in machine learning models to foster fairness and inclusivity.

Larter expressed enthusiasm for the opportunities presented by AI, particularly in terms of boosting productivity. He showcased Microsoft’s Copilot AI, highlighting its impact on software developers by automating code completion.

Open Source and Data Models

The committee members sought insights into the balance between open and closed data models and the associated risks and innovations. Sherman underscored Meta’s support for open source, emphasising its importance in the AI ecosystem. He highlighted how open models lower barriers to entry, enabling small and mid-sized firms to innovate alongside larger businesses.

In contrast, Larter approached the topic with caution, emphasising the need for a careful consideration of trade-offs between openness and safety, especially concerning highly capable frontier models.

Copyright Considerations

A key point of contention emerged when Lord Donald Foster raised questions about the implications of UK copyright laws related to data used to train LLMs. The discussion revolved around the requirement for a copyright license when ingesting text and data for training commercial LLMs, as per UK law.

Sherman argued that copyright laws were established before LLMs, and the courts should decide what constitutes fair use. He emphasised the importance of maintaining broad access to information for innovation. Larter, on the other hand, pointed to recent clarifications in the EU and Japan that provide exceptions for text and data mining within the context of training an AI model.

The Road Ahead

The debate over whether AI should be exempt from copyright law continues, with differing opinions among stakeholders. As the AI landscape evolves, it becomes crucial for developers and tech companies to navigate the legal landscape carefully. The need for compliance with existing laws and considerations for IP ownership and compensation is paramount.

Conclusion

The House of Lords committee meeting provided valuable insights into the perspectives of Meta and Microsoft on AI, LLMs, and the legal challenges associated with data usage. As AI technologies advance, the delicate balance between innovation and legal compliance remains a pressing issue. The conversation underscores the necessity for ongoing dialogue between tech companies, legislators, and the broader public to shape responsible AI development and usage in the future.

However bringing this back to ‘us’ as layman Users, if you are using AI please consider:-

Should the information I am generating be passed to a client or placed in the Public Domain?
Does this contravene anyones Intellectual Property Rights?
Is the ‘output’ likely to cause Copyright Infringement?

Thanks

Richard