GPT-4: The arrival of multimodal AI and its broader implications on the legal profession

The silhouette of a man standing in front of a background of many small green and blue coloured lights.

Barely three months since ChatGPT was released to the public, in March 2023 OpenAI announced the release of GPT-4 to paid subscribers. GPT-4 is an upgraded version of ChatGPT-3 and arguably the most advanced of all AI chatbots to date, with the ability to "read" and "interpret" images as well as text. Meanwhile, Italy has banned ChatGPT over concerns around potential privacy breaches involved in the masses of data it is trained on. Other EU member states are also making further inquiries.

Notwithstanding these concerns, Microsoft's partnership with OpenAI is gaining even more pace with the development of Copilot, which effectively integrates GPT-4 in the Microsoft365 suite of applications. Copilot may prove to be the ultimate gamechanger for businesses and even change how budgets are allocated when it comes to legal advisory and compliance functions. According to OpenAI, GPT-4 scored 75 percent on the Uniform Bar Exam in the United States, which places it in the 90th percentile. For the LSATs, GPT-4 placed in the 88th percentile. But does this make GPT-4 a good substitute lawyer?

In this article we compare the role of legal advice provisioned by lawyers with targeted information about a legal issue provisioned by AI, as well as some of the developments and shortcomings of the world's leading large multimodal model.

New features

The most interesting new feature is that GPT-4 accepts image inputs as well as text (which is why it is called a multimodal model). In the below exchange, GPT-4 is able to interpret a nuanced image such as an internet meme and derive its intended humour.

Figure 1

GPT-4 figure 1

Source: GPT-4 technical report, page 38

GPT-4 can also now process up to 25,000 words at a time, greatly expanding the types of requests it can deal with.

Figure 2

GPT-4 figure 2

Source: GPT-4 technical report, page 37

OpenAI claims that GPT-4 demonstrates advances in the accuracy of its responses, being 40% more likely to produce factual responses than GPT-3.5 (the model used by ChatGPT). OpenAI also claims that GPT-4 has stronger safety guardrails and is 82% less likely to respond to requests for disallowed content, based on OpenAI's undisclosed criteria. Further, GPT-4 has been trained with a "safety reward signal" system to better identify when a sensitive topic could be a valid request.

Figure 3

GPT-4 figure 3

Source: GPT-4 technical report, page 13


GPT-4 has many of the same shortcomings seen in ChatGPT. It still lacks knowledge of events after September 2021. It can confidently produce incorrect answers. It makes errors in reasoning. And its safety guardrails are not foolproof ─ GPT-4 can still produce harmful advice, and can be "jailbroken" to generate content that violates its usage guidelines.

A major concern is the lack of transparency around the data sets used. OpenAI has received criticism for failing to disclose technical aspects of its software, including details about the data it was trained on, purportedly due to the "competitive landscape and the safety implications". However, this may be a significant consideration for paying subscribers. "Who" or "what" does the data represent? OpenAI acknowledges that the model "can have various biases in its outputs". In one reported experiment, ChatGPT refused to write a poem admiring former US president Donald Trump, but did produce a poem when asked the same for current US president Joe Biden.

Regardless of whether the bias is the result of an inadvertent oversight, a flaw in the data set, or intentional design, it can have the effect of influencing users without their knowledge. Moreover, without transparency around the data and algorithmic rules employed by the platform, users may not have enough information to make an informed decision as to its suitability or reliability for their particular use.

How GPT-4 makes money

There has been some speculation that ChatGPT is running large losses for OpenAI, due to the complexity of computations required to generate its responses. For GPT-4, this cost is likely to be even higher.

In January 2023, just two months after launch, ChatGPT reached over 100 million users. OpenAI has moved to capitalise on this overwhelming demand, and on 2 February 2023 announced its "ChatGPT Plus" subscription at a price of US$20 per month.

Ultimately, if ChatGPT can offer people reliable and accurate answers to their requests, it may have the potential to upend the traditional search engine model. So far, OpenAI has partnered with Microsoft to integrate GPT-4 in Microsoft's Bing search engine. Currently, search is dominated by Google, which generated over US$162 billion in Google Search advertising revenues in 2022.

Legal industry applications: Putting GPT-4 to the test

Despite GPT-4's reported success in the LSATs and Uniform Bar Exam, it still has some difficulty assessing and incorporating legal considerations in its responses ─ as seen in its somewhat misleading replies to questions commonly presented to commercial lawyers:


Can I collect personal information for my project?


Yes, you can collect personal information for your project. There are many methods of collecting personal information, such as surveys, interviews, tests, physiological assessments, observations, existing record reviews, and biological samples.1 Additionally, Microsoft Forms can be used to collect information with choice, text, rating, or date questions.2 It is important to handle the collection of personal information delicately, as it can be sensitive.3

"Personal information" has a specific meaning under the Privacy Act 1988 (Cth) and is regulated under Australian law. Although GPT-4 appears to apprehend that there may be sensitivities around the collection of personal information, it gave a definitive answer ("Yes") to the question "Can I collect personal information for my project?". Yet it did not indicate whether there may also be legal issues to consider. One of the first questions a lawyer might normally ask would relate directly to the project itself for the relevant context. Without any prompting to indicate its limitations, GPT-4 relies on the user to identify the relevant jurisdiction, which is likely to have a significant impact on the correct legal response.

In order to give GPT-4 the opportunity to respond more accurately, we then prompted it further to see if we could elicit a legal response.


What do you mean by "delicately"?


By handling the collection of personal information delicately, I mean that you should be respectful of respondents' privacy and sensitive to their feelings. For example, you should explain why you are collecting the information, how it will be used, and how it will be protected. You should also obtain informed consent from respondents before collecting any personal information. This means that you should inform them of the purpose of the study, the type of information you will collect, and how it will be used. Additionally, you should inform them of their rights to refuse to participate or to withdraw from the study at any time.

Although GPT-4 gives a helpful response, it is incomplete and at times incorrect. GPT-4 does not give the user any indication that there may be legal obligations or ramifications if the user fails to take certain steps. We were left wondering whether GPT-4 had considered any legal sources at all (notwithstanding the other products being developed for legal research, such as the AI chatbot Harvey).


What have you based your responses on?


As an AI assistant, I have been programmed to provide responses based on a combination of pre-existing knowledge, data and information from reputable sources. In this case, my responses were based on my understanding of the terms used in the questions asked, and my knowledge of best practices for handling personal information and respecting participants' rights in research.

Additionally, I draw upon a variety of sources to provide accurate and up-to-date information. These sources may include academic research articles, government regulations, ethical guidelines, and industry best practices. I strive to provide responses that are both informative and accurate, and I continuously update my knowledge and sources to ensure that my responses reflect the latest information and best practices.

This response suggests GPT-4 has the capacity to review "government regulations". Despite this, GPT-4 failed to identify the legal facets of a question with real legal ramifications. The use of the word "I" also suggests some form of legal personality or legal entity status which AI cannot be, unlike its creator and owner, OpenAI.

In order for an AI chatbot to effectively replace the role of lawyers in reviewing legal documents and providing legal advice, it would need to:
  • be a qualified practitioner
  • satisfy the "fit and proper person" test, and
  • eliminate concerns about the accuracy of its responses, the confidentiality of information processed by it, and the ownership status of any intellectual property produced.

As a legal research aid for lawyers or businesses, if the AI chatbot cannot guarantee accuracy, then it would be more useful to give the user a way to verify its response. GPT-4 has not resolved these issues, and is not yet able to deliver a reliable and comprehensive legal solution ─ although perhaps other emerging products, like Harvey or Copilot, will.

Despite these limitations, GPT-4 and other AI chatbots may be able to generate efficiencies in the delivery of legal services, especially as they become more proficient in processing large legal documents. For example, on 22 March 2023 PwC announced that it had formed an official partnership with Harvey to assist its legal business solutions team. As Harvey was built using GPT-4, its founders claim it can increase efficiency for the legal industry, especially in due diligence and research tasks. Meanwhile, London-based firm Allen & Overy is reportedly using Harvey to assist lawyers in drafting contracts and writing client memos. Lander & Rogers is also forming a strategic AI partnership, which will be announced in the near future.


Concerns about trust, safety and reliability continue to surface when OpenAI and GPT-4 are tasked with responding to legal problems. Until the technology can provide responses with 100% accuracy, users will have difficulty relying on it with their legal issues, or will need the ability to verify its responses. This is not currently enabled by GPT-4.

Despite its shortcomings, however, GPT-4 is driving innovation in the law. If OpenAI continues its current trajectory, future iterations of GPT-4 can be expected to substantially impact the legal industry, both as a tool for increasing the efficiency of lawyers and quickly delivering legal solutions.

For more information on AI, emerging technologies and their professional applications, please contact a member of our Digital Economy practice and explore our legal innovation capabilities.

Image by Caleb Black on Unsplash




All information on this site is of a general nature only and is not intended to be relied upon as, nor to be a substitute for, specific legal professional advice. No responsibility for the loss occasioned to any person acting on or refraining from action as a result of any material published can be accepted.