Can Artificial Intelligence Be My Coauthor?

There has been a recent surge in interest about the use of artificial intelligence (AI) for writing scholarly manuscripts. Much of the discussion stems from the release of ChatGPT (OpenAI) late last year. ChatGPT is one of several AI programs that can generate text. The user enters a query or prompt, and the program quickly returns a well-articulated, grammatically correct written response. Thus, why wouldn't we want to use AI to help write our manuscripts?

The objective of this editorial is to provide a timely perspective on AI and the publication of scholarly material. I present some background on generative AI programs, provide some examples of how they can be used, and discuss some opportunities and concerns for their use in publishing scientific material. This is not intended to be a comprehensive primer on AI for publishing, but instead an introduction to this technology and current policies on how it can and cannot be used for publishing manuscripts in this journal. More detailed information can be found by examining the references cited here.

ARTIFICIAL INTELLIGENCE AUTHORING TOOLS

The last few years have seen a huge increase in the availability of generative AI tools. These programs use algorithms to create new content including audio, images, videos, text, computer code, and simulations. Whereas many traditional AI programs were designed for pattern detection, the newer generative programs create material.

ChatGPT is an example of a deep learning large language model (LMM). Essentially, these models are designed to predict the next best word in a string. It is a form of Chatbot like those that attempt to complete words and phrases when texting or typing email. The GPT stands for generative pretrained transformer. Basically, the program is generative in that it creates new material. There are many types of generative AI programs that create different outputs besides text. For instance, Dall-E (OpenAI) and Stable Diffusion (Stability AI, Ltd) are examples of models that generate or embellish images based on text descriptions. ChatGPT is pretrained. The model used autoregressive training on an enormous amount of data available to Microsoft's Bing search engine and Wikipedia content through September 2021 and is estimated to have 175 billion parameters.1 The model “learns” through a fine-tuning process based on supervised learning and reinforcement learning coupled with human feedback. It then uses that dataset to generate its responses. Lastly, the transformer refers to a specific AI architecture programmed with algorithms that decipher and generate conversational text.

ChatGPT became freely available in November 2022 and public interest was widespread and immediate. It is estimated to have more than 100 million active users today.2 Several versions exist with newer releases having exponentially more parameters and capabilities. The newest version is ChatGPT 4 and is significantly more advanced than its predecessor, ChatGPT 3.5. It can process both text and image inputs and OpenAI claims that it is much less likely to respond to requests for disallowed content and more likely to produce factually correct responses.3 There is also ChatGPT Plus, which is available for a monthly fee and is currently how most people can access to GPT 4 technology. As of this writing, OpenAI also offers a free version of ChatGPT 3.5, but it comes with some restrictions on usage. Microsoft's Bing also runs a version of GPT 4 for searches using the Microsoft Edge browser.

There are other LMMs besides ChatGPT. Some include LLaMA by Meta,4 PaLM-E by Google,5 and GPT4ALL,6 an open source LMM. There are also other AI authoring programs. One example is Bard, Google's conversational AI tool released in March 2023.7 It was developed to compete with ChatGPT and help Google maintain its status as the dominant search engine. Unlike ChatGPT, Bard can access the Internet and provide responses based on current information.8

WHAT CAN CHATGPT DO?

ChatGPT can generate text and other material. It can compose email, letters, summarize other written material, perform copy editing, write learning objectives, create lesson plans, write (and answer) test questions, serve as a tutor, engage in debates, create games, automate tasks, and tailor its responses in other styles. It can perform some basic mathematical operations, but it is not really designed for that purpose and its answers need to be validated (OpenAI claims ChatGPT 4 is better at answering math questions).

ChatGPT produces compelling responses because it is designed to emulate human conversations. It generates the next word based on a set of likely candidate words with associated weights and a “temperature” score that determines how strictly it adopts the most likely word. A lower temperature means it will be more consistent in its selections. A higher temperature means it will be more random in its choices, exhibit more “creativity,” and unlikely to return the same exact response each time it is given the same query. Users can adjust the temperature parameter.9

ChatGPT retains some history of the sequence of interactions or dialog and uses it as the conversation evolves. Its responses tend to be a bit verbose, but it can be asked to be more succinct. In addition, ChatGPT is primarily conversant in English but can handle many other languages.

It is also important to understand what ChatGPT cannot do. First, ChatGPT is not a search engine. It does not take text queries and return relevant links and it does not search for current information on the Internet. As noted previously, ChatGPT was pretrained on content available to Microsoft's Bing search engine up through 2021. Thus, it can only generate content based on what was available when it was trained. It cannot tell you anything about the current King of England. In addition, it cannot give “personal” opinions and does not have access to personal information. Lastly, training and human intervention were used to minimize or prevent the possibility of generating responses that can cause harm.

CONCERNS ABOUT AI AUTHORING TOOLS

Artificial intelligence authoring tools are impressive, but like all technology they have their limits. One of the most troubling issues is that they often provide wrong answers. Again, because ChatGPT is designed to predict the next word in a sequence based upon its training material, its responses are probabilistic and it responds with “best guesses” that can be factually incorrect. These have been referred to as hallucinations.10 However, its responses usually seem reasonable and it can sound confident about its answers. Thus, users need to validate its output, but its inaccuracies can be easily overlooked by the uninformed. In addition, its results are not necessarily replicable. As noted previously, the temperature can intentionally introduce a lot of variability in its responses.

It is important to understand that ChatGPT is not open source software. The details of the model and its training are proprietary and are not available to scrutiny. It is not clear where ChatGPT gets its information and whether any sources were validated in the training process. Furthermore, any biases that exist in the training material or the refinement process are inherent in its dataset and will show up in its responses. Thus, many in the scientific community are not enthusiastic about engaging in research with a tool that is opaque. This has led some researchers to argue against using proprietary LMMs and put more effort into developing open source ones to ensure reproducibility.11

ARTIFICIAL INTELLIGENCE AND PUBLISHING

The use of AI is not foreign to the publishing world. Many publishers have turned to sophisticated software services to protect copyrights and detect plagiarism. For example, iThenticate (Turnitin, LLC) is a plagiarism detection service that can be integrated into content management systems and manuscript tracking systems.12 Editorial Manager, the content management software used by our publisher Wolters Kluwer, currently uses this tool. The program compares a manuscript against a database of almost 90 million articles from journals, conferences, and books and 70 billion webpages and provides details on how much written content matches what it detects.

Artificial intelligence authoring tools, however, present a new dilemma for publishing. They can produce compelling material. Gao and colleagues13 used ChatGPT to generate a set of medical research abstracts and none were flagged for plagiarism. As noted previously, proprietary systems like ChatGPT generate responses based on training material that was available on the Internet. However, it does not always reveal the specific sources for its responses and will even fabricate the sources when queried about them. In fact, in April 2023, Turnitin14 announced it added AI writing detection capabilities to their manuscript screening applications, citing the necessity to keep pace with current AI authoring tools.

CAN AI BE AN AUTHOR?

Artificial intelligence authoring tools can be useful for writing scientific summaries. In fact, ChatGPT was listed as lead author in a recent article in Oncoscience.15 However, the issues surrounding fabrication of information are well documented and it is vital to recognize that AI authoring tools have no true understanding of what they generate, nor any capacity to distinguish true from false statements. They do not possess scientific integrity no matter how confident they seem to be about their responses.16 They tend to generate oversimplified text lacking important insight and value judgments. Birhane et al17 argue that the use of LLMs in writing peer-reviewed reports increases the possibility for misinterpretation of the submitted manuscript by overlooking crucial information or by fabricating information. This in turn could undermine trust in the peer-review process itself.17 Ultimately, they argue that to consider LLMs as scientists or authors is to overlook the need for responsibility and accountability in both science and LLMs.

International Committee of Medical Journal Editors/Committee on Publication Ethics POSITION STATEMENT

The surging interest in AI tools for writing manuscripts has led several journals (eg, JAMA, Nature, Science),18–20 publishers, and organizations concerned with integrity in scholarship [eg, Committee on Publication Ethics (COPE)] to adopt formal positions regarding AI and authorship. To reiterate, ChatGPT was not built to discern the factual truthfulness of any statement it generates, and therefore any correct assertions it makes are incidental to (and not because of) its underlying functionality. Thus, Science has adopted the most conservative stance and forbids an AI program from being an author and for using any text, figures, images, and graphics generated by AI tools in the work. Policies of other journals allow use of AI-generated content providing proper disclosure.

Simulation in Healthcare is a member of the International Committee of Medical Journal Editors (ICMJE), and we follow their recommendations for authorship.21 Two recommendations are pertinent here: (1) all authors must give final approval of the version to be published and (2) they take accountability for all aspects of the work, including accuracy and validity of the contents, and ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. Thus, because AI authoring tools cannot take responsibility for the work, they cannot be listed as an author.

Furthermore, according to the COPE position statement,22 any author who uses AI tools in any manner to prepare any portion of a manuscript (eg, production of images or graphical elements, the collection and analysis of data, or written text) must be transparent in disclosing in the method section of the manuscript how the AI tool was used and which tool and version was used. Ultimately, “authors are fully responsible for the content of their manuscript, even those parts produced by an AI tool, and are thus liable for any breach of publication ethics. As nonlegal entities, an AI tool cannot assert the presence or absence of conflicts of interest nor manage copyright and license agreements.”22

SIMULATION IN HEALTHCARE POSITION ON AI CONTRIBUTIONS TO MANUSCRIPTS

In agreement with the position stated by the COPE and the International Committee of Medical Journal Editors, the policy of this journal shall be:

Authors are fully responsible for the content of their manuscript, even those parts produced by an AI tool, and are thus liable for any breach of publication ethics. As nonlegal entities, an AI tool cannot assert the presence or absence of conflicts of interest nor manage copyright and license agreements. Any author who uses AI tools in any manner to prepare any portion of a manuscript (eg, written text, production of images or graphical elements, or the collection and analysis of data) must be transparent in disclosing in the method section of the manuscript how the AI tool was used and which tool and version was used. This applies to authors and contributors.

This journal will consider manuscripts where authors study, evaluate, or discuss the use of AI; however, current generative AI models do not possess the capacity to “fact check” their output and have been shown to generate false information. Thus, it is incumbent on all authors using AI as a writing tool to verify the accuracy and appropriateness of all material included in their manuscripts.

The Instructions for Authors page on the journal's Web site has been updated accordingly (https://edmgr.ovid.com/sih/accounts/ifauth.htm).

CAN AI TOOLS BE A PEER REVIEWER?

The conversational prowess of AI authoring tools could facilitate the editorial and peer reviewer process by helping to craft summaries and even decisions more quickly.23 Some have even suggested that these tools could help address a shortage of peer reviewers.24 However, Hosseini and Horbach24 expressed concerns about the opacity of proprietary models and their training data, inner workings, and the potential for exacerbating existing biases. Thus, they argue that like authors, reviewers, and editors should also disclose their use.

CONCLUSIONS

Generative AI is here. With hundreds of millions of users and increasing capabilities with each new release, it is a disruptive technology that will have profound effects on how we prepare, disseminate, and receive written, visual, and auditory information. Its applications are as boundless as the creativity of its users. However, the characteristic that often gets overlooked when discussing the virtues and concerns surrounding generative AI is that it is fast. Writing technical material is often an onerous, time-intensive process. I believe that many authors are likely to welcome an aid that can get them a first draft in matter of minutes even if it requires time for additional scrutiny and editing. After all, is that not how we routinely handle written drafts from our students and junior colleagues?

We are only at the beginning stages of generative AI and its impact is unknown. However, its trajectory is likely to follow that of other automated systems or technology that facilitate how we perform certain tasks. For example, telephone answering systems, ATMs, automated flight management systems, and electronic intravenous pumps each simplified a task, made the process more reliable, or reduced/eliminated the need for people to perform the task. Once the benefits are realized, they become the standard way of doing things. However, the benefits are always accompanied by costs that may or may not be obvious.25

As AI systems become more sophisticated and comprehensive in their degree of autonomy, users transition from performing tasks themselves to managing how the system performs the tasks. Furthermore, the underlying operations become more opaque making it difficult or impossible for users to fully understand how the technology works, to detect operational anomalies, and to intervene when necessary.26,27 This “out of the loop” problem is well understood in commercial aviation as pilots now spend more time managing the automated flight deck than actually flying and it has been identified as a contributor to crashes because of a loss of situation awareness.28,29

As noted previously, ChatGPT 3.5 can generate very compelling, well-written responses; however, its propensity to hallucinate and fabricate information is known. As such, it is shifting the role of the user from writer to fact checker and editor, but the user must have the requisite domain knowledge to validate the program's output. OpenAI claims that ChatGPT 4 is less likely to hallucinate. On a positive note, that means that the newest release is more accurate. On the other hand, because ChatGPT 4 still does not have domain knowledge and expertise it will ultimately make validating the program's responses even more challenging for users.

To conclude, I believe that members of the healthcare simulation community may have the ideal perspective for AI tools and authorship. I cannot imagine anyone in our community letting someone who had performed a few attempts at a procedure in a simulation, but with no domain knowledge or understanding of health and patient care, perform that procedure on a genuine patient. In fact, there are laws to prevent that. From an editorial perspective, a similar argument could be made about AI authoring tools. They can string together grammatical sentences in response to a request, but they do not understand what they generate. People, however, communicate to share meaning and hope to achieve mutual understanding. Thus, for the foreseeable future, I think we will reserve authorship for humans.

ACKNOWLEDGMENT

The author thanks Aaron Calhoun for his helpful comments on an earlier draft of this article.

REFERENCES 1. Korngiebel DM, Mooney SD. Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery. NPJ Digit Med 2021;4:93. 2. Hu K. ChatGPT sets record for fastest-growing user base—analyst note. Available at: https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/. Accessed May 14, 2023. 3. OpenAI. GPT-4 is OpenAI's most advanced system, producing safer and more useful responses. Available at: https://openai.com/gpt-4. Accessed May 14, 2023. 4. MetaAI. Introducing LLaMA: a foundational, 65-billion-parameter large language model. Available at: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/). Accessed May 14, 2023. 5. Google Research. PaLM-E: an embodied multimodal language model. Available at: https://ai.googleblog.com/2023/03/palm-e-embodied-multimodal-language.html. Accessed May 14, 2023. 6. Nomic. GPT4All: a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required. Available at: https://gpt4all.io/index.html. Accessed May 14, 2023. 7. Themeisle. ChatGPT vs Google BARD: a comparison for the ages. Available at: https://themeisle.com/blog/chatgpt-vs-google-bard/#:~:text=After%20comparing%20the%20performance%20of,intelligent%20approach%20to%20generating%20text. Accessed May 30, 2023 8. Ortiz S. What is Google Bard? Here's everything you need to know. Available at: https://www.zdnet.com/article/what-is-google-bard-heres-everything-you-need-to-know/). Accessed June 4, 2023. 10. Sanderson K. GPT-4 is here: what scientists think. Nature Mar. 2023;615:773. 11. Spirling A. Open generative AI models are a way forward for science. Nat 2023;616:413. 12. Lee C. What is iThenticate? Who is it for? Available at: https://www.turnitin.com/blog/what-is-ithenticate-who-is-it-for). Accessed June 1, 2023. 13. Gao CA, Howard FM, Markov NS, et al. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digit Med 2023;6:75. 14. Turnitin. Turnitin's AI writing detection available now. Available at: https://www.turnitin.com/solutions/ai-writing). Accessed June 1, 2023. 15. Zhavoronkov AChatGPT Generative Pre-trained Transformer. Rapamycin in the context of Pascal's wager: generative pre-trained transformer perspective. Onco Targets Ther 2022;9:82–84. 16. Not a generative AI–generated editorial. Nat Cancer 2023;4:151–152. 17. Birhane A, Kasirzadeh A, Leslie D, Wachter S. Science in the age of large language models. Nature Rev Phys 2023;5:277–280. 18. Flanagin A, Bibbins-Domingo K, Berkwits M, et al. Nonhuman “authors” and implications for the integrity of scientific publication and medical knowledge. JAMA 2023;329(8):637–639. 19. Tools such as ChatGPT threaten transparent science; here are our ground rules for their use. Available at: https://www.nature.com/articles/d41586-023-00191-1). Accessed June 1, 2023. 20. Thorp HH. ChatGPT is fun, but not an author. Science 2023;379(6630):313. doi:10.1126/science.adg7879 Epub 2023 Jan 26. 22. Authorship and AI tools. Available at: https://publicationethics.org/cope-position-statements/ai-author). Accessed June 6, 2023. 23. Enago Academy. Will ChatGPT disrupt peer review? Impact of AI on the hallmark of science vigilance. Available at: https://www.enago.com/academy/chatgpt-disrupt-peer-review-science-vigilance/. Accessed June 4, 2023. 24. Hosseini M, Horbach SPJM. Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review. Res Integr Peer Rev 2023;8(1):4. doi:10.1186/s41073-023-00133-5. 25. Parasuraman R, Riley V. Humans and automation: use, misuse, disuse, abuse. Hum Fac 1997;39:230–253. 26. Woods DD. Decomposing automation: apparent simplicity, real complexity. In: Parasuraman R, Mouloua M, eds. Automation and Human Performance: Theory and Applications. Mahwah, NJ: Erlbaum; 1996:3–17. 27. Scerbo MW. Theoretical perspectives on adaptive automation. In: Parasuraman R, Mouloua M, eds. Automation and Human Performance: Theory and Applications. Mahwah, NJ: Lawrence Erlbaum Associates; 1996:37–63. 28. Idowu AG, Shogbonyo MA, Adeyeye OA. Situational awareness and workload management in aviation: a case analysis of the crash of American Airlines Flight 965. Collegiate Aviation Rev Int 2022;14(1):60–73. 29. Sarter N, Woods D. Pilot interaction with cockpit automation: operational experiences with the flight management system. Int J Aviation Psych 1992;2:303–322.

Comments (0)

No login
gif