Background: Generative AI platforms with advanced algorithms offer potential applications in medical research. However, there is little knowledge about the accuracy of AI chatbots to identify correct information for lung cancer.
Objectives: To assess the capabilities of ChatGPT-3.5, ChatGPT-4, Copilot, and Bard in answering lung cancer questions.
Methods: Structured lung cancer questionnaires on epidemiology, diagnosis, biomarker, histology, clinical symptoms and progression, systemic therapy, surgery, management and prognosis, and respective answers from were used. The AI-based chatbots (ChatGPT-3.5, ChatGPT-4, Microsoft Copilot, and Google Bard) were asked and compared for their abilities to correctly answer the questions at first time. Compared to the answers as gold standard, descriptive statistics, such as accuracy, specificity, sensitivity, negative predictive value (NPV), and positive predictive value (PPV), were calculated for each chatbot.
Results: By using well-established answers as reference, ChatGPT-4.0 had the highest performance with an accuracy score of 0.950, with Google Bard having the lowest accuracy rate of 0.750. The specificity and sensitivity ranged from a low of 0.833 and 0.500 (Google Bard) to a high of 0.967 and 0.900 (ChatGPT-4.0), respectively. Both ChatGPT-3.5 and Copilot had similar performance. Findings included: i) each chatbot provided the rationale when answering the question; ii) all the 4 chatbots were unable to find correct answer on approximate response rates of chemotherapy in small cell lung cancer (SCLC); iii) both ChatGPT-3.5 and Copilot could not find correct answer on 5-year survival rates of 60-80% in stage I of the non-small cell lung cancer (NSCLC). Detailed findings, lessons learned and limitations of the chatbots will be discussed.
Conclusions: ChatGPT-4 had the highest accuracy and specificity, outperforming Google Bard, ChatGPT-3.5, and Copilot. Current chatbots still have a lot of limitations, however the abilities to quickly gather medical information, some AI chatbots may be promising tool in lung cancer patient care. Further research to assess the benefits and limitations of the AI chatbot on cancers and other diseases, is necessary.