Abstract
Background: Heart disease remains a leading cause of death for women in the United States, but awareness and knowledge about it are declining. Artificial intelligence (AI) chatbots have great potential to educate women.
Objective: This study aimed to evaluate the potential efficacy of HeartBot to increase women’s awareness and knowledge of heart attack symptoms and care-seeking behavior.
Methods: In this nonrandomized pilot, quasi-experimental study, 92 women aged ≥25 years without a history of heart disease completed the HeartBot interaction via SMS text messaging. The study was remotely conducted from October 2023 to January 2024. HeartBot, a fully automated AI chatbot, covered 15 topics of heart attack awareness, knowledge, symptoms, and care seeking in a single session. The mean length of the HeartBot interaction was 13.0 (SD 7.80) minutes. The primary outcomes consist of four questions: (1) recognizing signs and symptoms of a heart attack, (2) telling the difference between the signs and symptoms of a heart attack, (3) calling an ambulance or dialing 911 when experiencing heart attack symptoms, and (4) getting to an emergency room within 60 minutes after the onset of symptoms of a heart attack. Women were asked to answer the 4 questions before and after the HeartBot interaction on a scale of 1 to 4, with a higher score indicating higher levels of awareness and knowledge of heart attack risks and symptoms.
Results: The mean age of the sample was 45.9 (SD 11.9) years. In total, 59.8% (55/92) of the sample identified as belonging to racial or ethnic minority groups. The mean length of the HeartBot interaction was 13.0 (SD 7.80) minutes. In ordinal logistic regression models, women showed significant improvements across the 4 self-reported outcomes (ie, heart attack symptoms and calling 911) even after controlling for potential confounding factors (outcome 1: adjusted odds ratio [aOR] 7.10, 95% CI 3.52‐13.16; outcome 2: aOR 5.47, 95% CI 2.77‐10.78; outcome 3: aOR 5.75, 95% CI 2.86‐11.59; and outcome 4: aOR 2.85, 95% CI 1.54‐5.25; P<.001 for all 4 outcomes).
Conclusions: HeartBot led to a substantial increase in awareness and knowledge of heart attack risks and symptoms in women. These findings suggest that HeartBot is a promising approach to improving heart health education. A randomized controlled trial of HeartBot is warranted to establish its efficacy and safety for the clinical setting.
doi:10.2196/80407
Keywords
Introduction
Background
Artificial intelligence (AI) chatbots using natural language processing and machine learning facilitate natural human-machine conversations. In recent years, AI chatbots have gained significant popularity in health care and research, and researchers have investigated their efficacy across health domains. AI chatbots have shown potential to improve mental health [-] and other chronic illnesses [] and to promote healthy lifestyles and self-care behaviors [-]. However, research on the use of chatbots in cardiovascular health is still in its infancy.
Heart disease continues to be the leading cause of mortality and morbidity for women in the United States []. More than 60 million women in the United States are living with heart disease []. Over the past 2 decades, several awareness campaigns, such as Go Red for Women [] by the American Heart Association, have been conducted to educate the public and women regarding heart disease. Despite these large-scale public health campaigns, awareness of heart disease as the leading cause of death among women declined from 65% in 2009 to 44% in 2019 []. The greatest declines in awareness were observed among Hispanic and Black women as well as younger women [-].
Objectives
We therefore need a new approach to increase knowledge and awareness of heart disease in women. We recently developed and tested an AI chatbot (hereafter referred to as “HeartBot”) in a series of studies to achieve this goal. Given the rapid advancement of AI technologies and the increasing prevalence of smartphone ownership [], HeartBot could have significant advantages over traditional public health campaigns, expanding reach and delivering personalized communication. Therefore, this study aims to evaluate the potential efficacy of a fully automated AI HeartBot in increasing women’s awareness and knowledge of heart attack symptoms and care-seeking behavior.
Methods
Study Design and Sample
We remotely conducted a pilot, quasi-experimental study with 92 participants using the user-centered design approach []. Eligibility criteria were women aged ≥25 years, no self-reported cognitive impairment or history of heart disease or stroke, not a health care professional or student, not working in the health care field, living in the United States, possessed a cell phone with the ability to send and receive text messages, and had internet access. Participants were recruited via social media (ie, Facebook and Instagram [Meta]). The study was conducted from October 2023 to January 2024.
Description of the HeartBot Platform
provides an overview of the HeartBot conversation content, and shows the screenshots of the HeartBot conversation. HeartBot conversations occurred over SMS text messaging and started with a brief introduction, informing the user about what to expect in the conversation (). Within 1 conversational episode, HeartBot conversed with participants on 17 content modules, covering topics such as symptoms, risk factors, and treatment of heart attacks. To prioritize participant safety, the introduction message included the following medical emergency notice: “If you are experiencing a medical emergency, please call 911 immediately.” The messages sent by HeartBot () were developed by cardiovascular experts based on the latest guidelines and evidence to ensure full control over the content presented to participants and to minimize the risk of having the system dispense false or misleading information. In addition, we incorporated personalization and empathic responses—key communication features designed to enhance participants’ experience and engagement []. Finally, to ensure readability, the content sent by HeartBot () was evaluated using Flesch-Kincaid readability metrics. This analysis yielded a Flesch reading ease score of 69 and a Flesch-Kincaid grade level of 6.2, indicating that the language used was accessible and comprehensible to a broad audience.
| Message order | Topics |
| 1 | Introduction and greetings |
| 2 | Participants’ name retrieval |
| 3 | Knowledge of heart attacks |
| 4 | Symptoms of heart attacks |
| 5 | Leading cause of death for women in the United States |
| 6 | Gender factors for heart attacks |
| 7 | First action when experiencing symptoms of a heart attack |
| 8 | Importance of calling 911 |
| 9 | Time to seek medical help |
| 10 | Treatment of heart attacks |
| 11 | Action plans while waiting for 911 |
| 12 | Risk factors for heart disease |
| 13 | Female-specific risk factors for heart disease |
| 14 | Racial and ethnic differences in women’s heart disease risk |
| 15 | Multiple-choice questions |
| 16 | Further questions to ask HeartBot |
| 17 | Acknowledgment and conclusion of the conversation |
aWithin a single conversational episode, HeartBot delivers 17 sequential modules developed by cardiovascular experts based on current clinical guidelines. Messages were evaluated for readability using Flesch-Kincaid metrics (reading ease score=69; grade level=6.2).

The current version of HeartBot, deployed to research participants, is based on an established framework that has been used in a large number of chatbots across various domains over several years. HeartBot was built using the Google Dialogflow CX platform connected to Twilio for input and output over SMS text messaging. Dialogflow CX supports the development and deployment of chatbots based on the intents and entities paradigm []. Every user input is categorized as one of a finite number of intents (eg, greeting, correct response, and request for help), and chatbot actions, including what the chatbot says next, are mapped directly to these intents. In HeartBot specifically, this mapping was implemented using a bidirectional encoder representations from transformers neural language model [], which is based on the transformer architecture []; this approach leverages a powerful data-driven method for robustly identifying user intents expressed in free-form text messages. Within each participant’s utterance, one or more entities may be mentioned. Entities are words or phrases that express specific instances of a class. For example, “high blood pressure” expresses a specific instance of the class risk factors. Authoring a chatbot in Dialogflow CX involves the following: (1) defining the intents that the user may express when interacting with the chatbot and which entities the chatbot should recognize within user utterances; (2) listing examples of how these intents and entities can be expressed; and (3) defining the chatbot actions that should be mapped to each intent and, optionally, to mentions of an entity. While the use of a bidirectional encoder representations from transformers language model allows the system to handle natural language input flexibly and robustly, the chatbot’s behavior and responses are limited to content authored specifically for HeartBot conversations. Because each piece of the interaction is encoded directly and explicitly into the chatbot, the resulting interactions are rigid and predictable, varying in response to user utterances only in the ways intended by the system’s designers. On the one hand, this ensures a great level of control over what the chatbot can do, and on the other hand, the chatbot has limited flexibility and lacks any conversational ability beyond its authored content. In addition, the design of HeartBot follows a strict system initiative philosophy: every step of the interaction between the system and the user is initiated by the system. This is in contrast to the user-initiative design embodied in current generative AI chatbots, such as ChatGPT, where the system mainly reacts to or responds to user utterances, and the user is primarily responsible for directing the conversation. HeartBot asks specific questions, and the user responds to them. While HeartBot’s utterances vary based on the user’s responses, for example, by confirming correct responses or offering gentle corrections, the user has very limited control over how the interaction unfolds, and the system is in control of the conversation.
HeartBot’s content library and the inventory of user intents it can handle were designed and refined based on more than 171 interactions where a research staff with domain expertise played the role of the system. These interactions took place over text messages in the same way as the HeartBot interactions, except that participants were not told they were interacting with HeartBot and had no reason to believe their conversational partner was a chatbot. The research staff conducted the conversation initially designed for HeartBot in each of these interactions. The initial version of HeartBot was based on these human-human interactions and was further refined through initial system testing, resulting in the version of HeartBot used to collect the data presented here.
Procedures
Potential participants interested in the study were asked to complete an online screening survey on Research Electronic Data Capture (REDCap; Vanderbilt University) to determine eligibility. Research staff sent an electronic consent (e-consent) form to those meeting all eligibility criteria. Participants who signed the consent form received a baseline REDCap survey. Upon completing the baseline survey, research staff provided instructions with a specific phone number for participants to send SMS text messages to and on how to initiate a conversation with HeartBot. For participants’ safety, research staff closely monitored the participants’ responses during the HeartBot conversations through a back-end data monitoring platform. Research staff sent an online postsurvey link to the participants 4 to 6 weeks after the HearBot interaction. Participants who completed all study requirements received a US $20 Amazon e-gift card.
Measures
We used a previously validated set of questions to assess the potential efficacy of HeartBot in increasing the awareness and knowledge of heart attack risks and symptoms [,]. These items have been applied in earlier studies with women from varied backgrounds to support their relevance across diverse populations [-]. Participants answered four questions before and after interacting with HeartBot, using a 4-point scale, in which 1 indicated “not sure” and 4 indicated “sure”: (1) How sure are you that you could recognize the signs and symptoms of a heart attack in yourself?; (2) How sure are you that you could tell the difference between the signs or symptoms of a heart attack and other medical problems?; (3) How sure are you that you could call an ambulance or dial 911 if you thought you were having a heart attack?; and (4) How sure are you that you could get to an emergency room within 60 minutes after onset of your symptoms of a heart attack? Higher scores indicate a better awareness and knowledge of heart attack risks and symptoms.
To understand how participants engage with HeartBot and identify areas of improvement for HeartBot, we examined participants’ evaluations of HeartBot using the Effectiveness Scale, a semantic-differential scale originally developed based on previous literature [,]. It consists of 5 pairs of opposite adjectives (effective vs. ineffective, helpful vs. unhelpful, beneficial vs. not beneficial, adequate vs. not adequate, supportive vs. not supportive), and each pair is scored on a 7-point Likert scale. 1 indicated the negative pole (eg, “ineffective”) and 7 indicated the positive pole (eg, “effective”). Cronbach’s alpha for this sample was 0.94. In addition, the impression of HeartBot was assessed by using the Anthropomorphism Scale [] consisting of 5 pairs of opposite adjectives (fake vs natural, machine-like vs humanlike, unconscious vs conscious, artificial vs lifelike, rigid vs adaptive). Each pair was rated on a 7-point Likert scale, 1 being the first adjective in the pair (eg, “fake”) and 7 being the second adjective (eg, “natural”). The scores for each pair in the Effectiveness Scale and Anthropomorphism Scale were summed and averaged to create mean composite scores. In this current sample, both measures had excellent reliability (Cronbach α 0.94 and 0.91, respectively).
Statistical Analysis
We conducted a descriptive analysis to summarize sample characteristics, including sociodemographic characteristics, cardiovascular risks, and usability outcomes. To compare pre- and postsurvey results, we used ANOVA for continuous variables and chi-square tests for categorical variables to assess distributional differences. To determine whether participants’ responses to awareness and knowledge questions significantly changed after engaging with HeartBot, we first used Wilcoxon signed-rank tests. Then, to test for differences in pre- to post-intervention awareness and knowledge of heart attack risks and symptoms as outcome responses, we fit an ordinal mixed effects logistic regression model using the R (version 4.1.0; R Foundation for Statistical Computing) [] package ordinal v2022.11.16 [], adjusting for fixed effects of White (vs non-White), age, education, income, family history, past chatbot use, mean text message effectiveness, mean HeartBot impression score, word count, conversation duration, and whether the individual thought they were texting an AI agent, as well as a random effect for individual, across the 92 participants with outcomes measured at the second time point. As a sensitivity analysis, although this model appeared to fit the data well, we also conducted a backward stepwise regression analysis (P<.05; resulting in 2‐5 covariates per outcome) to ensure the model was not overparameterized. Additional sensitivity analyses included using all participants through multiple imputation for missing outcomes (instead of just imputing missing covariates), as well as a complete case analysis (2 of the 92 participants were missing income data). We also attempted a sensitivity analysis by fitting a mixed-effects multinomial logistic regression model using the generalized structural equations command in Stata (version 16.1; StataCorp LLC) []. However, these models did not converge, likely due to the limited sample size and increased number of parameters to estimate compared to an ordinal logistic regression model. To appropriately handle missing data, we used multiple imputation with chained equations with 100 imputations, combined via the usual Rubin rules, using a logistic regression model for dichotomous variables and perfect mean matching for continuous variables, using the R package mice v3.16.0 []; 2 random seeds were run to ensure stability of results. All analyses used 2-tailed tests, and statistical significance was evaluated at P<.05. The awareness and knowledge of heart attack risks and symptoms questions were coded as time-dependent variables at baseline and post-HeartBot interaction.
Ethical Considerations
This study was conducted in accordance with the ethical standards outlined in the Declaration of Helsinki. Institutional review board approval was obtained from the University of California, San Francisco (approval: 23‐39793). All participants provided written informed consent before study enrollment. Participation was voluntary, and participants were informed that they could withdraw at any time without penalty. All collected data were deidentified before analysis, and no personally identifiable information was retained. Data were stored on secure, password-protected servers accessible only to the research team. Participants who completed all study requirements received a US $20 Amazon e-gift card as compensation.
Results
Baseline Sample Characteristics
A total of 104 women completed the online baseline survey. Of these 104 women, 5 (4.8%) did not start the HeartBot interaction, and 7 (6.7%) did not complete the post-REDCap online survey. The sample characteristics of 12 women who did not complete the study did not differ from those who completed the study (P>.05). presents the baseline characteristics of 92 women who completed all study requirements. The mean age was 45.9 (SD 11.9; range: 26-70) years. Out of 92 participants, 55 (59.8%) identified as belonging to racial or ethnic minority groups; 53 (57.6%) had used a chatbot such as Alexa, Google Assistant, and Siri, at least once in the past 30 days; and 13 (14.1%) reported a family history of heart attack.
| Sociodemographic characteristics | Value |
| Age (years), mean (SD) | 45.9 (11.9) |
| Race or ethnicity, n (%) | |
| American Indian | 1 (1.1) |
| Asian | 6 (6.5) |
| Black (non-Hispanic) | 22 (23.9) |
| Hispanic | 19 (20.7) |
| Native Hawaiian | 2 (2.2) |
| White (non-Hispanic) | 37 (40.2) |
| More than 1 race or ethnicity | 5 (5.4) |
| Education, n (%) | |
| Less than high school or did not complete college | 26 (28.3) |
| Completed college or graduate school | 66 (71.7) |
| Household income (US $), n (%) | |
| Less than 40,000 | 21 (22.9) |
| 40,001-75,000 | 30 (32.6) |
| Greater than 75,000 | 39 (42.4) |
| Do not know or decline to answer | 2 (2.2) |
| Marital status, n (%) | |
| Never married | 21 (22.8) |
| Currently married or cohabitating | 59 (64.1) |
| Divorced or widowed | 12 (13) |
| Employment status, n (%) | |
| Full time or part time | 56 (60.9) |
| Unemployed, homemaker, or student | 17 (18.5) |
| Retired, disabled, or other | 19 (20.7) |
| Chatbot use (eg, Amazon’s Alexa, Google Assistant, Siri, and Facebook Messenger bot) in the past 30 days, n (%) | |
| Yes | 53 (57.6) |
| No | 39 (42.4) |
| Self-reported cardiovascular risks, mean (SD) | |
| BMI (kg/m2) | 29.5 (7.1) |
| Smoked at least 1 cigarette in the past 30 days, n (%) | |
| Yes | 14 (15.2) |
| No | 78 (84.8) |
| Blood pressure medication, n (%) | |
| Yes | 25 (27.2) |
| No or do not know | 67 (72.8) |
| Cholesterol medication, n (%) | |
| Yes | 29 (31.5) |
| No or do not know | 63 (68.5) |
| Diabetes medication, n (%) | |
| Yes | 12 (13) |
| No or do not know | 80 (85.9) |
| Family history of heart disease or stroke, n (%) | |
| Yes | 13 (14.1) |
| No or do not know | 79 (85.9) |
HeartBot Interaction
The mean and median duration of HeartBot interactions were 13.0 (SD 7.8) and 10.6 (IQR 8.5-13.9) minutes, respectively (). In addition, a post hoc analysis of HeartBot interaction logs showed that 11% (10/92) conversations had a minor technical issue in which participants were prompted to provide their name twice at the start of the session. This glitch did not interfere with delivering or completing the conversation, and all affected participants successfully completed the HeartBot session. At the end of the HeartBot conversation, 29% (27/92) of the participants submitted at least 1 question to HeartBot. The top two questions that the participants asked were regarding (1) heart disease risks and prevention risk reduction and (2) signs and symptoms of a heart attack and care-seeking behaviors during a heart attack. The participants rated HeartBot as highly effective (mean 5.7, SD 1.2) and as human-like and natural (mean 5.2, SD 1.2), with scores ranging from 1 to 7.
| Interaction metrics | Value |
| Total word count, mean (SD) | 890.1 (78.7) |
| Duration of HeartBot conversation (minutes), mean (SD) | 13.0 (7.80) |
| Number of questions participants asked, n (%) | |
| No question | 65 (70.7) |
| At least 1 question | 27 (29.3) |
| HeartBot effectiveness, mean (SD) | 5.7 (1.2) |
| HeartBot impression, mean (SD) | 5.2 (1.2) |
| “Overall, how would you rate the conversations with your texting partner?” n (%) | |
| Unnatural or very unnatural | 5 (5.4) |
| Neutral | 33 (35.9) |
| Natural or very natural | 54 (58.7) |
| “Overall, how would you rate the messages you received?” n (%) | |
| Incoherent or very incoherent | 0 (0) |
| Neutral | 23 (25.0) |
| Coherent or very coherent | 69 (75.0) |
| “Do you think you texted a human or an AI chatbot during your conversation?” n (%) | |
| Human | 31 (33.7) |
| AI chatbot | 61 (66.3) |
| ”Have you ever heard or read about AI?” n (%) | |
| I would consider myself an expert in that field | 5 (5.4) |
| I could explain well and what AI is about | 27 (29.3) |
| I know somehow what AI is | 41 (44.6) |
| Yes, but I do not know exactly what AI is | 16 (17.4) |
| No | 3 (3.3) |
| “How positive or negative do you feel about the use of AI in health care?” n (%) | |
| Negative or very negative | 13 (14.1) |
| Neutral | 44 (47.8) |
| Positive or very positive | 35 (38) |
| “The use of AI will result in better health care,” n (%) | |
| Strongly disagree or disagree | 17 (18.5) |
| Neither agree nor disagree | 30 (32.6) |
| Agree or strongly agree | 45 (48.9) |
| “The use of AI will result in better health outcomes,” n (%) | |
| Disagree or strongly disagree | 16 (17.4) |
| Neither agree nor disagree | 35 (38.0) |
| Agree or strongly agree | 41 (44.6) |
| “AI may help me reduce my risk of heart disease,” n (%) | |
| Disagree or strongly disagree | 12 (13.0) |
| Neither agree nor disagree | 39 (42.4) |
| Agree or strongly agree | 41 (44.6) |
aParticipant interaction metrics, ratings of HeartBot conversations, and attitudes toward artificial intelligence (AI): measures include conversation length, number of participant-initiated questions, perceived effectiveness, naturalness, coherence, identification of the agent as a chatbot or human, prior AI awareness, and attitudes toward the use of AI in health care and for heart disease prevention.
bAI: artificial intelligence.
Changes in Awareness and Knowledge of Heart Attack Risks and Symptoms
shows the results of the Wilcoxon signed-rank tests for the changes in responses to awareness and knowledge of heart attack risks and symptoms questions between the pre- and post-HeartBot interaction. Participants reported significantly higher awareness and knowledge of heart attack risks and symptoms across all 4 outcome responses between the baseline and the post-HeartBot interaction (P<.05).
shows the results of the ordinal mixed effects regression models for each awareness and knowledge of heart attack risks and symptoms question. Even after controlling for potential confounders (; the full model and sensitivity analysis are provided in ), the HeartBot interaction was significantly associated with improvements in awareness and knowledge of heart attack risks and symptoms, specifically on (1) recognizing the signs and symptoms of a heart attack response (adjusted odds ratio [aOR] 7.10, 95% CI 3.56‐14.15; P<.001), (2) telling the difference between the signs or symptoms of a heart attack (aOR 5.47, 95% CI 2.77‐10.78; P<.001), (3) calling an ambulance or dialing 911 during a heart attack (aOR 5.75, 95% CI 2.86‐11.59; P<.001), and (4) getting to an emergency room within 60 minutes after onset of symptoms (aOR 2.85, 95% CI 1.54‐5.25; P<.001). Results were very similar across all sensitivity analyses (Table S1 in ).
| Pre-HeartBot interaction, n (%) | Post-HeartBot interaction, n (%) | P value | ||
| How sure are you that you could recognize the signs and symptoms of a heart attack in yourself? (please select a number from 1‐4) | <.001 | |||
| Not sure | 24 (26.1) | 3 (3.3) | ||
| Somewhat not sure | 32 (34.8) | 28 (30.4) | ||
| Somewhat sure | 33 (35.9) | 40 (43.5) | ||
| Sure | 3 (3.3) | 21 (22.8) | ||
| How sure are you that you could tell the difference between the signs or symptoms of a heart attack and other medical problems? (please select a number from 1‐4) | <.001 | |||
| Not sure | 28 (30.4) | 8 (8.7) | ||
| Somewhat not sure | 38 (41.3) | 35 (38) | ||
| Somewhat sure | 24 (26.1) | 40 (43.5) | ||
| Sure | 2 (2.2) | 9 (9.8) | ||
| How sure are you that you could call an ambulance or dial 911 if you thought you were having a heart attack? (please select a number from 1‐4) | .02 | |||
| Not sure | 13 (14.1) | 3 (3.3) | ||
| Somewhat not sure | 20 (21.7) | 13 (14.1) | ||
| Somewhat sure | 32 (34.8) | 20 (21.7) | ||
| Sure | 27 (29.3) | 56 (60.9) | ||
| How sure are you that you could get to an emergency room within 60 min after the onset of your symptoms of a heart attack? (please select a number from 1‐4) | <.001 | |||
| Not sure | 17 (18.5) | 6 (6.5) | ||
| Somewhat not sure | 17 (18.5) | 12 (13) | ||
| Somewhat sure | 29 (31.5) | 31 (33.7) | ||
| Sure | 29 (31.5) | 43 (46.7) | ||
aWilcoxon signed-rank test.
| Outcome | Unadjusted OR | Adjusted OR | ||
| OR (95% CI) | P value | OR (95% CI) | P value | |
| Recognize the signs and symptoms of a heart attack | 7.20 (3.63‐14.27) | <.001 | 7.10 (3.56‐14.15) | <.001 |
| Tell the difference between the signs or symptoms of a heart attack and other medical problems | 5.18 (2.66‐10.09) | <.001 | 5.47 (2.77‐10.78) | <.001 |
| Call an ambulance or dial 911 | 5.81 (2.89‐11.68) | <.001 | 5.75 (2.86‐11.59) | <.001 |
| Get to an emergency room within 60 minutes | 2.94 (1.61‐5.38) | <.001 | 2.85 (1.54‐5.25) | <.001 |
aOR: odds ratio.
bModels are adjusted for the fixed effects of race (White vs non-White), age, education, income, family history, prior chatbot use, mean text message effectiveness score, mean HeartBot impression score, word count, conversation duration, and participants’ perception of texting an artificial intelligence agent, as well as a random effect for individual.
Discussion
Principal Findings
This study is among the first to evaluate a fully automated AI chatbot aiming to increase awareness and knowledge about heart attacks among women. All participants who started the HeartBot conversation completed it, and none left the conversation. We posit that the 3 main key factors contributed to HeartBot’s success. First, HeartBot delivered a brief conversational intervention (median 10.33 min), and participants could initiate the intervention anywhere and at any time. This design dramatically reduces participant burden and ensures a potential for large-scale dissemination and effectiveness. In addition, the HeartBot conversation was delivered via SMS text messaging and did not require Wi-Fi access or a smartphone. Nearly all Americans (98%) own a mobile phone [], and using SMS text messaging is an effective way to reach minority populations and ensure intervention equity. Second, HeartBot incorporated personalization and empathetic responses that are known to increase participants’ engagement [-] during conversations. Addressing users by name within conversations could have fostered a sense of individual relevance and strengthened a positive evaluation of HeartBot. Third, the content of HeartBot was developed from scientific evidence and guidelines and used simple nontechnical language with images, ensuring that the educational content was easily understandable and accessible to individuals with low health literacy levels.
Comparison With Prior Work
Previous Hispanic and Black women, and younger age groups, have shown a significant decline in awareness of heart disease as the leading cause of death among women [-]. The latest study reported that the overall awareness dropped from 65% in 2009 to 44% in 2019, with the steepest declines among Hispanic women (28.9%), non-Hispanic Black women (28.1%), and women aged 25 to 34 years (41.3%) []. To address this concern, we ensured that women with racial or ethnic minority backgrounds were well represented in this study sample, and the HeartBot included a content module specifically discussing the increased risk of heart attack in Hispanic and Black women [,]. HeartBot provided unbiased interactions with all participants, reducing potential biases that might arise from human facilitators in traditional in-person interventions, such as implicit judgments [,]. The HeartBot intervention appears to work regardless of age and race or ethnicity. To our knowledge, this is the first AI-driven chatbot intervention specifically designed to increase women’s awareness and knowledge of heart disease. The greatest advantage of the HeartBot intervention over conventional public heart attack awareness campaigns is its ability to promote active learning through personalized and SMS-based dialogue. Traditional heart attack campaigns face several challenges, such as passive information delivery, limited personalization and interactivity, and dissemination gaps. In contrast, personalized chatbots can foster active learning by prompting users to engage in feedback retrieval, self-reflection, and goal setting during interactions [,]. We believed that the inclusion of a follow-up quiz reviewing key content at the end of the session further enhanced active learning throughout the session.
Overall, the findings of this HeartBot trial hold great promise. Although we acknowledge that generative AI models are improving at a rapid pace, we think deploying a hybrid chatbot, which combines data-driven identification of user intents with scientifically vetted conversational content, is essential to prioritize participants’ safety and accuracy of information delivery. Given the purpose of the intervention to increase awareness and knowledge of heart attacks, the current design, which combines a structured content flow with personalized conversational strategies, may be optimal.
Limitations
Several limitations of this study must be acknowledged. First, because this study was not a randomized controlled trial (RCT), causal relationships and definitive efficacy cannot be established. However, we adjusted potential confounding factors in our analysis. Second, relying on a convenience sample of women may introduce selection bias. In addition, only participants who had at least moderate digital literacy, were comfortable with mobile technology, and were interested in their own health might be enrolled in this study, which could have influenced both engagement with the chatbot and the outcomes. To mitigate these section biases, we enrolled a wide age range (ie, 26-70 y) and a diverse sample of women (ie, 55/92, 60% belonging to racial or ethnic minority groups), as well as designed the study to not require Wi-Fi access (SMS text messaging only). Finally, the primary outcomes were assessed using self-reported measures of awareness and knowledge of heart attack risks and symptoms over a short period; behavioral changes were not assessed. Social desirability bias may have led some participants to overreport positive impressions or improvements in knowledge following the HeartBot interaction.
Future Direction
A full-scale RCT is warranted to examine the effectiveness of the HeartBot intervention in a large, racially and ethnically diverse sample. Future research also needs to aim to assess whether increased awareness and knowledge will lead to early access to care when women are experiencing heart attack symptoms. The framework of participants’ engagement and safety metrics must be developed to demonstrate use for clinical practice. Finally, future research should explore the duration of the effects of new awareness and knowledge and whether they lead to desired behaviors in an RCT. To sustain the awareness and knowledge of heart attacks, the HeartBot intervention may require multiple sessions, and the dosage and content of the intervention may need further adjustment.
Conclusions
Interaction with HeartBot was associated with increased awareness and knowledge of heart attack risks and symptoms in a national sample of US women. These findings suggest that AI chatbot–based interventions may be a promising approach to improve women’s knowledge and awareness of heart attack in the United States. An RCT of HeartBot is warranted to establish its efficacy and safety before implementation in clinical settings.
Acknowledgments
This project was supported by the Noyce Foundation and the University of California, San Francisco School of Nursing Emile Hansen Gaine Fund. The project sponsors had no role in the study design; collection, analysis, or interpretation of the data; writing the manuscript; or the decision to submit the manuscript for publication.
Data Availability
The datasets generated or analyzed during this study are available from the corresponding author on reasonable request.
Authors' Contributions
YF had full access to all study data and is responsible for the integrity and accuracy of the data analysis. YF, JZ, HAD, and KS conceptualized and designed the study. YF, DDK, JZ, HAD, and KS developed the chatbot. YF and DDK handled data acquisition, and YF, DDK, and TJH conducted the statistical analysis. YF, DDK, TJH, and KS drafted the manuscript. YF oversaw supervision and funding, while DDK provided administrative, technical, or material support. All authors critically reviewed the manuscript for important intellectual content.
Conflicts of Interest
None declared.
Full regression results for outcome questions. The “Adjusted OR” reflects results from the subset of participants with measured outcomes (n=92), as reported in the paper. The first sensitivity analysis includes the full dataset (n=104), and the second is a complete case analysis (n=90; 2 participants were missing income data). The remaining sensitivity analyses correspond to these 3 models but use backward stepwise regression. Cells show odds ratio (95% CI). Outcomes were as follows: (1) recognizing the signs and symptoms of a heart attack, (2) distinguishing the signs or symptoms of a heart attack from other medical problems, (3) calling an ambulance or dialing 911, and (4) reaching an emergency room within 60 minutes.
XLSX File, 14 KBReferences
- Zhong W, Luo J, Zhang H. The therapeutic effectiveness of artificial intelligence-based chatbots in alleviation of depressive and anxiety symptoms in short-course treatments: a systematic review and meta-analysis. J Affect Disord. Jul 1, 2024;356:459-469. [CrossRef] [Medline]
- He Y, Yang L, Qian C, et al. Conversational agent interventions for mental health problems: systematic review and meta-analysis of randomized controlled trials. J Med Internet Res. Apr 28, 2023;25:e43862. [CrossRef] [Medline]
- Lim SM, Shiau CW, Cheng LJ, Lau Y. Chatbot-delivered psychotherapy for adults with depressive and anxiety symptoms: a systematic review and meta-regression. Behav Ther. Mar 2022;53(2):334-347. [CrossRef] [Medline]
- Kurniawan MH, Handiyani H, Nuraini T, Hariyati RT, Sutrisno S. A systematic review of artificial intelligence-powered (AI-powered) chatbot intervention for managing chronic illness. Ann Med. Dec 2024;56(1):2302980. [CrossRef] [Medline]
- Oh YJ, Zhang J, Fang ML, Fukuoka Y. A systematic review of artificial intelligence chatbots for promoting physical activity, healthy diet, and weight loss. Int J Behav Nutr Phys Act. Dec 11, 2021;18(1):160. [CrossRef] [Medline]
- Noh E, Won J, Jo S, Hahm DH, Lee H. Conversational agents for body weight management: systematic review. J Med Internet Res. May 26, 2023;25:e42238. [CrossRef] [Medline]
- Bendotti H, Lawler S, Chan GC, Gartner C, Ireland D, Marshall HM. Conversational artificial intelligence interventions to support smoking cessation: a systematic review and meta-analysis. Digit Health. Nov 3, 2023;9:20552076231211634. [CrossRef] [Medline]
- Aggarwal A, Tam CC, Wu D, Li X, Qiao S. Artificial intelligence-based chatbots for promoting health behavioral changes: systematic review. J Med Internet Res. Feb 24, 2023;25:e40789. [CrossRef] [Medline]
- Kim HK. The effects of artificial intelligence chatbots on women’s health: a systematic review and meta-analysis. Healthcare (Basel). Feb 23, 2024;12(5):534. [CrossRef] [Medline]
- Lyzwinski LN, Elgendi M, Menon C. Conversational agents and avatars for cardiometabolic risk factors and lifestyle-related behaviors: scoping review. JMIR Mhealth Uhealth. May 25, 2023;11:e39649. [CrossRef] [Medline]
- Benjamin EJ, Muntner P, Alonso A, et al. Heart disease and stroke statistics-2019 update: a report from the American Heart Association. Circulation. Mar 5, 2019;139(10):e56-e528. [CrossRef] [Medline]
- Tsao CW, Aday AW, Almarzooq ZI, et al. Heart disease and stroke statistics-2022 update: a report from the American Heart Association. Circulation. Feb 22, 2022;145(8):e153-e639. [CrossRef] [Medline]
- Go Red for Women. American Heart Association. URL: https://www.goredforwomen.org/en/ [Accessed 2025-09-23]
- Cushman M, Shay CM, Howard VJ, et al. Ten-year differences in women’s awareness related to coronary heart disease: results of the 2019 American Heart Association national survey: a special report from the American Heart Association. Circulation. Feb 16, 2021;143(7):e239-e248. [CrossRef] [Medline]
- Mosca L, Hammond G, Mochari-Greenberger H, Towfighi A, Albert MA. Fifteen-year trends in awareness of heart disease in women: results of a 2012 American Heart Association national survey. Circulation. Mar 19, 2013;127(11):1254-1263. [CrossRef] [Medline]
- Mosca L, Mochari-Greenberger H, Dolor RJ, Newby LK, Robb KJ. Twelve-year follow-up of American women’s awareness of cardiovascular disease risk and barriers to heart health. Circ Cardiovasc Qual Outcomes. Mar 2010;3(2):120-127. [CrossRef] [Medline]
- Mosca L, Ferris A, Fabunmi R, Robertson RM, American Heart Association. Tracking women’s awareness of heart disease: an American Heart Association national study. Circulation. Feb 10, 2004;109(5):573-579. [CrossRef] [Medline]
- Gelles-Watnick R. Americans’ use of mobile technology and home broadband. Pew Research Center. 2024. URL: http://www.jstor.org/stable/resrep63511 [Accessed 2025-09-23]
- Norman DA, Draper SW. User-Centered System Design: New Perspectives on Human-Computer Interaction. Lawrence Erlbaum Associates; 1986. [CrossRef] ISBN: 9780898597813
- Adikari A, de Silva D, Moraliyage H, et al. Empathic conversational agents for real-time monitoring and co-facilitation of patient-centered healthcare. Fut Gen Comput Syst. Jan 1, 2022;126(C):318-329. [CrossRef]
- Williams JD, Kamal E, Ashour M, Amr H, Miller J, Zweig G. Fast and easy language understanding for dialog systems with microsoft language understanding intelligent service (LUIS). Presented at: Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue; Sep 2-4, 2015; Prague, Czech Republic. URL: http://aclweb.org/anthology/W15-46 [Accessed 2025-09-23] [CrossRef]
- Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. arXiv. Preprint posted online on Jun 12, 2017. [CrossRef]
- Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. Presented at: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Jun 2-7, 2019; Minneapolis, MN. [CrossRef]
- Luepker RV, Raczynski JM, Osganian S, et al. Effect of a community intervention on patient delay and emergency medical service use in acute coronary heart disease: the Rapid Early Action for Coronary Treatment (REACT) trial. JAMA. Jul 5, 2000;284(1):60-67. [CrossRef] [Medline]
- Riegel B, McKinley S, Moser DK, Meischke H, Doering L, Dracup K. Psychometric evaluation of the acute coronary syndrome (ACS) response index. Res Nurs Health. Dec 2007;30(6):584-594. [CrossRef] [Medline]
- Fukuoka Y, Lisha NE, Vittinghoff E. Comparing Asian American women’s knowledge, self-efficacy, and perceived risk of heart attack to other racial and ethnic groups: the mPED trial. J Womens Health (Larchmt). Sep 2017;26(9):1012-1019. [CrossRef] [Medline]
- Fukuoka Y, Oh YJ. Perceived heart attack likelihood in adults with a high diabetes risk. Heart Lung. 2022;52:42-47. [CrossRef] [Medline]
- Fukuoka Y, Oh YJ. Perceived risk of heart attack and type 2 diabetes in Hispanic adults with overweight and obesity. J Cardiovasc Nurs. 2022;37(6):E197-E205. [CrossRef] [Medline]
- Liao W, Oh YJ, Feng B, Zhang J. Understanding the influence discrepancy between human and artificial agent in advice interactions: the role of stereotypical perception of agency. Commun Res. 2023;50(5):633-664. [CrossRef]
- Feng B. Testing an integrated model of advice giving in supportive interactions. Hum Commun Res. Jan 2009;35(1):115-129. [CrossRef]
- Bartneck C, Kulić D, Croft E, Zoghbi S. Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int J Soc Robot. Nov 2008;1:71-81. [CrossRef]
- R Development Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing. 2010. URL: https://www.R-project.org [Accessed 2025-09-23]
- Christensen RH. Ordinal - regression models for ordinal data. The Comprehensive R Archive Network. 2024. URL: https://cran.r-project.org/web/packages/ordinal/index.html [Accessed 2025-09-23]
- Stata statistical software: release 16. StataCorp LLC. 2019. URL: https://www.stata.com/ [Accessed 2025-09-23]
- van Buuren S, Groothuis-Oudshoorn K. mice: multivariate imputation by chained equations in R. J Stat Softw. 2011;45(3):1-67. [CrossRef]
- Mobile fact sheet. Pew Research Center. Nov 13, 2024. URL: https://www.pewresearch.org/internet/fact-sheet/mobile [Accessed 2025-09-23]
- Blackston JW, Safford MM, Mefford MT, et al. Cardiovascular disease events and mortality after myocardial infarction among Black and White adults: REGARDS study. Circ Cardiovasc Qual Outcomes. Dec 2020;13(12):e006683. [CrossRef] [Medline]
- Ya’qoub L, Lemor A, Dabbagh M, et al. Racial, ethnic, and sex disparities in patients with STEMI and cardiogenic shock. JACC Cardiovasc Interv. Mar 22, 2021;14(6):653-660. [CrossRef] [Medline]
- Chapman EN, Kaatz A, Carnes M. Physicians and implicit bias: how doctors may unwittingly perpetuate health care disparities. J Gen Intern Med. Nov 2013;28(11):1504-1510. [CrossRef] [Medline]
- FitzGerald C, Hurst S. Implicit bias in healthcare professionals: a systematic review. BMC Med Ethics. Mar 1, 2017;18(1):19. [CrossRef] [Medline]
- Lin MP, Chang D. Enhancing post-secondary writers’ writing skills with a chatbot. Educ Technol Soc. 2020;23(1):78-92. URL: https://www.jstor.org/stable/26915408 [Accessed 2025-09-23]
- Vanichvasin P. Chatbot development as a digital learning tool to increase students’ research knowledge. Int Educ Stud. 2021;14(2):44. [CrossRef]
Abbreviations:
| AI: artificial intelligence |
| aOR: adjusted odds ratio |
| RCT: randomized controlled trial |
| REDCap: Research Electronic Data Capture |
Edited by Andrew Coristine; submitted 09.Jul.2025; peer-reviewed by Reenu Singh, Varun Varma Sangaraju; final revised version received 19.Aug.2025; accepted 31.Aug.2025; published 15.Oct.2025.
Copyright© Yoshimi Fukuoka, Diane Dagyong Kim, Jingwen Zhang, Thomas J Hoffmann, Holli A DeVon, Kenji Sagae. Originally published in JMIR Cardio (https://cardio.jmir.org), 15.Oct.2025.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Cardio, is properly cited. The complete bibliographic information, a link to the original publication on https://cardio.jmir.org, as well as this copyright and license information must be included.

