This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Pediatrics and Parenting, is properly cited. The complete bibliographic information, a link to the original publication on https://pediatrics.jmir.org, as well as this copyright and license information must be included.
Parents commonly experience anxiety, worry, and psychological distress in caring for newborn infants, particularly those born preterm. Web-based therapist services may offer greater accessibility and timely psychological support for parents but are nevertheless labor intensive due to their interactive nature. Chatbots that simulate humanlike conversations show promise for such interactive applications.
The aim of this study is to explore the usability and feasibility of chatbot technology for gathering real-life conversation data on stress, sleep, and infant feeding from parents with newborn infants and to investigate differences between the experiences of parents with preterm and term infants.
Parents aged ≥21 years with infants aged ≤6 months were enrolled from November 2018 to March 2019. Three chatbot scripts (stress, sleep, feeding) were developed to capture conversations with parents via their mobile devices. Parents completed a chatbot usability questionnaire upon study completion. Responses to closed-ended questions and manually coded open-ended responses were summarized descriptively. Open-ended responses were analyzed using the latent Dirichlet allocation method to uncover semantic topics.
Of 45 enrolled participants (20 preterm, 25 term), 26 completed the study. Parents rated the chatbot as “easy” to use (mean 4.08, SD 0.74; 1=very difficult, 5=very easy) and were “satisfied” (mean 3.81, SD 0.90; 1=very dissatisfied, 5 very satisfied). Of 45 enrolled parents, those with preterm infants reported emotional stress more frequently than did parents of term infants (33 vs 24 occasions). Parents generally reported satisfactory sleep quality. The preterm group reported feeding problems more frequently than did the term group (8 vs 2 occasions). In stress domain conversations, topics linked to “discomfort” and “tiredness” were more prevalent in preterm group conversations, whereas the topic of “positive feelings” occurred more frequently in the term group conversations. Interestingly, feeding-related topics dominated the content of sleep domain conversations, suggesting that frequent or irregular feeding may affect parents’ ability to get adequate sleep or rest.
The chatbot was successfully used to collect real-time conversation data on stress, sleep, and infant feeding from a group of 45 parents. In their chatbot conversations, term group parents frequently expressed positive emotions, whereas preterm group parents frequently expressed physical discomfort and tiredness, as well as emotional stress. Overall, parents who completed the study gave positive feedback on their user experience with the chatbot as a tool to express their thoughts and concerns.
ClinicalTrials.gov NCT03630679; https://clinicaltrials.gov/ct2/show/NCT03630679
Caring for infants can lead to parental anxiety and psychological distress especially for first-time parents and particularly within the first 6 months after birth [
Besides experiencing initial stress directly after birth, parents need to adapt to the new situation after hospital discharge (or at home) and develop confidence in caring for their newborns themselves. These adjustments and care transition from a medical facility to home may be associated with increased stress and loss of sleep. Although sleep disturbance is most commonly associated with the early postpartum period, parents may continue to experience disturbed sleep for some months after birth [
Web-based interventions for mental health have shown some success in conditions such as depression and anxiety [
In this study, chatbot technology was used to provide an interactive conversation platform to engage parents of newborn infants who were recently discharged from hospital in the areas of parental stress and sleep, and infant feeding. To our knowledge, there have been no studies published on the use of a chatbot as an interactive conversational tool for parents to provide information in these subject areas. The objective of this study is to explore the feasibility and usability of chatbot technology to gather real-life, in-home conversation data on 3 domains (parental stress, sleep, and infant feeding) from parents with newborn infants and investigate the differences between parents of preterm and term infants in these 3 domains using these conversation data.
This observational study was conducted from November 2018 to March 2019. Participants were recruited from a tertiary referral maternity hospital in Singapore. The study was approved by the SingHealth Centralised Institutional Review Board, Singapore, and registered at ClinicalTrials.gov (NCT03630679).
The study population comprised parents aged ≥21 years with healthy infants who were ≤6 months of age and had been discharged from the hospital at the time of enrollment. Eligible parents had to be proficient in the English language, have in-home access to a reliable internet connection, own a tablet or a mobile device suitable for electronic communication and assessment, and be able to comply with the required study tasks. Nonsingleton infants or those known to have current or previous illnesses or conditions which might interfere with the study outcome or who were participating in any other clinical studies were excluded. Parents with a past or present history of mental illness, single parents, or parents who had any acute or chronic illnesses or who were assessed by the investigators to be unable or unwilling to comply with the study protocol requirements were excluded. Written informed consent was obtained from all eligible parents.
Participants for this observational study were screened based on the above inclusion and exclusion criteria. After providing informed consent, eligible participants were given access to download the ClaimIt app (ObvioHealth), which provided access to electronic questionnaires (eQuestionnaires) and the study chatbot, on to their mobile devices. ClaimIt is a commercially available mobile app for data collection in virtual or hybrid research studies that require no or minimal use of physical study sites. Participants completed an electronic Screening eQuestionnaire in ClaimIt to confirm their eligibility for enrollment. The study population included 2 groups: “preterm” (parents of preterm infants at gestational age <37 weeks) and “term” (parents of term infants at gestational age ≥37 weeks).
ClaimIt was made available to participants so they could perform specific study-related tasks and receive study information. The participants were given instructions via the ClaimIt app on how to use the chatbot and were asked to interact with the chatbot at least 3 times a week over a maximum 28-day period. The chatbot is an interactive conversational app that was built as a component of the ClaimIt app specifically for this study. The chatbot conversed with users through an online platform. The chatbot was programmed using scripts to respond appropriately whenever a user initiated a conversation. The chatbot scripts included open-ended and closed-ended (multiple-choice) questions and responses. There were 3 conversation scripts, 1 for each of the 3 domains of interest, which included stress, sleep, and feeding (
Participants also received notifications on the first day of each week to remind them to complete the required number of interactions with the chatbot at their convenience. Reminder notifications were triggered on the first day of each week for the participant to complete 3 interactions over the week. Study compliance was monitored by the study team and principal investigator, and contact with the participants was made electronically, and if needed, by telephone. All study data were collected via the ClaimIt app running on participants’ mobile devices. Transcripts of the chatbot conversations were accessed and reviewed by the study team.
Each participant completed the Usability eQuestionnaire in the ClaimIt app at the end of the study (
A sample size of 40 participants was planned to permit reporting of descriptive summary statistics for the categorical and quantitative response data collected using the chatbot. The expected dropout rate was 25%. If this threshold was exceeded despite the investigators’ efforts to contact participants who were lost to follow-up, a maximum of 10 additional participants could be enrolled to replace the participants who dropped out. Completed chatbot interactions from participants who dropped out were included in the conversation analysis.
Each raw chatbot conversation was processed by separating open-ended responses from responses to closed-ended questions and suitably coded open-ended questions (
Workflow for conversation data processing and semantic analysis by latent Dirichlet allocation (LDA) topic modeling.
We used the latent Dirichlet allocation (LDA) [
Besides LDA, other natural language processing methodologies that have been explored for topic modeling include latent semantic analysis/indexing (LSA/LSI) [
LSA/LSI is nonprobabilistic and relies on a mathematical procedure, known as singular value decomposition [
Albalawi et al [
Chatbot conversations were analyzed independently for the stress, sleep, and feeding domains. As with online apps, the user-generated texts in this study were often limited in length. Therefore, the average conversation length was increased by merging multiple conversations collected from the same participant over the study period into a single conversation for each domain (
The first preprocessing step was to eliminate stop words (ie, those that do not carry information about topics). Stop words for the English language [
Stemming of the remaining words in conversations was performed using the Gensim library [
For each domain, 8 modeling sessions were performed, with the number of latent topics to be extracted set to a value from 2 through 9. Thus, a model was created for each setting (2 through 9 latent topics extracted). To obtain each model, we performed 10 learning runs by randomly changing the value of the random seed used to initialize the LDA procedure (ie, allocating a word to a topic), while the number of training passes (to determine the probability of the word belonging to a topic) was set at 100 for all runs. The best models for each domain could not be unequivocally identified based on a perplexity measure [
As a topic is a probability distribution over the entire dictionary of the corpus, only words with the highest probability values were deemed to be representative of the semantic meaning for that topic. We chose the 3 highest probability words within a topic to be most representative of the semantic concept associated with that topic. In simpler terms, one can think of these 3 highest probability words as the most frequently used words within that topic.
A total of 48 parents were screened. Of these, 45 participants were enrolled in the study. This included 5 participants with term infants who were recruited to replace participants withdrawn from the study due to noncompliance. There were 45 infants (23 females, 51%; 20 preterm and 25 term infants). In all, 19 participants withdrew from the study: 13 (68%) participants failed to complete at least 5 interactions, 4 (21%) were withdrawn at the investigator’s decision, and 2 (11%) withdrew consent. A total of 26 participants, 13 in each group, completed the study (
All parents (n=45) were female. The mean age of the participants was 31.7 (SD 4.3) years while their infants were a mean 1.1 (SD 1.3) months old (
Participant flowchart.
Characteristics of participants and chatbot responses.
Characteristic | Term (N=25) | Preterm (N=20) | Total (N=45) | ||||
Female gender (parent), n (%) | 25 (100) | 20 (100) | 45 (100) | ||||
Female gender (infant), n (%) | 13 (52) | 10 (50.0) | 23 (51) | ||||
Age of parents (years), mean (SD) | 31.1 (4) | 32.5 (5.0) | 31.7 (4) | ||||
Age of infants (months), mean (SD) | 1.1 (1) | 1.2 (1) | 1.1 (1) | ||||
|
|||||||
|
Stress domain | 126 | 133 | 259 | |||
|
Sleep domain | 125 | 132 | 257 | |||
|
Feeding domain | 130 | 137 | 267 | |||
Interactions (all 3 domains), n | 125 | 131 | 256 | ||||
|
|||||||
|
Stress domain | 17 | 22 | 39 | |||
|
Sleep domain | 17 | 22 | 39 | |||
|
Feeding domain | 18 | 22 | 40 |
aLDA: latent Dirichlet allocation.
bWithin each of the 3 domains, conversations belonging to the same participant were merged into a single conversation. Completed chatbot interactions from participants who dropped out were included in the conversation analysis.
Of the 45 parents enrolled, 26 completed the study and the usability eQuestionnaire. Responses from these 26 participants (on a 5-point Likert scale; 1=very difficult, 5=very easy) showed that the chatbot was rated as “easy” to use (mean 4.08, SD 0.74). Preterm and term group parents who completed the study rated it similarly (preterm: mean 3.9, SD 0.86; term: mean 4.2, SD 0.60). The ClaimIt app was also perceived as “easy” to use (mean 4.19, SD 0.85) by both the preterm and term group parents (preterm: mean 4.0, SD 1.0; term: mean 4.38, SD 0.65).
Parents were “satisfied” with the chatbot (mean 3.81, SD 0.90; 1=very dissatisfied, 5=very satisfied) and also with the ClaimIt app (mean 3.81, SD 0.80). Participants in the preterm group registered between “neutral” and “satisfied” with the chatbot (mean 3.62, SD 0.96) and ClaimIt (mean 3.69, SD 0.85) app. Higher mean scores were observed in the term group for the chatbot (mean 4.0, SD 0.82) and also the ClaimIt (mean 3.92, SD 0.76) app.
The preterm group felt that the length of interactions was between “long” to “neutral” (mean 2.92, SD 1.19; 1=too long, 5=easily manageable), while the term group felt that the length of interactions was between “manageable” and “easily manageable” (mean 4.31, SD 0.48). Furthermore, 46% (6/13) of the preterm parents and 23% (3/13) of the term parents experienced technical issues when using the chatbot.
Overall, participants were not worried about sharing their information (mean 4.04, SD 1.08; 1=very worried, 5=not at all worried) and were likely to use the chatbot again (mean 3.35, SD 0.75; 1=not at all likely, 5=very likely). Parents in both the term and preterm groups were generally not worried about data sharing and reported between “neutral” and “likely to use” chatbot technologies again to provide input on similar topics.
Conversations from the 45 enrolled parents were analyzed. Parents with preterm infants reported emotional stress more frequently compared to parents with term infants (33 vs 24 occasions). Parents with term infants reported physical stress more frequently compared to parents with preterm infants (30 vs 10 occasions). When the cause of stress was not directly linked to their infants, parents with term infants reported stressors on more occasions (27 vs 18 occasions for the preterm group). Common stressors experienced by both preterm and term parents were breastfeeding, work, and relationships. Only parents of term infants reported breast-related issues (7 occasions).
In general, parents perceived their sleep quality to be satisfactory although the preterm group reported good sleep slightly less frequently than did the term group (
Among parents who gave their infants breast milk, the most commonly reported feeding frequency was 8 to 11 times per day. This was true for both the preterm and term group. Among parents who gave their infants infant formula, the most commonly reported feeding frequency was 4 to 7 times per day in both the preterm and term groups. Feeding problems, such as irregular feeding, were more frequently reported by preterm group parents than by the term group parents (8 vs 2 occasions, respectively).
Rating of overall sleep quality by term and preterm group parents.
Open-ended responses to the conversation scripts from the 45 enrolled participants were used for the semantic analysis. Due to the limited length of the raw conversations, conversations belonging to the same participant were merged into a single conversation. This resulted in 39 conversations for the stress domain (17 term, 22 preterm) with an average conversation length of 27.4 words, 39 conversations for the sleep domain (17 term, 22 preterm) with an average conversation length of 28.5 words, and 40 conversations for the feeding domain (18 term, 22 preterm) with an average conversation length of 16.9 words (
For the stress and sleep domains, in each LDA-derived model, the top 3 most representative words for each topic were found to be consistent across the 10 learning runs performed. For the feeding domain, topic composition across the 10 learning runs was characterized by a high degree of variability; that is, the top 3 most representative words of each topic varied across learning runs. Thus, an optimal and reproducible set of topics could not be learned from the conversations in the feeding domain. This could be due to the shorter length of feeding conversations compared with conversations from the stress and sleep domains.
For all 3 domains, models with 3 or 4 semantic topics were identified by human experts as being the most interpretable. The semantic topics for the stress (4 topics) and sleep (3 topics) domains inferred using the LDA topic modeling are shown in
Three most representative words for each topic learned from conversations in the stress domain.
Three most representative words for each topic learned from conversations in the sleep domain.
In
When the distribution of conversations over the 4 topics was calculated for each group (
Topic prevalence in stress domain conversations from term and preterm group parents.
Within the sleep domain, topic 1 appeared to be linked to breastfeeding, topic 2 to feeding in generic terms, and topic 3 to feeding using a milk pump (
For this domain, the average conversation length was shorter than for the other 2 domains, resulting in a smaller feeding conversation data set. As a result, an optimal and reproducible set of topics could not be learned for the feeding domain. It is nonetheless interesting to note that feeding-related words and topics dominated the content of conversations collected for a different domain, sleep (
This study collected real-life, in-home data on parental stress, sleep, and infant feeding from parents of preterm and term infants using a chatbot. Participants who completed the study were satisfied with their online interactions with the chatbot and found the chatbot easy to use. Importantly, they were not worried about sharing such information through an interactive tool and were willing to use the chatbot to provide input on similar topics in the future. This finding helps to validate the use of chatbots on mobile devices as a convenient and accessible means of supporting parents of newborn infants and collecting data on topics that are important for the health and well-being of both infants and parents.
For the stress domain, the top conversation topic extracted from the semantic analysis showed strong positive emotions among parents with term infants. The other topics captured mixed feelings of moderate well-being and being tired, as well as general discomfort. Parents with preterm infants were more likely to express experiences of physical discomfort and tiredness through representative topic words like “pain,” “breast,” and “sleep.” The semantic analysis thus revealed a state of high physical stress in parents of preterm infants. In addition, they also reported emotional stress more frequently compared with term group parents. Similar experiences have been reported in earlier studies [
An interesting insight from our semantic analysis of chatbot conversations on sleep was that the 3 most frequent topics of conversation for all parents (both the term and preterm groups) were related to feeding. This observation implies that parents intuitively linked feeding activities with their inability to have adequate rest. This could be explained by the need to feed their infants at regular intervals over the day and night. Indeed, the most commonly reported frequency of feeding was 8 to 11 times per day for breast milk and 4 to 7 times per day for infant formula. The close links between feeding and sleep revealed by semantic analysis adds another dimension to the closed-ended responses on sleep. Although both groups reported satisfactory sleep quality overall, preterm group parents reported good sleep quality slightly less frequently. Preterm group parents also reported feeding problems, such as irregular feeding, on more occasions than did term group parents.
A total of 11 out of 45 enrolled participants (24%) were withdrawn from the study due to noncompliance (failing to complete the required number of chatbot interactions). For some participants, there were delays (up to 29 days) between enrolment and their first interaction with the chatbot. These delays could possibly be due to the stress experienced by parents and additional responsibilities of caring for a newborn at home after discharge. Although reminder notifications were sent on day 1 of each week, the next notification was only triggered on day 4 if the participant had not started a chat by that point. The high rates of noncompliance could be an indication of limited usability; for this reason, results for the usability questionnaire (answered by completers only) are presented descriptively and without attempting to perform statistical testing. Implementation of earlier and more frequent reminder notifications may improve participant compliance with chatbot interactions. Manual reminders via phone and external messaging platforms (WhatsApp and email) were implemented during the study to improve compliance and were well received. These reminders could be implemented in future work, along with further optimization of the technical performance of the mobile app and chatbot, to improve overall user experience and engagement in providing real-time data.
There were variations in word patterns believed to convey similar constructs that could pose some problems for completely unsupervised analysis. For example, in the stress and sleep scripts, participants were asked about how they were feeling and gave answers such as “good,” “not bad,” “doing well,” “god,” and “hood”. Intuitively, “good,” “not bad,” and “doing well” could be interpreted as saying that the person who responded felt “good,” However, without appropriate manual preprocessing, words such as “god” and “hood” might not be appropriately handled by the LDA algorithm. The conversation length was increased by merging multiple conversations to improve the efficiency of the LDA algorithm as discussed earlier. For the feeding domain, the average merged conversation length (16.9 words) was much shorter than for the other 2 domains (27-28 words). This resulted in a smaller feeding conversation data set and may explain why a reproducible set of topics could not be learned for this domain. Future studies should seek to validate the findings of this exploratory work with larger conversation data sets both in terms of the number or length of conversations and the number of participants. Additional topic modeling methods for short-text data could also be explored to improve handling of short or variable conversation length.
Although the 3 chatbot scripts (stress, sleep, and feeding) collected a large breadth of information, the depth of information was limited. The scripts explored the immediate concerns of parents and their high-level daily activities, but further studies are required to gain deeper insights. Future work could expand the scope of the chatbot to examine conversation topic patterns associated with other infant or family characteristics such as single or multiple births, parental age or age group, number of primary caregivers, or differences between first-time parents and those with more than one child. If data from different geographical regions can be collected, it may also be of interest to the explore similarities and differences among parents in different regions.
Our study shows that the application of machine learning to open-ended conversations elicited by a chatbot can provide additional insights beyond those provided by closed-ended questionnaire responses or descriptive statistics. Appropriately guided by human expert interpretation, unsupervised classification approaches such as LDA can reveal links or topics of interest within conversation data that may not have been anticipated. In addition, it has been suggested that conversational agents such as chatbots also help fulfil other emotional needs [
In this study, a chatbot was successfully used to collect real-time, open-ended conversation data on parental stress, sleep, and infant feeding. Using machine learning, our analysis of semantic patterns revealed differences between preterm and term group parents in conversation topic prevalence, notably for the stress domain. Positive emotions were more often expressed by parents with term infants, whereas parents with preterm infants more frequently expressed feelings of discomfort and tiredness, suggesting they were experiencing higher levels of stress. Topics involving infant feeding dominated the content of sleep domain conversations. Taken together with the results for self-reported sleep quality and feeding problems, these links between sleep and infant feeding suggest that preterm parents could have been more affected by poorer sleep related to frequent feeding or feeding problems. Overall, there was positive feedback from parents who completed the study on the usability experience of the chatbot as a tool to express their thoughts and concerns.
Chatbot scripts for stress, sleep, and feeding.
Usability eQuestionnaire for the chatbot and ClaimIt app.
Preprocessing of conversations to extract corpus for the latent Dirichlet allocation.
electronic questionnaire
intensive care unit
latent Dirichlet allocation
latent semantic analysis/indexing
nonnegative matrix factorization
probabilistic latent semantic analysis
The study sponsor (Danone Nutricia Research) was responsible for study design, data collection, data analysis, and the decision to submit the manuscript for publication. Danone Nutricia Research provided the funding to conduct this study. Editorial support was provided by Tech Observer Asia Pacific and funded by Danone Nutricia Research.
JW contributed to study conception and design, analysis and interpretation of data, and manuscript drafting and revision.
ACF contributed to study conception and design, analysis and interpretation of data, and manuscript drafting and revision.
ST contributed to analysis and interpretation of data as well as manuscript drafting and revision.
EA contributed to analysis and interpretation of data as well as manuscript drafting and revision.
RMVE contributed to study conception and design, analysis and interpretation of data, and manuscript drafting and revision.
CMC contributed to the study conception and design. She was responsible for the study implementation, acquisition of the study resources including participant recruitment, data collection, and analysis and data interpretation. She also contributed to the drafting and revision of the manuscript.
All authors contributed to the writing and critical revision of the manuscript for important intellectual content and approved the final version for publication.
ACF was affiliated with Danone Nutricia Research, Precision Nutrition D-lab, Singapore, at the time the work was performed. JW was affiliated with Danone Nutricia Research, Precision Nutrition D-lab, Singapore, at the time the work was performed. ST was affiliated with Danone Nutricia Research, Precision Nutrition D-lab, Singapore, at the time the work was performed and is currently affiliated with Cytel Singapore Private Ltd. EA was affiliated with Danone Nutricia Research, Precision Nutrition D-lab, Singapore, at the time the work was performed and is currently affiliated with NLYTICS Pte. Ltd, Singapore. RMvE was affiliated with Danone Nutricia Research at the time the work was performed and is currently affiliated with Emma Children’s Hospital, Amsterdam University Medical Center, The Netherlands; and Nutrition4Health, Hilversum, The Netherlands. CMC has no conflicts of interest to declare.