Leveraging Digital Technology in Conducting Longitudinal Research on Mental Health in Pregnancy: Longitudinal Panel Survey Study

Background Collecting longitudinal data during and shortly after pregnancy is difficult, as pregnant women often avoid studies with repeated surveys. In contrast, pregnant women interact with certain websites at multiple stages throughout pregnancy and the postpartum period. This digital connection presents the opportunity to use a website as a way to recruit and enroll pregnant women into a panel study and collect valuable longitudinal data for research. These data can then be used to learn new scientific insights and improve health care. Objective The objective of this paper is to describe the approaches applied and lessons learned from designing and conducting an online panel for health care research, specifically perinatal mood disorders. Our panel design and approach aimed to recruit a large sample (N=1200) of pregnant women representative of the US population and to minimize attrition over time. Methods We designed an online panel to enroll participants from the pregnancy and parenting website BabyCenter. We enrolled women into the panel from weeks 4 to 10 of pregnancy (Panel 1) or from weeks 28 to 33 of pregnancy (Panel 2) and administered repeated psychometric assessments from enrollment through 3 months postpartum. We employed a combination of adaptive digital strategies to recruit, communicate with, and build trust with participants to minimize attrition over time. We were transparent at baseline about expectations, used monetary and information-based incentives, and sent personalized reminders to reduce attrition. The approach was participant-centric and leveraged many aspects of flexibility that digital methods afford. Results We recruited 1179 pregnant women—our target was 1200—during a 26-day period between August 25 and September 19, 2016. Our strategy to recruit participants using adaptive sampling tactics resulted in a large panel that was similar to the US population of pregnant women. Attrition was on par with existing longitudinal observational studies in pregnant populations, and 79.2% (934/1179) of our panel completed another survey after enrollment. There were 736 out of 1179 (62.4%) women who completed at least one assessment in both the prenatal and postnatal periods, and 709 out of 1179 (60.1%) women who completed the final assessment. To validate the data, we compared participation rates and factors of perinatal mood disorders ascertained from this study with prior research, suggesting reliability of our approach. Conclusions A suitably designed online panel created in partnership with a digital media source that reaches the target audience is a means to leverage a conveniently sized and viable sample for scientific research. Our key lessons learned are as follows: sampling tactics may need to be adjusted to enroll a representative sample, attrition can be reduced by adapting to participants’ needs, and study engagement can be boosted by personalizing interactions with the flexibility afforded by digital technologies.


Introduction
Mental health and mood disorders, such as depression and anxiety, can cause negative outcomes for women [1] and can lead to health and developmental problems for their offspring [2]. A better understanding of perinatal mental health is needed to help families lead healthier lives. To observe the totality of perinatal depression, it is important to include women early in pregnancy and obtain repeated assessments starting at this early stage and into the postnatal period. The challenges to accomplish this include lack of access to pregnant women before they have been assessed in clinical settings, where many pregnancy studies recruit participants, and difficulty maintaining cooperation throughout pregnancy and into the postpartum period.
An additional roadblock when researching perinatal depression is the reluctance of pregnant women to participate in scientific or medical studies, as pregnant women exhibit lower cooperation rates than the general population of women [3]. Concern for the fetus and pregnancy and lack of connection with the research goals contribute to this reduced cooperation [4]. In addition, enrolling a representative pregnant population may be difficult, as research has shown that African American pregnant women are less willing to take surveys associated with medical research; this can challenge researchers to construct and maintain representative samples [3]. It has been shown that building trust is pivotal when conducting research among pregnant women and necessary to increase participation [5].
There have been successful longitudinal cohort studies conducted in Europe and Asia. The Maternal Anxiety in Relation to Infant Development (MARI) Study recruited 483 pregnant women at weeks 10 to 12 from community clinics in Dresden, Germany [6]. The Growing Up in Singapore Towards healthy Outcomes (GUSTO) Study recruited 1247 women during their first clinical visit of pregnancy (ie, <14 weeks) and followed them through birth and to 36 months postpartum [7]. Our study aimed to conduct longitudinal research with a panel that was representative of US women giving birth, starting from week 4 of pregnancy.
BabyCenter was a suitable platform to recruit a large population of pregnant women into a panel that was similar to the profile of pregnant women in the United States. It is a digital resource for pregnancy and parenting information that reaches 3 in 4 pregnant women in the United States [8]. Pregnant women begin accessing the BabyCenter website early in pregnancy, often before their first prenatal visit; over three-quarters of BabyCenter pregnancy website registrations occur during the first trimester, with weeks 4, 5, and 6 of pregnancy seeing the largest percentage of registrations, according to BabyCenter's internal tracking data.
We designed and conducted a comprehensive longitudinal study of perinatal mental health among a large panel of women reflective of all US women giving birth. We administered frequent assessments using electronic patient-reported outcome assessments beginning early in pregnancy and through the postnatal period. The goal was to minimize participant attrition and generate a well-characterized data set to further the knowledge of perinatal mood disorders. The aim of this paper is to demonstrate methods used to recruit pregnant participants into an online panel to ensure we obtained a large representative sample and describe how we reduced attrition. We also describe lessons learned that could improve future online panel recruitment and retention for difficult-to-survey populations.

Recruitment and Enrollment
We conducted a longitudinal study with a population-based sample of pregnant women, aged 18 years and older, in the United States, from early in pregnancy to 12 weeks postpartum. The sampling frame for this work was the BabyCenter website. Additional inclusion criteria for the study were as follows: weeks 4 to 10 of pregnancy (Panel 1) or weeks 28 to 33 of pregnancy (Panel 2) and not currently participating in other research studies.
From August 25 to September 19, 2016, BabyCenter website visitors were selected at random and shown a floating invitation during their website experience (see Figure 1). Invitations used friendly language, a description of incentives for participation, and an altruistic approach, as this has been shown to be a key motivator for pregnant women to participate in research [3]. The recruitment goal was to enroll 1200 participants in a 6-week period. The goal of 1200 participants was determined with consideration to power calculations, anticipated time frames for recruitment, and an effort to sample a similar or larger panel size than had been demonstrated in previous longitudinal studies of pregnancy and mental health. Participants enrolled in the study on their own, without support of study researchers, within the digital survey environment upon completion of a screening and enrollment baseline assessment. They were provided detailed information about the study's timing, protocol, and incentives. Participants' consent was obtained via digital agreement within this same baseline assessment. We had New England Institutional Review Board approval to complete this work.
Recruitment strategies were designed to balance the sample to closely match the demographic profile of US women giving birth as reported by government agencies [9]. To this end, adjusting specific digital sampling parameters either increased or decreased the proportion of participants in certain demographic groups.

Study Content
The baseline assessment included screening questions, health history, demographic profiling, pregnancy health assessment, and information about recent life events. The final assessment, administered at 12 weeks postpartum, measured the birth experience. The study contained a battery of standardized psychometric assessments relevant to the topic of perinatal mood disorder that repeated at set intervals throughout the course of the study, measuring anxiety, stress, and obsessive-compulsive tendencies (see Table 1). The study employed the Edinburgh Postnatal Depression Scale (EPDS), the accepted standard measure of mood in the perinatal period, as the primary indicator of major depressive disorder [10]. We excluded the suicidality item in the EPDS scale due to the study's lack of provision for intervention for women who may have self-identified to be at risk.
There were two iterations of short-form assessments, labeled Mini A and Mini B, and one iteration of a long-form assessment, labeled Full. Each of the three total assessment types contained varied sets of psychometric scales alternating in the study protocol to maximize the types of information collected, provide measurements at regular intervals of 1 to 4 weeks, and reduce monotony and response burden (see Figure 2 and Multimedia Appendix 1).
Panel 1 had the opportunity to complete a total of 15 assessments including the one at baseline, while Panel 2 could complete a total of 8 assessments including the one at baseline.        Assessments were meant to create a panel experience that was enjoyable and stress free. At the beginning of every assessment, respondents were asked two or three pregnancy or parenting lifestyle questions unrelated to the psychometric assessments. These included questions about pregnancy, diet, the baby's sex, and preparation for the baby's arrival. The inclusion of these lifestyle questions was intended to foster participant engagement and counterbalance the serious nature of the psychometric assessments (see Multimedia Appendix 2).
Assessments were optimized for mobile devices for easy viewing and completion of questions. All assessments were administered through the Qualtrics platform, and respondent data were stored in the secure environment of Qualtrics Target Audience, which is currently known as Qualtrics Core XM [11].

Assessment Invitations
Participants received invitations to complete assessment surveys by email. The assessment interval was an established protocol, but the actual date a participant was invited to complete a survey was customized for each participant based on the date of enrollment and the pregnancy week at baseline. We created an application programming interface (API) within Qualtrics that enabled unique protocol dates for each participant. The API distributed automated email invitations, reminders, and incentives. The API deployed reminders as needed, with up to three reminders delivered over the duration of each survey window, which was typically 7 days. This volume and timing of communication was intended to maximize response but not overburden participants with emails.
A challenge when studying a pregnant population into the postnatal period is that the birth date of the baby is an unknown time variable that cannot be pre-established. To address this, as pregnancy progressed into the late third trimester, we invited women to complete a birth survey to confirm the arrival of the baby. Participants received birth survey invitation emails through week 42 of pregnancy. Completing the birth survey initiated a new protocol within the API, with the baby's birth date now serving as the baseline date for initiating the postnatal surveys.

Incentives
Declining participation in epidemiologic studies has necessitated the use of monetary incentives; this is an accepted method to increase cooperation [12]. This study's duration-9 to 11 months for most participants-required an incentive strategy to head off attrition. Participants in Panel 1 had the opportunity to earn a total of US $180 in e-gift cards over the course of the study, and participants in Panel 2 had the opportunity to earn a total of US $125 in e-gift cards over the course of the study. When an incentive was attained, it was fulfilled automatically by the API via email, making it easy for participants to track and redeem their rewards.
We included a second incentive to help maintain participation through the study's end: a sweepstakes to encourage participants to complete the maximum assessments. Separate US $1000 sweepstakes were offered for Panel 1 and Panel 2 participants. A respondent in Panel 1 who completed all 15 assessments would increase their odds of winning by earning 15 entries. A respondent in Panel 2 who completed all 8 assessments would increase their odds of winning by earning 8 entries. The sweepstakes were conducted as a random drawing after the final assessment for each panel concluded. No empirical tests were conducted to measure the impact of incentivization.

Engagement Strategies
As the study progressed, we implemented incremental ways to encourage participation. Texting on mobile devices is the most prevalent means of communication for Americans under 50 years of age [13]. To leverage this behavior, we introduced the option to have text reminders sent to mobile devices as an additional prompt to complete an assessment.
To help participants connect with the study and foster a sense of community, selected pregnancy and lifestyle top-line results were shared periodically with participants in assessment invitations. Results shared included the number of pregnant women actively participating in the study and facts about common pregnancy concerns and behaviors. At the study's end, selected findings were also shared in an article hosted on the BabyCenter website, as participants had told us via feedback survey that they were interested to see what we had learned [14].
We closely monitored participation behaviors to identify chronic nonresponders, defined as participants that did not respond to two or more consecutive assessments. At four strategic intervals over the course of the study, before the more in-depth, longer full assessments were scheduled to deploy, dedicated emails were sent specifically to nonresponders in addition to the standard invitation protocol, asking them to return to active participation and reminding them of the potential to earn new entries into the sweepstakes.

Recruitment
In 26 nonconsecutive calendar days, 476,863 invitation impressions were served, garnering 5843 clicks (1.2% click rate). This rate was typical for the floater intercept recruitment methodology used by BabyCenter as per their internal data. Industry benchmarks for random intercept survey invitations are not readily available, but as proxy, the click rate on a typical website display ad unit in the health category was 0.31% [15]. A 2016 study with a niche user population utilizing Twitter as a recruitment source noted click rates between 0.43% and 0.50% on its targeted study recruitment ads [16].
We manipulated recruitment tactics to achieve a more representative profile of pregnant women. Those recruited on the weekend were more likely to be employed than those recruited during the week. Those recruited with targeting on desktop devices were more likely to be in older age groups, compared to those recruited via mobile devices. We tested the impact of inclusion and exclusion of the monetary incentive during intercept recruitment on the proportions of household income and determined that not mentioning the incentive increased participation among higher-income groups, but skewed the recruitment toward older women with a higher level of education attainment (see Table 2). The sampling approach was fine-tuned based on these learnings to yield the initial baseline sample. Of the 5028 respondents who started the baseline assessment, 1557 completed it and met the inclusion criteria. The most common reasons for disqualification were pregnancy week out of target range, not pregnant, participating in other research, and out of target age range (see Table 3).
A total of 1179 participants met the eligibility requirements, completed the baseline screening survey, and opted to participate. While the panel recruited more quickly than we planned, the panel size was slightly shy of our target, as a few responses showed duplicate email addresses and were removed. This is a risk when using a digital recruitment method and offering gift card incentives. To mitigate this, we instituted email validation, which excluded baseline submissions from previously submitted email addresses, and monitored responses coming from the same IP addresses.
Two panels were recruited. Panel 1, with 858 women, was recruited early in the first trimester at weeks 4 to 10 of pregnancy. The 321 women in Panel 2, were recruited early in the third trimester at weeks 28 to 33 of pregnancy. Panel 2 was included in the event of undue attrition to insure a sufficient sample size in the critical postnatal period for future statistical modeling in health care research.

Participation and Retention
Of the 1179 participants initially enrolled at baseline, 79.2% (934/1179) completed at least one additional assessment, 65.6% (773/1179) informed us about the birth of their child, 63.7% (751/1179) completed one or more assessments in the postpartum period, and 60.1% (709/1179) completed the final assessment in the study. There were 245 out of 1179 women enrolled in the study that did not return to take any additional assessments after baseline (20.8%) (see Table 4). Participation rates for each assessment varied and were impacted by the type of assessment, the incentives offered, and the position in the protocol. Short assessments and long assessments showed similar cooperation rates-64.6% (4669/7222) and 65.0% (3088/4754), respectively-but attributing cooperation to survey length alone cannot be established, as we put more effort into garnering responses to longer surveys.
After closing recruitment for the fifth assessment after baseline (ie, time point [T] 6 [T6]) with a 51.6% (431/835) participation rate (see Table 5), we began aggressively implementing re-engagement strategies starting with the next full survey at T7. Strategies included revising email invitation copy, sending dedicated correspondence to nonresponders, and implementing text reminders.
Completion rate trends point to engagement strategies boosting the total number of assessment surveys completed. Following T6, which had a cooperation rate of 51.6% (431/835), cooperation began to increase, with cooperation rates of 58.5% (490/837) at T7, 59.9% (692/1156) at T8, 59.8% (499/835) at T9, and 64.2% (742/1156) at T10. Among the 370 participants that opted in for text reminders, response rates improved by as much as 40% over the group that did not opt in. Communications sent to nonresponders during pregnancy encouraged 229 nonengaged participants to re-engage with the study and complete future assessments. A portion of these nonresponders may have returned on their own without re-engagement efforts; however, that proportion is unknown.
The attrition of participants after giving birth was expected, as this pivotal event shifts priorities. We were pleased to retain 80.4% (751/934) of the active sample after this life-changing point in time. In fact, the T12 assessment was administered 0 to 5 days after giving birth and achieved a 93.4% (465/498) participation rate. This reaffirmed our confidence in the approach and ability to continue measurement of the pregnancy sample into the postnatal period. In the postpartum period, the length of time that had elapsed from giving birth to responding to the birth survey determined which assessment a respondent was next eligible to complete, which also impacted the number of invitations sent. The invitations sent during the postpartum period were only sent to those women who had confirmed the birth of her child via the birth survey. g All respondents, regardless of birth survey response, were invited to take the final assessment.
Two population-based maternity studies with similar assessment timing allowed for a remedial comparison of participation statistics: the MARI Study, a longitudinal study conducted among pregnant women recruited from community clinics in Dresden, Germany, and the GUSTO Study, which was conducted among families in Singapore recruited during their first clinical visit of pregnancy and then followed through birth and 36 months postpartum [6,7]. In the late-second trimester and early-third trimester assessments, in which the EPDS or similar instruments were administered, the BabyCenter study had a participation rate (529/858, 61.7%) that was within the range of the MARI Study (57.6%) and the GUSTO Study (77.5%). For assessments conducted at approximately 3 to 4 months postpartum, all three studies showed remarkably similar participation rates, ranging from 57.7% (719/1247) for the GUSTO Study to 59.3% (509/858) for the BabyCenter study (see Table 6).

Population Profile
At baseline, the profile of participants was similar to the population of women and births in the United States for age, marital status, presence of children, employment, and ethnicity [9,17]. The study sample had a higher concentration of women who had achieved a college or higher education degree, consistent with an online population [18]. Participants in the study demonstrated lower median household income than the US median [19]. This is potentially a result of the monetary incentives offered.
Attrition that occurred over the course of the study period is not inconsequential for demographic characteristics, with potential impact on mood-related characteristics as well. Participants retained through completion of the final assessment demonstrated a sample profile that differed from the baseline profile. The sample at final assessment showed higher median age, higher household income, higher incidence of marriage, and higher education attainment. This subset also demonstrated a different ethnic makeup, with a higher proportion reporting ethnicity as White, and fewer identifying as African American, Black, or Hispanic (see Table 7). Attrition characteristics are similar to those from other perinatal studies, such as the EDEN study (Etude sur les déterminants pré et post natals précoces du Développement psychomoteur et de la santé de l'ENfant), the mother-child EDEN cohort study based in France [20].
Participants completing the final assessment showed similar characteristics for number of babies, type of birth, and birth week. Table 8 shows the birthing profile of participants determined during the final assessment. Prefer not to answer a The National Center for Health Statistics (NCHS) reports births by the following age ranges of the mother: Under 15, [15][16][17][18][19], and 20-24 years; the BabyCenter study reports births by the mother's age starting at 18 years. b N/A: not applicable. The survey instruments in this study permitted respondents to opt out of providing personal information by selecting Prefer not to answer. NCHS reports characteristics for the entire population.

Data Set Validation
We investigated the factor structure of the psychometric scales and compared these to previously published results. The EPDS measurement of Panel 1 at baseline, despite exclusion of the suicidality item, was similar in structure to published results from the Postpartum Depression: Action Towards Causes and Treatment (PACT) Consortium, with three analogous factors of mood disorder: depressed mood, anxiety, and anhedonia (see Table 9) [21]. The Obsessive-Compulsive Inventory was noted to be remarkably similar in structure to the published version (see Multimedia Appendix 3) [22].

Participant Feedback
After completing the final assessment, we offered participants the opportunity to provide feedback about their overall experience via a survey. Overall, 61.0% of participants active in the postpartum period (459/752) provided feedback.
Of those who responded to this feedback survey, 98.3% (451/459) were satisfied or very satisfied with their experience participating in the study, 86.7% (398/459) felt the incentives were very fair, 91.5% (420/459) said the number of questions in each survey was the right amount, and 89.5% (411/459) said the number of emails received in relation to the study was the right amount. We note that nonresponse bias in this assessment may not be inconsequential, as nonresponders to the feedback survey were less engaged with the study; overall, they completed 18% fewer assessments than responders in the postpartum period.

Overview
In this paper, we showed that it is possible to recruit a large and representative sample of pregnant women into an online panel via the BabyCenter website. We implemented a range of methods to keep participants active and reduce attrition. Our panel provided high-quality data that can now be used to learn new insights into mental health during and shortly after pregnancy.

Lessons Learned
In this study we demonstrated that leveraging digital methods to measure a niche population over a length of time to collect a longitudinal data set is both viable and logical, as digital methods afford the following: 1. Ability to reach a specific population with a digital media partner. 2. Capability to recruit a large convenience sample into an online panel in a short period of time. 3. Capacity to readily adjust recruitment strategies to help construct a more representative panel profile. 4. Tools to automate and optimize otherwise tedious processes when collecting repeated measures (ie, API). 5. Flexibility to easily introduce additional retention elements as needed. 6. Means to execute longitudinal data collection for the validation of existing knowledge and the advancement of scientific study.
We were able to recruit a large and representative sample of pregnant women into an online panel during a 26-day period.
The key recruitment lessons learned were as follows: 1. Partner with a website that is known to interact with the required population. 2. Adapt the demographic sampling parameters to get a representative population. 3. Use friendly language in the advert's invitation copy that focuses on altruism. 4. Employ email or IP and time stamp validation to reduce duplicate and invalid participants. 5. Offer an initial incentive at enrollment that is fair but not overly generous to encourage legitimate enrollment.
The study duration was as long as 9 to 11 months from early pregnancy. Our online panel captured a baseline survey and one follow-up survey for approximately 80% of respondents and had similar attrition to previous longitudinal panel studies. The methods we used to reduce attrition were as follows:

Limitations
During the recruitment period, although the study invitations served on BabyCenter were randomized, there is no way to determine the characteristics of site visitors that chose not to click on the invitation. This is due to the anonymity of intercepting in a digital environment and online data privacy issues. To address this limitation, extra care was taken to monitor the composition and characteristics of the panel at all stages.
When using a digital-only methodology without the human-to-human contact that is often part of a clinical study approach with pregnant women, attrition is likely to be problematic. Of the participants who did not complete an additional assessment after baseline, attrition occurred disproportionally within Panel 1. Recruitment of Panel 1 participants occurred very early in pregnancy, at 4 to 10 weeks, when rates of pregnancy loss and false positives can be as high as 20%. Although we did receive participant-initiated requests to opt out, it is likely that a portion of women who experienced pregnancy loss or false positives did not notify us and did not return to complete another assessment. We had no alternative means to contact these women.
It is also realistic to assume that the incentive for completing the baseline assessment, a US $25 e-gift card, was sufficient reward for some women who chose not to continue in the study. We hypothesize that a smaller reward at enrollment may have extended the period needed to recruit the target number of participants but resulted in higher cooperation rates.
As stated, the study design did not include direct contact between participants and researchers, unless an inquiry was initiated by the participant. This was intentional but created another limitation. We chose not to include the suicidality item in the EPDS scale, confining the measurement and analysis to only 9 of the 10 standard items. Without the appropriate means to support women that may have expressed an inclination toward self-harm, we chose to exclude it. We provided links to suicide prevention and mental health resources in the study materials. We do not believe the omission of suicidality measurement has hampered achievement of the overall study objective but does create an unknowable gap in the data set.
Digital surveys may offer the advantage of increased accuracy with the convenience and anonymity they afford. Results from one perinatal depression study demonstrated that responses submitted by mail showed higher EPDS scores compared to responses collected by phone [23]. Another investigation found that women preferred to complete the EPDS assessment in the more comfortable environment of their own home versus in a clinical setting, in which interacting with a researcher impacted how women responded [24]. Testing this hypothesis was not within the scope of our study.
There are challenges to contextualizing results with other studies. To our knowledge, longitudinal studies from pregnancy to the postpartum period conducted exclusively online have not been published. Comparing a perinatal sample to population studies of different nonmaternal targets is problematic due to the nature of the birth of a child, a pivotal component of attrition.
It is difficult to compare the participation rates of this study to prior perinatal depression research due to the inclusion in our study of women early in pregnancy at 4 to 10 weeks of gestation, and the fact that many other studies were conducted with patients recruited later in their pregnancies in clinical settings. That said, two other population-based longitudinal studies of perinatal depression with similar assessment time frames showed comparable retention rates at about 3 to 4 months postpartum.

Conclusions
Recruiting participants into an online panel from a trusted digital media source and administering a well-designed study exclusively in an online environment can successfully be utilized for scientific research. We approached this study with a focus on maximizing engagement, reducing attrition, and building trust with participants, which resulted, to the best of our knowledge at the time, in the collection of the largest, most comprehensive longitudinal data set to date measuring perinatal mood disorders from early pregnancy.