medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
1
1
Subtyping of common complex diseases and disorders by integrating heterogeneous
2
data. Identifying clusters among women with lower urinary tract symptoms in the LURN
3
study
4
5
Victor P. Andreev 1*, Margaret E. Helmuth1, Gang Liu2, Abigail R. Smith1, Robert M. Merion1,
6
Claire C. Yang3, Anne P. Cameron4, J. Eric Jelovsek 5, Cindy L. Amundsen5, Brian T. Helfand6,
7
Catherine S. Bradley 7, John O. L. DeLancey 4, James W. Griffith8, Alexander P. Glaser6, Brenda
8
W. Gillespie9, J. Quentin Clemens4, H. Henry Lai10, and the LURN Study Group
9
10
1
Arbor Research Collaborative for Health, Ann Arbor, Michigan, United States of America
11
2
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor,
12
Michigan, United States of America
13
3
14
America
15
4
Department of Urology, University of Michigan, Ann Arbor, Michigan, United States of America
16
5
Department of Obstetrics and Gynecology, Duke University, Raleigh, North Carolina, United
17
States of America
18
6
Department of Urology, North Shore University, Evanston, Illinois, United States of America
19
7
Department of Obstetrics and Gynecology, University of Iowa, Iowa City, Iowa, United States
20
of America
21
8
22
States of America
23
9
24
America
25
10
Department of Urology, University of Washington, Seattle, Washington, United States of
Department of Medical Social Sciences, Northwestern University, Chicago, Illinois, United
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of
Department of Surgery, Washington University, St Louis, Missouri, United States of America
26
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
2
27
*Corresponding Author
28
Email: victor.andreev@arborresearch.org
29
30
31
Short Title: Subtyping complex disorders by clustering heterogeneous data
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
3
32
ABSTRACT
33
We present a novel methodology for subtyping of persons with a common clinical symptom
34
complex by integrating heterogeneous continuous and categorical data. We illustrate it by clustering
35
women with lower urinary tract symptoms (LUTS), who represent a heterogeneous cohort with
36
overlapping symptoms and multifactorial etiology. Identifying subtypes within this group would
37
potentially lead to better diagnosis and treatment decision-making. Data collected in the Symptoms of
38
Lower Urinary Tract Dysfunction Research Network (LURN), a multi-center prospective observational
39
cohort study, included self-reported urinary and non-urinary symptoms, bladder diaries, and physical
40
examination data for 545 women. Heterogeneity in these multidimensional data required thorough and
41
non-trivial preprocessing, including scaling by controls and weighting to mitigate data redundancy, while
42
the various data types (continuous and categorical) required novel methodology using a weighted
43
Tanimoto indices approach. Data domains only available on a subset of the cohort were integrated using
44
a semi-supervised clustering approach. Novel contrast criterion for determination of the optimal
45
number of clusters in consensus clustering was introduced and compared with existing criteria.
46
Distinctiveness of the clusters was confirmed by using multiple criteria for cluster quality, and by testing
47
for significantly different variables in pairwise comparisons of the clusters. Cluster dynamics were
48
explored by analyzing longitudinal data at 3- and 12-month follow-up. Five distinct clusters of women
49
with LUTS were identified using the developed methodology. The clinical relevance of the identified
50
clusters is discussed and compared with the current conventional approaches to the evaluation of LUTS
51
patients. Rationale and thought process are described for selection of procedures for data
52
preprocessing, clustering, and cluster evaluation. Suggestions are provided for minimum reporting
53
requirements in publications utilizing clustering methodology with multiple heterogeneous data
54
domains.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
4
55
INTRODUCTION
56
Complex diseases, such as obesity, diabetes, atherosclerosis, Alzheimer’s disease, major
57
depressive disorder, and cancer result from multiple genetic, epigenetic, and environmental factors, and
58
importantly, interactions between these factors [1-3]. Important differences between patients with
59
these complex diseases and disorders may exist at multiple levels, including: (a) symptoms, subjective
60
experiences, and adaptive behaviors; (b) characteristics of the physical state of the organism, including
61
comorbidities; (c) characteristics of organs and systems; and (d) characteristics at the cellular or
62
molecular level. The extent of these differences suggests that some complex diseases and disorders are
63
better represented by subtypes, each of which can potentially have different etiologies, mechanisms,
64
and outcomes, and require different approaches to treatment. Subtype identification is therefore a
65
potentially important aspect to understanding and treating complex diseases and disorders. Specifically,
66
extracting patient characteristics at each level and grouping patients based on these characteristics
67
allows for comprehensive clinical phenotyping and provides necessary information for discovery and
68
implementation of personalized treatments. Characteristics at each level are represented by variables of
69
different types (continuous, categorical, and binary), scales, and different level of relevance or impact
70
for the disease of interest. This data heterogeneity requires thoughtful preprocessing and novel
71
approaches to data integration.
72
Lower urinary tract symptoms (LUTS) is a general term representing a heterogeneous group of
73
symptoms with sometimes unclear etiology, high economic and social costs, and significant effects on
74
patients’ quality of life. LUTS can include frequent urination during day and night (nocturia), urinary
75
urgency (a sudden urge to urinate), stress and urgency urinary incontinence (UI), and bladder emptying
76
symptoms such as straining, hesitancy (delay to start to urinate), weak urine stream, and post-void
77
dribbling. None of the symptoms is pathognomonic for a particular diagnosis, and many persons have
78
more than one symptom. The prevalence of LUTS in the United States (US) ranges between 45% and
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
5
79
70% and increases with age [4-5]. Given the aging population in the US, the prevalence of LUTS is
80
expected to increase in the coming years [6]. Medical expenditures for LUTS have been reported to be
81
as high as $65 billion per year [7], yet many therapeutic options for the treatment of LUTS do not
82
provide long-term symptom relief. Many patients experience a combination of symptoms, so treatment
83
that focuses on a single symptom may result in suboptimal care. To improve treatment outcomes for
84
patients with LUTS, it is necessary to sort out the heterogeneity of this population, increase the
85
understanding of different subtypes of LUTS and their underlying mechanisms.
86
One way to better understand a complex disease or disorder is to use an unbiased, data driven
87
unsupervised clustering approach to identify subtypes of individuals with the disease. This method uses
88
data to identify groups, or clusters, such that members of each cluster are as similar as possible to
89
others within their cluster, but as different as possible from those in other clusters [8]. Subtypes
90
identified in this manner are based on similarities and differences within the data, remaining agnostic to
91
clinical definitions or diagnostic categories.
92
Clustering methodologies have generated important contributions to the analysis of health-
93
related data and are becoming increasingly valuable tools as the field of precision medicine progresses.
94
These methods represent a burgeoning field of research [9-15]. Many unsupervised classification or
95
clustering methods have been developed, from commonly used k-means clustering, hierarchical
96
clustering, and self-organizing maps (SOM) [16-18] to algorithms developed in specific areas for specific
97
applications [19-21]. An important problem in unsupervised clustering is determining the optimal
98
number of clusters. Currently available criteria include Calinski-Harabasz, Davies-Bouldin, Dunn, Gap,
99
Silhouette indices, and others [22-26]; however, the optimal number of clusters may vary by the
100
criterion applied. Therefore, another critical issue is to validate the clustering results, that is, to gain
101
confidence about the clinical significance of the putative clusters, both in terms of cluster numbers and
102
cluster assignments.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
6
103
Disease subtyping by clustering homogeneous “-omics” data of certain types, mostly gene
104
expression data, has been extensively published [27-30]. More recent studies have subtyped complex
105
diseases by clustering heterogeneous data that included patient health questionnaires and other clinical
106
data. These research studies include subtyping asthma by using questionnaires, physiological tests, and
107
lab tests [31]; subtyping type 2 diabetes by using body mass index (BMI), age at onset of diabetes,
108
homoeostasis model assessment estimates, and insulin resistance [32]; and subtyping sepsis using
109
demographics, vital signs, biomarkers of inflammation, and organ dysfunction or injury [33].
110
Resampling based consensus clustering initially proposed for clustering gene expression data
111
[34] is gaining popularity and has been used for subtyping complex diseases using heterogeneous data
112
[33, 35]. Multiple random resampling of patients followed by k-means clustering generates probabilities
113
that each pair of patients appear in the same cluster, which can be treated as a pairwise distance
114
between patients and used for determination of cluster membership through hierarchical clustering
115
[34]. This method ensures the clustering results are robust to sampling errors.
116
Previous studies aiming to identify subtypes of LUTS in an unbiased manner include
117
Epidemiology Urinary Incontinence and Comorbidities (EPIC) and Boston Area Community Health (BACH)
118
projects [36-37], which performed clustering of LUTS patients based on a relatively small number of self-
119
reported symptom data in community-dwelling cohorts. Another study used only BMI and bladder diary
120
variables for clustering community-dwelling women with LUTS [38].
121
The Symptoms of Lower Urinary Tract Dysfunction Research Network (LURN) Observational
122
Cohort Study is a multi-center study that collected self-reported symptoms, 3-day bladder diaries,
123
physical examination, neuroimaging and sensory testing data, and biological samples in over 1000 care-
124
seeking men and women across six tertiary care centers [39-40]. In our previous study [41], we
125
performed clustering of 545 women from the LURN Observational Cohort Study using baseline self-
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
7
126
reported urinary symptom data, captured with the LUTS Tool [42-43] and the American Urological
127
Association Symptom Index (AUA-SI) [44]. Four distinct clusters were identified. Women in cluster F1
128
(n=138) were continent, but reported post-void dribbling, frequency, and voiding symptoms. Cluster F2
129
(n=80) reported urgency urinary incontinence, as well as urinary urgency and frequency, and minimal
130
voiding symptoms. Cluster F3 (n=244) included women reporting all types of urinary incontinence,
131
urgency, frequency, and mild voiding symptoms. Women in cluster F4 (n=83) reported all LUTS at
132
uniformly high levels. These subtypes of LUTS were based solely on the above two questionnaires and
133
require further refinement, followed by clinical verification.
134
The current report describes the methodology and results of refining the female LUTS symptom-
135
based clusters by integrating multiple data domains collected in LURN: demographics, non-urinary
136
symptoms, history, and physical examination data, as well as intake and voiding patterns captured in 3-
137
day bladder diaries. We discuss preprocessing of heterogeneous data and combination of continuous
138
and categorical data using our novel weighted Tanimoto indices approach. We use resampling-based
139
consensus clustering [34] combined with a modified semi-supervised clustering approach [45-46] to
140
make use of data available on only a subset of participants. Then we determine the optimal number of
141
clusters using our novel contrast criterion (CC) developed for consensus clustering, and compare it with
142
other consensus clustering criteria: proportion of ambiguous clustering (PAC) [47], and consensus score
143
(CS) [35], as well as with the established quality of clustering criteria, such as Calinski-Harabasz, Davies-
144
Bouldin, Dunn, and Silhouette [22-26]. We identify distinct clusters of women with LUTS and show
145
superiority of these clusters to our published symptom-based clusters [41], in terms of the percentage
146
of significantly different variables in pairwise comparisons of the clusters and the confidence level in the
147
determined cluster membership. Dynamics of the clusters in 12-month follow-up, as well as clinical
148
relevance of the clusters, are discussed.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
8
149
In the Methods section, we describe the analytical pipeline we developed and used for
150
subtyping women with LUTS. We provide the rationale for our choices of methods for data
151
preprocessing, integration, clustering, and cluster evaluation, as well as review of other available
152
options. We demonstrate that the developed pipeline allowed for identification of distinct and robust
153
refined clusters, with a higher percentage of significantly different variables across the clusters than
154
those previously published, and validate cluster distinctiveness by analyzing biomarker data. Finally, we
155
review the methodological information needed to assess subtyping via clustering and propose a set of
156
reporting requirements that should ideally be included in all clustering reports. We finish by calling for
157
the clustering community effort to develop minimum requirements for clustering publications. We
158
believe this paper would be of interest for clinicians and researchers involved in subtyping of common
159
complex disease and disorders using heterogeneous and multidomain data.
160
MATERIALS AND METHODS
161
LURN data used for subtyping of LUTS
162
Data on women with LUTS
163
Data for LUTS subtyping were obtained from the LURN Observational Cohort Study [39,40],
164
which included 545 women seeking care for LUTS at six tertiary care centers. Baseline data collection
165
included demographic information, medical history, physical examination findings, 3-day bladder diaries
166
[48], and self-report questionnaires of urologic and non-urologic symptoms. Urologic symptoms were
167
collected using the LUTS Tool [42-43] and the AUA-SI [44]. The LUTS Tool contains 44 items, including
168
questions on the frequency of occurrence and degree of bother for each urinary symptom. Possible
169
answers to the LUTS Tool questions were ranked from zero to four, zero indicating absence of the
170
symptom, and four indicating the most severe level of the symptom. The AUA-SI has eight items,
171
including a single overall bother question. Responses to the first seven questions of the AUA-SI range
172
from zero to five, zero indicating “none” or “not at all”, and five indicating “almost always”. The final
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
9
173
question in the AUA-SI ranges from zero (delighted) to six (terrible). Participants also completed patient-
174
reported outcome (PRO) questionnaires from the Patient-Reported Outcomes Measurement
175
Information System (PROMIS). We used questionnaires related to bowel function (PROMIS
176
gastrointestinal constipation, diarrhea, and bowel incontinence subsets) [49], psychological health
177
(PROMIS Depression and Anxiety Short Forms [50], Perceived Stress Scale [51], PROMIS Sleep
178
Disturbance Short Form [52]), urologic pain (Genitourinary Pain Index [GUPI]) [53], and pelvic floor
179
function (Pelvic Floor Distress Inventory [PFDI]) [54]. Demographics included: age, race, ethnicity,
180
employment status, education level, and marital status. Physical examination data included: weight,
181
waist circumference, BMI, post-void residual volume, pelvic organ prolapse (using the Pelvic Organ
182
Prolapse Quantification [POP-Q] system that measures the location of selected landmarks on the vagina
183
and cervix [55]), and presence of pathology findings at the introitus, urethra, vagina, uterus, or rectum.
184
Medical history data included Functional Comorbidity Index (FCI) [56], additional individual
185
comorbidities, as well as information on history of urinary tract infections (UTI), pregnancy, vaginal
186
deliveries, alcohol, smoking, recreational drug use, and medication use. Bladder diaries included data on
187
timing and volume of each beverage intake and urinary void during a 72-hour period. Completeness and
188
accuracy of bladder diaries collected in LURN are described in [57]. Only 193 women (35%) returned
189
bladder diaries deemed complete. For clustering purposes, we used the following five bladder diary
190
variables from those 193 women: number of intakes and voids, total volumes of intakes and voids, and
191
maximum voided volume (serving as a proxy for bladder capacity).
192
In total, 185 variables were used for subtyping women with LUTS; 27 demographic variables, 55
193
medical history variables, 33 physical exam variables, 52 urinary symptoms variables (LUTS Tool and
194
AUA-SI), 13 non-urinary PRO variables, and 5 bladder diary variables. Variables were continuous (n=83)
195
or categorical (n=102). Supplemental Table S1 presents an overview of these variables for 545 women
196
with LUTS used in the analysis. Although all variables were deemed important or possibly important in
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
10
197
subtyping persons with LUTS, not all the variables are of equal importance, relevance, and non-
198
redundancy; therefore, scaling and weighting of the variables was implemented as described below.
199
Data on non-LUTS controls
200
Our preprocessing procedure, described in more detail in the next section, includes scaling of
201
variables by standardizing their values using means and standard deviations (SDs) in non-LUTS controls.
202
The LURN study included 55 control women, who were not necessarily healthy but did not report LUTS.
203
Unfortunately, not all the variables of interest were collected for these non-LUTS controls (e.g., physical
204
examination, bladder diary). As a source of bladder diary data for non-LUTS controls, we used bladder
205
diaries of 32 non-LUTS controls from the Establishing Prevalence of Incontinence (EPI) community study
206
of women in Southeastern Michigan [58]. For other variables of interest not collected for non-LUTS
207
controls, we used population data from literature sources indicated in Supplemental Table S1.
208
Biological samples collected and analyzed in LUTS cases and non-LUTS controls
209
The LURN study collected numerous biological samples, including whole blood, serum, plasma,
210
and urine at baseline and at 3- and 12-month follow-up visits [39-40]. Of these samples, 230 baseline
211
serum samples of women with LUTS and 30 serum samples of non-LUTS controls were analyzed using
212
the targeted proteomics approach – Proximity Extended Assay (PEA) by Olink Proteomics (Uppsala,
213
Sweden). Three Olink panels (cardio metabolic, inflammation, neurology) were used to quantify
214
abundances of 276 proteins. These data were not used for clustering in the current report; however,
215
they served as an additional orthogonal approach for evaluation of the quality of the identified clusters.
216
We compared the abundances of 276 proteins in women with LUTS and in controls and tested for
217
significantly differentially abundant proteins in each of the identified clusters versus controls adjusted
218
for multiple comparison using the false discovery rate (FDR) correction (FDR<0.05) [59]. Note that assays
219
were performed in a subset (n =230, 42%) of women.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
11
220
221
Ethical guidelines and consent
The authors confirm all relevant ethical guidelines have been followed, and all research has
222
been conducted according to the principles expressed in the Declaration of Helsinki. Informed consent
223
has been obtained from participants. Institutional Review Board (IRB) approval has been obtained from:
224
Ethical and Independent Review Services (E&I) IRB, an Association for the Accreditation of Human
225
Research Protection Programs (AAHRPP) Accredited Board, Registration #IRB 00007807.
226
Overview of the clustering pipeline
227
The clustering pipeline implemented in this paper contains multiple steps of data preprocessing,
228
integration, clustering, and cluster evaluation. In this subsection, we present an overview of the
229
sequence of steps in the pipeline shown in Figure 1. Details and rationale for each of the steps are
230
provided in the rest of the subsections of the “Materials and Methods” section below. The pipeline is
231
implemented using MATLAB (MathWorks, Natick, MA) and is publicly available through the Dryad
232
repository (https://datadryad.org).
233
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
12
234
235
Figure 1. Flowchart of the pipeline for subtyping of a common complex disease or disorder by
integrating heterogeneous data as used for subtyping of LUTS.
236
237
238
Data preprocessing
239
Multiple imputation
240
Due to missing data (up to 10% in self-report questionnaires of urologic and non-urologic
241
symptoms), multiple imputation was performed. The imputation used a sequential regression technique
242
and was implemented using IVEware version 2.0 [60-61]. Ten imputed data sets were constructed, and
243
each was preprocessed as described below. The k-means step of the resampling-based consensus
244
clustering was performed on each of the data sets separately, resulting in ten pairwise probabilities of
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
13
245
being in the same cluster for each pair of participants. Finally, hierarchical clustering on the mean of ten
246
probabilities (pairwise distances) was performed to determine cluster membership.
247
Scaling of continuous variables
248
Scaling variables prior to clustering is an important and often overlooked step. Clustering
249
algorithms group objects in a way that minimizes the sum of the pairwise distances between
250
participants within the cluster, where the distance is composed of the distances between all variables
251
calculated using Euclidian, Manhattan, or other suitable metrics [9]. Since each variable is measured
252
using its own scale, the distances and optimal partition of the objects depend on these scales. As stated
253
in [62], the problem with unscaled, unstandardized data is the inconsistency between cluster solutions
254
when the scale of some of the variables is changed, which is a strong argument in favor of
255
standardization. It is especially important in the case of heterogeneous data, where scales of variables in
256
the raw data can be very different and completely unrelated. A common form of conversion of variables
257
to standard scores (or z-scores) entails subtracting the mean and dividing by the SD for each variable.
258
However, subtracting the cohort mean and dividing by the cohort SD would mask the subtype
259
differences, since it ignores whether the within-cohort variance is caused by the natural biological
260
variability of the subjects or by differences in disease subtypes, which increase the within-cohort
261
variance due to multimodal distributions along certain variables. Using z-scores along such variables will
262
unduly reduce their weight and will mask the presence of the subtypes. Therefore, standardization using
263
z-scores is not suitable for our task of identifying disease subtypes. The solution to this problem is
264
standardization by the mean and SD of a reference population that does not have the disease of
265
interest, in this case, controls without LUTS. Following this approach, we define standardized variables
266
𝑆𝑆𝑖𝑖𝑖𝑖 [63]:
267
268
𝑆𝑆𝑖𝑖𝑖𝑖 = (𝐴𝐴𝑖𝑖𝑖𝑖 − 𝐴𝐴𝑖𝑖𝑖𝑖 )/𝜎𝜎𝑖𝑖𝑖𝑖
(1),
where 𝐴𝐴𝑖𝑖𝑖𝑖 –ith unstandardized variable for participant n, 𝐴𝐴𝑖𝑖𝑖𝑖 -mean value of ith variable across control
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
14
269
subjects without disease of interest, 𝜎𝜎𝑖𝑖𝑖𝑖 – SD of ith variable in control subjects without disease of
270
interest. A simulated example illustrating the benefits (substantially lower misclassification error) of
271
clustering using variables standardized according to equation (eq.) 1, versus unstandardized variables
272
and z-scores, is presented in Supplemental Material text.
273
Weighting variables to mitigate the redundancy
274
Clustering results can be skewed by including variables reflecting redundant information. An
275
obvious and extreme case example will be including into the data set the same or highly correlated
276
variables multiple times, which will result in the dominating role of these variables in the overall sum of
277
squared distances, and therefore in the clustering decision. To mitigate this, we used weighting, so that
278
the highest weight was attributed to the least correlated variable (i.e., the variable with the smallest
279
average correlation with all other variables) and the lowest weight to the most correlated variable [41,
280
64]. The weights wi were defined by equations (2-4):
281
wi =
282
ci =
1
1 + ci / cw
m
∑r
ij
j =1, j ≠ i
m
283
cw = ∑
(2),
/(m − 1)
m
∑r
ij
i =1 j =1, j ≠ i
/ m(m − 1)
(3),
(4),
284
where m -number of variables and rij Pearson correlation coefficients of variables i and j .
285
Row normalization for continuous variables
286
Initial attempts to cluster un-normalized urinary symptoms data led to identification of two
287
clusters that differed by overall severity of LUTS, not subtypes of symptoms. To avoid clustering
288
predominantly by the overall severity of LUTS, in [41, 64] and here, we normalized the data by the
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
15
289
participant's overall severity of disease. For each participant, the weighted Euclidean length of the
290
vector composed of all 78 continuous variables used for clustering was calculated as 𝐿𝐿𝑖𝑖 =
291
th
th
th
2
�∑78
𝑖𝑖=1( 𝑤𝑤𝑖𝑖 ∙ 𝑆𝑆𝑖𝑖𝑖𝑖) , where 𝑆𝑆𝑖𝑖𝑖𝑖 is the scaled i variable for the n participant and 𝑤𝑤𝑖𝑖 is the weight of i
293
variable defined by eq. (2-4). Each continuous variable was then normalized by 𝐿𝐿𝑖𝑖 , the Euclidean length
294
strategy allowed for clustering based on the direction rather than the length of the vector representing
295
each subject.
296
Consensus clustering using continuous variables
292
297
of the participant’s vector, resulting in normalized continuous variables 𝑉𝑉𝑖𝑖𝑖𝑖 = 𝑤𝑤𝑖𝑖 ∙
𝑆𝑆𝑖𝑖𝑖𝑖
𝐿𝐿𝑖𝑖
. This normalization
Clustering was performed using a resampling-based consensus clustering method introduced by
298
Monti et al [34]. We performed 1000 instances of random resampling, each selecting a subset including
299
80% of participants. The same procedure was repeated for each of the ten multiply imputed data sets,
300
resulting in 10,000 subsets. We then partitioned each of the subsets into clusters using a k-means
301
clustering algorithm implemented as k-means MATLAB function (with option ‘number of replicates’ =8,
302
see [63] for the explanation on the need of this option); with number of clusters K scanned from 2 to 12.
303
Let 𝑄𝑄𝑖𝑖𝑛𝑛 denote the number of times participants 𝑛𝑛 and 𝑞𝑞 were assigned by k-means into the same
304
305
306
307
308
cluster. Let 𝐼𝐼𝑖𝑖𝑛𝑛 denote the number of times participants 𝑛𝑛 and 𝑞𝑞 were both selected in the random
sampling. The probability of participants 𝑛𝑛 and 𝑞𝑞 belonging to the same cluster could be calculated as
𝑄𝑄𝑖𝑖𝑛𝑛 /𝐼𝐼𝑖𝑖𝑛𝑛. We could thus obtain ten probabilities for a pair of participants from the ten imputed data
sets. The average of these probabilities represented the final consensus index 𝑀𝑀𝑖𝑖𝑛𝑛for participants
𝑛𝑛 and 𝑞𝑞. A 545 by 545 consensus matrix 𝑀𝑀 (Figure 2) was constructed to visualize these average
309
probabilities as a heat map. Probability is color-coded: bright yellow represents probability close to one
310
and dark blue probability close to zero. The indices of participants were reordered so that the
311
participants belonging to the same clusters were grouped together. To reorder the indices of
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
16
312
participants in the consensus matrix, we employed hierarchical clustering (using clustergram MATLAB
313
function) with 1 − 𝑀𝑀 as distance matrix so that participants belonging to the same clusters were
314
315
316
317
grouped together, depicted as bright yellow blocks along the diagonal of consensus matrix.
Figure 2. Consensus matrix. Consensus matrix is presented as a heat map, where the probabilities
𝑀𝑀𝑖𝑖𝑛𝑛for each pair of participants to be in the same cluster are shown by color-coded elements; bright
yellow represents probability close to one and dark blue probability close to zero.
318
319
320
Clustering using categorical variables
321
Of 185 variables used for clustering women with LUTS, 102 (55%) are categorical. K-means is not
322
an appropriate method for categorical variables, so resampling-based consensus clustering with multiple
323
runs of k-means algorithm, which we used for continuous variables, cannot be directly used for
324
categorical variables.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
17
325
K-prototype approach
326
One way to combine continuous and categorical variables is to use the k-prototype algorithm
327
introduced by Z. Huang [65]. According to [65], the distance between two objects 𝑋𝑋, 𝑌𝑌, described by p
328
329
continuous variables and m-p binary variables, is represented as:
𝑝𝑝
𝑑𝑑(𝑋𝑋, 𝑌𝑌) = ∑𝑗𝑗=1(𝑥𝑥𝑗𝑗 − 𝑦𝑦𝑗𝑗 ) 2 + 𝛾𝛾 ∑𝑚𝑚
𝑗𝑗=𝑝𝑝+1 𝛿𝛿(𝑥𝑥𝑗𝑗 , 𝑦𝑦𝑗𝑗 )
(5),
330
where δ is Kronnecker symbol describing simple matching and γ is the weight introduced in [65] to
331
“avoid favoring either type of attribute” (continuous vs. categorical). The goal of the k-prototype
332
algorithm is to minimize the sum of the distances (defined by eq. 5) between objects within the cluster.
333
Limitations of the k-prototype algorithm include lack of scaling of categorical variables and use of the
334
same weight γ for all categorical variables, regardless of their relevance to the disease of interest or
335
their redundancy. A later version of the algorithm [66] attempted to overcome some of these limitations
336
by defining the distance between an object 𝑋𝑋𝑖𝑖 and the centroid 𝑍𝑍𝑘𝑘 of the kth cluster as:
337
338
339
𝑝𝑝
𝑑𝑑(𝑍𝑍𝑘𝑘 ,𝑋𝑋𝑖𝑖 ) = ∑𝑗𝑗=1(𝑥𝑥𝑖𝑖𝑗𝑗 − 𝑍𝑍𝑘𝑘𝑗𝑗 )2 + ∑𝑚𝑚
𝑗𝑗=𝑝𝑝+1 𝜑𝜑 �𝑍𝑍𝑘𝑘𝑗𝑗 ,𝑥𝑥 𝑖𝑖𝑗𝑗 �
(6),
1, 𝑖𝑖𝑖𝑖 𝑍𝑍𝑘𝑘𝑗𝑗 ≠ 𝑥𝑥𝑖𝑖𝑗𝑗
�
𝑖𝑖𝑘𝑘𝑘𝑘𝑘𝑘
�
, and 𝐶𝐶𝑘𝑘 is the number of objects in cluster k, while 𝐶𝐶𝑘𝑘𝑗𝑗𝑘𝑘 is
where 𝜑𝜑 𝑍𝑍𝑘𝑘𝑗𝑗 ,𝑥𝑥 𝑖𝑖𝑗𝑗 = �
, 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑤𝑤𝑖𝑖𝑒𝑒𝑒𝑒
1−
𝑖𝑖𝑘𝑘
the number of objects in this cluster with the categorical value 𝑎𝑎𝑗𝑗𝑘𝑘 of the jth attribute, e.g., participants
340
with blue eyes in cluster k . Such definition of distance makes sense; for instance, if the number of
341
participants in cluster k is 100, and 51 of them have blue eyes, then for a person with brown eyes,
342
distance from the centroid along this dimension is 1, but for a person with blue eyes, it is 1-0.51=0.49.
343
Now, if 99 participants have blue eyes, then for them, the distance from the centroid is 1-0.99=0.01,
344
while for the only one with brown eyes, it is still 1. Thus, such a definition of distance makes certain
345
attributes more important (defining) for the cluster if the majority of objects have the same value of this
346
attribute. An algorithm using such a definition of distance between objects would strive to make clusters
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
18
347
as homogeneous as possible with regard to both its continuous and categorical variables. This approach,
348
however, does not distinguish between categorical variables relevant and irrelevant to the disease of
349
interest.
350
To distinguish between relevant and irrelevant variables, we suggest using the same approach
351
as for continuous variables, i.e., compare them with controls without the disease or symptom complex
352
of interest. For the categorical variables transformed into binary variables, we suggest scaling by
353
comparison of the frequencies of these binary variables in LUTS 𝐹𝐹𝑗𝑗 and in controls 𝐹𝐹𝑗𝑗𝑖𝑖 by using the
354
following function:
355
𝛾𝛾𝑗𝑗 = �log (
356
357
358
𝐹𝐹𝑘𝑘
𝐹𝐹𝑘𝑘𝑗𝑗
)�
(7),
where |𝑥𝑥| is absolute value of 𝑥𝑥.
If, for a certain binary variable, frequencies in LUTS and non-LUTS controls are equal, e.g.,
prevalence of blue eyes is the same in LUTS and non-LUTS, then this variable will get weight 𝛾𝛾𝑗𝑗 = 0 and
359
would not affect clustering decision, essentially excluding the variable from clustering. However, if the
360
361
frequency of this variable in LUTS is higher or lower than in controls, then 𝛾𝛾𝑗𝑗 > 0, and this, relevant to
362
the variables based on their correlation with other variables described by eq. (2-4), one needs to modify
363
eq. (6):
364
𝑑𝑑(𝑍𝑍𝑘𝑘 ,𝑋𝑋𝑖𝑖 ) = ∑𝑗𝑗=1 𝑤𝑤𝑗𝑗 (𝑥𝑥𝑖𝑖𝑗𝑗 − 𝑍𝑍𝑗𝑗𝑘𝑘)2 + ∑𝑚𝑚
𝑗𝑗=𝑝𝑝+1 𝑤𝑤𝑗𝑗 𝜑𝜑�𝑍𝑍𝑘𝑘𝑗𝑗 ,𝑥𝑥 𝑖𝑖𝑗𝑗 � 𝛾𝛾𝑗𝑗
365
Unfortunately, none of the available implementations of k-prototype algorithm in standard software (R
366
and SAS [67-68]) easily allows for such modification, and therefore, an alternative simpler approach was
367
used.
disease variable, will affect clustering decisions. To accommodate this scaling together with weighting of
𝑝𝑝
(8)
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
19
368
369
Weighted Tanimoto indices approach
A simpler approach to clustering categorical variables is based on Tanimoto indices or Tanimoto
370
similarity measure [69]. For two objects a and b described by m binary variables, Tanimoto similarity is
371
defined as:
∑𝑚𝑚
𝑘𝑘=1 𝑎𝑎𝑘𝑘∙𝑏𝑏 𝑘𝑘
372
𝑇𝑇 =
373
For instance, if a and b are 5-dimensional binary vectors a = [1, 1, 0, 0, 0] and b = [1, 0, 1, 0, 0],
(9)
∑𝑚𝑚
� 2 2
�
𝑘𝑘=1 𝑎𝑎𝑘𝑘 +𝑏𝑏 𝑘𝑘 −𝑎𝑎𝑘𝑘∙𝑏𝑏 𝑘𝑘
374
then T=1/(2+2-1)=1/3. Note that common “ones” but not common “zeros” are counted in this definition
375
of similarity, which is especially useful in case of multiple binary variables formed from one categorical
376
variable. Think, for example, of the categorical variable ‘eye color’ transformed into several binary
377
variables: ‘blue eyes’ (yes, no), ‘brown eyes’ (yes, no), ‘green eyes’ (yes, no), etc. Tanimoto similarity
378
between two persons with blue eyes will not depend on whether you add ‘hazel eyes’ to the list of
379
options or not.
380
Not all of the categorical variables are equally relevant to the disease of interest, so we want to
381
be able to assign weights reflecting the level of relevance for each variable by comparing its frequency in
382
LUTS with its frequency in non-LUTS controls. We also want to compensate for redundancy in the
383
variables by using weights defined by equations (2-4). Importantly, we want to make sure that no
384
categorical variable, even if it is much more frequent in disease than in controls, has overwhelmingly
385
high weight and makes the role of differences in other variables negligible. To attain this goal, we
386
introduce a weighed Tanimoto similarity measure as:
387
388
389
𝑇𝑇 =
2
2
∑𝑚𝑚
𝑘𝑘=1 𝑎𝑎𝑘𝑘∙𝑏𝑏 𝑘𝑘∙𝑤𝑤𝑘𝑘 ∙(erf�𝛾𝛾𝑘𝑘 �)
2
∑𝑚𝑚
� 2 2
� 2
𝑘𝑘=1( 𝑎𝑎𝑘𝑘 +𝑏𝑏𝑘𝑘 −𝑎𝑎𝑘𝑘∙𝑏𝑏 𝑘𝑘 ∙𝑤𝑤𝑘𝑘 ∙�erf�𝛾𝛾𝑘𝑘 �� )
(10),
where 𝑤𝑤𝑗𝑗 is the weight defined by eq. (2-4), using the appropriate correlation coefficients of the
variables. Coefficient 𝛾𝛾𝑗𝑗 is defined by eq. (7) and minimizes the role of binary variables equally prevalent
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
20
390
391
in disease and controls. Function erf(𝑥𝑥) =
2 𝑥𝑥
∫ exp( −𝑜𝑜 2 ) 𝑑𝑑𝑜𝑜
√𝜋𝜋 0
ensures that the weight of each binary
variable is smaller or equal to one, even for the high values of 𝛾𝛾𝑗𝑗, since |erf (𝑥𝑥)| ≤ 1 . Note that
392
maximum value of T is equal to one when all the categorical variables in a and b are the same, and
393
minimum value is zero when all the categorical variables are different. Now we can use eq. (10) to
394
define distance between any pair of participants described by J binary variables as:
395
396
397
𝑑𝑑�𝑋𝑋𝑖𝑖 , 𝑋𝑋𝑛𝑛 � = 1 − 𝑇𝑇𝑖𝑖𝑛𝑛 = 1 −
∑𝐽𝐽𝑘𝑘=1 𝑋𝑋𝑘𝑘𝑖𝑖 ∙𝑋𝑋𝑘𝑘𝑗𝑗∙𝑤𝑤2𝑘𝑘 ∙(erf�𝛾𝛾𝑘𝑘�) 2
2
2 +𝑋𝑋 2 −𝑋𝑋 ∙𝑋𝑋 �∙𝑤𝑤2 ∙�erf�𝛾𝛾 �� )
∑𝐽𝐽 ( �𝑋𝑋𝑘𝑘𝑖𝑖
𝑘𝑘𝑖𝑖 𝑘𝑘𝑗𝑗
𝑘𝑘
𝑘𝑘𝑗𝑗
𝑘𝑘
𝑘𝑘=1
(11)
Combining continuous and categorical variables
Note the similarity of the pairwise distance 1 − 𝑇𝑇𝑖𝑖𝑛𝑛 between two participants based on the
399
categorical variables describing them (eq.11), and pairwise distance 1 − 𝑀𝑀𝑖𝑖𝑛𝑛 between these
400
variables describing these two participants are the same, while the latter is equal to zero when the two
401
participants always were assigned to the same cluster by the 10,000 instances of k-means in the
402
resampling-based consensus clustering. Similarly, the former is equal to one when all the categorical
403
variables describing the two participants are different, while latter is equal to one when these
404
participants always were assigned to the different clusters in resampling-based consensus clustering.
405
This similarity allows combining continuous and categorical pairwise distances into a single distance
406
measure 𝐷𝐷𝑖𝑖𝑛𝑛, with a minimum value of zero and maximum value of one using the weighted Euclidean
398
407
408
409
410
participants based on their continuous variables. The former is equal to zero when all the categorical
length approach:
2
𝐷𝐷𝑖𝑖𝑛𝑛 = �((1 − 𝑀𝑀𝑖𝑖𝑛𝑛 )2 + 𝜇𝜇 2 ∙ �1 − 𝑇𝑇𝑖𝑖𝑛𝑛 � )�(1 + 𝜇𝜇 2 )
(12),
where 𝜇𝜇 is the weight representing the relative role of the distances based on categorical variables and
on continuous variables. We set it equal to the ratio of the number of non-redundant categorical and
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
21
411
∑78
continuous variables: 𝜇𝜇 = ∑102
𝑗𝑗=1 𝑤𝑤𝑗𝑗𝑗𝑗𝑎𝑎𝑗𝑗� 𝑗𝑗=1 𝑤𝑤𝑗𝑗𝑗𝑗𝑗𝑗𝑖𝑖𝑗𝑗 , where 𝑤𝑤𝑗𝑗 are determined by eq. (2-4) using
412
correlation coefficients appropriate to the distributions of the variables.
413
Semi-supervised clustering using bladder diary data
414
Of 545 women in the Observational Cohort of LURN, 193 (35%) returned complete bladder
415
diaries without missing volumes of voids and intakes. These data were deemed clinically important to
416
the subtyping of women with LUTS. One way to integrate data domains only available on a subset is by
417
using semi-supervised clustering methods [70-71]. If class membership for the members of the subset is
418
known, then the objective function in the clustering of the whole cohort should be modified as follows:
419
420
𝑝𝑝
2
∑𝑁𝑁𝑁𝑁𝑘𝑘 ∑𝑁𝑁𝑁𝑁𝑘𝑘
∑𝑁𝑁𝑘𝑘
𝑊𝑊𝐶𝐶𝑆𝑆𝑆𝑆 = ∑𝑗𝑗=1 ∑𝑁𝑁𝑘𝑘
𝑖𝑖=1 𝑛𝑛=1(𝑥𝑥𝑖𝑖𝑗𝑗 − 𝑥𝑥𝑛𝑛𝑗𝑗 ) + 𝑖𝑖=1 𝑛𝑛=1 ℎ 𝑖𝑖𝑛𝑛
(13),
where 𝑊𝑊𝐶𝐶𝑆𝑆𝑆𝑆 - within cluster sum of squared distances, p – number of variables, 𝑛𝑛 ≠ 𝑞𝑞, Nk - number of
421
cohort participants in the given cluster k, Nsk - number of the participants in cluster k that were present
422
in the subset. The values of ℎ 𝑖𝑖𝑛𝑛 = −ℎ are negative (reward, decreasing the within-cluster-sum-of-
423
424
squares [WCSS]) if participants n and m belong to the same cluster according to subset classification,
and is positive ℎ 𝑖𝑖𝑛𝑛 = ℎ (punishment, increasing WCSS) if participants n and q belong to the different
425
classes of the subset. This approach known as “must-link, cannot-link” allows for using labels known
426
from classification of the subset to influence clustering of the cohort. The limitation of this approach,
427
however, is that it does not allow for different level of confidence in subset cluster membership, i.e.,
428
participants n and q are either in the same subset cluster or not. In our case, additional data available
429
for the subset of participants do not provide 100% confidence in cluster membership for this subset;
430
furthermore, the number of participants in the subset is lower than in the whole cohort, making the
431
subset clusters less robust. There is a measure of similarity, however, that is quantitative and reflects
432
the confidence in cluster membership; it is the pairwise distance between members of the subset. We
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
22
433
suggest using this measure to modify the pairwise distance defined by eq. 12 by taking into account the
434
similarity between members of the bladder diary (BD) subset:
435
𝐺𝐺𝑖𝑖𝑛𝑛 = max ((𝐷𝐷𝑖𝑖𝑛𝑛 + 𝜌𝜌 ∙ (𝐵𝐵𝐷𝐷𝑖𝑖𝑛𝑛 − 𝑚𝑚𝐵𝐵𝐷𝐷)), 0)
(14)
437
where 𝐷𝐷𝑖𝑖𝑛𝑛 is defined by eq. 12, 𝐵𝐵𝐷𝐷𝑖𝑖𝑛𝑛is the pairwise distance between members of the subset, 𝑚𝑚𝐵𝐵𝐷𝐷 is
438
in the below sections. If either participant n or q, or both of them are not members of the subset, their
439
pairwise distance is not known and is assumed equal to the mean pairwise distance within the subset
440
𝑚𝑚𝐵𝐵𝐷𝐷. For these participants, the second term of eq. 14 is equal to zero; while, for members of the
436
the mean pairwise distance between members of the subset, 𝜌𝜌 is a parameter determined as described
442
subset, it is either negative or positive depending on whether 𝐵𝐵𝐷𝐷𝑖𝑖𝑛𝑛 is smaller or larger than 𝑚𝑚𝐵𝐵𝐷𝐷. Note
443
that, since the second term might be negative, for some large values of 𝜌𝜌, the sum of the two terms is
negative as well. However, the pairwise distance between the objects cannot be negative, and is
444
therefore set to zero for these cases.
441
445
We used five variables (number of intakes and voids, total volumes of intakes and voids, and
446
maximum voided volume) from the bladder diaries of the 193 participants to refine the values of their
447
pairwise distances. These five variables were scaled using bladder diary data for controls [58] and then
448
added to the 78 other continuous variables to calculate pairwise distances 𝐵𝐵𝐷𝐷𝑖𝑖𝑛𝑛 refined with bladder
449
450
451
diary variables, as described in the subsection on consensus clustering using continuous variables. Then
it was introduced into eq. 14 to get the refined matrix of the pairwise distances 𝐺𝐺𝑖𝑖𝑛𝑛 . The value of the
coefficient 𝜌𝜌 was determined by optimizing the quality of clusters, as defined in the following
452
subsections.
453
Determining the number of clusters
454
455
Determining the number of clusters is an important step in any clustering process. In
partitioning algorithms like k-means, it is necessary to decide on the number of clusters K prior to
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
23
456
running the algorithm. In agglomerative algorithms like hierarchical clustering, it is possible to decide on
457
the number of clusters when the dendrogram based on the distances between objects is already
458
created. It is common to try several values of K and then to compare the resultant clusters by using
459
various quality of clustering criteria, including Calinski-Harabasz, Davies-Bouldin, Dunn, Gap, and
460
Silhouette indices [22-26]. Quite often, these criteria disagree on the value of K that optimizes the
461
quality of the clusters.
462
Resampling-based consensus clustering, introduced in [34] and subsequently applied to our data
463
set, is a combination of multiple instances of k-means clustering followed by hierarchical clustering on
464
the pairwise distances between objects derived at the first stage. The value of K in k-means is typically
465
scanned (in our case from 2 to 12), and then the optimal value of K is determined using criteria
466
developed specifically for consensus clustering algorithm [34-35, 47]. In both [34] and [47], the
467
determination of the number of clusters is based on analysis of the cumulative distribution function
468
(CDF) of consensus index values 𝑀𝑀𝑖𝑖𝑛𝑛 (defined in the “Consensus clustering using continuous variables”
469
subsection of this paper). The main idea of this analysis is that, in case of ideal clustering, there are only
470
two possible values of consensus index 𝑀𝑀𝑖𝑖𝑛𝑛 = 1, when a pair of objects n and q are in the same cluster,
471
and 𝑀𝑀𝑖𝑖𝑛𝑛 = 0, when they are in the different clusters. Therefore, the ideal CDF should consist of two
472
vertical lines and a horizontal (flat) line between them. The length of the first vertical line will be equal
473
to the number of pairs with 𝑀𝑀𝑖𝑖𝑛𝑛 = 0 and the length of the second vertical line to the number of pairs
474
with 𝑀𝑀𝑖𝑖𝑛𝑛 = 1. However, if the value of K used in clustering is different from the true value of K, then the
475
shape of the CDF curve differs from the idealized curve described above. In [34], the optimal K is defined
476
as the one for which the change in the area under the CDF (relative to area at K-1 and K+1) is the largest
477
(“elbow” of the AUC vs. K curve). Analysis in [47] demonstrates several examples when the criterion of
478
[34] does not work and suggests an alternative criterion named proportion of ambiguously clustered
479
pairs (PAC) equal to the number of pairs with 0 < 𝑀𝑀𝑖𝑖𝑛𝑛 < 1 over the total number of pairs. Obviously, in
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
24
480
the real-world situation of biological variability and noisy measurements, almost all of the pairs will fall
481
482
into this category, so some more liberal lower and upper boundaries for 𝑀𝑀𝑖𝑖𝑛𝑛 need to be introduced,
483
for different values of K, since potential ambiguity increases with the increased number of clusters.
484
e.g., 0.1 and 0.9, as in [47]. It raises a question, however, whether these boundaries should be different
A more straightforward approach to determine the number of clusters is used in [35] by
486
introducing mean consensus score (CS) calculated as the mean value of consensus indices 𝑀𝑀𝑖𝑖𝑛𝑛 within
487
that it favors the high number of clusters, e.g., CS could reach its maximum when K=N/2 and each
488
cluster contains only a pair of most similar objects with highest values of 𝑀𝑀𝑖𝑖𝑛𝑛 .
485
489
the clusters. The value of K that results in the highest CS is considered optimal. The problem with CS is
Below, we introduce a pair of complementary criteria, i.e., contrast criterion (CC) and
490
proportion of the core clusters (PCCs), which we used in our clustering pipeline. We believe they
491
combine the advantages of PAC and CS and are free of some of their limitations, especially when used as
492
a complementary pair. The below sections introduce the CC and PCCs based on the analysis of
493
consensus matrix derived by k-means clustering of the multiple resampling instances of the data set;
494
however, these criteria can be used with any matrix of pairwise distances between the objects.
495
Contrast criterion
496
The idea of the CC is derived from visual representation of the consensus matrix as a heat map
498
presented in Figure 2. Each pixel of the heat map represents the value of 𝑀𝑀𝑖𝑖𝑛𝑛 probability of two objects
499
a cluster. Next, we compared the “bright yellowness” of this diagonal square with the “color” of the rest
500
of the row in which the diagonal square is located; therefore, the term “contrast criterion” (CC). The
501
larger the difference between these two measures, the further the situation from the case where all
502
𝑀𝑀𝑖𝑖𝑛𝑛 except for n=q are equal, and the heat map is uniformly yellowish (single uniform cluster case), in
497
to be together in the same cluster. Each bright yellow square along the diagonal of the matrix represents
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
25
503
which case the contrast is zero. We consider number of clusters K and cluster membership optimal
504
when CC is maximized.
505
When analyzing the properties and behavior of clustering criteria, it is necessary to compare the
506
clusters identified using certain clustering algorithms and clustering criteria of interest with the “true”
507
clusters. Unfortunately, “true” clusters are not known in real life, and therefore, one needs to simulate
508
them and then evaluate misclassification error resulting from the clustering algorithm and criteria of
509
interest. Such an approach was used in [63] to compare several popular clustering algorithms, and was
510
applied for the case of resampling-based consensus clustering with CC and PAC criteria (see
511
Supplemental Material text and Supplemental Figures S2-S10). Here, we concentrate on the general
512
analysis of CC and its properties. In this analysis, we need to introduce the term “alleged clusters”, which
513
are different from “true clusters” and “identified clusters”. “Alleged clusters” are determined for each
514
value of K tried by clustering algorithm, while “identified clusters” are those maximizing the value of
515
clustering criteria of interest, and “true clusters” are specified by the simulation.
516
To define CC explicitly, let us first look at the most typical case where the number of alleged
517
clusters is not equal to the number of objects, is not equal to one, and none of the clusters includes just
518
one object. Note that we are not making any assumptions or imposing any restrictions on the properties
519
of the “true” clusters.
520
For 𝐾𝐾 ≠ 𝑁𝑁, 𝐾𝐾 ≠ 1, 𝑁𝑁𝑗𝑗 ≠ 1, 𝐶𝐶𝐶𝐶 = ∑𝐾𝐾
𝑘𝑘=1 �
521
522
where K is the number of alleged clusters, N is the number of the clustered objects, 𝑁𝑁𝑘𝑘 and 𝑁𝑁𝑖𝑖 are the
523
represents the average “bright yellowness” of the diagonal square, while the second term in the
𝑁𝑁
𝑁𝑁
𝑘𝑘 ∑ 𝑘𝑘
∑𝑖𝑖=1
𝑗𝑗=1,𝑗𝑗≠𝑖𝑖 𝑀𝑀𝑖𝑖𝑗𝑗
𝑁𝑁𝑘𝑘 ( 𝑁𝑁𝑘𝑘−1)
−
𝑁𝑁
𝑁𝑁
𝑘𝑘
𝑖𝑖
∑𝐾𝐾
𝑖𝑖=1,𝑖𝑖≠𝑘𝑘∑𝑖𝑖=1 ∑𝑗𝑗=1 𝑀𝑀𝑖𝑖𝑗𝑗
𝑁𝑁𝑘𝑘 (𝑁𝑁−𝑁𝑁𝑘𝑘 )
��𝐾𝐾 (15),
numbers of objects in kth and ith clusters. As intended, the first term in the brackets of eq. (15)
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
26
524
brackets represents the average “yellowness” of the rest of the row in which the diagonal square is
525
located.
526
Note that we defined CC as an averaged contrast across alleged K clusters independently of the
527
size of the clusters. Other approaches are possible, e.g., a weighted average based on the sizes of the
528
clusters, or minimax approach, where the contrast for the “worst” (least contrast cluster) is maximized.
529
Clearly, the best choice of combining contrasts of each of the clusters into one overall depends on many
530
factors, including the goal of the clustering, expected sizes of the clusters, and the number and
531
distribution of the variables. It is an interesting topic; however, it is outside of the scope of this paper.
Another choice made in defining CC by eq. (15) is omitting of the diagonal terms 𝑀𝑀𝑖𝑖𝑖𝑖, which are
532
533
always equal to one. The percentage of diagonal terms in each square representing an identified cluster
534
is
535
terms in the definition of CC would favor smaller clusters over larger clusters by assigning higher CC to
536
the smaller clusters, irrespective of similarities of objects within the clusters. Therefore, we do not
537
include diagonal terms in the definition of CC, as shown in eq. 15. Note that eq. (15) does not work if
538
K=1, K=N, or 𝑁𝑁𝑗𝑗 = 1. For these special cases, CC is defined and discussed in Supplemental Material text.
539
𝑁𝑁𝑘𝑘
𝑁𝑁𝑘𝑘2
= 𝑁𝑁𝑘𝑘−1, or 100%, 50%, 33%, 25% for 𝑁𝑁𝑘𝑘 = 1, 2, 3, 4. Since 𝑀𝑀𝑖𝑖𝑖𝑖 = 1, 𝑀𝑀𝑖𝑖𝑛𝑛 < 1, inclusion of diagonal
There are certain similarities between our contrast criterion CC and consensus score (CS) of [35].
540
Although the exact definition of CS is not provided, it appears that CS is similar to the first term of eq.
541
542
(15). However, it is unclear if the diagonal terms 𝑀𝑀𝑖𝑖𝑖𝑖 are omitted or included and how CS for K clusters
543
CC in some simple idealized cases. In case of ideal clustering, both CS and CC reach their maximum
544
possible value of 1. The worst-case scenario for both criteria is the case where K>1 clusters are alleged,
545
when in reality, there are no “true” clusters, and all objects are described by a unimodal random
546
distribution of their attributes (variables). Now, if the number of alleged clusters K >1, there is an equal
are combined to derive the overall CS. Nevertheless, it is of interest to compare the behavior of CS and
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
27
547
probability for the object to end up in any of K clusters, so the mean value of 𝑀𝑀𝑖𝑖𝑛𝑛 is 1/K, which makes
548
the minimal possible value for CC equal to zero and minimal possible value for CS equal to 1/K. The
549
rather limited range of values from 1/K to 1, together with the dependence of the minimal value on the
550
number of alleged clusters, are the limitations of the CS criterion, which are absent in case of CC, where
551
the range is from 0 to 1 irrespectively of K. Therefore, we consider the use of CC advantageous for
552
determining the optimal number of clusters.
553
Core clusters
554
When analyzing the quality of alleged clusters, it is important to know the confidence with
555
which cluster membership is determined. Clearly, one would prefer clusters where the probability of
556
objects to belong to a particular cluster is 0.9 rather than 0.3. The knowledge of consensus matrix allows
557
for calculating the probability for each object n belonging to a particular cluster k:
558
𝜋𝜋𝑖𝑖𝑘𝑘 =
𝑁𝑁
𝑘𝑘
∑𝑗𝑗=1
𝑗𝑗≠𝑖𝑖 𝑀𝑀𝑖𝑖𝑗𝑗
𝑁𝑁𝑘𝑘 −1
(16)
559
Note that it is different from the probability averaged across the cluster that was used to calculate CS
560
and CC in the previous subsection. Importantly, within the same cluster, some objects might have
561
562
probability (confidence) as high as 𝜋𝜋𝑖𝑖𝑘𝑘=0.9999... , or as low as 𝜋𝜋𝑖𝑖𝑘𝑘 = 1/𝐾𝐾+0.001, assuming that it is
563
lower for any other cluster 𝑖𝑖 ≠ 𝑘𝑘. We will call the nth object the core member of cluster k if 𝜋𝜋𝑖𝑖𝑘𝑘 > 0.5,
which means that, for this object, the probability to be in cluster k is higher than probability to be in all
564
other clusters combined. The rest of the members of the cluster do not belong to the core; for them, the
565
probability to belong to the kth cluster is just higher than to be in any other given cluster. The number of
566
core members divided by the total number of objects in the cluster provides a useful measure that we
567
named proportion of the core cluster (PCC). As with the contrast criterion CC, one can use several
568
approaches to derive overall PCC from the PCCs for each cluster, i.e., take the average across all clusters,
569
weighted average based on the size of the clusters, or minimax by looking at the PCC in the worst
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
28
570
cluster. PCC provides a measure of overall confidence in the alleged cluster membership and of the
571
uniformity of the alleged clusters. Unlike contrast CC, PCC favors smaller size of the clusters and reaches
572
its maximum when K=N/2 and each cluster contains just a pair of objects. PCC reveals information
573
similar to the proportion of ambiguously clustered pairs (PAC) [47], for which 0.1 < 𝑀𝑀𝑖𝑖𝑛𝑛 < 0.9;
574
however, PCC is easier to interpret and does not include unjustified upper and lower boundaries 0.1 and
575
0.9.
576
Using CC and PCC to determine optimal number of clusters K
577
578
We used a combination of CC and PCC to determine the optimal number of clusters K. Note that
eq. 14 contains undefined coefficient 𝜌𝜌. Therefore, we have two parameters 𝜌𝜌 and K to determine and
580
two criteria to meet. We determined 𝜌𝜌 and K as the values maximizing CC and corresponding to an
581
to 1.2 using the single-program multiple data sets (spmd) function of MATLAB Parallel Computing
582
583
Toolbox; K values were scanned from K=2 to K=12. The determined optimal values were 𝜌𝜌 =0.3 and K=5.
584
presented in Figure 3. As seen in Figure 3, contrast criterion CC and percent of core cluster members
585
PCC both have maxima for number of clusters K=5. As shown in Table 1, other quality of clustering
586
criteria also confirm K=5 as an optimal number of clusters for our cohort of women with LUTS.
579
587
elbow (point of diminishing returns) for PCC. A clustering procedure was run for 24 values of 𝜌𝜌 from 0.05
The resultant consensus matrix together with the values of CC and PCC for K scanned from 2 to 12 are
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
29
588
589
590
Figure 3. Determination of the optimal number of refined clusters. (A) Consensus matrix heat map
demonstrates five clusters of participants (named W1-W5) grouped together based on the pairwise
distances G nq (eq 14). (B) CC eq. 15 for K=2,…,12. (C) PCC eq. 16. Both CC and PCC have maxima at K=5.
591
592
Table 1. Other quality of clustering criteria, confirming K=5 as an optimal number of clusters.
K=2
Calinski-Harabasz (↑)
K=3
K=4
K=5
K=6
K=7
K=8
K=9
K=10
1382.26 346.13 676.29 1126.25 567.83 398.02 309.93 234.52 180.57
Davies-Bouldin (↓)
0.4328 0.8175 0.6045
0.4720 0.5692 0.9178 1.0809 1.1498 1.1678
Dunn (↑)
0.0167 0.0072 0.0290
0.0485 0.0496 0.0455 0.0273 0.0754 0.0590
Point-Biserial (↓)
-4.801
Silhouette (↑)
0.6896 0.4565 0.5731
-2.728
-3.25
-3.849 -2.515
-2.081 -1.855
-1.621 -1.409
0.6857 0.5955 0.4821 0.3723 0.3143
NaN
593
594
Evaluation of the quality of the identified clusters
595
We used multistep procedure to evaluate the quality of identified clusters and compare with the
596
potential alternative clusters. The first step was examination of CC and PCC for each of K clusters (Figure
597
3). Next, we calculated other established quality of clustering criteria, including Calinski, Davies-Bouldin,
598
Dunn, Point-Biserial, Silhouette [22-26], etc. (Table 1). Then, we performed pairwise comparison of the
599
clusters using Wilcoxon rank sum tests and chi square tests, where appropriate, to determine which
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
30
600
variables used for clustering were significantly different in the pairwise comparison. All pairwise
601
comparisons were adjusted using an FDR correction for multi-testing [59].
602
Visualization of the results
603
We used several tools to visualize the results of our analyses. Heat maps representing the
604
consensus matrices were generated using the clustergram MATLAB function. Properties of the identified
605
clusters were illustrated by radar plots, built-in using SAS statistical graphics panel (sgpanel) scatter,
606
polygon, and vector procedures. Comparison of cluster membership in the refined clusters versus
607
previously identified symptom-based clusters was performed using Sankey diagrams, built with the
608
googleVis package.
609
RESULTS AND DISCUSSION
610
Description of the clusters
611
Five distinct clusters of women with LUTS were identified by clustering 545 participants of the
612
LURN Observational Cohort Study using 185 variables. We call these clusters W1-W5, in order to
613
distinguish them from clusters F1-F4, described previously [41]. Demographic data for each cluster are
614
presented in Table 2. Some demographic characteristics were different across the clusters, including
615
age, ethnicity, menopausal status, obesity, prevalence of hysterectomy, percentage of participants with
616
at least one vaginal birth, education level, and employment status. No significant differences across the
617
clusters were observed for race, smoking, and alcohol use.
618
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
31
619
Table 2. Demographic Data for clusters W1-W5.
N
Age (median IQR)
W1
W2
W3
W4
W5
77
64
144
95
165
53 (36-60)
66 (51-71)
60 (50-70)
51 (42-63)
59 (51-67)
P-Value
<0.0001
0.221
Race
White
66 (86%)
60 (94%)
116 (81%)
80 (84%)
130 (79%)
Black
8 (10%)
2 (3%)
22 (15%)
8 (8%)
26 (16%)
Asian
3 (4%)
2 (3%)
3 (2%)
5 (5%)
3 (2%)
Other
1 (1%)
2 (3%)
3 (2%)
3 (3%)
6 (4%)
0.016
Ethnicity
74 (96%)
57 (89%)
138 (96%)
84 (88%)
159 (96%)
Hispanic or Latino
1 (1%)
5 (8%)
3 (2%)
10 (11%)
2 (1%)
Unknown
2 (3%)
2 (3%)
3 (2%)
1 (1%)
4 (2%)
Obese
23 (30%)
15 (23%)
72 (50%)
38 (40%)
99 (60%)
<0.0001
Post-menopausal
35 (46%)
46 (73%)
102 (71%)
47 (49%)
117 (72%)
<0.0001
Had a hysterectomy
17 (22%)
9 (14%)
49 (34%)
25 (26%)
64 (39%)
0.0013
At least one vaginal
birth
35 (45%)
60 (94%)
91 (63%)
75 (79%)
128 (78%)
<0.0001
Non-Hispanic/Latino
Alcoholic drinks per
week
0.0706
Never
10 (13%)
5(7%)
20 (14%)
15 (17%)
40 (25%)
0-3 drinks per week
55 (72%)
46 (73%)
89 (62%)
65 (68%)
100 (61%)
4-7 drinks per week
8 (11%)
9 (14%)
21 (15%)
13 (14%)
15 (9%)
0 (0%)
3 (4%)
8 (6%)
2 (2%)
5 (3%)
More than 7 drinks per
week
0.1746
Smoking status
Never smoker
50 (67%)
43 (68%)
99 (69%)
64 (67%)
92 (56%)
Former smoker
21 (28%)
19 (30%)
33 (23%)
28 (29%)
52 (32%)
Current smoker
3 (4%)
1 (2%)
10 (7%)
3 (3%)
18 (11%)
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
32
0.0058
Education
Less than Associate
degree
40 (30%)
24 (30%)
89 (30%)
27 (28%)
76 (47%)
Associates or Bachelor’s
degree
32 (43%)
25 (40%)
58 (41%)
45 (47%)
61 (38%)
Graduate degree
23 (31%)
19 (30%)
41 (29%)
23 (24%)
24 (15%)
0.0350
Employment
Full-time
35 (47%)
27 (42%)
50 (35%)
44 (46%)
51 (31%)
Part-time
13 (17%)
5 (8%)
20 (14%)
17 (18%)
21 (13%)
Unemployed (looking or
not looking for work)
27 (36%)
32 (50%)
73 (51%)
34 (36%)
90 (56%)
620
621
Importantly, all urinary symptoms and many other clinical variables were significantly different
622
across the clusters. Table 3 presents the comparison of urinary symptoms (collected with LUTS Tool and
623
AUA-SI), bladder diary variables, and other 37 significantly different clinical variables across clusters W1-
624
W5. For clarity, we describe these significantly different variables while discussing signatures of the
625
clusters in the text following Table 3. Table 2, and especially Table 3, illustrate the distinctiveness of the
626
identified five clusters of women with LUTS.
627
628
Table 3. Urinary symptoms, bladder diary variables, non-urinary symptoms, and clinical variables
across clusters W1-W5.
Cluster Cluster Cluster Cluster Cluster
P-Value
W1
W2
W3
W4
W5
77
64
144
95
165
Frequency
1.92
1.72
2.65
1.67
2.70
<0.001
Daytime
frequency
1.67
1.44
2.09
1.37
2.13
<0.001
Nocturia
1.85
1.16
1.89
1.22
2.06
<0.001
N
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
33
Incomplete
emptying
1.76
1.16
1.20
0.86
2.37
<0.001
Trickle/dribble
1.69
1.07
1.28
1.11
2.56
<0.001
Urgency
1.10
1.47
2.88
1.76
2.73
<0.001
Hesitancy
1.18
0.72
0.34
0.27
1.26
<0.001
Intermittency
1.12
0.73
0.32
0.28
1.28
<0.001
Straining
1.04
0.49
0.16
0.11
1.09
<0.001
Weak stream
0.84
0.79
0.43
0.22
1.52
<0.001
Spraying
0.87
0.60
0.46
0.45
1.55
<0.001
Urgency with
fear of leakage
0.56
1.34
2.85
2.05
2.69
<0.001
Bladder pain
0.94
0.50
0.34
0.34
1.42
<0.001
Burning with
urination
0.44
0.24
0.10
0.12
0.72
<0.001
Urinary
incontinence(UI)
0.59
1.24
2.39
2.81
2.96
<0.001
Post-void UI
0.74
0.65
0.71
1.44
2.31
<0.001
Urgency UI
0.37
1.06
2.61
2.00
2.71
<0.001
Stress UI
(laughter)
0.55
1.03
1.26
2.86
2.57
<0.001
Stress UI
(exercise)
0.46
0.78
0.90
2.94
2.35
<0.001
UI with sleep
0.10
0.35
0.73
0.88
1.64
<0.001
UI with sex
0.16
0.33
0.12
0.75
0.80
<0.001
UI no reason
0.21
0.56
0.89
1.66
2.13
<0.001
Nocturia (AUASI)
2.37
1.77
2.18
1.57
2.37
<0.001
Frequency
(AUA-SI)
3.15
2.25
2.96
2.26
3.13
<0.001
Intermittency
(AUA-SI)
1.74
1.12
0.53
0.41
1.95
<0.001
Weak stream
(AUA-SI)
1.25
1.12
0.60
0.36
2.09
<0.001
Straining (AUASI)
1.44
0.57
0.12
0.16
1.08
<0.001
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
34
Incomplete
emptying (AUASI)
2.18
1.28
1.04
0.79
2.67
<0.001
Urgency (AUASI)
1.59
2.17
3.63
2.18
3.53
<0.001
QOL (AUA-SI)
3.68
3.57
4.46
4.66
4.88
<0.001
75.58
75.47
82.62
80.58
87.64
<0.001
Waist circumference (Cm) 94.74
97.13
100.59
95.58
106.35
<0.001
131.00
130.92
124.29
130.17
<0.001
75.04
41.29
29.09
39.94
0.012
Weight (Kg)
Systolic blood pressure
122.42
Post-void residual volume 55.43
(mL)
POP-Q: Ba result
-2.65
1.24
-2.23
-1.78
-1.86
<0.001
POP-Q C result
-6.85
-2.38
-6.52
-6.54
-5.85
<0.001
POP-Q D result
-7.96
-4.43
-7.79
-7.42
-6.79
<0.001
Number of pregnancies
1.80
2.90
2.01
2.48
3.07
<0.001
Number of vaginal births
0.92
2.43
1.47
1.72
2.02
<0.001
Functional Comorbidity
Index Total
1.66
1.74
2.48
1.24
3.64
<0.001
GUPI pain
5.09
4.15
3.70
2.23
6.84
<0.001
GUPI urine
4.71
3.40
3.67
2.49
5.67
<0.001
GUPI QOL
5.78
5.68
6.85
7.60
8.70
<0.001
POPDI-6
14.50
20.60
7.57
7.25
30.22
<0.001
Colorectal-anal distress
inventory (CRADI-8)
12.98
18.03
14.76
10.99
33.50
<0.001
Urinary distress inventory 24.42
(UDI-6)
25.44
36.27
39.58
63.67
<0.001
Perceived stress scale
12.09
9.29
11.89
11.29
16.44
<0.001
PROMIS constipation Tscore
50.30
48.68
49.00
49.89
55.80
<0.001
PROMIS depression Tscore
48.91
44.63
48.67
46.61
53.93
<0.001
PROMIS anxiety T-score
49.84
46.78
48.61
48.11
54.75
<0.001
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
35
PROMIS sleep
disturbance T-score
53.48
49.08
52.17
51.52
56.83
<0.001
PROMIS diarrhea T-score
45.70
46.37
48.43
45.17
53.98
<0.001
PROMIS physical
functioning T-score
50.44
49.85
46.84
53.19
42.24
<0.001
Arthritis diagnosis
23 (30%)
29 (46%)
64 (45%)
20 (21%)
94 (58%)
<0.001
Asthma diagnosis
12 (16%)
7 (11%)
29 (20%)
7 (7%)
49 (30%)
<0.001
Chronic obstructive
pulmonary disease
(COPD) diagnosis
1 (1%)
5 (8%)
6 (4%)
1 (1%)
19 (12%)
0.001
Diabetes diagnosis
4 (5%)
8 (13%)
22 (15%)
5 (5%)
36 (22%)
0.004
Upper gastrointestinal
disease diagnosis
19 (25%)
15 (24%)
34 (24%)
12 (13%)
71 (44%)
<0.001
Depression diagnosis
17 (22%)
9 (14%)
53 (37%)
27 (28%)
79 (49%)
<0.001
Anxiety or panic disorder
diagnosis
13 (17%)
8 (13%)
35 (24%)
20 (21%)
62 (38%)
<0.001
Degenerative disc disease 10 (13%)
diagnosis
7 (11%)
29 (20%)
10 (11%)
57 (35%)
<0.001
History of pelvic pain
15 (20%)
3 (5%)
7 (5%)
11 (12%)
37 (23%)
<0.001
Sexual activity with the
last month
41 (54%)
27 (43%)
54 (38%)
58 (61%)
56 (34%)
<0.001
History of hypertension
20 (26%)
21 (33%)
61 (43%)
23 (24%)
80 (49%)
<0.001
History of hyperlipidemia
14 (18%)
24 (38%)
44 (31%)
22 (23%)
67 (41%)
0.009
History of sleep apnea
9 (12%)
10 (16%)
21 (15%)
9 (9%)
45 (28%)
0.005
History of a psychiatric
diagnosis
32 (42%)
11 (17%)
55 (38%)
41 (43%)
93 (57%)
<0.001
Past surgical procedure
for LUTS
4 (5%)
6 (10%)
23 (16%)
11 (12%)
36 (22%)
0.008
No abnormal vaginal
57 (75%)
findings on physical exam
35 (56%)
101 (72%)
78 (82%)
124 (77%)
0.004
No abnormal uterus
50 (68%)
findings on physical exam
45 (71%)
83 (60%)
69 (73%)
87 (54%)
0.009
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
36
No notation of
tenderness on physical
exam
44 (59%)
50 (79%)
108 (77%)
84 (88%)
131 (81%)
<0.001
Average number of voids
in 24 hours
7.6
7.5
8.4
7.1
8.6
0.0023
Average voided volume
in 24 hours (mL)
1827.0
1717.5
1786.6
1786.6
1813.5
0.6728
Average number of
intakes in 24 hours
6.5
6.4
6.3
6.3
6.4
0.7081
Average volume of
intakes in 24 hours
1902.8
1682.9
1813.3
1813.3
1810.1
0.3028
Max voided volume
531.9
543.2
473.4
473.4
519.0
0.2134
629
630
Properties of the five clusters are visualized in Figure 4. Each column represents one of five
631
clusters. Radar plots in the first row illustrate urinary symptoms measured by LUTS Tool and AUA-SI; the
632
second row illustrates demographics, clinical measurements, and non-urinary PROs; the third row shows
633
categorical data on comorbidities and anomalies identified during the physical exam; the fourth row
634
shows intake and voiding pattern variables collected in bladder diaries. Radar plots represent mean
635
values of the raw variables across members of each of the clusters. None of the clusters could be
636
characterized by a single symptom, but rather by a combination of symptoms with various levels of
637
severity. Women in all five clusters reported higher than normal frequency of voiding (with the highest
638
frequency in W3 and W5). Women in all clusters except W1 reported urinary urgency and some level of
639
incontinence.
640
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
37
641
642
643
644
Figure 4. Radar plots illustrating mean values of urinary symptoms, demographics, clinical
measurements, non-urinary PROs, comorbidities, and bladder diary variables for identified clusters of
women with LUTS. Urinary symptoms are color-coded: green=frequency; blue=post-micturition;
purple=urgency; dark blue=voiding; red=pain; orange=incontinence.
645
646
Women in cluster W1 (n=77) reported minimal urinary incontinence, but had mostly voiding and
647
post-micturition symptoms (post-void dribbling, trickling, straining, hesitancy, and incomplete bladder
648
emptying). They were younger than the average across the LURN female cohort, had a lower than
649
average weight, number of pregnancies, and vaginal births. They had less comorbidities and abnormal
650
findings in the physical exam. Women in cluster W2 (n=64) reported mild urinary symptoms, including
651
mild urinary incontinence. They presented clinically significant anterior vaginal wall prolapse (mean
652
POP-Q point B anterior [Ba]=1.24 cm, which is outside the introitus), apical prolapse (mean POP-Q point
653
C=-2.38 cm), and the most severe pelvic organ prolapse symptoms (with the highest Pelvic Organ
654
Prolapse Distress Inventory [POPDI-6] values of 20.60). They were on average older (66 vs. 53 years old),
655
and had a higher number of pregnancies (2.9 vs. 1.8) and vaginal births (1.47 vs. 0.92) than women in
656
cluster W1. They also had the highest post-void residual urine volume (75 mL) across the clusters.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
38
657
Women in cluster W3 (n=144) reported high urinary frequency, urinary urgency, and urgency urinary
658
incontinence. They had increased weight, had larger waist circumference, and higher functional
659
comorbidity index (FCI) than women in W1, W2, and W4. They most frequently reported “urgency with
660
fear of leakage” (2.85), but did not report any substantial post-voiding symptoms. Women in cluster W4
661
(n=95) reported multiple symptoms associated with stress urinary incontinence, as well as urgency
662
urinary incontinence, and some post-void urinary incontinence. They were younger (mean 51 years),
663
had less medical comorbidities (FCI =1.24), and had a higher level of physical functioning (PROMIS T-
664
score=53.2) than others in the cohort. Women in W5 (n=165) reported higher frequencies and severities
665
of LUTS for all symptoms. For 27 out of 30 urinary symptoms, they reported the highest levels across all
666
five clusters. These women were heavier (87.6 Kg), had the lowest level of physical functioning (PROMIS
667
T-score=42.2), had more medical comorbidities (FCI=3.64), and more pregnancies (3.07) than the rest of
668
the cohort. They also reported higher psychosocial difficulties in depression, anxiety, and perceived
669
stress, as well as sleep disturbance.
670
The presence of multiple significantly different variables across the clusters demonstrates that
671
clusters W1-W5 meet the concise definition of clustering given by Liao as: “The goal of clustering is to
672
identify structure in an unlabeled data set by objectively organizing data into homogeneous groups
673
where the within-group-object dissimilarity is minimized and the between-group-object dissimilarity is
674
maximized” [8]. Figure 5 visually illustrates results of pairwise comparisons of the clusters in a matrix
675
form. Figure 5A provides cluster comparisons by LUTS Tool variables. Elements on the diagonal of the
676
matrix present the level of severity for each LUTS Tool question, i.e., the severity urinary symptom
677
signature of the cluster. The triangle of boxes above the diagonal demonstrates variables significantly
678
different in the pairwise comparison of the clusters; each colored bar indicates a significantly different
679
variable. As seen, the majority of symptoms are significantly different in the pairwise comparison of the
680
clusters. Elements in the lower triangle of the matrix present the difference in symptom severity levels;
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
39
681
e.g., the first (upper) element in the triangle represents the difference between symptom severity levels
682
in cluster W2 and cluster W1, indicating that urgency symptoms are more severe in cluster W2, while
683
voiding and pain symptoms are more severe in cluster W1 than in cluster W2. Similarly, Figures 5B-5C
684
provide the results of pairwise comparison of the clusters for other variables from Tables 2-3,
685
demonstrating multiple significantly different non-urologic symptoms, physical examination, clinical and
686
demographic variables. In summary, clusters are distinct and significantly different, not only by their
687
urinary symptom signatures, but by multiple non-urologic variables as well. Importantly, these
688
significant differences are demonstrated by omnibus test (Tables 2-3) and by pairwise comparison of the
689
clusters (Figure 5).
690
691
Figure 5. Results of the pairwise comparison of clusters W1-W5. (A) LUTS Tool variables. (B)
Demographic and clinical variables. (C) Physical examination and comorbidities data.
692
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
40
693
694
695
696
697
Differential protein abundance in serum of women with LUTS versus non-LUTS controls
Figure 6 presents the volcano plots comparing abundances of 276 proteins in baseline serum
samples of women with LUTS versus non-LUTS controls. Figure 6A compares the abundances for all 230
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
41
698
women with LUTS to 30 controls, while Figures 6B-6F provide similar comparisons for members of the
699
identified clusters W1-W5 for whom proteomics data was available (n1=37, n2=38, n3=53, n4=42,
700
n5=60). Supplemental Table S3 provides the lists of significantly differentially abundant proteins for each
701
of the comparisons. Multiple differentially abundant proteins are observed in the serum samples of
702
women with LUTS versus non-LUTS controls, both overall and between each cluster and controls. While
703
some of these have been shown [72] to be associated with LUTS (e.g., tumor necrosis factor [TNF],
704
interleukin-10 [IL-10], monocyte chemotactic protein [MCP], and transforming growth factor [TGF]), the
705
remainders are novel. The highest number of the differentially abundant proteins of 70 (29 after FDR
706
correction for multiple testing) is observed for cluster W5, which demonstrated the highest level of all
707
urinary symptoms and comorbidities. Note that overlap between the lists of differentially abundant
708
proteins is quite low, meaning that clusters W1-W5 are “biochemically” different. The highest overlap of
709
18 differentially abundant proteins is observed for cluster W5 and cluster W3, defined mainly by high
710
urinary frequency, urinary urgency, and urge urinary incontinence. Interestingly, the lowest number of
711
differentially abundant proteins (n=10) are observed in clusters W2 (characterized by the presence of
712
pelvic organ prolapse) and W4 (characterized by the presence of stress urinary incontinence), which are
713
presumably driven by anatomic abnormalities, rather than biochemical changes. Without going into the
714
detailed interpretation of these results, which are outside the scope of this paper, we think the
715
observed differences in the differentially abundant proteins across W1-W5 clusters serve as important
716
independent confirmation of the distinctiveness of the identified clusters.
717
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
42
718
719
720
Figure 6. Volcano plots demonstrating differentially abundant proteins in women with LUTS vs.
controls for 230 participants representing: (A) the whole cohort; and (B-F) for each of identified
clusters W1-W5.
721
722
723
Comparison of clusters W1-W5 with our previously published urinary symptom-based clusters F1-F4
724
Comparing quality of the clusters
725
Previously, we identified four clusters (F1-F4) by analyzing data on the same 545 women with
726
LUTS using only urinary symptoms data collected via the LUTS Tool and AUA-SI (total of 52 variables)
727
[41]. Since the same resampling procedure was performed when generating W1-W5 and F1-F4, both
728
cluster structures are equally robust to the random variations of the cohort composition.
729
Distinctiveness, as determined by pairwise comparisons, was higher for the refined clusters
730
compared with the previously published clusters (Tables 4 and 5). The proportion of significantly
731
different variables in pairwise comparison of the clusters ranged from 27% to 83% (mean 52%) for the
732
refined clusters, compared with a range of 27% to 58% (mean 43%) for the previous clusters. The
733
proportion of core clusters was also higher for W1-W5, compared with F1-F4; it ranged from 73% to 93%
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
43
734
(mean 87%) for the refined clusters, compared with a range of 40% to 83% (mean 62%) for the previous
735
clusters. Summarizing, there are more significantly different variables across refined clusters (W1-W5)
736
than across our previously published clusters (F1-F4), and the refined clusters contain a higher
737
percentage of core members for whom the probability to be in the given cluster is higher than the
738
probability to be in all other clusters combined. Therefore, the refined clusters identified in the current
739
paper by using additional urinary and non-urinary variables (total of 185 variables) are substantially
740
more distinct than our previously published clusters based only on urinary symptoms measured by the
741
LUTS Tool and AUA-SI (52 variables).
742
Table 4. Proportion of significantly different variables in clusters W1-W5 and F1-F4.
W1
W2
W3
W4
W5
F1
F2
W1
27%
46%
58%
64%
F1
34%
W2
27%
39%
37%
83%
F2
34%
W3
47%
27%
34%
66%
F3
47%
27%
W4
58%
37%
34%
69%
F4
58%
51%
W5
64%
83%
66%
69%
Mean
52.3% Mean
F3
47%
27%
F4
58%
51%
39%
39%
42.7%
743
744
745
Table 5. Proportion of core clusters in W1-W5 and F1-F4.
W1
W2
W3
W4
W5
Mean
PCC
88%
73%
89%
93%
92%
87%
F1
F2
F3
F4
PCC
40%
74%
51%
83%
Mean
62%
746
747
748
Comparing cluster membership
The Sankey diagram in Figure 7 serves to compare cluster membership in W1-W5 and in F1-F4.
749
Refined cluster W1 is mostly composed of the members of cluster F1 without prolapse. Cluster W2 is
750
formed by the members of cluster F1 with pelvic organ prolapse and includes some members of other
751
clusters having prolapse and moderate urinary symptoms. Cluster W3 is mainly composed of members
752
of F2 and F3 with urgency urinary incontinence. Cluster W4 is predominantly formed by members of F3
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
44
753
with stress urinary incontinence. Cluster W5 includes nearly all members of F4 and approximately 30%
754
of F3 who have both urgency urinary incontinence and stress urinary incontinence symptoms.
755
Figure 7. Sankey diagram comparing cluster memberships in W1-W5 and F1-F4.
756
757
Comparing radar plots
758
Figure 8 provides comparison of radar plots for the urinary symptom signatures of refined
759
clusters W1-W5 and symptom-based clusters F1-F4. There are substantial similarities in the urinary
760
symptom signatures of our previously published clusters, and of the refined clusters. Radar plots for W5
761
and F4 are similar in presenting all urinary symptoms at a uniformly high level. Signatures of W4 and F3
762
are similar in presenting the combination of stress urinary incontinence, urgency, and voiding
763
dysfunction symptoms. W3 and F2 present urinary urgency, urgency urinary incontinence, and mild
764
voiding problems. Symptom signatures of W1 and F1 are similar, presenting mostly voiding and post-
765
micturition problems. The signature of cluster W2 presents mild LUTS and is mostly defined by clinically
766
significant pelvic organ prolapse. The observed similarity of the clusters’ LUTS signatures confirms that
767
additional variables did not result in radical changes, but rather in incremental changes that allowed for
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
45
768
identification of the refined clusters, which are built upon, but are more distinct and uniform than, our
769
previously published ones. We believe this is further evidence of the stability of the identified clusters.
770
Urinary symptom data captured by the LUTS Tool and AUA-SI provided the foundation for data-driven
771
subtyping of LUTS, while the remaining urinary and non-urinary variables allowed for identifying refined
772
clusters that differ not only by urinary symptoms, but by other PRO, demographic, and clinical variables
773
as well (Table 3, Figure 5, and Figures 6B-6C). Importantly, cluster refinement enhanced the
774
distinctiveness and uniformity of the clusters, as well as the confidence in cluster membership, by
775
increasing the overall proportion of core clusters from 62% to 87%.
776
Figure 8. Comparison of radar plots of the urinary symptom signatures for clusters W1-W5 and F1-F4.
777
778
779
780
Evolution of refined clusters W1-W5 in 3- and 12-month follow-up
Figure 9 presents radar plots of the urinary symptom signatures of refined clusters W1-W5 at 3-
781
and 12-month follow-up. As seen, the shapes of the radar plots are conserved, while their areas
782
representing overall severity of LUTS are decreased due to improvement in urinary symptoms for some
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
46
783
participants. The percentage of improvers varied across the clusters (Table 6). Improvers were defined
784
as having a ½ SD or greater improvement between baseline and 12 months on the calculated LUTS Tool
785
Summary Score (including all 22 LUTS Tool severity variables). We view stability of the urinary symptom
786
radar plot signatures’ shapes as additional evidence of robustness of the identified clusters.
787
Figure 9. Evolution of the urinary symptom signatures in 3- and 12-month follow-up.
788
789
790
Table 6. Percentage of improvers in cluster W1-W5 in 3- and 12-month follow-up.
Cluster
N patients with 12-month
follow-up
N (%) Improvers at 12
months
Cluster W1
50
23 (46%)
Cluster W2
51
23 (45%)
Cluster W3
106
57 (53%)
Cluster W4
70
49 (70%)
Cluster W5
119
69 (58%)
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
47
791
792
Clinical significance of the identified clusters
The current paradigm for managing patients with LUTS is to assign a diagnosis based on a pre-
793
defined symptom complex, such as overactive bladder (OAB), or based on a single predominant
794
symptom, such as nocturia or stress urinary incontinence. Treatments are then administered based on
795
these diagnoses [73-74]. Conventional classification of LUTS includes such partially overlapping groups
796
as OAB wet, OAB dry, continent, stress urinary incontinence, urgency urinary incontinence, mixed
797
urinary incontinence, underactive bladder, and bladder outlet obstruction. As stated in [39-41], there
798
are limitations to this paradigm, as patients frequently present with multiple other urinary symptoms in
799
addition to those being treated, and these combinations of symptoms may be relevant to treatment
800
selection. Diagnosis and treatment based solely on patients’ chief complaints may be unsatisfactory, as
801
they disregard other presenting symptoms. Mechanistic studies reveal that a functional impairment to a
802
specific organ in the urinary tract may cause more than a single symptom. For example, a weak urethral
803
sphincter is associated with both stress and urgency urinary incontinence [58, 75]. This may explain why
804
mixed incontinence is so common. This raises the question of how current diagnostic paradigms
805
correspond with biological changes of the continence system and how symptoms occur in women
806
seeking treatment, which was the main rationale for the unbiased data-driven subtyping of women with
807
LUTS in LURN. As seen, none of the clusters W1-W5 identified in our analysis could be characterized by a
808
single symptom, but rather by a combination of symptoms with various levels of severity, which are in
809
concert with the clinical observations mentioned above [58, 75]. The more detailed comparison of our
810
refined clusters W1-W5 with the conventional LUTS groups and subtypes of LUTS identified by other
811
researchers [36-38,76] is provided in Supplemental Material text.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
48
812
Clustering methodologies reported in studies on subtyping LUTS and other common diseases and
813
disorders
814
Previous research on subtyping of diseases and disorders provides various levels of detail
815
regarding the clustering methodologies that were employed. For instance, Coyne et al [36] provided
816
detailed information on the 14 LUTS questions used for clustering and indicated that all the variables
817
were scaled from 0 to 1. They took care of the robustness of clusters to the variation in composition of
818
the cohort by performing clustering on the random 50% subset of participants first, and then by
819
extending it to the whole cohort. They used k-means algorithm for clustering and scanned the number
820
of clusters K from 3 to 7. They reported that the decision on the number of clusters was made by
821
evaluating “each cluster model based on the clinical relevance and distinctiveness of each cluster, as
822
well as the amount of variance accounted for by the cluster solution.” However, they did not provide the
823
names or values of criteria used for cluster evaluation. They provided detailed and informative
824
descriptive statistics, but for some reason, did not provide any tests on significantly different variables in
825
pairwise comparison of the identified clusters.
826
Similarly, Hall et al [37] provided detailed information on the 14 LUTS questions and scaling. K-
827
means clustering was performed with random 50% split of the cohort (split-half validation) similar to
828
[36]. Detailed descriptive statistics and omnibus tests are provided across identified clusters and
829
asymptomatic controls for significant difference in the variables not used for clustering, including
830
demographics, comorbidities, risk factors, and lifestyle factors. However, no information on significant
831
differences in the 14 LUTS variables used for clustering, and no information on pairwise comparison of
832
the clusters is provided. Summarizing, these two LUTS clustering papers provide a reasonable level of
833
details on scaling of the variables and on the clustering procedure, but unfortunately do not provide
834
enough information on evaluation of the quality of the identified clusters.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
49
835
In contrast, Miller et al [38] provided all necessary information on pairwise comparison of the
836
identified clusters, both for six variables used for clustering, and for eight other variables collected but
837
not used for clustering. Unfortunately, the authors did not perform any scaling of variables used for
838
clustering. As stated in the paper, “the six clustering variables were (a) number of voids during daytime
839
hours, (b) number of voids during nighttime hours, (c) daytime modal output volume in milliliters, (d)
840
total 24-hour output volume in milliliters, (e) total 24-hour beverage intake in milliliters, and (f) BMI (a
841
variable that was speculated to be related to intake).” If the volume of related variables c, d, e were not
842
scaled but entered into clustering in milliliters (values can be as high as 500), then these variables will
843
provide the domineering contribution (compared with the number of daytime voids, typically < 20) to
844
the 6-dimensional Euclidean distance between clustered objects, and will serve as drivers determining
845
cluster membership. The contribution of these variables to the Euclidean distance would be much lower
846
if they were entered in liters instead of milliliters, which would change the cluster membership. This is
847
the problem indicated by Hair et al [62], with unscaled, unstandardized data of the inconsistency
848
between cluster solutions when the scale of some variables is changed, which is a strong argument in
849
favor of standardization. To our mind, the best solution of the problem is the use of the variables scaled
850
by comparison with controls, as we described in the Methods section. Proper scaling is especially
851
important when using heterogeneous data combining dimensionless and dimensional variables
852
measured in different units, as in [38], where frequencies, volumes (mL), and BMI units (kg/m2) are
853
combined.
854
A broader look outside the LUTS domain shows that previous data on subtyping other common
855
complex diseases provide different level of details on the clustering procedure and cluster evaluation as
856
well. Table 7 below summarizes methodological information reported in the clustering papers [31-33]
857
subtyping patients with asthma, diabetes, and sepsis, as well as in the LUTS papers [36-38] discussed
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
50
858
above. More details on methodological information reported in [31-33] are provided in Supplemental
859
Material text.
860
Table 7. Methodological information provided in the clustering papers.
Paper
Variables
Preprocessing Clustering
Number of
algorithm
clusters
determination
Moore
34 variables
No info on
Agglomerative Dendrogram
et al
derived from
scaling of 17
hierarchical
demonstrates 5-6
[31]
initial 726. Only
continuous
clustering,
groups. Five
list of 34 variables variables; 17
Wards linkage. clusters selected
provided.
composite
due to small size
variables
of the sixth
ranked 0-10.
group. No other
criteria.
Ahlqvist
et al
[32]
List of six
variables used for
clustering is
provided.
Five variables
standardized as
z-scores.
Presence of
glutamic acid
decarboxylase
antibodies
(GADA) binary
variable.
Variables
standardized as
z-scores.
Patients with
GADA grouped
into separate
cluster. Kmeans with
resampling for
patients
without GADA.
Coyne et List and
al [36]
description of 14
EPIC LUTS
questions used
for clustering is
provided.
Variables
scaled from 0
to 1.
Hall et al List and
[37]
description of 14
BACH LUTS
questions used
for clustering is
provided.
Variables
scaled from 0
to 1.
k-means
clustering with
split-half
randomization.
Values of k
scanned from 3
to 8.
Hierarchical
clustering and
k-means
clustering,
split-half
validation.
Seymour List of 29
et al
variables used for
[33]
clustering is
provided.
Resamplingbased
consensus kmeans
clustering.
Cluster evaluation
Omnibus tests (analysis
of variance [ANOVA],
Kruskal-Wallis, chisquare) used on
demographic, clinical,
medication use, health
care utilization, and
biomarker variables.
No pairwise
comparisons of
clusters.
Schwarz’s
Box plots comparing 5
Bayesian
continuous variables
criterion to
used for clustering.
determine
Pairwise comparisons
number of
of clusters for multiple
clusters k=4.
variables not used for
clustering. Cluster
validation in 3
independent cohorts.
Consensus matrix Pairwise comparison of
heat map. Area
variables clusters for
under the CDF
variables used and not
curve [34].
used for clustering.
Validation in an
independent cohort.
No names or
Descriptive statistics on
values for criteria variables both used
provided.
and not used for
clustering, but no tests
for significance.
Pseudo F and t2
statistics were
used to
determine
number of
clusters.
Omnibus tests (ANOVA,
chi-square) on
variables used and not
used for clustering. No
pairwise comparisons
of clusters.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
51
Miller et
al [38]
List of six
variables used for
clustering is
provided.
No explicit
information on
scaling.
Agglomerative
hierarchical
clustering,
Wards linkage.
However, values
of statistics are
not provided.
Dendrogram
demonstrates 3-4
groups. Three
groups selected
based on visual
examination of
separation on
canonical
variables plane.
No other criteria
provided.
Pairwise comparison of
clusters on six variables
used for clustering and
8 variables not used for
clustering. ANOVA for
continuous and chisquare test for
categorical data.
861
862
Thoughts on minimal requirements for clustering publications
863
As shown above, publications using clustering for disease subtyping provide different levels of
864
details on data preprocessing, clustering procedure, and cluster evaluation. In particular, information on
865
scaling and weighting of the variables, values of criteria used for selection of the number of clusters,
866
pairwise comparison of the clusters, and level of confidence in cluster membership are often not
867
provided. This missing or hard-to-find information is important for better understanding of the papers’
868
results, for comparison of the proposed phenotypes with previous and future classifications, and for
869
potential refinement. With this in mind, and guided by the principles of FAIR data (findability,
870
accessibility, interoperability, and reusability) [77-78], we think it is time for the clustering community to
871
develop minimum requirements for clustering reports (MICRo), similar to minimum information about a
872
proteomics experiment (MIAPE), developed by the proteomics community [79], and minimum
873
information about a microarray experiment (MIAME), developed by the transcriptomics community
874
[80].
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
52
875
Exact guidelines for the minimal requirements for clustering publications should result from the
876
clustering community discussion. Here, we would like to call for such discussion and propose the below
877
items that we believe are important for future guidelines:
878
1. Complete list of variables used for clustering.
879
2. Explicit information on scaling and weighting of clustering variables.
880
3. Clustering algorithm used (name of the function with options and parameter values, or code).
881
4. Exact definition of criteria used to determine the number of clusters. Values of criteria for the
882
selected and alternative number of clusters. Preferably, more than one criterion should be
883
presented.
884
5. Results of pairwise comparison of the clusters, with the indication of clinically meaningful and
885
significantly different variables in the pairwise comparison – for all variables used for clustering,
886
and for selected important variables not used for clustering (e.g., demographics).
887
888
889
6. Information on the level of confidence in cluster membership.
Limitations of the current study
Our paper carries some limitations. First, there are limitations in terms of the cohort, which
890
included only treatment-seeking and predominantly white participants. Our analysis only contains
891
women. Preliminary data analysis confirmed that sex is the major determinant of LUTS subtypes;
892
therefore, sex-specific clustering was performed. We previously published the results of urinary
893
symptom-based clustering of male participants [64]. Cluster refinement of male clusters, along the same
894
lines as described in the current paper, is underway; the resulting refined subtypes will be compared
895
with those found in the female cohort. Our control group used for scaling of the variables was relatively
896
small (55 participants) but commensurate with the size of identified clusters. Not all data elements
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
53
897
collected for the cases were available for the controls, so we used some literature data for general
898
population and bladder diary data from a different study (EPI).
899
Second, there are limitations in terms of data elements used for clustering. Genomics data were
900
not used so far. We do not expect that genomics data will produce dramatic effect on clustering results
901
since LUTS is a highly prevalent common disease, especially in older age. Nevertheless, genotyping of
902
the LURN participants is underway, which would allow for future cluster refinement by including the
903
binary single nucleotide polymorphism (SNP) data using our novel weighted Tanimoto indices approach.
904
Proteomics data are available for approximately 40% of the cohort and were not yet used for cluster
905
refinement. However, they were used to demonstrate the presence of multiple significantly different
906
proteins, indicating the refined symptom-based clusters are biochemically different.
907
Third, we developed methodology and a pipeline for integrating heterogeneous continuous and
908
categorical data for clustering women with LUTS. We cannot claim that this is the preferable
909
methodology for other data sets since data and research questions are different in different studies.
910
However, we explicitly described our preprocessing and clustering procedures, as well as criteria used
911
for determination of the number of clusters, cluster evaluation, and confidence level in cluster
912
membership. We compared our methodology with alternative approaches and demonstrated that our
913
methodology allows for combining heterogeneous continuous, categorical, and binary data, and that our
914
refined clusters are more distinct than the previous urinary symptom-based clusters. Detailed
915
description of the methodology, and comparison with the alternative approaches, allows interested
916
readers to decide if it is suitable for their data and research questions. Availability of the pipeline source
917
code allows for modifications, if needed.
918
919
Fourth, and most importantly, clinical significance of the identified clusters has yet to be
determined. We already demonstrated the distinctness of our clusters, but now we need to establish
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
54
920
their usefulness in clinical practice. This should be done through clinical trials, where treatments and
921
outcomes of patients classified into the identified clusters would be compared with the standard
922
treatment without the knowledge of cluster membership. Our preliminary analysis of multiple
923
differentially abundant serum proteins in women with LUTS corroborated that the identified clusters are
924
biochemically different. Further analysis of the affected biochemical pathways, and their longitudinal
925
dynamics as related to symptom trajectories, will follow with the goal to enhance understanding of
926
different etiologies of the identified subtypes and potentially establish more effective subtype-specific
927
treatments. Further analysis will also include determination of the minimal set of variables sufficient for
928
classification of patients with LUTS into identified subtypes in clinical practice.
929
CONCLUSION
930
A novel clustering pipeline for subtyping of common complex diseases and syndromes using
931
heterogeneous continuous and categorical data was developed. The advantages of scaling variables by
932
comparison with the controls without the disease of interest were discussed and illustrated by the
933
simulated example. The novel weighted Tanimoto indices approach to integrate multiple binary
934
variables into the clustering procedure was developed. A cluster refinement procedure using data
935
available only for the subset of participants through semi-supervised clustering was proposed. A novel
936
contrast criterion (CC) for resampling-based consensus clustering was proposed and compared with
937
existing criteria for consensus clustering, i.e., consensus score (SC) and proportion of ambiguously
938
clustered pairs (PAC). A simulated example demonstrated the advantages of CC over SC and PAC.
939
Information provided in the literature on subtyping common complex diseases and disorders
940
was reviewed and shown to be often incomplete, especially with regard to data preprocessing,
941
clustering procedures, and cluster evaluation. Suggestions for the minimum requirements for clustering
942
publications were formulated, and the community effort to work on creating such requirements
943
following the principles of FAIR data was called for.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
55
944
Five distinct clusters of women with LUTS were identified by using 185 variables, including
945
demographics, physical exam, LUTS and non-LUTS questionnaires, and bladder diary variables. The
946
quality of the clusters was evaluated using established criteria (Calinski-Harabasz, Davies-Bouldin, Dunn,
947
Point-Biserial, and Silhouette [22-26]), as well as novel contrast criterion (CC) and percentage of core
948
members of the clusters (PCC). Distinctiveness of the clusters was confirmed by multiple significantly
949
different variables in pairwise comparison of the clusters. Refined clusters W1-W5 were compared with
950
our previously published urinary symptom-based clusters F1-F4, and were shown to be more distinct by
951
having a higher percentage of significantly different variables and a higher percentage of the core
952
members of the clusters. Importantly, targeted proteomics data confirmed that our refined clusters
953
based on clinical data are biochemically different. Identification of the clinically and biochemically
954
distinct subtypes of LUTS has provided a foundation for studies of subtype-specific etiologies and
955
treatments.
956
957
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
56
958
959
960
961
962
963
ACKNOWLEDGMENTS
Heather Van Doren, Senior Medical Editor with Arbor Research Collaborative for Health,
provided editorial assistance on this manuscript.
This is publication number 28 of the Symptoms of Lower Urinary Tract Dysfunction Research
Network (LURN).
This study is supported by the National Institute of Diabetes & Digestive & Kidney Diseases
964
through cooperative agreements (grants DK097780, DK097772, DK097779, DK099932, DK100011,
965
DK100017, DK099879) and for Dr. Andreev’s Biomarker Ancillary LURN R01 (grant 5R01DK125251).
966
Research reported in this publication was supported at Northwestern University, in part, by the
967
National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number
968
UL1TR001422. The content is solely the responsibility of the authors and does not necessarily represent
969
the official views of the National Institutes of Health.
970
The following individuals were instrumental in the planning and conduct of this study at each of
971
the participating institutions:
972
Duke University, Durham, North Carolina (DK097780): PIs: Cindy Amundsen, MD, Eric Jelovsek, MD; Co-
973
Is: Kathryn Flynn, PhD, Todd Harshbarger, PhD, Jim Hokanson, PhD, Aaron Lentz, MD, Michelle O’Shea,
974
MD, David Page, PhD, Nazema Siddiqui, MD, Kevin Weinfurt, PhD Lisa Wruck, PhD; Study Coordinators:
975
Yasmeen Bruton, Paige Green, Folayan Morehead
976
University of Iowa, Iowa City, IA (DK097772): PIs: Catherine S Bradley, MD, Karl Kreder, MD, MBA, MSCE;
977
Co-Is: Bradley A. Erickson, MD, MS, Daniel Fick, MD, Vince Magnotta, PhD, Philip Polgreen, MD, MPH;
978
Study Coordinators: Mary Eno, Sarah Heady, Chelsea Poesch
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
57
979
Northwestern University, Chicago, IL (DK097779): PIs: James W Griffith, PhD, Kimberly Kenton, MD, MS,
980
Brian Helfand, MD, PhD; Co-Is: Carol Bretschneider, MD, David Cella, PhD, Sarah Collins, MD, Julia
981
Geynisman-Tan, MD, Alex Glaser, MD, Christina Lewicky-Gaupp, MD, Margaret Mueller, MD; Study
982
Coordinators: Sylwia Clarke, Melissa Marquez, Pooja Sharma, Michelle Taddeo, Pooja Talaty. Dr. Helfand
983
and Ms. Talaty are at NorthShore University HealthSystem.
984
University of Michigan Health System, Ann Arbor, MI (DK099932): PI: J Quentin Clemens, MD, FACS,
985
MSCI; Co-Is: John DeLancey, MD, Dee Fenner, MD, Rick Harris, MD, Steve Harte, PhD, Anne P. Cameron,
986
MD, Aruna Sarma, PhD, Giulia Lane, MD; Study Coordinators: Ashly Chimner, Linda Drnek, Emma Keer,
987
Marissa Moore, Greg Mowatt, Sarah Richardson
988
University of Washington, Seattle Washington (DK100011): PI: Claire Yang, MD; Co-I: Anna Kirby, MD;
989
Study Coordinators: Lois Meryman, Brenda Vicars, RN
990
Washington University in St. Louis, St. Louis Missouri (DK100017): PI: H. Henry Lai, MD; Co-Is: Gerald L.
991
Andriole, MD, Joshua Shimony, MD, PhD; Study Coordinators: Linda Black, Vivien Gardner, Patricia
992
Hayden, Diana Wolff, Aleksandra Klim, RN, MHS, CCRC
993
Arbor Research Collaborative for Health, Data Coordinating Center (DK099879): PI: Robert Merion, MD,
994
FACS; Co-Is: Victor Andreev, PhD, DSc, Brenda Gillespie, PhD, Abigail Smith, PhD; Project Manager:
995
Melissa Fava, MPA, PMP; Clinical Monitor: Melissa Sexton, BA, CCRP; Research Analysts: Margaret
996
Helmuth, MA, Jon Wiseman, MS, Jane Liu, MPH; Project Associate: Levi Hurley
997
National Institute of Diabetes and Digestive and Kidney Diseases, Division of Kidney, Urology, and
998
Hematology, Bethesda, MD: Project Scientist: Ziya Kirkali MD; Project Officer: Christopher Mullins PhD;
999
Project Advisor: Julie Barthold, MD.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
58
1000
REFERENCES
1001
1002
1. Schadt EE, Lum PY. Reverse engineering gene networks to identify key drivers of complex
disease phenotypes. J Lipid Res. 2006;47:2601-2613.
1003
1004
2. Becker KG. The common variants/multiple disease hypothesis of common complex genetic
disorders. Medical Hypothesis. 2004;62:309-317.
1005
1006
3. Relton CL, Davey Smith G. Epigenetic epidemiology of common complex disease: prospects for
prediction, prevention, and treatment. PLoS Med. 2010;7(10):e1000356.
1007
1008
1009
4. Coyne KS, Sexton CC, Thompson CL, Milsom I, Irwin D, Kopp ZS, et al. The prevalence of lower
urinary tract symptoms (LUTS) in the USA, the UK and Sweden: results from the Epidemiology of
LUTS (EpiLUTS) study. BJU Int. 2009;104(3):352-360.
1010
1011
1012
5. Irwin DE, Milsom I, Hunskaar S, Reilly K, Kopp Z, Herschorn S, et al. Population-based survey of
urinary incontinence, overactive bladder, and other lower urinary tract symptoms in five
countries: results of the EPIC study. Eur Urol. 2006;50(6):1306-1314; discussion 1314-1305.
1013
1014
6. Litman HJ, McKinlay JB. The future magnitude of urological symptoms in the USA: projections
using the Boston Area Community Health survey. BJU Int. 2007;100(4):820-825.
1015
1016
1017
7. Coyne KS, Wein A, Nicholson S, Kvasz M, Chen CI, Milsom I. Economic burden of urgency urinary
incontinence in the United States: a systematic review. J Manag Care Pharm. 2014;20(2):130140.
1018
8. Liao TW. Clustering of time series data -a survey. Pattern Recognit. 2005;38:1857-1874.
1019
9. Duda RO, Hart PE, Stork DG. Pattern classification. 2nd Ed. New York: Wiley; 2001.
1020
1021
1022
10. Robotti E, Manfredi M, Marengo E. Biomarkers discovery through multivariate statistical
methods: A review of recently developed methods and applications in proteomics. J Proteomics
Bioinform. 2014;S3:003.
1023
1024
11. Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of
clusters in a dataset. Genome Biology. 2002;3(7):1-21.
1025
1026
12. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning, statistics. New York:
Springer; 2001.
1027
13. Wang L. Heterogeneous data and big data analytics. Autom Cont Inf Sci. 2017;3(1):8-15.
1028
14. Jain AK, Dubes RC. Algorithms for clustering data. Englewood Cliffs, NJ: Prentice Hall; 1988.
1029
1030
1031
15. Giancarlo R, Scaturro D, Utro F. ValWorkBench: An open source Java library for cluster
validation, with applications to microarray data analysis. Comp Meth Prog Biomed.
2015;118:207-217.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
59
1032
1033
16. Hartigan JA, Wong MA. Algorithm AS 136: A K-means clustering algorithm. J Royal Stat Soc Series
C (Applied Statistics). 1970;28(1):100-108.
1034
1035
17. Day WHE, Edelsbrunner H. Efficient algorithms for agglomerative hierarchical clustering
methods. J Classif. 1984;1:7-24.
1036
18. Kohonen T. Self-organizing maps. Information sciences. Berlin: Springer; 1997.
1037
1038
19. Lum PY, Singh G, Lehman A, Ishkanov T, Vejdemo-Johansson M, Alagappan M, et al. Extracting
insights from the shape of complex data using topology. Sci Rep. 2013;3:1236.
1039
1040
1041
20. Bae E, Bailey J. COALA: A novel approach for the extraction of an alternate clustering of high
quality and high dissimilarity. International Conference on Data Mining 2006. Los Alamitos, CA,
USA. IEEE Computer Society: 53-62.
1042
1043
21. Ramoni M, Sebastiani P, Cohen P. Multivariate clustering by dynamics. Proceedings of the 2000
National Conference on Artificial Intelligence (AAAI-2000). San Francisco, CA: 633-638.
1044
22. Calinski T, Harabasz J. A dendrite method for cluster analysis. Comm Statistics. 1974;3:1-27.
1045
1046
23. Davies DL, Bouldin DW. A cluster separation measure. IEEE Trans Pattern Anal Mach Intell.
1979;1:224-227.
1047
1048
24. Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap
statistic. J Royal Stat Society Series B. 2001;63:411-423.
1049
1050
25. Rouseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis.
J Computational Applied Mathematics. 1987;20:53-65.
1051
1052
26. Rand WM. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc.
1971;66(336):846-850.
1053
1054
1055
27. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ. Broad patterns of gene
expression revealed by clustering analysis of tumor and normal colon tissues probed by
oligonucleotide arrays. Proc Natl Acad Sci. 1999;96:6745-6750.
1056
1057
1058
28. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh M, Downing
J, Caligiuri M, Bloomfield C, Lander E. Molecular classification of cancer: Class discovery and class
prediction by gene expression. Science. 1999;286(5439):531-537.
1059
1060
1061
29. Hayes DN, Monti S, Parmigiani G, Gilks CB, Naoki K, Bhattacharjee A, et al. Gene expression
profiling reveals reproducible human lung adenocarcinoma subtypes in multiple independent
patient cohorts. J Clin Oncol. 2006;24:5079-5090.
1062
1063
1064
30. Lapointe J, Li C, Higgins JP, van de Rijn M, Bair E, Montgomery K, et al. Gene expression profiling
identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci. USA 2004;101:811816.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
60
1065
1066
1067
31. Moore WC, Meyers DA, Wenzel SE, Teague WG, Li H, Li X, et al. Identification of asthma
phenotypes using cluster analysis in the severe asthma research program. Am J Respir Crit Care
Med. 2010;181:315-323.
1068
1069
1070
32. Ahlqvist E, Storm P, Käräjämäki A, Martinell M, Dorkhan M, Carlsson A, et al. Novel subgroups of
adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six
variables. Lancet Diabetes Endocrinol. 2018;6:361-369.
1071
1072
1073
33. Seymour CW, Kennedy JN, Wang S, Chang CCH, Elliott CF, Xu Z, et al. Derivation, validation, and
potential treatment implications of novel clinical phenotypes for sepsis. JAMA.
2019;321(20):2003-2017.
1074
1075
1076
34. Monti S, Tamayo P, Meserov J, Golub T. Consensus clustering: A resampling-based method for
class discovery and visualization of gene expression microarray data. Mach Learn 2003;52:91118.
1077
1078
1079
35. Locke Jr K, Lai HH, Pontari MA, Clemens JQ, Kreder KJ, Krieger JN, et al. Discovery, validation,
and novel visualization of subgroups in urologic chronic pelvic pain syndrome (UCPPS):
Consensus clustering findings from the MAPP Research Network. J Urol. 2020;203(4S):e104.
1080
1081
36. Coyne KS, Matza LS, Kopp ZS, Thompson C, Henry D, Irwin DE, et al. Examining lower urinary
tract symptom constellations using cluster analysis. BJU Int. 2008;101(10):1267-1273.
1082
1083
1084
37. Hall SA, Cinar A, Link CL, Kopp ZS, Roehrborn CG, Kaplan SA, et al. Do urological symptoms
cluster among women? Results from the Boston Area Community Health Survey. BJU Int.
2008;101(10):1257-1266.
1085
1086
1087
38. Miller JM, Guo Y, Rodseth SB. Diary data subjected to cluster analysis of intake/output/void
habits with resulting clusters compared by continence status, age, race. Nurs Res.
2011;60(2):115-123.
1088
1089
39. Yang CC, Weinfurt KP, Merion RM, Kirkali Z, Lurn Study Group. Symptoms of Lower Urinary Tract
Dysfunction Research Network. J Urol. 2016;196(1):146-152.
1090
1091
1092
40. Cameron AP, Lewicky-Gaupp C, Smith AR, Helfand BT, Gore JL, Clemens JQ, et al. Baseline lower
urinary tract symptoms in patients enrolled in LURN: a prospective, observational cohort study. J
Urol. 2018;199(4):1023-1031.
1093
1094
1095
41. Andreev VP, Gang L, Yang CC, Smith AR, Helmuth ME, Wiseman JB, et al. Symptom-based
clustering of women in the Symptoms of Lower Urinary Tract Dysfunction Research Network
(LURN) observational cohort study. J Urol. 2018;200(6):1323-1331.
1096
1097
1098
42. Coyne KS, Sexton CC, Kopp Z, Chapple CR, Kaplan SA, Aiyer LP, et al. Assessing patients’
descriptions of lower urinary tract symptoms (LUTS) and perspectives on treatment outcomes:
results of qualitative research. Int J Clin Pract. 2010;64(9):1260-1278.
1099
1100
1101
43. Coyne KS, Barsdorf AI, Thompson C, Ireland A, Milsom I, Chapple C, et al. Moving towards a
comprehensive assessment of lower urinary tract symptoms (LUTS). Neurourol Urodyn.
2012;31(4):448-454.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
61
1102
1103
1104
44. Barry M, Fowler F Jr, O’Leary M, Bruskewitz RC, Holtgrewe HL, Mebust WK, et al. The American
Urological Association symptom index for benign prostatic hyperplasia. The Measurement
Committee of the American Urological Association. J Urol. 1992;148(5):1549-1557.
1105
1106
1107
45. Cheng J, Leng M, Li L, Zhou H, Chen X. Active semi-supervised community detection based on
must-link and cannot-link constraints. PLoS ONE. 2014;9(10): e110088.
doi:10.1371/journal.pone.0110088.
1108
1109
1110
46. Li Z, Liu J, Tang X. Pairwise constraint propagation by semidefinite programming for semisupervised classification. Proceedings of the 25th International Conference on Machine
Learning; Helsinki, Finland: 2008.
1111
1112
47. Șenbabaoğlu Y, Michailidis G, Li JZ. Critical limitations of consensus clustering in class discovery.
Sci Rep. 2014;4:6207.
1113
1114
1115
48. Ku JH, Jeong IG, Lim DJ, Byun S, Paick JS, Oh SJ. Voiding diary for the evaluation of urinary
incontinence and lower urinary tract symptoms : Prospective assessment of patient compliance
and burden. Neurourol Urodyn. 2004;23(4):331-335.
1116
1117
1118
49. Spiegel BM, Hays RD, Bolus R, Melmed GY, Chang L, Whitman C, et al. Development of the NIH
Patient-Reported Outcomes Measurement Information System (PROMIS) gastrointestinal
symptom scales. Am J Gastroenterol. 2014;109(11):1804-1814.
1119
1120
1121
50. Pilkonis PA, Choi SW, Reise SP, Stover AM, Riley WT, Cella D, et al. Item banks for measuring
emotional distress from the Patient-Reported Outcomes Measurement Information System
(PROMIS®): depression, anxiety, and anger. Assessment. 2011;18(3):263-283.
1122
1123
51. Cohen S, Kamarck T, Mermelstein R. A global measure of perceived stress. J Health Soc Behav.
1983;24(4):385-396.
1124
1125
1126
52. Yu L, Buysse DJ, Germain A, Moul DE, Stover A, Dodds NE, et al. Development of short forms
from the PROMISTM sleep disturbance and sleep-related impairment item banks. Behav Sleep
Med. 2011;10(1):6-24.
1127
1128
1129
53. Clemens JQ, Calhoun EA, Litwin MS, McNaughton-Collins M, Kusek JW, Crowley EM, et al.
Validation of a modified National Institutes of Health chronic prostatitis symptom index to
assess genitourinary pain in both men and women. Urology. 2009;74(5):983-987.
1130
1131
1132
54. Barber MD, Chen Z, Lukacz E, Markland A, Wai C, Brubaker L, et al. Further validation of the
short form versions of the Pelvic Floor Distress Inventory (PFDI) and Pelvic Floor Impact
Questionnaire (PFIQ). Neurourol Urodyn. 2011;30(4):541-546.
1133
1134
1135
55. Bump RC, Mattiasson A, Bø K, Brubaker LP, DeLancey JO, Klarskov P, Shull BL, Smith AR. The
standardization of terminology of female pelvic organ prolapse and pelvic floor dysfunction. Am
J Obstet Gynecol. 1996;175(1):10-17.
1136
1137
56. Groll DL, To T, Bombardier C, Wright JG. The development of a comorbidity index with physical
function as the outcome. J Clin Epidemiol. 2005;58(6):595-602.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
62
1138
1139
1140
57. Cameron AP, Wiseman JB, Smith AR, Merion RM, Gillespie BW, Bradley CS, et al. Are three-day
voiding diaries feasible and reliable? Results from the Symptoms of Lower Urinary Tract
Dysfunction Network (LURN) cohort. Neurourol Urodyn. 2019;38(8):2185-2193.
1141
1142
1143
58. DeLancey JO, Fenner DE, Guire K, Patel DA, Howard D, Miller JM. Differences in continence
system between community-dwelling black and white women with and without urinary
incontinence in the EPI study. Am J Obstet Gynecol. 2010;202(6):584.e1-584.e12.
1144
1145
59. Benjamini Y, Hochberg Y. Controlling the false discovery rate – a practical and powerful
approach to multiple testing. J Royal Statistical Society, Ser B. 1995;57(1):289-300.
1146
1147
1148
60. Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P. A multivariate technique for
multiply imputing missing values using a sequence of regression models. Survey Methodol.
2001;27(1):85-95.
1149
1150
1151
61. Raghunathan TE, Solenberger PW, Berglund P, Van Hoewyk J. IVEware: Imputation and variance
estimation software. Ann Arbor: University of Michigan, Institute for Social Research, Survey
Research Center. 2000.
1152
1153
62. Hair JR, Anderson RE, Tatham RL, Black WC. Multivariate data analysis. Prentice- Hall Inc: Upper
Saddle River, NJ; 1998.
1154
1155
1156
63. Andreev VP, Gillespie BW, Helfand BT, Merion RM. Misclassification errors in unsupervised
classification methods. Comparison based on the simulation of targeted proteomics data. J
Proteomics Bioinform. 2016;S14:005. doi:10.4172/jpb.S14-005.
1157
1158
1159
64. Liu G, Andreev VP, Helmuth ME, Yang CC, Lai HH, Smith AR, et al. Symptom-based clustering of
men in the Symptoms of Lower Urinary Tract Dysfunction Research Network (LURN)
observational cohort study. J Urol. 2019; 202(6):1230-1239.
1160
1161
65. Huang Z. Extensions to the k-means algorithm for clustering large data sets with categorical
values. Data Mining and Knowledge Discovery. 1998;2(3):283-304.
1162
1163
66. Ng MK, Li MJ, Huang JZ, He Z. On the impact of dissimilarity measure in k-modes clustering
algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007;29(3):503-507.
1164
1165
1166
67. Szepannek G, Aschenbruck R. k-prototypes clustering for mixed variable-type data. CRAN
Repository 2021. Available at: https://cran.rproject.org/web/packages/clustMixType/clustMixType.pdf. Accessed 7/16/21.
1167
1168
1169
1170
68. SAS clustering action set: Clustering with the k-prototypes algorithm. SAS visual statistics
programming guide. Available at:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/casactstat/casactstat_clustering_ex
amples06.htm. Accessed 7/16/21.
1171
1172
69. Rogers DJ, Tanimoto TT. A computer program for classifying plants. Science.
1960;132(3434):1115-1118.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
63
1173
1174
1175
1176
70. Bilenko M, Basu S, Mooney RJ. Integrating constraints and metric learning in semi-supervised
clustering. Proceedings of the 21st International Conference on Machine Learning (ICML). Banff,
Canada. July 2004. Available at: https://www.cs.utexas.edu/~ml/papers/semi-icml-04.pdf.
Accessed 8/6/21.
1177
1178
71. Huang H, Cheng Y, Zhao R. A semi-supervised clustering algorithm based on must-link set. In C
Tang et al (Eds). ADMA 2008;LNAI 5139:492-499.
1179
1180
1181
72. Siddiqui NY, Helfand BT, Andreev VP, Kowalski JT, Bradley MS, Lai HH, et al. Biomarkers
implicated in lower urinary tract symptoms: systematic review and pathway analyses. J Urol.
2019;202(5):880-889.
1182
1183
73. Syan R, Brucker BM. Guideline of guidelines: urinary incontinence. BJU International.
2016;117(1):20-33.
1184
1185
74. AUA (American Urological Association) Guidelines. Available at:
https://www.auanet.org/guidelines. Accessed 7/22/21.
1186
1187
1188
75. DeLancey JO, Trowbridge ER, Miller JM, Morgan DM, Guire K, Fenner DE, et al. Stress urinary
incontinence: relative importance of urethral support and urethral closure pressure. J Urol.
2008;179(6);2286-2290.
1189
1190
1191
76. Rosen RC, Coyne KS, Henry D, Link CL, Cinar A, Aiyer LP, et al. Beyond the cluster:
methodological and clinical implications in the Boston Area Community Health survey and EPIC
studies. BJU Int. 2008;101(10):1274-1278.
1192
1193
77. Go FAIR. FAIR (Findable, Accessible, Interoperable, Reusable) principals for scientific data.
Available at: https://www.go-fair.org/fair-principles/. Accessed 8/18/21.
1194
1195
1196
78. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR
guiding principles for scientific data management and stewardship. Scientific Data.
2016;3:160018.
1197
1198
79. Taylor CF, Paton NW, Lilley KS, Binz PA, Julian RK, Jr., Jones AR, et al. The minimum information
about a proteomics experiment (MIAPE). Nat Biotechnol. 2007;25(8):887-893.
1199
1200
1201
80. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, et al. Minimum
information about a microarray experiment (MIAME)—toward standards for microarray data.
Nat Genet. 2001;29(4):365-371.
1202
1203
1204
81. Utomo E, Blok BFM, Steensma AB, Korfage IJ. Validation of the pelvic floor distress inventory
(PFDI-20) and pelvic floor impact questionnaire (PFIQ-7) in a Dutch population. Int Urogynecol J.
2014;25(4):531-544.
1205
1206
82. Cohen S. Perceived stress scale. Mind Garden 1994. Available at:
http://www.mindgarden.com/documents/PerceivedStressScale.pdf. Accessed 8/6/21.
1207
1208
83. PROMIS scoring manuals. HealthMeasures. Available at:
http://www.healthmeasures.net/promis-scoring-manuals. Accessed 8/6/21.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
64
1209
SUPPORTING INFORMATION
1210
See separate Supplemental Materials document.
1211
SUPPLEMENTAL MATERIAL TEXT (Section Headings)
1212
Simple example illustrating the benefits of scaling by controls.
1213
Contrast criterion (CC): special cases.
1214
Contrast criterion and proportion of ambiguous clustering: simulation example.
1215
Details on the targeted proteomics study of serum samples of women with LUTS versus controls.
1216
Comparison of the identified clusters W1-W5 with the conventional classification, subtypes, and clusters
1217
of women with LUTS identified in literature.
1218
Details on the methodological information provided in the clustering papers [31-33].
1219
SUPPLEMENTAL TABLES (Captions)
1220
Supplemental Table S1. Overview of the variables used for clustering of 545 women with lower urinary
1221
tract symptoms (LUTS) in comparison with the same variables for non-LUTS controls.
1222
Supplemental Table S2. Means, standard deviations, pairwise distances, and misclassification errors
1223
resulting from the use of three scaling approaches (unscaled – U, scaled as cohort’s z-scores – C, and
1224
scaled by healthy controls – N).
1225
Supplemental Table S3. Lists of differentially abundant proteins observed in the serum samples of 230
1226
women with LUTS versus controls. Proteins with false discovery rate (FDR) adjusted p-value<0.05 are
1227
bolded.
1228
Supplemental Table S3A. All 230 women with LUTS.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
65
1229
Supplemental Table S3B. W1 cluster (n=37).
1230
Supplemental Table S3C. W2 cluster (n=38).
1231
Supplemental Table S3D. W3 cluster (n=53).
1232
Supplemental Table S3E. W4 cluster (n=42).
1233
Supplemental Table S3F. W5 cluster (n=60).
1234
SUPPLEMENTAL FIGURES (Captions)
1235
Supplemental Figure S1. Simulated example 1, illustrating distributions of two variables describing
1236
patients in two disease subtypes and healthy controls. Upper row – X1, lower row X2. Left column
1237
unscaled, center z-scored, right scaled by controls.
1238
Supplemental Figure S2. Simulated example A. Non-overlapping transcript signatures. 40,000
1239
transcripts. Five groups of people. Means, correlation matrix, and single instance of simulated
1240
distributions of transcript abundances.
1241
Supplemental Figure S3. Simulated example A. Consensus matrices, K=2,3…8. CC vs. PAC. Effect size=2.
1242
Supplemental Figure S4. Simulated example A. Consensus matrices, K=2,3…8. CC vs. PAC. Effect
1243
size=0.6.
1244
Supplemental Figure S5. Simulated example A. Misclassification error vs. effect size. Comparison of
1245
consensus clustering using contrast criterion with k-means and hierarchical clustering using Calinski-
1246
Harabasz criterion. Three values of correlation coefficient for transcript abundances within transcript
1247
signatures.
medRxiv preprint doi: https://doi.org/10.1101/2021.09.17.21263124; this version posted September 22, 2021. The copyright holder for this
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
66
1248
Supplemental Figure S6. Simulated example A1. Same conditions as in simulated examples A, except
1249
transcript abundances being correlated following 𝑹𝑹𝒊𝒊𝒊𝒊 = 𝒓𝒓| 𝒊𝒊−𝒊𝒊| .
1250
Supplemental Figure S7. Simulated example B. Overlapping transcript signatures. Consensus matrices,
1251
K=2,3…8. CC vs. PAC. Effect size=2.
1252
Supplemental Figure S8. Simulated example B. Overlapping transcript signatures. Consensus matrices,
1253
K=2,3…8. CC vs. PAC. Effect size=0.8.
1254
Supplemental Figure S9. Simulated example B. Misclassification error vs. effect size. Comparison of
1255
consensus clustering using contrast criterion with k-means and hierarchical clustering using Calinski-
1256
Harabasz criterion. Three values of correlation coefficient for transcript abundances within transcript
1257
signatures.
1258
Supplemental Figure S10. Simulated example B1. Same conditions as in simulated examples B, except
1259
transcript abundances being correlated following 𝑹𝑹𝒊𝒊𝒊𝒊 = 𝒓𝒓| 𝒊𝒊−𝒊𝒊| .
1260