croarray-based gene expression platform according to manufacturer’s instructions. Total RNA quality was evaluated using the Agilent RNA 6000 Pico Kit with the Bioanalyzer 2100. Total RNAs were amplified and labeled using OneColor Low Input Quick Amp Labeling Kit in one run to minimize batch effects. Complementary RNA samples were hybridized onto SurePrint G3 Human Gene Expression 8x60K v2 arrays for 17 h at 65C in a rotator oven, followed by washing with Wash Buffers. A randomized design was used to avoid biases. After washing, the slides were scanned using a Model G2505C Microarray Scanner and the hybridization signals were extracted using the Agilent Feature Extraction software, version 10.7.3.1. Microarray sample processing Microarray gene expression values were calculated using the gProcessed Signal, which was normalized via log2 transformation and quantile normalization in Partek Genomics SuiteTM. Subject samples 26574517 were randomized across arrays so that batch effects would be independent of subject ID and sample location. Accordingly, the batch effect was removed for downstream analysis using the “Remove Batch Effect Function”. Over-expression analysis was performed using 44 samples. Differential expression analysis was performed using 41 samples, with technical replicates removed to eliminate the possibility that the Scutellarein variation within groups would be artificially decreased. Microarray data have been submitted to the GEO repository. doi: 10.1371/journal.pone.0077340.g001 Over-expression analysis A gene isoform based upon Agilent probe design was defined as “over-expressed” if normalized expression levels were above 3.5, and “under-expressed” if the levels were below 3.5. This is the median expression level for the sample distribution, which also corresponds to the change in slope in the sample histogram. All samples show identical signal distributions because the data were quantile-normalized. Counts and 12876198 average expression values were calculated based upon over-expression status per sample. Expression values were also averaged per subject and counts were calculated per subject using the same threshold for over-expression. If a random variable is assumed to show above-median expression levels 50% of the time, then the observed proportion of high or low expression can be expressed as a proportion. Genes with above-median expression in all 6 included an on-column DNase digestion with the RNase-free DNase kit. The total RNAs were eluted once with 50 L of RNase-free water. Initial RNA purity was assessed by diluting 4 L samples with 146 L of RNase-free water and measuring the OD 260 nm/280 nm ratio in a 96-well format using a SpectraMax Plus Absorbance Microplate Reader. Samples were stored and transported at 3 Molecular Transporters in the Human Vaginal Tract doi: 10.1371/journal.pone.0077340.g002 subjects therefore would have an over-expression proportion of 100%. Using the prop.test function in R, genes with expression in either all 6 subjects or 0 subjects vary from a proportion of 0.5 with a P-value of 0.041. Microarray gene expression cross-validation by RTqPCR Membrane transporter transcript expression profiling was carried out by reverse transcription-quantitative polymerase chain reaction in a custom 96-well array, CAPH11899, containing 10 membrane transporter primer assays identified for cross-validation. The RT-qPCR array also contained primers for 3 housekeeping genes as well as quality controls. Total RNAs from the sample pool