descargar 154.67 Kb.
CLOZUK: Patients taking clozapine provide regular blood samples to allow early detection of adverse effects of that treatment. Through collaboration with Novartis, the manufacturer of a proprietary form of clozapine (Clozaril), we acquired blood from people with schizophrenia who were taking the drug via the central processing labs of a clozapine blood monitoring service. After the samples had been used to complete the necessary clinical tests, unused fractions were sent to Tepnel Life Sciences (Paisley, UK) for DNA extraction. Samples were anonymous, only basic demographic and diagnostic details being made available. Subjects (71% male) were white UK residents, aged 18-90 with a recorded diagnosis of treatment resistant schizophrenia according to the clozapine registration forms completed by treating psychiatrists. In the UK, treatment resistant schizophrenia implies a lack of satisfactory clinical improvement to adequate trials of at least two other antipsychotics.
Bipolar Samples: Of the bipolar cases used in the present study, roughly half have not been used previously in genetic studies of bipolar disorder. Cases met criteria for DSM IV Bipolar I disorder, Bipolar II disorder or Schizoaffective disorder, bipolar type. Cases were excluded if they: (i) had only experienced mood or psychotic illness as a result of alcohol or substance dependence; (ii) had a history of intravenous drug use; (iii) had experienced mood or psychotic illness secondary to medical illness or medication; or (iv) were biologically related to another study participant. The following methodology was used for assessment of bipolar cases: a semi-structured lifetime psychiatric interview (Schedules for Clinical Assessment in Neuropsychiatry),1 a review of the available case notes and completion of the Operational Criteria (OPCRIT) checklist of items of psychopathology2 which has been shown to be valid in studies of mood disorders,3 followed by clinical ratings and a best-estimate lifetime diagnosis according to the Diagnostic and Statistical Manual of the American Psychiatric Association (DSM-IV) criteria.4 In cases where there was doubt as to the best-estimate lifetime diagnosis, diagnostic and clinical ratings were made by at least two members of the research team blind to each other’s ratings.
Controls: The sample comprised 4 539 WTCCC2 control individuals who were not screened for psychiatric illness. They came from two sources (a) the UK 1958 birth cohort longitudinal epidemiological sample and (b) the UK Blood Donor Service. It has previously been shown that it is valid to combine these two control samples for use as controls in genetic association studies using UK disease samples.5
The Immunochip was designed to allow low cost analysis of genes implicated in diseases in which altered autoimmune function played a role. During the design phase, WTCCC2 investigators had the opportunity to place additional SNPs on the chip and these included SNPs of interest that had been identified by the schizophrenia group of the PGC.6 Genotypes were called centrally using GenoSNP software;7 those with a call probability <85% were scored as missing. Following preliminary quality control assessment (QC) by the central analysis group, data for 192 402 autosomal SNPs and 1 561 SNPs on the X chromosome were available on 2 889 schizophrenia, 2 893 bipolar, and 4 539 control individuals.
Analysis of the genotype data
Quality Control Assessment
Unless stated otherwise, data management and QC assessments were performed with PLINK (v1.07)8 and a series of shell scripts. Figures were produced in R (v2.11.1). SNPs with a minor allele frequency (MAF) <0.5%, Hardy Weinberg Equilibrium p<0.00001, or >3% missing data in either cases or controls, were excluded. SNPs were also excluded if the difference in case and control percentage missing exceeded 1%. Samples were excluded if they had >2% missing data. As the Immunochip contains, by design, dense sets of often very highly correlated SNPs at a limited number of loci, a set of approximately 43 000 SNPs in relative linkage equilibrium (r2<0.5 within a window of 50 SNPs after excluding regions of known high LD, e.g. the MHC region) were identified for QC assessment as follows. Individuals that were excessively heterozygous or homozygous for these SNPs (abs(Fhet)>0.035) were excluded. For all pairs of individuals, the proportion of identify-by-descent was estimated and one member of each pair was removed if this exceeded 0.1 to avoid including individuals who were apparently related. The individual genotypes for these SNPs were then merged with those of the 210 HapMap individuals of European (Caucasians from Utah), African (Yoruba) and Asian (Japanese and Chinese) origin. Principal Component Analysis (PCA) was performed with Eigenstrat9, 10 on the combined sample and individual outliers that did not cluster near to the Hapmap European individuals were removed to maximise the ethnic homogeneity of our sample. X-chromosome data enabled us to confirm the reported sex assignment. Three individuals were excluded based on incorrect gender assignment.
Identify individuals that have been included in studies published previously
We used the genetic data to identify duplicates and individuals closely related to those included in our previous studies.5, 11 Using a subset of the 43 000 SNPs (~6 000 for CLOZUK, ~9 000 for bipolar and controls) we calculated the identify-by-descent of all Immunochipped subjects and those we had reported GWAS data on. Duplicates and closely related individuals (equivalent to half siblings or closer) were removed from the Immunochip datasets.
Case-control association analyses were performed using Cochran Armitage Trend tests. To explore potential effects of stratification, we derived genomic control and 100012 estimates from the set of 43 000 SNPs in relative linkage equilibrium. Combinations of the first 4 principal components from Eigenstrat were included as covariates in a logistic regression analysis (additive model). None of the principal components had a noticeable effect on or the distribution of the p-values as presented in quantile-quantile (Q-Q) plots. To adjust for the slight inflation of the test statistics relative to a null distribution, we adjusted the chi-square statistics from all SNPs on the Immunochip with estimated from the ~43 000 SNPs in relative linkage equilibrium. The used for the adjustment and the post-QC 1000 estimates for the whole unpruned Immunochip datasets are shown in Table S1 and post QC Q-Q plots are given in Figure S1.
To examine whether the risk alleles identified by the PGC showed an overall enrichment for association with schizophrenia risk in our data, we performed a sign test using an exact binomial test which is necessarily one-tailed since the alternative to the null hypothesis is that the PGC risk alleles are enriched in cases at more of the SNPs than expected by chance. Fisher’s method for combining p-values (one-tailed for the same reason) was employed to get an overall measure of association. To compare the effect sizes across data sets, we compared the loge odds ratios (loge ORs) using paired T-tests.
The method of Bowden & Dudbridge13 was used to obtain an approximately unbiased estimate of the true effect size of each SNP by combining the ORs from PGC Stages 1 & 2. This method allows for the inflation of the ORs that occurs as a result of selecting the most significant SNPs from the GWAS. The resulting estimates of effect size were similar to those from PGC Stage 2. In our sample, the power to replicate ranged from 1.1x10-4 to 0.99 with a mean of 0.29. Note that, since our replication test is one-sided in the direction of PGC Stage 1, it is possible to obtain estimates of power <0.05 when the PGC stage 2 OR is in a different direction from the PGC stage 1 OR. The actual proportion of true replications was estimated by maximum likelihood.
For SNP i, let Pi be the power to replicate at significance level α (=0.05 in this study) given the odds ratio estimated from the PGC data. Define a binary variable Xi = 1 if SNP i replicated in the CLOZUK sample, and Xi=0 if it did not.
Denote the (unknown) probability that SNP i is a true positive signal by p (0≤p≤1).
The probability that SNP i replicates in the CLOZUK sample (i.e. Xi =1) is (p Pi + (1-p)α), and the probability that it does not replicate (Xi =0) is (p(1- Pi)+(1-p)(1-α)).
Thus, the likelihood of the observed replication status of SNP i is given by
(p Pi + (1-p)α) Xi (p(1- Pi)+(1-p)(1-α))(1- Xi)
and the likelihood of the whole set of CLOZUK replication results by
L(p) = Пi (p Pi + (1-p)α) Xi (p(1- Pi)+(1-p)(1-α))(1- Xi)
The maximum likelihood estimator of p is the value of p which maximises L(p), and its 95% confidence interval defined as the set of values q for which the null hypothesis p=q would not be rejected at the 5% level. That is, the set of values q for which 2 ln (L()/L(q)) ≤ 3.841.
Meta analyses were performed to combine the PGC schizophrenia results with those in our CLOZUK sample. As in the PGC paper, our primary analysis combined loge ORs weighted by 1/s.e. using PGC data that had not been adjusted for Where our SNP data were identified through proxies in high linkage disequilibrium, we divided the s.e.by √ r2 to account for the reduced information.14
Table S1: Summary statistics for the key analyses. For each analysis of all the SNPs, the genomic control was estimated in a subset of around 43 000 SNPs in relative linkage equilibrium. In every analysis, each SNP was genomic control adjusted by this value. The post adjustment 1000 estimate is also provided.
Fig S1: Quantile-quantile plot; A: CLOZUK vs Controls, B: Bipolar vs Controls C: CLOZUK vs Bipolar
Table S2: Details of the 10 SNPs not directly available on the Immunochip. Where possible, proxy markers were identified on the immunochip. R2 and D’ values are taken from CEU 1000Genomes Pilot 1 reference data.
Table S3: List of 81 PGC SNPs followed up in stage 2 of that study. Alleles are listed with the PGC stage 1 associated allele first. CHR: chromosome, BP: base pair position. *The PGC stage 2, CLOZUK and UK BD results presented are one-tailed P values. Although statistically independent, the PGC considered the 5 MHC GWS SNPs as one locus as did they the two chromosome 10 SNPs. † SNP data in our sample identified with proxy makers (see table S2). Note that as in the PGC study, we included 8 SNPs that had surpassed p=2x10-5 in a preliminary analysis of the PGC dataset but for which the significance dropped slightly after all QC.