A highly annotated whole-genome sequence of a Korean individual

From Kogic.net
Revision as of 04:17, 11 August 2009 by WikiSysop (talk | contribs)

A highly annotated whole-genome sequence of a Korean individual. 

Kim JI, Ju YS, Park H, Kim S, Lee S, Yi JH, Mudge J, Miller NA, Hong D, Bell CJ, Kim HS, Chung IS, Lee WC, Lee JS, Seo SH, Yun JY, Woo HN, Lee H, Suh D, Lee S, Kim HJ, Yavartanoo M, Kwak M, Zheng Y, Lee MK, Park H, Kim JY, Gokcumen O, Mills RE, Zaranek AW, Thakuria J, Wu X, Kim RW, Huntley JJ, Luo S, Schroth GP, Wu TD, Kim H, Yang KS, Park WY, Kim H, Church GM, Lee C, Kingsmore SF, Seo JS.
[1] Genomic Medicine Institute (GMI), Medical Research Center, Seoul National University, Seoul 110-799, Korea [2] Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, [3] Macrogen Inc., Seoul 153-023, Korea [4] Psoma Therapeutics, Inc., Seoul 110-799, Korea [5] These authors contributed equally to this work.

Recent advances in sequencing technologies have initiated an era of personal genome sequences. To date, human genome sequences have been reported for individuals with ancestry in three distinct geographical regions: a Yoruba African, two individuals of northwest European origin, and a person from China. Here we provide a highly annotated, whole-genome sequence for a Korean individual, known as AK1. The genome of AK1 was determined by an exacting, combined approach that included whole-genome shotgun sequencing (27.8x coverage), targeted bacterial artificial chromosome sequencing, and high-resolution comparative genomic hybridization using custom microarrays featuring more than 24 million probes. Alignment to the NCBI reference, a composite of several ethnic clades, disclosed nearly 3.45 million single nucleotide polymorphisms (SNPs), including 10,162 non-synonymous SNPs, and 170,202 deletion or insertion polymorphisms (indels). SNP and indel densities were strongly correlated genome-wide. Applying very conservative criteria yielded highly reliable copy number variants for clinical considerations. Potential medical phenotypes were annotated for non-synonymous SNPs, coding domain indels, and structural variants. The integration of several human whole-genome sequences derived from several ethnic groups will assist in understanding genetic ancestry, migration patterns and population bottlenecks. 

Data have been deposited in the NCBI short read archive under accession number SRA008370. These data are also available freely from http://gmi.ac.kr. SNPs and indels are deposited in the dbSNP database under handle GMI.


http://www.nature.com/nature/journal/vaop/ncurrent/full/nature08211.html