Genomics
Sequence, sequence, sequence...
In simpler days, before people made up words like genomics and bioinformatics,
a single gene was enough to keep a graduate student occupied for all five
years of a doctorate. Sequence data trickled onto Genbank one gene at a time, even as the methods for obtaining
that sequence became simpler and simpler. Then a renegade called J. Craig
Venter had an idea: stop thinking about what DNA you should sequence and
start sequencing anything you can get your hands on. Reasoned thought was
out; brute force was in.
But there was some logic to Venter's approach. Genes are mere islands in
a cell's DNA, stranded among seas of nonsensical filler DNA. The Human Genome
Project promised to (eventually) sequence everything. (A genome is the collection
of all the DNA in a given cell.) Venter wanted to fish out the informative
bits first - just enough of each gene to take a guess at its place in the
running of the cell. It is proteins that do the work of a cell, but proteins
are made only after genes are converted to mRNA, which is then converted
to protein. Venter took the mRNA and transformed it back into DNA that was
ready to sequence and devoid of non-gene junk. After sequencing at most
a few hundred nucleotides of each piece of DNA, he had his expressed sequence
tag (EST).
Venter founded The Institute for Genomic
Research (TIGR; Rockville, Md.) in July 1992,
with $85 million of funding promised over ten years by Human
Genome Sciences, Inc. (HGS; Rockville, Md.). Within
a year, TIGR claimed it had identified ESTs for over half the estimated
70,000 human genes. TIGR and HGS parted ways in 1997: TIGR is now a not-for-profit
institute with government funding, and HGS has focused on patenting genes
(several hundred applications so far, with over fifty patents allowed) and
developing the corresponding proteins as drugs.
Incyte began as a traditional pharmaceutical company. But when the failure
of its premier drug in clinical trials coincided with Venter's EST splash,
Incyte decided to re-invent itself. "We became basically a factory
for sequencing DNA," says Klingler.
Four years, six months, and three million human ESTs later, the sequencing machines are still running. The sequencing room is a far cry from
the deserted computer room: people scurry everywhere to tend to the rows
of sequencing machines. This room generates all the data that makes the
company run, but the work is repetitive and the workers - many of them college
students - are expendable. "There is a whole new temporary biologist
market," says Klingler. "I don't know how long the average technician
stays, but it's not too long."
Incytes human database is called LifeSeq | Incytes Database |  
| . The ESTs come from the
mRNA of 669 different tissue samples, some of them diseased and some of
them not, and represent perhaps 90-95% of all human genes. Genes that are
often made into mRNA have been sequenced thousands of times, but some genes
that are rarely converted into mRNA remain to be sequenced once. Incyte
is also using the short ESTs to find the entire length of every human gene,
and is working out where each gene lies in the 24 human chromosomes.
Newer databases include PathoSeq, which has most of the genes from 32 bacterial
species, and ZooSeq, which includes genes from mice, rats, monkeys, and
soon dogs. The sequencing operation that feeds these databases generates
~200,000 pieces of sequence, or over 40 million DNA nucleotides, every single
month.
|
|
|