What is Bioinformatics, anyway?
First things first, thanks for reading our blog! Macromoltek runs an internship program each semester for Natural Science and Engineering students who have an interest in biotechnology. Our interns are very gifted undergraduate students ready to apply their computational skills to real-life biological problems. Their active involvement in what we do at Macromoltek gave us the idea to start this blog. Here we plan to answer any biotech, biochemistry, and immunology questions of theirs and yours! Our very first question is one we are asked a lot as a bioinformatics company: What is bioinformatics, anyway?
In short, bioinformatics can be summarized as a hybrid science — somewhere between computer sciences and biology. Such a simple explanation barely scratches the surface, though.
The practice of bioinformatics can be traced as far back as the 1960s. This is when Margaret Oakley Dayhoff, who is sometimes referred to as the mother of bioinformatics, developed a computer program to aid in the determination of protein sequences(1). Dr. Dayhoff developed the one-letter amino acid codes to make sequences easier to input into a computer using punch cards. Her single-letter codes are still used to this day, much to the chagrin of biochemistry students, who have enough to remember already!
The actual term has been around since at least as early as 1970, when Ben Hesper used it to describe “the study of informatic processes in biotic systems” (2). From then through the 1980s, however, the concept of bioinformatics shifted away from generally describing biochemical networks to become synonymous with sequence analysis using algorithms to compare data. In this phase of its history, two of the most important contributors were Elvin Kabat and Tai Te Wu. They collected and aligned amino acid sequences from humans and mice. Kabat and Wu, “used a simple mathematical formula to calculate the various amino acid substitutions at each position and predict the precise locations of segments of the [protein]” (3). Their database was released in print throughout the late 70s and 80s until it became so expansive that it was impossible to print. It can now be found online at KabatMan.
By the end of the 90s, bioinformatics became known as the use of computational methods for comparative analysis in biology. This is more in line with today’s definition, but in the 90s sequence analysis was still the major focus — largely because bioinformatics gained public attention during the Human Genome Project (HGP) (4). An argument can be made that the HGP was a springboard for bioinformatics as the study became a dramatic scientific race. The HGP was initiated in 1990 as a publicly funded project. With the technology of the time, sequencing all 3 billion base pairs in the human genome was a huge challenge! Scientists had to map a gene, sequence it in small segments, and reconstruct the sequences into a whole using the map. Suffice it to say, it was a slow process! A privately-owned company called Celera arose to compete with the public project in ’98. Headed by Dr. J. Craig Venter, Celera was a biotech company that used computational methods to automatically match the overlapping sections of sequences (5) — no more mapping or slow, grueling human assembly. This is what bioinformatics is all about! (Learn more about all of the drama between Celera and the public HGP project here.)
In the years since the completion of the HGP, the use of computers in biological research has only increased. Bioinformatics has grown to encompass a huge variety of fields, from immunology to cardiology to neuroscience and more. People working in all of these fields use computer science to advance our understanding of life science every day! As bioinformaticians do the work to hitch progress in biochemistry and medicine to the rapid pace of improvements in computer processing power, we have begun to approach a world where medical science improves at pace with Moore’s law.
With that, we have arrived at our answer, at least as it is understood today: bioinformatics is the creation, advancement, and understanding of immense sets of data using mathematical and computational techniques, in order to improve the quality and pace of new discoveries. Who knows how the definition will change in the coming decades?
Links and Citations:
1. www.computer.org/csdl/proceedings/afips/1962/5061/00/50610262.pdf
2. www.ncbi.nlm.nih.gov/pmc/articles/PMC3068925
3. Lo, Benny K.C., Antibody Engineering: Methods and Protocols. Humana Press, 2004.
4. www.genome.gov/12011238/an-overview-of-the-human-genome-project
5. www.nature.com/scitable/topicpage/dna-sequencing-technologies-key-to-the-human-828
Looking for more information about Macromoltek, Inc? Visit our website at www.Macromoltek.com
Interested in molecular simulations, biological art, or learning more about molecules? Subscribe to our Twitter and Instagram!