Introduction to Bioinformatics

Tony · Post by **Tony** » Thu Oct 01, 2009 1:06 am

Bioinformatics is the use of computers, software tools, and databases to handle biological information. Bioinformatics is widely used for genomics and proteomics. Bioinformatics helps to sequence and analyse all of the genomic entities, including genes and transcripts, in an organism. It also helps in proteonomics to analyse the complete set of proteins or proteome. Bioinformatics is used in drug designing and drug development. Bioinformatics is one of the fastest growing filed, which will certainly reveal the mysteries of life and many deceases. Bioinformatics has become a very important part of Biotechnology. All the information process by Biotechnology is stored and analysed using bioinformatics. Bioinformatics is the comprehensive application of mathematics, statistics, biochemistry, biophysics and computer algorithms to analyse biological data.

Application of Bioinformatics in various fields:

Drug designing
Drug development
Gene therapy
Evolutionary studies
Biotechnology
Bio-weapon creation
Veterinary medicine
Molecular medicine
Improve quality for various products
Climate change Studies
Waste recycling
Prevention medicine
Agriculture
Forensic analysis
Development of high yield crops
Antibiotic medicine
Pesticides
And More....

Skills Required for a Bioinformatician
As we know Bioinformatics is wide area, it will not be possible to learn all of them. The important topics that should be know are:

Molecular Biology
Should have a basic knowledge of molecular biology.

Experience with one or more of Molecular Biology software packages. Learn to use sequence analysis and molecular modelling software. Some of the molecular biology packages are BLAST, FASTA etc.

Computer's

Operating system's

Windows and Linux
Should have the basic knowledge of both windows and Linux. Nowadays Linux (Free open source) is widely used in bioinformatics for is robustness and available tools & software for this platform, its very important to learn these operating system.

Computer Programming Language
C/C++, Perl, Python, Java and HTML should be known by Bioinformatician.

Database Management Systems
Oracle and MySQL (Free Database Server) is widely used to store large biological data.

Sequence Analysis

Sequence analysis is the use of various bioinformatics methods and tools to determine the biological function and/or structure of genes and the proteins they code for.

How to perform Sequence Analysis?

To perform Sequence Similarity Search we use NCBI BLAST.

What is NCBI BLAST?
NCBI's BLAST, is used to compare the sequence of a particular gene or protein with other sequences from a variety of organisms.

What is BLAST?
BLAST (Basic Local Alignment Search Tool) is a set of programs designed to perform similarity searches on all available sequence data. We can use this searches to gain insight into the function and biological importance of gene products.

BLAST uses an algorithm developed by NCBI that seeks local alignment (the alignment of some portion of two sequences) as opposed to global alignment (the alignment of two sequences over their entire length). By searching for local alignments, BLAST is able to identify regions of similarity within two sequences.

Services provided by BLAST :

blastp - comparing an amino acid query sequence with others stored in protein sequence databases
blastn - comparing a nucleotide query sequence against a nucleotide sequence database
blastx - comparing a nucleotide query sequence translated in all reading frames with other amino acid sequences stored in protein sequence databases

To proceed with the BLAST we need the FASTA formatted Amino acid sequence.

How to obtain a FASTA Formatted Sequence?

Go to NCBI Home page.
Enter the Search option (eg: protein, Genome, etc) for any particular organism (eg: human, rat etc) and in the display option select FASTA.
Then click Go.
The FASTA sequence starts with > symbol.
eg: >gi|73535847|pdb|1Z9O|C Chain C, 1.9 Angstrom Crystal Structure Of The Rat Vap-A Msp Homology Domain In Complex With The Rat Orp1 Ffat Motif
GSHMAKHEQILVLDPPSDLKFKGPFTDVVTTNLKLQNPSDRKVCFKVKTTAPRRYCVRPNSGVIDPGSIV
TVSVMLQPFDYDPNEKSKHKFMVQTIFAPPNISDMEAVWKEAKPDELMDSKLRCVFEM

Copy the sequence.
From the home page click BLAST. Then you need to select the BLAST service you need to use like blastp, blastn etc.
Paste the sequence in the form.
Click on the BLAST button.

Analyse the result.

Bioinformatics Sequence Analysis Tools

Below listed are few online Sequence Analysis tools. We regularly add new websites and new tools in Bioinformatics. If you come across new tools in bioinformatics, please let us know.

Protein Sequence Analysis Tools

3DCrunch - Database Browser of modelled Swiss-Prot proteins at ExPASy, Switzerland
3D-Jigsaw - Comparative modelling server
AAT - Analysis and Annotation Tool for finding genes in genomic sequences
ASC - Analytic Surface Calculation of PDB molecules @t EMBL, Heidelberg)
BLOCKS Search - Search a protein against BLOCKS database
BTPRED - Prediction of beta-turns
CD-Search - Search a protein against CDD domain database
Chime - Plugin for structure view
Cn3D - Plugin for structure view for structure @ NCBI
eMOTIF Search - Assign putative function to new proteins by sequence comparison with IDENTIFY motif database.
Coils - Prediction of coiled Coil regions by Lupas method
CPHmodels - Structure prediction by comparative homology modelling
DIP - Search Database of Interacting Proteins
DOMPLOT - Structural domain organization annotated by ligand contacts.
eMATRIX Search - Predict function by sequence analysis using minimal-risk scoring matrices.
eMOTIF Maker - Generate motifs describing protein families or superfamilies
FingerPRINTScan - Search a protein sequence against protein motif fingerprints database
HMMTOP - Prediction of transmembrane helices and topology of proteins
JPred - Protein secondary structure prediction
LIBRA - LIght Balance for Remote Analogous proteins: search compatible structure of a target sequence by threading
Comparative modelling tools
Modules - Mobile protein domains database
MolSurfer - Calculate and navigate protein-protein interfaces
NetOGlyc - Prediction of O-glycosylation sites in mammalian proteins
nnPredict - Protein secondary structure prediction.
PFSCAN - Protein search against different profile databases
PPSearch - Search a protein sequence against prosite pattern database
Predicting Protein 3D structure based on homologous sequence search
RasMol - 3D viewer
VAST - Structure-structure similarity search
WebMol - 3D viewer

DNA / RNA Sequence Analysis Tools

Gene Finder -Exon and Splice Site Prediction
GENEID -Prediction of Exons and Gene Structure
ORD ID - Open Reading Frame search
ORF Finder - Open Reading Frame Finder
PatScan - DNA Sequence Motif search

Bioinformatics Databases

Below listed are few online Bioinformatics Databases. We regularly search and add new databases in Bioinformatics. If you come accross new database websites in bioinformatics, please let us know.

List of Bioinformatics Databases / Data Banks Online

NCBI - National Centre for Biotechnology Information (GenBank)
EBI - European Bioinformatics Institute (EMBL)
EMNEW - Index of New EMBL Nucleotides ( EBI)
DDBJ - DNA Data Bank of Japan

List of Protein Sequence Data Banks

SWISS-PROT - Protein sequence database
PIR - Protein Information Resource
MIPS - Munich Information centre for Protein Sequences

SeqAnalyser - Open Source BioPerl Software

What is SeqAnalyser?
SeqAnalyser is an open source bio perl software. It is used to Analyse both DNA and PROTIEN sequence.

Functions of SeqAnalyser?

PROTEIN Sequence Analysis
DNA Sequence Analysis
FORMAT Conversion
Convert DNA to PROTEIN
PAIRWISE Alignment

Sub Functions

PROTEIN Sequence Analysis
Getting Protien Structure
Search for Mottifs

DNA Sequence Analysis
Convert DNA to RNA
Convert RNA to DNA
Mutation of DNA
Frequency of NUCLIOTIDE
Reverse the DNA
Search for Mottifs
Reverse Compliment of DNA

FORMAT Conversion
Convert RAW Format Sequence
Convert FASTA Format Sequence
Convert GENBANK Format Sequence
Convert EMBL Format Sequence
Convert GCG Format Sequence

You can convert any of these format sequence file from one format to another

SeqAnalyser Download Procedure

Before you download our Software "SeqAnalyser" you need to download and install Perl and Bioperl.

Perl - Download Perl

After Downloading Install perl to your C:\ Drive

BioPerl - Download BioPerl

You get the zip file of BioPerl. Unzip it, Copy the contents inside the main folder and paste it in your perl\lib directory.

Contact us to get your login details to Download SeqAnalyser. Please make sure you have received Login details to download SeqAnalyser.

Download SeqAnalyser

If you have any issue Downloading SeqAnalyser, Please contact us.

Bioinformatics Perl Script Library
Perl is the most widely used programming language in Bioinformatics. You can download and use any scripts from our Bioinformatics Perl scripts library for FREE. If you are interested in programming and willing to share your bioinformatics scripts, you are welcome to share it with us. We will list your script in our script library and provide it for FREE.

dna2rna.pl
This Perl script "dna2rna.pl" can be used to convert the DNA sequence to RNA sequence. While running this Perl script it asks for the DNA file sequence. Enter the filename of the DNA sequence and it generates the RNA sequence. This script runs both on Windows and Linux operating Systems.