Geophysics
Digital Logic
Computation Photonics
Protien
Cancer
Genetics
Godenics

 

A Practice Guide to NGS Genomic Sequencing Data Processing and Genetic Analyzing in Linux OS

1. Reference:

download_resource > build_reference_for_bwa (tophat, star) / bwa_index / annotation_gtf
The genome data for reference include genome reference as .fasta or .fa file and annotation file as .gtf or .gff.

- ensemble

FTP site to download .fasta and .gtf file :
ftp://ftp.ensembl.org/pub/current_fasta
ftp://ftp.ensembl.org/pub//current_gtf

Web site to download .fasta and .gtf file :
http://www.ensembl.org/info/data/ftp/index.html

- UCSC

Web site to download .fasta and .gtf file :
http://hgdownload.soe.ucsc.edu/downloads.html

FTP site to download .fasta and .gtf file :
ftp://hgdownload.soe.ucsc.edu/goldenPath/currentGenomes

- NCBI

FTP site for download :
ftp://ftp.ncbi.nlm.nih.gov/genomes

Web site for download :
http://www.ncbi.nlm.nih.gov/home/download.shtml

2. Software and tools:

download_resource > software_installation

- TopHat/Cufflinks/Cuffdiff/Fusion
- BWA (Burrows-Wheeler Aligner)
- Bowtie
- SAMtools
- GATK(GenomeAnalysisToolKits)
- FastQC
- QC3
- RNA-SeQC
- AnnoVar
- R
- Cluster 3.0 (Open Source Clustering Software)
- TreeView
- Circos (flexible and automatable circular data visualization)
- BreakDancer
- IGV
- FusionHunter
- FusionMap
- BEDTools
https://bedtools.googlecode.com/

- bamUtil
http://genome.sph.umich.edu/wiki/BamUtil

- Picard tools

- MuTect
- VarScan
- CNVnator
- CoNIFER (Copy Number Inference From Exome Reads)
- CNVseq
- CPAT
- cummeRbund
- VCFtools
http://vcftools.sourceforge.net/

- ViusFinder
- VirusSeq
- SRAtools
- pindel
- dindel
- Homer
- fastax-toolkit
- MapSplice
- diffsplice
- SVDetect
- HTSeq


3. DNAseq:

fastq > bwa_align > GATK_realign > GATK_recalibration > GATK_markduplicate > GATK_call_SNPs/INDELs > GATK_best_prectice_filter > pass_filter > reformat_data > ANNOVAR >annotated_exon_result
> samtools_mpileup > varscan_call_SNPs/INDELs

4. RNAseq:

fastq > tophat-G_align > cufflink-G > fpkm_table
> cuffdiff > differential_gene_expression / isoforms
> cufflink-g > de-novel_individual >merge_by_location > re-call_fpkm
> cufflink > cuffcompare_gtf > cuffdiff_de-novel_group > differential_gene_expression / isoforms / splicing / promotor / cds
5. Parallel Computing

computer_cluster > linux_OS > organization_of_computation_tasks > processing_script > jobs_submit

6. Linux Command-line Operation

7. Mutation Analysis

8. SNPs Analysis