GitHub - ayangromano/hadoop_for_science: This repository houses the work done by Dhar, Wang and Yang related to a hadoop/spark project done in the Big Data Analytics: CS-GY 9223 class at NYU.

We (Dhar, Wang and Yang) tested and evaluated available bioinformatic tools that uses the Big Data platforms that were taught in the Spring 2016 Big Data Analytics course at NYU-Tandon. We first proposed a small genomic analysis pipeline on short-read genetic data from the Human Microbiome Project (HMP), then proceeded to install and test the tools. Based on the unsuitability of the HMP data for available tools, we then proposed a second, simpler pipeline using single-species data to use in testing the tools. We found that though there are several bioinformatics tools that have been created for use with Big Data technologies, many if not most of the tools were outdated and/or have not been kept up to date through developer's maintenance or user engagement. In many cases, the tools seem to have been created as “proofs-of-concept,” but are not used actively in the bioinformatics community, thus failing to receive updates or support. However, at least one promising tool, called ADAM, appears to receive frequent updates and have an active programmer community; additionally, it relies on the more user-friendly platform of Spark. Underscoring its promise, we were able to successfully produce output using ADAM, such as transforming frequently used bioinformatics files (FastQ, FASTA and BAM) to ADAM files.

======
Update: September 2017

Although, this document may be slightly out of date. We hope that this will give the community an insight on using hadoop or spark for NGS analyses.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Appendix_I--Commands		Appendix_I--Commands
Appendix_III_SampleOutput		Appendix_III_SampleOutput
Appendix_II--SoftwareLinks.pdf		Appendix_II--SoftwareLinks.pdf
FinalReport.pdf		FinalReport.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

ayangromano/hadoop_for_science

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages