Yufeng Wu’s Publications in Computational Biology and Bioinformatics
Preprints
- ScisTree2: An Improved Method for Large-scale Inference of Cell Lineage Trees and Genotype Calling from Noisy Single Cell Data, Haotian Zhang, Yiming Zhang, Teng Gao and Yufeng Wu, submitted for publication, 2024. [Accompanying Software: ScisTree2] This paper presents an improved method for inferring cell lineage tree from single cell DNA data.
- Bounding the number of reticulation events for displaying multiple trees in a phylogenetic network, Yufeng Wu and Louxin Zhang, submitted for publication, 2024.
Published Papers
- A general approach for inferring the ancestry of recent ancestors of an admixed individual, Yiming Zhang, Haotian Zhang and Yufeng Wu, PNAS, 121 (2) e2316242120, 2024. [Accompanying Software: PedMix2] This paper presents a more general approach than the original PedMix and can perform ancestry inference for all recent ancestors from an extant individual’s genome.
- A fast and scalable method for inferring phylogenetic networks from trees by aligning lineage taxon strings, Louxin Zhang, Niloufar Abhari, Caroline Colijn and Yufeng Wu. Genome Research, 33: 1053-1060, 2023. An earlier version of this paper was presented in RECOMB 2023 conference.
- Bounding the Number of Reticulations in a Tree-Child Network that Displays a Set of Trees, Yufeng Wu and Louxin Zhang, in proceedings of RECOMB-CG 2023 conference, 2023.
- Joint inference of ancestry and genotypes of parents from children, Yiming Zhang and Yufeng Wu, iScience, 25,104768, 2022.
- Inferring the ancestry of parents and grandparents from genetic data (preprint), Jingwen Pei, Yiming Zhang, Rasmus Nielsen and Yufeng Wu, PLoS Computational Biology, 16(8): e1008065, 2020. [Accompanying Software: PedMix]. This paper develops an inference method for inferring the ancestry (in particular, admixture proportions of recent ancestors, e.g. parents or grandparents) from an extant (admixed) individual.
- Inference of Population Admixture Network from Local Gene Genealogies: a Coalescent-based Maximum Likelihood Approach, Yufeng Wu, in proceedings of ISMB 2020, to appear. [Accompanying Software: GTmix]. This paper develops a new inference method for inferring population admixture history from inferred genealogies from haplotypes.
- Accurate and Efficient Cell Lineage Tree Inference from Noisy Single Cell Data: the Maximum Likelihood Perfect Phylogeny Approach (preprint), Yufeng Wu, Bioinformatics, v.36, pages 742-750, 2020. [Accompanying Software: ScisTree]. This paper develops a new method for inferring cell lineage tree from uncertain single cell data.
- Detecting circular RNA from high-throughput sequence data with de Bruijn graph, Xin Li and Yufeng Wu, BMC Genomics, 2020. [Accompanying Software: CircDBG] This paper has been presented in ISBRA 2018 conference.
- DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network, Lei Cai, Yufeng Wu and Jingyang Gao, BMC Bioinformatics, 20:665, 2019.
- GAPPadder: A Sensitive Approach for Closing Gaps on Draft Genomes with Short Sequence Reads, Chong Chu, Xin Li, and Yufeng Wu, BMC Genomics, 2019. [Accompanying Software: GAPPadder] This paper develops a new method for draft genome gap filling.
- CircMarker: A Fast and Accurate Algorithm for Circular RNA Detection, Xin Li, Chong Chu, Jingwen Pei, Ion Mandoiu and Yufeng Wu, BMC Genomics, 2018. [Accompanying Software: CircMarker] This paper has been presented in ISBRA 2017 conference.
- CLADES: A Classification-based Machine Learning Method for Species Delimitation from Population Genetic Data (preprint), Jingwen Pei, Chong Chu, Xin Li, Bin Lu and Yufeng Wu, Molecular Ecology Resources, accepted, 2018. [Accompanying Software: CLADES] This paper develops a machine learning approach for species delimitation, which runs very fast.
- STELLS2: Fast and Accurate Coalescent-based Maximum Likelihood Inference of Species Trees from Gene Tree Topologies, Jingwen Pei and Yufeng Wu, Bioinformatics, v33, pages 1789 – 1797, 2017. [Accompanying Software: STELLS2] This paper develops an improved species tree inference method based on coalescent theory, which is much faster than the original STELLS.
- RENT+: An Improved Method for Inferring Local Genealogical Trees from Haplotypes with Recombination, Sajad Mirzaei and Yufeng Wu, Bioinformatics, v33, pages 1021 – 1030, 2017. [Accompanying Software: RENT+] This paper develops a new method for inferring genealogy with recombination, which is more accurate and much faster than the original RENT.
- Concod: Accurate Consensus-based Approach of Calling Deletions from High-throughput Sequencing Data, Xiaodong Zhang, Chong Chu, Yao Zhang, Yufeng Wu and Jingyang Gao, in Proceedings of BIBM 2016, pages 72 to 77, 2016. [Accompanying Software: Concod]
- An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree, Yufeng Wu, Bioinformatics, v32, pages i225 – i233, 2016 (special issue of ISMB 2016). This paper develops an algorithm for computing the gene tree probability under multispecies coalescent that runs in polynomial time in terms of the number of gene lineages when the number of populations is small (extension to the STELLS paper).
- REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads. Chong Chu, Rasmus Nielsen and Yufeng Wu. PLoS One, 11(3): e0150719. 2016. [Accompanying Software: REPdenovo] This paper develops a software tool for constructing consensus repeats directly from short sequence reads.
- SpliceJumper: a classification-based approach for calling splicing junctions from RNA-seq data., Chong Chu, Xin Li and Yufeng Wu, BMC Bioinformatics, 2015;16(Suppl 17):S10. [Accompanying Software: SpliceJumper]
- Fast Construction of Near Parsimonious Hybridization Networks for Multiple Phylogenetic Trees (preprint link from TCBB), Sajad Mirzaei and Yufeng Wu, IEEE/ACM IEEE/ACM Transactions on Computational Biology and Bioinformatics, v13, p. 565-570, 2016. [Accompanying Software: PIRNs]. This paper develops a new algorithm for building hybridization networks from multiple gene trees. It works the best when the number of gene trees is large and the number of taxa is not too large.
- A Coalescent-based Method for Population Tree Inference with Haplotypes, Yufeng Wu, Bioinformatics, March 1;31(5):691-698, 2015. [Accompanying Software: STELLSH (available by email)] [Supplemental materials]. This paper uses the gene tree probability computed for genealogies inferred from haplotypes for the purpose of population tree inference.
- GINDEL: Accurate Genotype Calling of Insertions and Deletions from Low Coverage Population Sequence Reads, Chong Chu, Jin Zhang and Yufeng Wu, PLoS One, 9(11): e113324, 2014. [Accompanying Software: GINDEL] This paper develops a machine learning approach for calling deletion/insertion genotypes from sequence data.
- An Algorithm for Constructing Parsimonious Hybridization Networks with Multiple Phylogenetic Trees, Yufeng Wu, in Proceedings of RECOMB 2013, p.291-303, 2013. [Accompanying Software: PIRN 2.0] The Journal version appears in Journal of Computational Biology, 20 (10): 792-804, 2013: [paper]. This paper develops an exact algorithm for building parsimonious hybridization networks from multiple gene trees.
- Coalescent-based Species Tree Inference from Gene Tree Topologies Under Incomplete Lineage Sorting by Maximum Likelihood, Yufeng Wu, Evolution, v. 66 (3), p. 763-775, 2012. [Accompanying Software: STELLS (available by email)] This paper develops a much faster algorithm for computing the so-called gene tree probability (originally studied by Degnan and Salter, Evolution, 2005); it also develops a ML species tree inference method based on the improved gene tree probability algorithm.
- An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data, Jin Zhang, Jiayin Wang and Yufeng Wu, BMC Bioinformatics, v.13 (suppl. 6): S6, 2012.
- SVseq: an approach for detecting exact breakpoints of deletions with low-coverage sequence data, Jin Zhang and Yufeng Wu, Bioinformatics, v.27 (23): p. 3228-3234, 2011. [Accompanying Software: SVseq (available upon request)] This paper develops a new computational approach for finding genomic deletions in populations or single individuals using short reads from high throughput sequencing. The main idea is to combine two existing approaches, split reads mapping and discordant insert size analysis, in order to obtain more accurate calling of deletions.
- Identifying Interacting SNPs with Parallel Fish-Agent based Logic Regression, Jiayin Wang, Jin Zhang and Yufeng Wu, in Proceedings of the First IEEE International Conference on Computational Advances in Bio and medical Sciences (ICCABS 2011), Orlando, Florida, 2011.
- Linkage Disequilibrium Based Genotype Calling from Low-Coverage Shotgun Sequencing Reads (pre-print), J. Duitama, J. Kennedy, S. Dinakar, Y. Hernandez, Y. Wu and I.I. Mandoiu, BMC Bioinformatics, v.12 (suppl. 1), S53, 2011. Part of the paper was presented at APBC 2011.
- Haplotype Inference from Short Sequence Reads Using a Population Genealogical History Model [link], Jin Zhang and Yufeng Wu, in proceedings of Pacific Symposium on Biocomputing: 288-299, 2011.
- New Methods for Inference of Local Tree Topologies with Recombinant SNP Sequences in Populations [link], Yufeng Wu, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(1):182-193, 2011. [Accompanying Software: RENT] This paper develops computational methods for inferring local genealogical trees, where different genomic regions may have different trees due to recombination. A key observation is that nearby genealogical trees often share many topological features and my approach is to find these shared structures by comparing nearby genomic sites.
- Close Lower and Upper Bounds for the Minimum Reticulate Network of Multiple Phylogenetic Trees (supplemental materials), Yufeng Wu, in Proceedings of 18st Annual International Conference of Intelligent Sysems for Molecular Biology (ISMB 2010), published as a special issue of Bioinformatics, 26(12):i140-i148, 2010. [Accompanying Software: PIRN]. Talk given at ISMB.
- Bounds on the Minimum Mosaic of Population Sequences Under Recombination, Yufeng Wu, in Proceedings of 21st Annual Symposium on Combinatorial Pattern Matching (CPM 2010), LNCS 6129, pp. 152-163, 2010.
- Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees, Yufeng Wu and Jiayin Wang, in Proceedings of International Symposium on Bioinformatics Research and Applications (ISBRA 2010), LNCS 6053, p.203-214, 2010.
- The Three-State Perfect Phylogeny Problem Reduces to 2-SAT, Dan Gusfield and Yufeng Wu, Communications in Information and Systems, v.9, p.295-302, 2009.
- Exact Computation of Coalescent Likelihood Under the Infinite Sites Model, Yufeng Wu, in Proceedings of ISBRA 2009, pages 209-220, 2009. The full version of this paper, “Exact Computation of Coalescent Likelihood for Panmictic and Subdivided Populations Under the Infinite Sites Model”, (link) appears in IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(4):611-618, 2010. This paper develops an algorithm for exact computation of likelihood of a set of gene sequences under infinite sites model (without recombination). This problem has been studied extensively before where inexact methods have been developed. This paper provides exact solutions for a single population or subdivided populations.
- A practical method for exact computation of subtree prune and regraft distance [link], Yufeng Wu, Bioinformatics, 25(2):190-196, 2009. [Software] This paper developes an exact method for computing the rSPR distance between two rooted binary trees using integer linear programming.
- An Analytical Upper Bound on the Minimum Number of Recombinations in the History of SNP Sequences in Populations, Yufeng Wu, Information Processing Letters, v.109, n.9, p.427-431, 2009.
- Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms, Yufeng Wu, in Proceedings of RECOMB 2007 (LNBI Vol. 4453), pages 488-502, 2007. This paper wins the best student paper award. An extended version of this paper [link] appears in Journal of Computational Biology, 15(7): 667-684, 2008. [Software] [Talk]
- Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants, Yufeng Wu and Dan Gusfield, in Proceedings of CPM 2007. [Software available upon request]
- A New Recombination Lower Bound and The Minimum Perfect Phylogenetic Forest Problem [link], Yufeng Wu and Dan Gusfield, in Proceedings of COCOON 2007, The full version of the paper appears in Journal of Combinatorial Optimization [link].
- Efficient Computation of Minimum Recombination over Genotypes (not Haplotypes), Yufeng Wu and Dan Gusfield, Proceedings of Life Sciences Society Computational Systems Bioinformatics (CSB) 2006, pages 145-156, 2006. [Abstract]. An extended version of this paper appears in Journal of Bioinformatics and Computational Biology (JBCB) in a special issue of CSB 2006, 5(2a), 181-200, 2007.
- Algorithms to distinguish the role of gene-conversion from single-crossover recombination in the derivations of SNP sequences in populations, Yun S. Song, Zhihong Ding, Dan Gusfield, Charles Langley and Yufeng Wu, Proceedings of RECOMB 2006 (LNBI Vol. 3909), pages 231-245, 2006. An extended version of this paper (link) appears in Journal of Computational Biology (JCB), 14(10): 1273-1286, 2007.
- Algorithms for Imperfect Phylogeny Haplotyping with a Single Homoplasy or Recombination Event, Yun S. Song, Yufeng Wu and Dan Gusfield, Proceedings of Workshop on Algorithm of Bioinformatics (WABI) 2005, LNCS 3692, 2005.
- Efficient computation of close lower and upper bounds on the minimum number of recombinations in biological sequence evolution, Yun S. Song, Yufeng Wu and Dan Gusfield, Proceedings of ISMB 2005, published as a special issue of Bioinformatics, 21: i413 – i422, 2005.
The above papers are based upon work supported by the National Science Foundation under Grants No. 0803440, 0953563, 1116175, 1526415, 1718093 and 1909425. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.