A simple, space-efficient algorithm for detecting start and stop codons in DNA
Date
2021-04Author
Schoeller, Scott J.
Publisher
University of Wisconsin - Whitewater
Advisor(s)
Oster, Zachary
Ganguly, Arnab
Veldkamp, Christopher
Metadata
Show full item recordAbstract
The brute force algorithm for string matching has a worst-case execution complexity of O(MN), where N is the length of the entire string and M is the length of the substring to be matched. The efficiency of brute-force matching is insufficient for biological applications due to the sheer size of bioinformatics datasets. This thesis describes an algorithm that uses bit compression techniques to represent multiple codons in a single Python integer. It was proven that with exact matching steps, the search algorithm is as accurate as brute-force matching with significantly higher efficiency, as this algorithm runs in O(N) time and space, with minimal memory overhead. While the bit-based algorithm is not more space-efficient in a theoretical sense, it is more efficient practically as we have observed significant memory savings compared to the size of the original string.
Subject
Bioinformatics
DNA
Permanent Link
http://digital.library.wisc.edu/1793/82249Description
This file was last viewed in Adobe Acrobat Pro.