Skip to content

sr-murthy/rosalind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Solutions to the ROSALIND problem set

Selected Python solutions to the ROSALIND bioinformatics problem set, including some data structures and utility functions.

Notes

  • The solutions aren't complete yet, but will added over time. All solutions are original, and the focus is not just on correctness, but conciseness, readability and speed/performance, although it is not always possible to achieve all three at once, and there is no absolute guarantee of performance in case of arbitrarily large inputs.

  • solutions.py is the main solution set, while utils.py contains generic utilities which are used in the solutions, as required.

  • A basic set of tests has been added, and they mostly use the example problems from the ROSALIND problems as test cases.

  • Solutions always produce raw values, and don't depend on formatting, e.g. GC (Computing GC Content), where the marking on ROSALIND depends on formatting the answer in a particular way.

  • The function docstrings are written using the Numpy docstring style.

  • For more background on linguistic complexity (LC) see, page W630.

  • The counting of kmers in the KMER (k-Mer Composition) problem must take overlapping substrings into account: so a custom function has been used for this purpose, as str.count only counts non-overlapping occurences.

  • The solutions to several problems, including SSEQ (Finding a Spliced Motif), involve finding and returning arrays of indices of a matching subsequence or substring, in terms of 1-indexed arrays, as required by the problems. They convert the 0-indexed array indices returned by some generic utility functions that they call on.

  • The solution to EDIT (Edit Distance) is (cached) recursive, which is slower than equivalent iterative implementations, but is definitely more readable and easier to understand. It also allows for insertion, deletion, and substitution costs to be customised.

About

Python solutions to the ROSALIND (https://rosalind.info/) bioinformatics problem set.

Topics

Resources

License

Stars

Watchers

Forks

Languages