Descendant of MolecularEvolution.jl, specializing in codon models. The current dN/dS catalogue consists of
- FUBAR
- difFUBAR
To run our models in the cloud, see ColabMolecularEvolution.
In a julia REPL, type
]add CodonMolecularEvolution
https://MurrellGroup.github.io/CodonMolecularEvolution.jl
To trigger our plotting extension, one must using Plots, Phylo
.
This section demonstrates how to use our dN/dS models with real biological data, showcasing both the input requirements and the type of evolutionary insights you can obtain.
using MolecularEvolution, FASTX, CodonMolecularEvolution
Note: To enable plotting, one would add using Plots, Phylo
.
We perform FUBAR analysis on this FASTA file, and a phylogeny from this Newick tree file.
# Read data
outdir = "fubar"
seqnames, seqs = read_fasta("Ace2_with_bat.fasta")
treestring = readlines("Ace2_with_bat.tre")[1]
# Perform analysis
fgrid = alphabetagrid(seqnames, seqs, treestring);
method = DirichletFUBAR() # Standard FUBAR
fubar_df, params = FUBAR_analysis(method, fgrid, analysis_name=outdir*"/Ace2_with_bat");
The above call will write results to outdir*"/Ace2_with_bat"
, but here we only display what's written to sdout
Step 1: Initialization.
Step 2: Optimizing global codon model parameters.
Optimized single α,β LL=-66072.56532041529 with α=1.940881909909932 and β=0.7757100788818039.
Step 3: Calculating conditional likelihoods.
Step 4: Model fitting.
Step 5: Tabulating results.
19 sites with positive selection above threshold.
487 sites with purifying selection above threshold.
Now we show some of the above-threshold sites
above_threshold_sites = fubar_df[(fubar_df.positive_posterior .> 0.95) .| (fubar_df.purifying_posterior .> 0.95), :]
above_threshold_sites[1:10, :]
10×5 DataFrame
Row │ site positive_posterior purifying_posterior beta_posterior_mean alpha_posterior_mean
│ Int64 Float64 Float64 Float64 Float64
─────┼─────────────────────────────────────────────────────────────────────────────────────
1 │ 1 0.00564089 0.985948 0.0126755 1.42783
2 │ 4 5.12413e-5 0.998737 0.218888 1.12801
3 │ 6 0.0092943 0.981231 0.0227715 1.47458
4 │ 9 0.0118501 0.964818 0.0658507 0.392239
5 │ 11 2.29686e-8 0.999994 0.277966 2.10407
6 │ 15 1.43323e-6 0.999887 0.20953 1.42799
7 │ 17 0.000436892 0.995299 0.199044 0.917965
8 │ 19 8.04229e-9 0.999999 0.152897 2.47035
9 │ 25 0.000141827 0.996911 0.275294 1.36792
10 │ 27 0.995895 3.74494e-5 2.94698 1.05116
We perform difFUBAR analysis on this FASTA file, and a tagged phylogeny from this NEXUS tree file.
# Read data
analysis_name = "difFUBAR/Ace2_tiny"
seqnames,seqs = read_fasta("data/Ace2_tiny_test.fasta");
treestring, tags, tag_colors = import_colored_figtree_nexus_as_tagged_tree("data/Ace2_no_background.nex")
# Perform analysis
df,results = difFUBAR(seqnames, seqs, treestring, tags, analysis_name);
difFUBAR
will write its results to the export directory analysis_name
,
but here we show a short-hand representation of the results, which are written to stdout.
Step 1: Initialization. If exports = true, tree showing the assignment of branches to groups/colors will be exported to: output/Ace2_tagged_input_tree.svg.
Step 2: Optimizing global codon model parameters.
0.0% 29.0% 58.0% 87.0%
Step 4: Running Gibbs sampler to infer site categories.
Step 5: Tabulating and plotting. Detected sites:
Site 3 - P(ω1 > ω2):0.0075; P(ω2 > ω1):0.9805; P(ω1 > 1):0.1205; P(ω2 > 1):0.9675
Site 13 - P(ω1 > ω2):0.0175; P(ω2 > ω1):0.9605; P(ω1 > 1):0.13; P(ω2 > 1):0.9305