Assembly Genome Browser (AGB) is a tool providing interactive visualization of assembly graphs, a wide range of tuning parameters, and various options for modifying/simplifying the graph.
AGB uses d3-graphviz, GfaPy, NetworkX-METIS, and QUAST-LG.
AGB can be run on Linux or macOS (OS X). Install conda if you don't have one:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh bash miniconda.sh -b -p ./miniconda unset PYTHONPATH export PATH=$(pwd)/miniconda/bin:$PATHCreate a new conda environment and install AGB into it:
conda create -c almiheenko -c bioconda -n AGB agbActivate the environment:
conda activate agb
To compile AGB by yourself you will need the following libraries to be pre-installed:
git clone https://github.com/almiheenko/AGB.git cd AGB
agb.py --graph <GFA(1,2)/FASTG/Graphviz file> -a <assembler_name>or running AGB to visualize an output of some of supported assemblers (Canu, Flye, SPAdes):
agb.py -i <assembler_output_dir> -a <assembler_name>The assembly graph viewer will be saved to
agb.py --graph <file> -a <assembler_name> [--fasta <file>] [-r <file>] [-o <output_dir>]Also, if you have output generated by one of supported assemblers (Canu, Flye, SPAdes), you can run AGB as follows:
agb.py -i <input_dir> -a <assembler_name> [-r <file>] [-o <output_dir>]Options:
If an output path is not specified manually (with
-o), AGB generates its output into
AGB output contains:
<output_dir>/viewer.htmlmain file with graph visualization
AGB visualizes the assembly graph produced by an assembler, where edges represent various genome segments (each genome segment is represented by its forward and reverse-complement edge). The top panel contains control buttons for iterating over connected components and buttons for exporting the graph in SVG and DOT formats. In addition, it contains a trigger for switching between default, repeat-focused, reference-based or contig-based modes. Each edge is labeled with its identifier, length, and read coverage. Unique edges are shown as thin black lines, while repetitive edges are shown as colored and thick lines (edge width depends on its coverage). All edges within each mosaic repeat are highlighted with the same color. Nodes with zero indegree or outdegree are shown as black circles. Unbalanced nodes with the difference in coverage of incoming and outgoing edges are highlighted in red.
The graph representation can be further modified using Additional options.
This mode could be useful to assess the contiguity/complexity of the graph and to find problematic parts of the assembly.
This mode is designed for analyzing complex repeat structures. AGB removes all unique edges from the assembly graph, so each remaining connected component forms a mosaic repeat. By default, each mosaic repeat is highlighted with the same color (all unique edges are colored as black). Light green nodes present the hidden parts of the graph.
Some assemblers (e.g., Flye and Canu) provide information on whether an edge is repetitive. If such information is not available, AGB attempts to classify unique and repetitive edges in the assembly graph using the following simple criteria. For each edge, we estimate its multiplicity by dividing the read coverage of this edge by the median coverage. Edge multiplicity value is set to 1 if this ratio is less than 1.75. An edge is classified as unique if it has multiplicity equal to 1 and as repetitive otherwise.
If a reference genome is available, AGB runs QUAST-LG to align graph edges and contigs (scaffolds) produced by assemblers to the reference genome and detect assembly errors. This mode provides two additional options for edge coloring: either according to their mappings to the reference (same colors represent same chromosomes), or based on the presence of assembly errors. A subset of edges mapped to each chromosome can be visualized on a separate page. When edges are colored according to the presence of assembly errors detected by QUAST-LG, green edges do not contain errors, red edges belong to the misassembled contigs (but correspond to correct genomic sequences), and dark red edges represented by parallel lines are erroneous themselves.
At the top, corresponding edge alignments to the selected chromosome are displayed. Red blocks contain detected assembly errors, while green blocks were aligned correctly. The alignment of the selected edge is highlighted with dark green color. It is also possible to display brief information about an alignment by hovering.
If an assembler provides paths in the assembly graph corresponding to the assembled contigs/scaffolds, AGB displays each path separately. Given the reference genome, AGB also shows the number of assembly errors per contig.
The left panel includes the menu with various options, the search bar, and the tables describing various graph elements.
The search bar allows to search all graph edges, contigs, and reference chromosomes by name and display them.
AGB displays interactive sortable tables containing information about edges, vertices, contigs, reference chromosomes, and connected components. All tables are affected by additional options (only edges satisfying the filtering criteria (read length, coverage, or uniqueness) are taken into account).