GrAnnoT data and output

This folder contains the data necessary for the analysis described in GrAnnoT's paper (doi), and the files produced with this data. The command lines used to process this data and produce the outputs are described in the file "grannot_analysis_command_lines.txt". The only unprovided data are the 12 genomes sequences, issued from the paper from 2020 by Zhou, Y., Chebotarov, D., Kudrna, D. et al., "A platinum standard pan-genome resource that represents the population structure of Asian rice" (doi:10.1038/s41597-020-0438-2). These genomes were used to build the rice pangenome graph (along with the Nipponbare reference (doi:10.1186/1939-8433-6-4)), and for the Liftoff transfers. The rice annotation comes from the Rice Genome Annotation Project, available at https://rice.uga.edu/ The E.coli genomes used to build the pangenome graph come from the paper available at http://dx.doi.org/10.7554/eLife.78834 The K12_MG1655 annotation is adapted from : https://www.ncbi.nlm.nih.gov/nuccore/U00096.3 to match the pangenome graph. The graph was made by the Human Pangenome Reference Consortium, and is available at https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=pangenomes/scratch/2022_03_11_minigraph_cactus/ The human genomes for the Liftoff transfer come from https://projects.ensembl.org/hprc/ The CHM13 annotation is adapted from : https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/ to match the pangenome graph. This folder is organised as such : . ├── data │ ├── ecoli │ │ ├── EcoliGraph_MGC.gfa │ │ ├── feature_types.txt │ │ ├── K_12_MG1655_09949b0.fasta │ │ ├── O127_H6_E2348_69_193637c.fasta │ │ └── sequence_filter_rename_K_12_MG1655_09949b0.gff3 │ ├── human │ │ ├── CHM13_chr1.gff │ │ ├── chm13.draft_v1.1_chr1.fasta │ │ ├── feature_types.txt │ │ ├── GCA_000001405.15_GRCh38_no_alt_analysis_set_chr1.fna │ │ └── HumanChr1Graph_renamePaths.gfa │ └── rice │ ├── GCA_009830595.1_AzucenaRS1_genomic.fna │ ├── nb_allFeatures.fa │ ├── nb_allFeatures.gff3 │ ├── nb_allFeatures_renamed_filter.bed │ ├── nb_allFeatures_renamepath_annotate.gff3 │ ├── refpath_odgi │ ├── refpath_vg │ ├── RiceGraph_MGC.gfa │ ├── RiceGraph_MGC_paths.gfa │ ├── RiceGraph_MGC_refOs127652RS1.gfa │ ├── TIGRv7_ok.fasta │ └── TIGRv7_ok.genome ├── grannot_analysis_command_lines.txt ├── outputs │ ├── ecoli │ │ ├── intermediate_files │ │ │ ├── reference_all_genes.fa │ │ │ └── reference_all_to_target_all.sam │ │ ├── liftoff_transfer_k12_to_0127.gff │ │ ├── O127_H6_E2348_69_193637c │ │ │ └── O127_H6_E2348_69_193637c.gff │ │ └── unmapped_features.txt │ ├── human │ │ ├── GRCh38 │ │ │ └── GRCh38.gff │ │ ├── intermediate_files │ │ │ ├── reference_all_genes.fa │ │ │ └── reference_all_to_target_all.sam │ │ ├── liftoff_transfer_chm13_to_grch38.gff │ │ └── unmapped_features.txt │ └── rice │ ├── back_forth_transfer │ │ ├── grannot │ │ │ ├── AzucenaRS1.gff │ │ │ └── IRGSP.gff │ │ └── liftoff │ │ ├── AzucenaRS1.gff3 │ │ └── IRGSP.gff3 │ ├── grannot │ │ ├── AzucenaRS1 │ │ │ ├── AzucenaRS1.gff │ │ │ ├── AzucenaRS1_var_sorted.txt │ │ │ └── AzucenaRS1_var.txt │ │ ├── AzucenaRS1_refOs127652RS1.gff │ │ ├── RiceGraph_MGC.gaf │ │ └── segments.txt │ ├── grannot_multi │ │ ├── AzucenaRS1 │ │ │ └── AzucenaRS1.gff │ │ ├── Os117425RS1 │ │ │ └── Os117425RS1.gff │ │ ├── etc... │ │ └── PAV_matrix.txt │ ├── graphaligner │ │ └── graphaligner_rice_transfer.gaf │ ├── liftoff_multi │ │ ├── AzucenaRS1_named.db.gff │ │ ├── AzucenaRS1_named.gff │ │ ├── AzucenaRS1_named_unmappeddb.txt │ │ ├── AzucenaRS1_named_unmapped.txt │ │ ├── Os117425RS1_named.db.gff │ │ ├── Os117425RS1_named.gff │ │ ├── Os117425RS1_named_unmappeddb.txt │ │ ├── Os117425RS1_named_unmapped.txt │ │ └── etc... │ ├── odgi │ │ └── odgi_transfer_nb_azu.bed │ └── vg │ ├── nb_allFeatures_annotate.gaf │ ├── nb_allFeatures_annotate.gam │ ├── nb_allFeatures_renamed_filter.bam │ ├── nb_allFeatures_renamed_filter.gaf │ ├── nb_allFeatures_renamed_filter.sam │ └── RiceGraph_MGC_paths.xg └── readme.txt

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

21 to 30 of 111 Results

RiceGraph_MGC_paths.gfa Mar 18, 2025 - Data used for the analysis described in GrAnnoT paper Unknown - 4.3 GB - MD5: f541072bfc36b01c4bb1609192c272b9 Data Pangenome graph adapted from the file "RiceGraph_MGC.gfa" by converting the walks into paths.
RiceGraph_MGC_refOs127652RS1.gfa Mar 18, 2025 - Data used for the analysis described in GrAnnoT paper Unknown - 4.6 GB - MD5: e58d668a5ea6a6a2f2c8033032adcfb4 Data Pangenome graph built with the same data as the file "RiceGraph_MGC.gfa", but with a different genome used as reference (Os127652RS1).
RiceGraph_MGC_paths.xg Mar 18, 2025 - Output from the analysis described in GrAnnoT paper Unknown - 4.3 GB - MD5: 3ac4d2629fafbc7ffa77ef106d417e84 Data Rice pangenome graph with paths converted in .xg format by VG.
AzucenaRS1.fna Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper Unknown - 368.4 MB - MD5: d621f8e18ef4fce7aef9ddb0b00db2cf Data Fasta file of the genome AzucenaRS1, from the paper from 2020 by Zhou, Y., Chebotarov, D., Kudrna, D. et al., "A platinum standard pan-genome resource that represents the population structure of Asian rice" (doi:10.1038/s41597-020-0438-2).
CHM13_chr1.gff Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper Unknown - 152.4 MB - MD5: 1f0269e5728a0ed5f83188cd08792c41 Data Annotation file of the chromosome 1 of the genome CHM13, adapted from https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/ to match the pangenome graph.
chm13.draft_v1.1_chr1.fasta Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper Unknown - 240.8 MB - MD5: ab2421da8b88a867d087c3f452a6969f Fasta file of the chromosome 1 of the genome CHM13,from https://projects.ensembl.org/hprc/.
EcoliGraph_MGC.gfa Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper Unknown - 112.8 MB - MD5: e5183acfb527fcd7b01ac4386d7be290 Data E.coli pangenome graph built with 12 genomes from the paper available at http://dx.doi.org/10.7554/eLife.78834.
feature_types.txt Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper Plain Text - 638 B - MD5: 571dcfb312543edd81dca0524212459a Lists all the feature types in the human annotation (CHM13_chr1.gff), necessary to run Liftoff on all the features.
GCA_000001405.15_GRCh38_chr1.fna Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper Unknown - 240.8 MB - MD5: 90780be24a8555613e6ad0bd2f8a0935 Data Fasta file of the chromosome 1 of the genome GRCh38,from https://projects.ensembl.org/hprc/.
IRGSP_nipponbare.fasta Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper Unknown - 363.7 MB - MD5: e923ca38bac7f347ffbf88142b93428f Fasta file of the genome Nipponbare, from the paper from 2013 by Kawahara, Y., de la Bastide, M., Hamilton, J.P. et al., "Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data" (doi:10.1186/1939-8433-6-4).

RiceGraph_MGC_paths.gfa

Mar 18, 2025 - Data used for the analysis described in GrAnnoT paper

Unknown - 4.3 GB -

Data

Pangenome graph adapted from the file "RiceGraph_MGC.gfa" by converting the walks into paths.

RiceGraph_MGC_refOs127652RS1.gfa

Mar 18, 2025 - Data used for the analysis described in GrAnnoT paper

Unknown - 4.6 GB -

Data

Pangenome graph built with the same data as the file "RiceGraph_MGC.gfa", but with a different genome used as reference (Os127652RS1).

RiceGraph_MGC_paths.xg

Mar 18, 2025 - Output from the analysis described in GrAnnoT paper

Unknown - 4.3 GB -

Data

Rice pangenome graph with paths converted in .xg format by VG.

AzucenaRS1.fna

Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper

Unknown - 368.4 MB -

Data

Fasta file of the genome AzucenaRS1, from the paper from 2020 by Zhou, Y., Chebotarov, D., Kudrna, D. et al., "A platinum standard pan-genome resource that represents the population structure of Asian rice" (doi:10.1038/s41597-020-0438-2).

CHM13_chr1.gff

Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper

Unknown - 152.4 MB -

Data

Annotation file of the chromosome 1 of the genome CHM13, adapted from https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/ to match the pangenome graph.

chm13.draft_v1.1_chr1.fasta

Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper

Unknown - 240.8 MB -

Fasta file of the chromosome 1 of the genome CHM13,from https://projects.ensembl.org/hprc/.

EcoliGraph_MGC.gfa

Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper

Unknown - 112.8 MB -

Data

E.coli pangenome graph built with 12 genomes from the paper available at http://dx.doi.org/10.7554/eLife.78834.

feature_types.txt

Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper

Plain Text - 638 B -

Lists all the feature types in the human annotation (CHM13_chr1.gff), necessary to run Liftoff on all the features.

GCA_000001405.15_GRCh38_chr1.fna

Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper

Unknown - 240.8 MB -

Data

Fasta file of the chromosome 1 of the genome GRCh38,from https://projects.ensembl.org/hprc/.

IRGSP_nipponbare.fasta

Mar 10, 2025 - Data used for the analysis described in GrAnnoT paper

Unknown - 363.7 MB -

Fasta file of the genome Nipponbare, from the paper from 2013 by Kawahara, Y., de la Bastide, M., Hamilton, J.P. et al., "Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data" (doi:10.1186/1939-8433-6-4).

Add Data

Share Dataverse

Link Dataverse

Reset Modifications