This folder contains the data necessary for the analysis described in GrAnnoT's paper (doi), and the files produced with this data. The command lines used to process this data and produce the outputs are described in the file "grannot_analysis_command_lines.txt". The only unprovided data are the 12 genomes sequences, issued from the paper from 2020 by Zhou, Y., Chebotarov, D., Kudrna, D. et al., "A platinum standard pan-genome resource that represents the population structure of Asian rice" (doi:10.1038/s41597-020-0438-2). These genomes were used to build the rice pangenome graph (along with the Nipponbare reference (doi:10.1186/1939-8433-6-4)), and for the Liftoff transfers. The rice annotation comes from the Rice Genome Annotation Project, available at https://rice.uga.edu/ The E.coli genomes used to build the pangenome graph come from the paper available at http://dx.doi.org/10.7554/eLife.78834 The K12_MG1655 annotation is adapted from : https://www.ncbi.nlm.nih.gov/nuccore/U00096.3 to match the pangenome graph. The graph was made by the Human Pangenome Reference Consortium, and is available at https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=pangenomes/scratch/2022_03_11_minigraph_cactus/ The human genomes for the Liftoff transfer come from https://projects.ensembl.org/hprc/ The CHM13 annotation is adapted from : https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/ to match the pangenome graph. This folder is organised as such : . ├── data │   ├── ecoli │   │   ├── EcoliGraph_MGC.gfa │   │   ├── feature_types.txt │   │   ├── K_12_MG1655_09949b0.fasta │   │   ├── O127_H6_E2348_69_193637c.fasta │   │   └── sequence_filter_rename_K_12_MG1655_09949b0.gff3 │   ├── human │   │   ├── CHM13_chr1.gff │   │   ├── chm13.draft_v1.1_chr1.fasta │   │   ├── feature_types.txt │   │   ├── GCA_000001405.15_GRCh38_no_alt_analysis_set_chr1.fna │   │   └── HumanChr1Graph_renamePaths.gfa │   └── rice │   ├── GCA_009830595.1_AzucenaRS1_genomic.fna │ ├── nb_allFeatures.fa │   ├── nb_allFeatures.gff3 │   ├── nb_allFeatures_renamed_filter.bed │   ├── nb_allFeatures_renamepath_annotate.gff3 │   ├── refpath_odgi │   ├── refpath_vg │   ├── RiceGraph_MGC.gfa │   ├── RiceGraph_MGC_paths.gfa │   ├── RiceGraph_MGC_refOs127652RS1.gfa │   ├── TIGRv7_ok.fasta │   └── TIGRv7_ok.genome ├── grannot_analysis_command_lines.txt ├── outputs │   ├── ecoli │   │   ├── intermediate_files │   │   │   ├── reference_all_genes.fa │   │   │   └── reference_all_to_target_all.sam │   │   ├── liftoff_transfer_k12_to_0127.gff │   │   ├── O127_H6_E2348_69_193637c │   │   │   └── O127_H6_E2348_69_193637c.gff │   │   └── unmapped_features.txt │   ├── human │   │   ├── GRCh38 │   │   │   └── GRCh38.gff │   │   ├── intermediate_files │   │   │   ├── reference_all_genes.fa │   │   │   └── reference_all_to_target_all.sam │   │   ├── liftoff_transfer_chm13_to_grch38.gff │   │   └── unmapped_features.txt │   └── rice │   ├── back_forth_transfer │   │   ├── grannot │   │   │   ├── AzucenaRS1.gff │   │   │   └── IRGSP.gff │   │   └── liftoff │   │   ├── AzucenaRS1.gff3 │   │   └── IRGSP.gff3 │   ├── grannot │   │   ├── AzucenaRS1 │   │   │   ├── AzucenaRS1.gff │   │   │   ├── AzucenaRS1_var_sorted.txt │   │   │   └── AzucenaRS1_var.txt │ │ ├── AzucenaRS1_refOs127652RS1.gff │ │ ├── RiceGraph_MGC.gaf │   │   └── segments.txt │   ├── grannot_multi │   │   ├── AzucenaRS1 │   │   │   └── AzucenaRS1.gff │   │   ├── Os117425RS1 │   │   │   └── Os117425RS1.gff │   │   ├── etc... │   │   └── PAV_matrix.txt │ ├── graphaligner │ │   └── graphaligner_rice_transfer.gaf │   ├── liftoff_multi │   │   ├── AzucenaRS1_named.db.gff │   │   ├── AzucenaRS1_named.gff │   │   ├── AzucenaRS1_named_unmappeddb.txt │   │   ├── AzucenaRS1_named_unmapped.txt │   │   ├── Os117425RS1_named.db.gff │   │   ├── Os117425RS1_named.gff │   │   ├── Os117425RS1_named_unmappeddb.txt │   │   ├── Os117425RS1_named_unmapped.txt │   │   └── etc... │   ├── odgi │   │   └── odgi_transfer_nb_azu.bed │   └── vg │   ├── nb_allFeatures_annotate.gaf │   ├── nb_allFeatures_annotate.gam │   ├── nb_allFeatures_renamed_filter.bam │   ├── nb_allFeatures_renamed_filter.gaf │   ├── nb_allFeatures_renamed_filter.sam │   └── RiceGraph_MGC_paths.xg └── readme.txt
Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

11 to 20 of 109 Results
Unknown - 152.4 MB - MD5: 1f0269e5728a0ed5f83188cd08792c41
Data
Annotation file of the chromosome 1 of the genome CHM13, adapted from https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/ to match the pangenome graph.
Unknown - 240.8 MB - MD5: ab2421da8b88a867d087c3f452a6969f
Fasta file of the chromosome 1 of the genome CHM13,from https://projects.ensembl.org/hprc/.
Unknown - 112.8 MB - MD5: e5183acfb527fcd7b01ac4386d7be290
Data
E.coli pangenome graph built with 12 genomes from the paper available at http://dx.doi.org/10.7554/eLife.78834.
Plain Text - 638 B - MD5: 571dcfb312543edd81dca0524212459a
Lists all the feature types in the human annotation (CHM13_chr1.gff), necessary to run Liftoff on all the features.
Plain Text - 104 B - MD5: 6752a18a92500cff53d9d8c4210fc1bc
Data
Lists all the feature types in the E.coli annotation (sequence_filter_rename_K_12_MG1655_09949b0.gff3), necessary to run Liftoff on all the features.
Unknown - 240.8 MB - MD5: 90780be24a8555613e6ad0bd2f8a0935
Data
Fasta file of the chromosome 1 of the genome GRCh38,from https://projects.ensembl.org/hprc/.
Unknown - 363.7 MB - MD5: e923ca38bac7f347ffbf88142b93428f
Fasta file of the genome Nipponbare, from the paper from 2013 by Kawahara, Y., de la Bastide, M., Hamilton, J.P. et al., "Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data" (doi:10.1186/1939-8433-6-4).
Unknown - 539 B - MD5: 562caffe11637996021726b3d3a2e59b
Data
Text file necessary to run VG on the rice data.
Unknown - 4.5 MB - MD5: 4e617b3e53fbe61f2ca689dda227a674
Data
Fasta file of the genome K_12_MG1655_09949b0, from the paper available at http://dx.doi.org/10.7554/eLife.78834.
Unknown - 626.5 MB - MD5: 25854fb0391bc1676dcddc967fdf3371
Data
Fasta file of all the features from the annotation file "nb_allFeatures.gff3".
Add Data

Log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.