Documentation - Yeast Odyssey

# Documentation **Yeast Odyssey** is a tool to search for the habitats of a given Fungi species or genus. It searches in the public ITS sequencings files from *NCBI* [[1]](#1), downloads them, analyses the files using *Kraken 2* [[2]](#2) with a reference database created from the *MIMt* ITS sequences [[3]](#3) enriched with custom sequences and displays all the samples ordered by the abundance of this taxon. The samples are assigned to categories from the *OntoBiotope* [[4]](#4) ontology. When available, the country and precise geographic location are given too. To search for a taxon, enter the name of the Fungi genus or species you're interested in in the text field of the home page and click on the magnifying glass button. The results page will be shown containing the following elements. ## Results list The results are displayed in a list where each line corresponds to a sample. The different columns are: * **Sample**: The accession ID of the *NCBI* run, with a link to the corresponding page. A tooltip gives the corresponding experiment title. * **Study**: The title of the study for which the sample has been produced, with a link to the corresponding *NCBI* page. * **Location**: The flag of the country in whick the sample has been collected when available. A tooltip gives more details when available (region, town,...). * **Categories**: The categories from the *Ontobiotope* ontology that have been assigned to the sample. * **Abundance/Fungi** The abundance of your taxon within the Fungi in proportion of reads. By default, the samples are sorted by this column. * **Abundance/Saccharomycotina or phylum** If your taxon is a Saccharomycotina yeast, this column will show the abundance of your taxon within the Saccharomycotina subphylum, otherwise within its phylum. A click on the column title will sort the samples by this column. * **Abundance/genus** The abundance of your taxon within its genus. A click on the column title will sort the samples by this column. ## Filters The filters at the top of the page allows you to filter in or out samples according to different criteria: * **Taxon**: You can change the taxon of interest here. * **Abundance range**: You can set the minimum and maximum abundance of your taxon that you want in the samples. Only the samples with the abundance of your taxon within this range will be displayed. * **Exclude samples with no geographic coordinates**: Excludes the results that have no geographic coordinates, to make the results list relevant with the map. * **Include categories**: You can enter a list of categories that you want to include in the results. The samples that weren't assigned to one of them won't be displayed. If the list is empty, all the categories are displayed. * **Exclude categories**: You can enter a list of categories that you want to exclude from the results. The samples that were assigned to one of these categories wont be displayed. * **Include countries**: You can enter a list of countries that you want to include in the results. Only the samples collected in one of these countries will be displayed. If the list is empty, the samples from all the countries are displayed. * **Map**: The map on the right side allows you to select a more precise location from which you want to see the samples. You can zoom, unzoom or move it to select the desired location. The number of samples of each location is shown on the map, alongside with the abundance of your taxon in the corresponding area. The samples are grouped on a grid on the map, when there are more than 1000 samples to display. Otherwise, they are clustered according to the distance of each others. Some buttons allows other actions: * **Download TSV**: You can download the data with the current filters in a tabular file to study it and automate research for insights. * **Group by**: You can group the results by category or by country. The groups (categories or countries) will be shown sorted by mean abundance of your taxon within the samples of that group. The abundances displayed are the means within the groups as well. A click on the line expands or collapses the group to show or hide its samples. For the categories, the categories shown are only the top-level nodes, the direct children of *microbial habitats* in the ontology. * **Reinitialize filters**: Reinitializes all the filters to their default value. Only the taxon name is kept. * **Update results**: The results list is updated automatically when a field loses the focus. This button forces the update. ## References [1] Sequence Read Archive (SRA) [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; 2009. Available from: https://www.ncbi.nlm.nih.gov/sra/ [2] Wood *et al.* 2019. Improved metagenomic analysis with Kraken 2. DOI: 10.1186/s13059-019-1891-0 [3] Cabezas *et al.* 2024. MIMt – A curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification. DOI: 10.1186/s40793-024-00634-w [2] Nédellec *et al.* 2018. Text-mining and ontologies: new approaches to knowledge discovery of microbial diversity. DOI: 10.48550/arXiv.1805.04107