Note
This is a work in progress. Please let us know of any additions or changes you will like to see by emailing us at qiita.help@gmail.com.
If you have not yet created an account, please see the document Creating an account.
Studies are the main source of data for Qiita. Studies can contain only one set of samples but can contain multiple sets of raw data, each of which can have a different preparation. Many experiments will contain only one data set, which includes data for all samples. Such experiments will include a single sample template and a single prep template.
However, Qiita can also support more complex study designs. For example imagine a study with 100 samples in which:
To represent this project in Qiita, you will need to create a single study with a single sample template that contains all 100 of the samples. Separately, you will need to create four prep templates that describe the preparations for the corresponding samples. All raw data uploaded will need to correspond to a specific prep template. For instance, the data sets described above would require the following data and template information:
To create a study, click on the “Study” menu and then on “Create Study”. This will take you to a new page that will gather some basic information to create your study.
The “Study Title” has to be unique system-wide. Qiita will check this when you try to create the study, and may ask you to alter the study name if the one you provide is already in use.
A principal investigator is required, and a list of known PIs is provided. If you cannot find the name you are looking for in this list, you can choose to add a new one.
Select the environmental package appropriate to your study. Different packages will request different specific information about your samples. This information is optional; for more details, see the metadata section.
Finally, select the kind of time series you have. The main options are:
Additionally, there is a distinction between real, pseudo or mixed interventions:
Once your study has been created, you will be informed by a green message; click on the study name to begin inserting your sample template, raw data and/or prep templates.
The first point of entrance to a study is the study description page. Here you will be able to edit the study info, upload files, and manage all other aspects of your study.
The first step after study creation is uploading files. Click on the “Upload” button: as shown in the figure below, you can now drag-and-drop files into the grey area or simply click on “select from your computer” to select the fastq, fastq.gz or txt files you want to upload.
Uploads can be paused at any time and restarted again, as long as you do not refresh or navigate away from the page, or log out of the system from another page.
Once your file(s) have been uploaded, you can process them in Qiita. From the upload tool, click on “Go to study description” and, once there, click on the “Sample template” tab. Select your sample template from the dropdown menu and, lastly, click “Process sample template”.
If a sample template is processed successfully, a green message will appear; if processing is unsuccessful, a red message describing the errors will appear. In this case, please fix the described issues, re-upload your file, and then re-attempt processing.
You can download the processed sample template file from the “Sample template” tab. If you are using a single-user install, you will see the full path on your computer for downloads; alternately, if you have a multi-user install, you will be able to download the files, see below:
An example of how downloads differ between the single- and multi-user installs. In a single-user install, the file-path on your system is provided. In a multi-user install, an actual download of the file is available.
Once the sample template is successfully processed, you will be able to use the “Add prep template” tab.
After you’ve added a new prep template, you can either (a) select a new raw data file from the drop-down menu of uploaded files or (b) add raw data from another study to which you have access. The latter ability exists as a way to avoid duplication of uploads, since some studies share the same raw data (for example, the same fastq files).
Note
Prep templates are not shared, only raw data can be shared.
Here you should select what kind of data you are processing (SFF, FASTQ, etc). Once the selections are made you can “Link” your raw data. This action will take you to a new page, where the moving/adding job is created, but you can move out of there whenever you want.
Note
From that moment until the job is finish, you will see a “Linking files” message and you will not be able to add any more files or unlink them.
Adding prep templates is similar to adding sample templates except that, in addition to selecting the prep template file from the dropdown menu, you will also need to select what kind of prep template (16S, 18S, etc) and the corresponding investigation type. The investigation type is optional for Qiita, but a requirement for submitting your data to EBI.
Finally, when you add a new prep template, you will get two new links or two full paths for those running Qiita on your local machine: one to download the prep template you uploaded and another one that is a QIIME-compatible mapping file. The QIIME mapping file is a combination of the sample and the prep template.
Once you have linked files to your raw data and your prep template has been processed, you can then proceed to preprocessing your data. Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>__ a list of currently supported raw files files.
Once the preprocessing is finished you will have 4 new files:
The HDF5 demuliplexed file format allows (described in detail here) for random access to sequences associated with samples, as well as per-sample statistics. This format originated from the need to fetch sequences associated with individual samples, which required substantial overhead when working with ASCII formatted sequence files such as fasta and fastq. The structure provided by HDF5 enables Qiita to rapidly access the sequence data for any sample, and additionally, to efficiently subset (potentially randomly) the corresponding sequences.
HDF5 can be thought of internally as a filesystem, where directories are called “groups” and files are called “datasets.” In the HDF5 demux format, a sample is a group and the sequence data are decomposed into multiple datasets. Specifically, the following datasets are directly part of the sample group:
Barcode details can be found under the “barcode” group of the sample. Within there are three datasets:
All datasets within a sample are in index-order. In other words, the sequence at index zero corresponds to the quality at row zero, corresponds to the barcode at index zero, etc.
Last, the following summary statistics are tracked per-sample (accessible via the group attributes) and per-file (accessible via the file attributes):
Once you are happy with these files and you are ready for publication, you can contact one of the Qiita admins to submit to EBI, this process normally takes a couple of days but can take more depending on availability and how busy is the submitting queue.