Project: Submission Spreadsheet

Title: Example Entries



Introduction

The purpose of this document is to detail the requirements for filling out entries in the spreadsheet designed for Microarray submission to the GUDMAP database.

The GUDMAP Microarray spreadsheet includes many fields that are required by the Gene Expression Omnibus (GEO), a public repository that stores high-throughput experimental data including microarray data. It is anticipated that data submitted to GUDMAP can be easily incorporated into GEO and vice versa.

The following examples serve to highlight in which details the GUDMAP Microarray data requirements differ from that of GEO.

______________________________________________________________________________________________________________

GEO Submissions

GEO requires that users supply 3 types of information: Platform, Sample, and Series data.

For more details of GEO submissions, go to the following link:
http://www.ncbi.nlm.nih.gov/projects/geo/info/overview.html


Image taken from:
http://www.ncbi.nlm.nih.gov/projects/geo/info/overview.html

Currently, GUDMAP accepts only microarray data from Affymetrix chips.

______________________________________________________________________________________________________________

GUDMAP Microarray Spreadsheets

The data required for a GUDMAP Microarray submission is organised into two Excel workbooks.

The first workbook contains 2 spreadsheets.
a. Sheet 1 - The Sample Data Sheet (incorporating sample and platform data)
b. Sheet 2 - The Series Data Sheet (series data only)

The second workbook should contain a single spreadsheet.
c. Sheet 3 The Associated File Sheet (CEL, CHP, EXP, RPT, TXT files)
In this should be entered the names of the supplementary files associated with each sample.

IMPORTANT:  Only alpha-numeric characters (A-Z, 0-9) will be accepted in both the image directory and the image filenames.  The only exception to this is an underscore [ _ ] and the dot [ . ] before the file extension.

______________________________________________________________________________________________________________

GUDMAP Sample Data (Sheet 1)
Sample data describes the biological material examined and the measurements derived from analysis of this material.

Unique ID (Column A)
Enter here a unique ID for each row. This can be any unique value but labs may wish to use an ID that they hold locally. Only alpha-numeric characters (A-Z, 0-9) will be accepted for a Unique ID.  The only exception to this is an underscore [ _ ] and the dot [ . ] before the file extension.

Associated File (Column B)

Enter here the name of the spreadsheet in which the associated files for each row is listed (i.e. the name of the Excel workbook incorporating the associated file sheet).

______________________________________________________________________________________________________________

Submitter / PI Details (Columns C-R)

Enter here the contact details of both the submitter and the principal investigator.
______________________________________________________________________________________________________________

Sample Name & Specimen Details (Columns S-AB)

Enter here the sample name and specimen details. One should pay particular attention to the following columns:

Tissue / Cell Type (Column T)

Enter here the anatomical component(s) from which the RNA was isolated.

Ontology Name (Column U)

Enter here the EMAP ID of the anatomical component(s) from which the RNA was isolated.

Sex (Column AB)

Options include ‘male’, ‘female’, ‘unknown’, or ‘both'.


______________________________________________________________________________________________________________
Staging & Dissection Method (Columns AC-AF)

Enter here the staging details and dissection method. A Theiler stage (Column AD) must be provided.

Sample Image (Column AH)

Enter here the filename of the image illustrating the source of the sample. When including multiple images, use the vertical slash (pipe) symbol to separate filenames.

RNA Quality (Columns AI-AK)

Enter here details relating to the quality of the RNA.
In Column AJ we would like the A260/280 index i.e. the ratio of UV absorbance readings at 260 nm and 280 nm giving a general assessment of RNA quality (with 1.8 - 2.0 being ideal).
In Column AK we would like the Bioanalyzer RIN i.e. the RNA Integrity Number provided by Agilent's Bioanalyzer. Again this is a general assesment of RNA quality (with 10 being highest quality and 1 being lowest).


______________________________________________________________________________________________________________

Platform Details (Columns AL-AY)

Enter here details of both the array design and manufacture protocols.

______________________________________________________________________________________________________________

References / Experimental Goals and Design (Columns AZ-BB)

In Column AZ, enter details relating to the standard RNA reference that was used for the purpose of comparing multiple experiments.
In Columns BA and BB, describe the experimental goals and experimental design, respectively.


______________________________________________________________________________________________________________

GUDMAP Series Data (Sheet 2)


Series data links together groups of samples and intends to describe the study as a whole.

For more info on series data, go to the following link:
http://www.ncbi.nlm.nih.gov/projects/geo/info/depguide.html#SubmitSeries

Standard Fields Table

The Standard Fields Table contains information relating to the design of the experiment and requires both a title and a summary.


For a GUDMAP Microarray submission, we require submitters to complete three additional fields not found in GEO. These are listed below:

Sample ID Table
The Sample ID Table should contain the unique ID from the Sample Data Sheet (in Column A) along with a description of that sample (in Column B).


Variable Table
The Variable Table follows GEOarchive metadata guidelines (http://www.ncbi.nlm.nih.gov/projects/geo/info/spreadsheet.html#GAmeta).

The variable (Column A) is a required field. Variables can be one of the following: dose, time, tissue, strain, gender, cell line, development stage, age, agent, cell type, infection, isolate, metabolism, shock, stress, temperature, specimen, disease state, protocol, growth protocol, genotype/variation, species, individual, or other.

A description (Column B) is additionally a required field. Descriptions should include information about the developmental stage and tissue from which RNA was extracted.

In Column C, unique IDs relating to individual samples should be entered. Multiple IDs are to be pipe (I) separated.

Replicates Table
The Replicates Table follows GEOarchive metadata guidelines (http://www.ncbi.nlm.nih.gov/projects/geo/info/spreadsheet.html#GAmeta).

There are three options for the replicate type (Column A). These are: 1) biological replicate; 2) technical replicate - extract; 3) technical replicate - labeled-extract. Enter one of these terms into Column A.

In Column B, unique IDs relating to individual samples should be entered. Multiple IDs are to be pipe (I) separated.

______________________________________________________________________________________________________________

GUDMAP Associated Files (Sheet 3)
In this sheet, one should enter the filenames of the associated files. Please ensure the filename extension is in upper case (i.e. of the format filename.TXT).

Unique ID (Column A)

Enter here the ID found in Column A of the Sample Data Sheet (Sheet 1).

CEL (Column B)

The Cell Intensity file (.CEL) contains fluorescence intensities for each probe on the microarray.
Enter here the name of the associated CEL file for this ID.

CHP (Column C)

The Chip file (.CHP) contains signal values and presence / absence calls for each probe set on the microarray.
Enter here the name of the associated CHP file for this ID.

RPT (Column D)

The Report file (.RPT) includes information about noise and internal hybridisation controls within the chip.
Enter here the name of the associated RPT file for this ID.

EXP (Column E)

The Experiment Information file (.EXP) includes information relating to the hybridisation protocol.
Enter here the name of the associated EXP file for this ID.

TXT (Column F)

The Expression Analysis file (.TXT) is a text version of the .CHP file. It contains this information in a tab-delimited format.
Enter here the name of the associated TXT file for this ID.


For more details of the information provided by the associated files, please see the following website:
http://chip.dfci.harvard.edu/stats/data.php