Contents
The creation of a print within the database is a complicated process, but is absolutely required prior to experiment entry. To get your data into the database, there are a number of things we need, the most important being the molecules that serve as your reporters, the plates in which they are located in, and any other platesample data you might have (LIMS/QC). In some circumstances, this list is termed a "godlist". It was referred as such because it comprised a list of all the reporters, including their biological annotation, the plate sample itself (well address + contents), listed in the order the plates were put in the printer. In terms of array design, this file was practically "omniscient", hence the name.
To simplify this terminology, as well as the print submission process, we now encourage researchers to only submit platesamples to the database a single time, when the plate is constructed (rather than everytime the plate was printed, as a godlist suggests). That way, errors are kept to a minimum, and changes in a plate's contents can be reflected in the multiple prints that the plate may have contributed to. Therefore, new prints made from pre-existing plates are trivial to enter, requiring only a list of plate names (numbering in the dozens), rather than a reiteration of the 40,000 well contents that were spotted. Therefore...
New plates and platesamples are conveyed to the database curators via a tab-delimited text file, most likely exported from a spreadsheet application. The requirements we make of this file are the following:
Required Columns PLAT The plate number; eg 1, 2, 3, etc. PROW The plate row; eg A, B, C, etc. PCOL The plate column; eg 1, 2, 3, etc. NAME The sequence name; usually a systematic name or clone identifier (see CLONEID, below). This is the only name used for samples of TYPE other than CDNA. TYPE The sequence type; usually ORF, CDNA, CONTROL, or EMPTY. List of TYPEs | Download SUID/TYPE Examples FAIL Null is assumed to be 0 (success)
- Whether the PCR (sample verification) failed;
- 0 : one distinct band - success
- 1 : no signal - fail
- 2 : multiple distinct bands
- 3 : signal, but not a distinct band (smear)
- 4 : multiple smears
- 5 : unknown
- 101 : worst cases of peeled away or haloed spots(assigned on a 96 well plate basis)
- 102 : less bad cases of peeled away or haloed spots(assigned on a 96 well plate basis)
CLONEID Required for samples of TYPE=CDNA, if ACC is absent/null. Real cDNA clones must have a cloneID. otherwise it is assumed the sample is a psuedoclone, which requires an surrogate accession.. Format examples: IMAGE:34049, ATCC:183963 ACC Required for samples of TYPE=CDNA, if CLONEID is absent/null. This is the GenBank accession, usually acquired from dbEST. Used to populate the clone and clone_gbacc tables Optional Columns DESC A description of the molecular entity, if desired. This desctiption is associated with the SUID itself (not a clone or platesample description) LUID Laboratory Unique ID: For those samples that have identical NAME and TYPE, but require distinction within the laboratory for experimental reasons (different sources, questionable quality, sample tracking). GENE_NAME Sometimes clones will stop being included in UniGene for spurious reasons, but users have a 'Preferred Name' for those clones. The Gene_name column will be entered into the preferred_name column of the clone table, for a new clone SUID. ORIGIN For CDNA clones, this can indicate whether they are public or private. SOURCE A string describing the source of the clone or DNA. This has typically been used to indicate the original plate source, and the 96 and 384 well plate locations that a clone has been in, eg GF200:96(1A1):384(1A1). In this case GF200 refers to a set of resgen plates, aka the 1st 5K. This field can be used by any type of DNA. IS_CONT Whether the sample is known to be contaminated. A blank entry will default to unknown (U) IS_VER Whether the DNA in a well has been verified. A blank entry will default to no (N). SAMPLE_DESC A description, if any, about that particular sample. This description is specific to the plate sample. ORGANISM If submitting a print containing samples from multiple organisms (i.e. human, yeast). For those few rows where the sample is derived from an organism *other* than the default (user-defined), the organism code must be specified. For a list of 2-letter organism codes, go here
RULE 1: Required column headers are: PLAT, PCOL, PROW, NAME, TYPE, FAIL, and CLONEID, ACC (if TYPE=CDNA). If any of these headers are either misspelled or absent OR if any data is null (except FAIL/ACC/CLONEID columns), you cannot proceed with plate/platesample submission. In addition, PLAT, PCOL, and PROW ordering must be correct and no wells may be skipped (with the exception of the last plate in the print run). Empty wells must be specified as such, except for the tail-end of the last plate (also see common errors). Optional columns: DESC, LUID, CLID, GENE_NAME, ORIGIN, SOURCE, IS_CONT, IS_VER, SAMPLE_DESC, CLONE_DESC, ORGANISM.
The database uses the combination of NAME, TYPE, and ORGANISM to uniquely identify a sample. Each unique combination is given a unique numeric identifier, also called SUID. SUIDs allow comparison of the same samples accross different prints. Thus, it is extremely important to insure that erroneous SUIDs are not created. Erroneous SUIDs are usually created by a bad NAME (either misspelled, non-standard, or non-systematic). Every new platesample must be verified and committed to the database (via SUID) before the reporter/molecule contained can be entered. Therefore, all rows in your file are checked to see if the combination of NAME, TYPE, and ORGANISM has been used previously. If not, these samples (rows) must have new SUIDs assigned to them if they are verified by the user to be legitimate, new samples.
RULE 2: If any any samples within your platesample list are not currently in the database, you will be prompted to double-check your entries prior to passing the intermediate file off to the curators. Please be a concientious user and verify that any new SUIDs you approve are valid. Erroneous SUIDs prevent comparisons between prints/experiments!
If your list passes the first two rules (above), you must eventually specify how the samples were printed in order to de-convolute the SPOTLIST. In order to do this, you must know how many sectors (corresponding to printer tips), columns, and rows your print output, and therefore, the gridding methodology, corresponds to. You will have to provide these 3 values, and they must "equate" (relatively) to the number of rows in your platesample list (sans header)
RULE 3: This equation must be followed:#samples = (#platesample list rows-1) <= #tips * #rows * #columns = #spotsTherefore, you are not permitted to have more platesamples than gridded spots (experiment loading would be disallowed). Conversely, if your #spots (gridded) is greater than #samples (rows in submitted file) realize that the data from the empty, "ghost" spots on the partial last row will be disposed of during experiment loading (allowed).
There is a program to assist you in platesample submission, which follows the rules stipulated above. In order to run it, you must place (SFTP) the file in the incoming directory of your loader account. This is so the program can read your original file, offer feedback, and write intermediate steps to the logs directory.