PUMAdb : Platesample and Reporter Entry Assistance Help

Contents

Description
Before you start...
Column Headers
NAMEs and SUIDs
Sample Files
Common Errors

Description

The creation of a print within the database is a complicated process, but is absolutely required prior to experiment entry. To get your data into the database, there are a number of things we need, the most important being the molecules that serve as your reporters, the plates in which they are located in, and any other platesample data you might have (LIMS/QC). In some circumstances, this list is termed a "godlist". It was referred as such because it comprised a list of all the reporters, including their biological annotation, the plate sample itself (well address + contents), listed in the order the plates were put in the printer. In terms of array design, this file was practically "omniscient", hence the name.

To simplify this terminology, as well as the print submission process, we now encourage researchers to only submit platesamples to the database a single time, when the plate is constructed (rather than everytime the plate was printed, as a godlist suggests). That way, errors are kept to a minimum, and changes in a plate's contents can be reflected in the multiple prints that the plate may have contributed to. Therefore, new prints made from pre-existing plates are trivial to enter, requiring only a list of plate names (numbering in the dozens), rather than a reiteration of the 40,000 well contents that were spotted. Therefore...

Before you start...

Not every new print requires plate/platesample submission. A new submission should only happen if you are certain that

The plates used during printing have not been previously entered into the database, or
The plate(s) were entered in the past, but their contents have changed over time (well contamination, well emptied), and are therefore considered novel.

For example, if your lab makes 3 different prints using the exact same plates (perhaps in different orders) it is not necessary to compile separate lists for all three. All the curators require to enter the subsequent new prints (2) would be a 2-column platelist, comprised of database plateIDs and plate names from the first print (in their new order). The curators can assist setting up your "master" plate list, from which all future prints will likely be composed. New plates can be entered as needed, as described below.

New plates and platesamples are conveyed to the database curators via a tab-delimited text file, most likely exported from a spreadsheet application. The requirements we make of this file are the following:

Column Headers

Required Columns
PLAT The plate number; eg 1, 2, 3, etc.

PROW The plate row; eg A, B, C, etc.

PCOL The plate column; eg 1, 2, 3, etc.

NAME The sequence name; usually a systematic name or clone identifier (see CLONEID, below). This is the only name used for samples of TYPE other than CDNA.

TYPE The sequence type; usually ORF, CDNA, CONTROL, or EMPTY. List of TYPEs | Download SUID/TYPE Examples

FAIL
Whether the PCR (sample verification) failed;
0 : one distinct band - success
1 : no signal - fail
2 : multiple distinct bands
3 : signal, but not a distinct band (smear)
4 : multiple smears
5 : unknown
101 : worst cases of peeled away or haloed spots(assigned on a 96 well plate basis)
102 : less bad cases of peeled away or haloed spots(assigned on a 96 well plate basis)
Null is assumed to be 0 (success)

CLONEID Required for samples of TYPE=CDNA, if ACC is absent/null. Real cDNA clones must have a cloneID. otherwise it is assumed the sample is a psuedoclone, which requires an surrogate accession.. Format examples: IMAGE:34049, ATCC:183963

ACC Required for samples of TYPE=CDNA, if CLONEID is absent/null. This is the GenBank accession, usually acquired from dbEST. Used to populate the clone and clone_gbacc tables

Optional Columns
DESC A description of the molecular entity, if desired. This desctiption is associated with the SUID itself (not a clone or platesample description)

LUID Laboratory Unique ID: For those samples that have identical NAME and TYPE, but require distinction within the laboratory for experimental reasons (different sources, questionable quality, sample tracking).

GENE_NAME Sometimes clones will stop being included in UniGene for spurious reasons, but users have a 'Preferred Name' for those clones. The Gene_name column will be entered into the preferred_name column of the clone table, for a new clone SUID.

ORIGIN For CDNA clones, this can indicate whether they are public or private.

SOURCE A string describing the source of the clone or DNA. This has typically been used to indicate the original plate source, and the 96 and 384 well plate locations that a clone has been in, eg GF200:96(1A1):384(1A1). In this case GF200 refers to a set of resgen plates, aka the 1st 5K. This field can be used by any type of DNA.

IS_CONT Whether the sample is known to be contaminated. A blank entry will default to unknown (U)

IS_VER Whether the DNA in a well has been verified. A blank entry will default to no (N).

SAMPLE_DESC A description, if any, about that particular sample. This description is specific to the plate sample.

ORGANISM If submitting a print containing samples from multiple organisms (i.e. human, yeast). For those few rows where the sample is derived from an organism *other* than the default (user-defined), the organism code must be specified. For a list of 2-letter organism codes, go here

Required Columns
PLAT	The plate number; eg 1, 2, 3, etc.
PROW	The plate row; eg A, B, C, etc.
PCOL	The plate column; eg 1, 2, 3, etc.
NAME	The sequence name; usually a systematic name or clone identifier (see CLONEID, below). This is the only name used for samples of TYPE other than CDNA.
TYPE	The sequence type; usually ORF, CDNA, CONTROL, or EMPTY. List of TYPEs \| Download SUID/TYPE Examples
FAIL	Whether the PCR (sample verification) failed; 0 : one distinct band - success 1 : no signal - fail 2 : multiple distinct bands 3 : signal, but not a distinct band (smear) 4 : multiple smears 5 : unknown 101 : worst cases of peeled away or haloed spots(assigned on a 96 well plate basis) 102 : less bad cases of peeled away or haloed spots(assigned on a 96 well plate basis) Null is assumed to be 0 (success)
CLONEID	Required for samples of TYPE=CDNA, if ACC is absent/null. Real cDNA clones must have a cloneID. otherwise it is assumed the sample is a psuedoclone, which requires an surrogate accession.. Format examples: IMAGE:34049, ATCC:183963
ACC	Required for samples of TYPE=CDNA, if CLONEID is absent/null. This is the GenBank accession, usually acquired from dbEST. Used to populate the clone and clone_gbacc tables

Optional Columns
DESC	A description of the molecular entity, if desired. This desctiption is associated with the SUID itself (not a clone or platesample description)
LUID	Laboratory Unique ID: For those samples that have identical NAME and TYPE, but require distinction within the laboratory for experimental reasons (different sources, questionable quality, sample tracking).
GENE_NAME	Sometimes clones will stop being included in UniGene for spurious reasons, but users have a 'Preferred Name' for those clones. The Gene_name column will be entered into the preferred_name column of the clone table, for a new clone SUID.
ORIGIN	For CDNA clones, this can indicate whether they are public or private.
SOURCE	A string describing the source of the clone or DNA. This has typically been used to indicate the original plate source, and the 96 and 384 well plate locations that a clone has been in, eg GF200:96(1A1):384(1A1). In this case GF200 refers to a set of resgen plates, aka the 1st 5K. This field can be used by any type of DNA.
IS_CONT	Whether the sample is known to be contaminated. A blank entry will default to unknown (U)
IS_VER	Whether the DNA in a well has been verified. A blank entry will default to no (N).
SAMPLE_DESC	A description, if any, about that particular sample. This description is specific to the plate sample.
ORGANISM	If submitting a print containing samples from multiple organisms (i.e. human, yeast). For those few rows where the sample is derived from an organism other than the default (user-defined), the organism code must be specified. For a list of 2-letter organism codes, go here

RULE 1: Required column headers are: PLAT, PCOL, PROW, NAME, TYPE, FAIL, and CLONEID, ACC (if TYPE=CDNA). If any of these headers are either misspelled or absent OR if any data is null (except FAIL/ACC/CLONEID columns), you cannot proceed with plate/platesample submission. In addition, PLAT, PCOL, and PROW ordering must be correct and no wells may be skipped (with the exception of the last plate in the print run). Empty wells must be specified as such, except for the tail-end of the last plate (also see common errors). Optional columns: DESC, LUID, CLID, GENE_NAME, ORIGIN, SOURCE, IS_CONT, IS_VER, SAMPLE_DESC, CLONE_DESC, ORGANISM.

Names

The database uses the combination of NAME, TYPE, and ORGANISM to uniquely identify a sample. Each unique combination is given a unique numeric identifier, also called SUID. SUIDs allow comparison of the same samples accross different prints. Thus, it is extremely important to insure that erroneous SUIDs are not created. Erroneous SUIDs are usually created by a bad NAME (either misspelled, non-standard, or non-systematic). Every new platesample must be verified and committed to the database (via SUID) before the reporter/molecule contained can be entered. Therefore, all rows in your file are checked to see if the combination of NAME, TYPE, and ORGANISM has been used previously. If not, these samples (rows) must have new SUIDs assigned to them if they are verified by the user to be legitimate, new samples.

RULE 2: If any any samples within your platesample list are not currently in the database, you will be prompted to double-check your entries prior to passing the intermediate file off to the curators. Please be a concientious user and verify that any new SUIDs you approve are valid. Erroneous SUIDs prevent comparisons between prints/experiments!

Plate/Platesample list examples: Save these as text if you wish to import them into a spreadsheet for viewing

Making a print - the relationship between the platesamples and the final print

If your list passes the first two rules (above), you must eventually specify how the samples were printed in order to de-convolute the SPOTLIST. In order to do this, you must know how many sectors (corresponding to printer tips), columns, and rows your print output, and therefore, the gridding methodology, corresponds to. You will have to provide these 3 values, and they must "equate" (relatively) to the number of rows in your platesample list (sans header)

RULE 3: This equation must be followed:
#samples = (#platesample list rows-1) <= #tips * #rows * #columns = #spots
Therefore, you are not permitted to have more platesamples than gridded spots (experiment loading would be disallowed). Conversely, if your #spots (gridded) is greater than #samples (rows in submitted file) realize that the data from the empty, "ghost" spots on the partial last row will be disposed of during experiment loading (allowed).

Common Errors to Avoid

All empty wells must be designated NAME=>EMPTY and TYPE=>EMPTY. Do not use "blank" or "control" to describe empty wells.
Erroneous SUID assignments: Please verify that newly minted SUIDs are authentic. For example, do not create a new SUID "ACT1","ORF" or "Actin","ORF" for yeast as there already exists the legitimate, correct one("YFL039C","ORF"). Similarly, check to see if there is a "3X SSC","CONTROL" SUID for your organism before you create "3xssc","CONTROL".
Plate names within the list: Data in the PLAT column must be an integer, and is essentially the order the plate was put in the printer. Plates do not get proper, systematic names until later in the print creation process. For example, just because you have added a plate of control samples to the end of your list, PLAT does not equal 'Control'.

There is a program to assist you in platesample submission, which follows the rules stipulated above. In order to run it, you must place (SFTP) the file in the incoming directory of your loader account. This is so the program can read your original file, offer feedback, and write intermediate steps to the logs directory.