Status:
Platform:
Implement Technique:
Species:
Analysis of intergenic sequences for purposes such as the investigation of transcriptional signals or the identification of small RNA genes is frequently complicated by traditional biological database structures. Genome data is commonly treated as chromosome-length sequence records, detailed by gene calls demarcating subsequences of the chromosomes. Given this model, the determination of non-called subsequences between any gene and its nearest neighbors requires an exhaustive search of all gene calls associated with the chromosome. Further compounding the issue, the location of intergenic regions for many called genes cannot be resolved unambiguously due to uncertainties in gene boundaries, as well as the presence of other conflicting gene calls. To address these difficulties we have constructed the PACRAT (http://www.biosci.ohio-state.edu/~pacrat/) database system. PACRAT preprocesses GenBank genome submissions, evaluates for every gene the character of its relationship to those genes nearest to it, and produces a relationally linked model of the gene ordering for the genome. Using this information, the interface allows the researcher to query gene data as well as intergenic sequence data based on a number of criteria. These include the ability to filter searches based on the status of start and stop positions, or upstream/downstream sequences as conflicting with called genes and automated extension of upstream or downstream searches to find probable operon promoters or terminators. The database is also indexed by KEGG classification, allowing, for example, functionally-related groups of high-quality promoter-containing regions to be easily retrieved as a group.[1]