About PerturbSeq.db
Perturbation experiments are a cornerstone of systems biology, allowing researchers to probe the intricate networks within cells. The advent of technologies like CRISPR-Cas9, CRISPRi, and CRISPRa has revolutionized the manipulation and study of cells. CRISPR-Cas9 facilitates precise genome editing for introducing specific mutations or deletions, while CRISPRi and CRISPRa are used to repress or activate gene expression, respectively, without altering the DNA sequence. In contrast, small molecules chemicals typically interact directly with proteins such as receptors and enzymes. These tools provide a multi-tiered approach to modulate cellular functions and dissect genetic and protein interactions. By employing techniques such as scRNA-seq or scATAC-seq in combination with genetic and chemical perturbations, researchers can assess cellular responses at the single-cell level. Such large-scale screenings provide insights into complex cellular behaviors that are not apparent in traditional bulk measurements. Single-cell analysis has unveiled the heterogeneity within cell populations, a facet often overlooked in bulk studies. Large perturbation screens are tailored to study specific systems under a set of perturbations of interest.
PerturbSeq.db, a curated database that consolidates 189 single-cell perturbation datasets from 77 studies. These datasets consist of 165 scRNA-seq and 24 scATAC-seq molecular readouts. The collection includes single-cell perturbation data from approximately 52 different cell lines or tissues such as embryonic stem cells, PBMCs, tumor-infiltrating immune cells, and brain organoids. Out of these, 147 datasets involve genetic perturbations, while the rest are influenced by chemical perturbations. The majority of the data is sourced from Homo sapiens (156 datasets) and Mus musculus (29 datasets). A uniform processing pipeline was implemented to analyze these single-cell perturbation datasets. By creating an interactive user interface, we have established PerturbSeq.db, enabling users to browse, search, visualize, and download single-cell perturbation data conveniently.
- Data Browsing: Users can select datasets based on species, cell/tissue type, perturbation type, and associated publications. Each dataset includes quality control metrics, clustering results, target identification, functional assessments, and perturbation similarity analyses.
- Data Querying: Users can search for perturbations and their effects on a gene of interest. The database will display the results in both network and tabular formats, showing how the gene’s expression is influenced by various perturbations. Additionally, it identifies target genes affected by the perturbation of the gene in question. Users can also explore potential mediators by inputting pairs of genes.
- Perturbation: Users can analyze various results of a perturbation of interest across multiple datasets in specific species. The database will show the quality control metrics, DEG counts and intersections, as well as detailed visualizations of expression changes in terms of both the magnitude and direction of common DEGs across different datasets.
- Data Downloading: Users can access, and download processed single-cell perturbation data by specifying the species, cell/tissue type, and perturbation type, providing a direct link for data retrieval.
Perturbation Description
PerturbSeq.db includes a comprehensive collection of perturbations, comprising 19,646 genetic and 775 chemical perturbations. These perturbations are categorized based on their type and mode of action.
Genetic perturbations in PerturbSeq.db are classified into two main categories: those targeting specific genes and those targeting chromosomal loci. Perturbations targeting specific genes are identified using gene symbols (e.g., TP53). In contrast, those targeting chromosomal loci are annotated using chromosome coordinates, formatted as chromosome ID:Start-End (e.g., chr8:99763530-99763606). The sgRNAs targeting different genes or chromosomal loci are considered distinct perturbations due to their unique biological effects. However, multiple sgRNAs targeting the same gene or chromosomal locus are classified as a single perturbation to ensure effective gene knockdown, knockout, or overexpression. In addition, the database also includes cases where cells are subjected to multiple perturbations simultaneously. These combinations are denoted using a comma (e.g., “41BB, TGFBR2” indicates perturbation by both 41BB and TGFBR2). To maintain consistency across datasets, the order of genes in combined perturbations is standardized. For example, both “41BB, TGFBR2” and “TGFBR2, 41BB” are sorted to “41BB, TGFBR2” to ensure uniformity in representation.
For Chemical compound perturbations, considering that different treatment methods, such as varying doses or durations, can significantly impact cellular effects. Therefore, each chemical compound perturbation is annotated with detailed treatment information, as provided in the literature (e.g., Erlotinib_Day1, Dacinostat_0.01μM).
Tutorials of PerturbSeq.db