Data_Sheet_1_Definition of the Minimal Contents for the Molecular Simulation of the Yeast Cytoplasm.pdf
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The cytoplasm is a densely packed environment filled with macromolecules with hindered diffusion. Molecular simulation of the diffusion of biomolecules under such macromolecular crowding conditions requires the definition of a simulation cell with a cytoplasmic-like composition. This has been previously done for prokaryote cells (E. coli) but not for eukaryote cells such as yeast as a model organism. Yeast proteomics datasets vary widely in terms of cell growth conditions, the technique used to determine protein composition, the reported relative abundance of proteins, and the units in which abundances are reported. We determined that the gene ontology profiles of the most abundant proteins across these datasets are similar, but their abundances vary greatly. To overcome this problem, we chose five mass spectrometry proteomics datasets that fulfilled the following criteria: high internal consistency, consistency with published experimental data, and freedom from GFP-tagging artifacts. Using these datasets, the contents of a simulation cell containing a single 80S ribosome were defined, such that the macromolecular density and the mass ratio of ribosomal-to-cytoplasmic proteins were consistent with experiment and chosen datasets. Finally, multiple tRNAs were added, consistent with their experimentally-determined number in the yeast cell. The resulting composition can be readily used in molecular simulations representative of yeast cytoplasmic macromolecular crowding conditions to characterize a variety of phenomena, such as protein diffusion, protein-protein interactions and biological processes such as protein translation.
Read the peer-reviewed publication