Large-Scale Screening of E. coli Promoters for Small Molecule Biosensor Development

Saanvi Dogra, Jason Gao, Dishti Wadhwani, Risha Guha, Nithika Vivek, Lauren Chen, Anwita Bandaru, Shawn Kim, David Lanster

December 18, 2025

https://doi.org/10.69831/4a63597f50

This preprint reports new research that has not been peer-reviewed and revised at the time of posting

Categories: Biology

Abstract

The field of synthetic biology makes significant contributions to healthcare, environmental engineering, and technology through the manipulation of cellular macromolecules and whole organisms. Oftentimes, these advancements are dependent upon biosensors to report on an activity of interest within a cell or to detect extracellular cues and report on them in a measurable way. This project undertaken as part of iGEM 2024 (Internationally Genetic Engineered Machines Competition 2024) centered around the choice of 10 small molecules related to environmental and human health with the goal of developing transcriptional biosensors to report on their concentrations. Each molecule was screened against a library of over 2000 promoter-GFP constructs in search of promoters responsive to each molecule. Further, a Deep Learning model was used to predict active promoter-molecule pairs and in silico putative hits from the screen were analyzed with molecular docking. While no robust biosensor hits were found for the molecules of interest, our work demonstrates a useful pipeline for further small molecule biosensor development.

Download PDF

Scientific Feedback

Anonymous

February 24, 2026

Comments to Authors: Congratulations on your excellent manuscript! Your efforts to identify and develop novel transcriptional biosensors for environmentally and medicinally relevant small molecules can certainly help shape future small molecule biosensor development. Please consider the suggested changes below to help improve the accessibility and clarity of your manuscript for a broad audience, as well as to better communicate the impact and outcomes of your science. I am not a total expert in machine learning or molecular modeling, so I would suggest reaching out for additional support on the computational end to perhaps develop a cleaner and more thorough explanation of your model and its performance quality. I also suggest working closely with your scientific mentor to further refine your article prior to possible submission to a journal. Overall, great work!

Summary of Manuscript: This manuscript describes the authors efforts to develop transcriptional biosensors for ten small molecules that are implicated in a variety of health and environmental scenarios. The work involved both in vitro and computational screening strategies in an effort to develop and outline a pipeline for the identification of biosensor hits.

Recommended Scientific Changes:

Results:
- I am not entirely sure if this is possible, but in the future you could consider also modeling the small molecule with the DNA-TF pairs. This would give you more information about how the biosensor interacts in real life!
- Figure 2/Results: Why do you think the sfGFP signal decreased in some cases prior to increasing? Do you think there was any inhibition of signal? I would consider having more of a discussion of this result.
- Figure 5/Results: You could consider adding certain docking/modeling parameters to describe the strength / transience of binding between the DNA and TF. I am not an expert in this field, but things like bonding interactions, solvent-accessible surface area, and others can be helpful. It may not be necessary, though!

Recommended Presentation Changes:

Introduction:
- Line 43: Every time you refer to a strain of bacteria, like E. coli, you should write the strain name in italics.
- I really appreciated how you described the rationale of your study (particularly, lines 42-46). However, eiRxiv and journals like JEI accept "hypothesis-driven research". I suggest that you include a hypothesis in your introduction and in your Summary (such as how many transcriptional biosensors you will be able to develop, which molecules you will be able to find TF-promoters for, etc.). The eiRxiv submission guide and manuscript template includes descriptions on how and where to incorporate a hypothesis. In this way, throughout your article you can show how you either "proved" or "disproved" your hypothesis!
- I think it would be helpful to add more of a discussion about WHY you want to use E. coli or TF-promoters in biosensors, and how that would work in the "real world".
- When you describe your selected molecules (Lines 60-83), I would suggest that you group them by impact rather than just a sentence per molecule listing their importance. I would suggest using sentences like the following:
  - "For example, Carbaryl (CAR, a man-made insecticide), 3-phenoxybenzoic acid (PBA, a degradation product of pyrethroid insecticides), phenylglyoxylic acid (PGA, a breakdown product of styrene, which is used to make plastics and rubber), [list the others with environmental impact here] are all considered environmental contaminants in high amounts, so development of biosensors for these molecules could help track pollution in the environment."
- It will also be helpful to give a quick summary of your conclusions in your introduction to give your audience a "teaser" for what you will share in your article. For example, this could include the success of your neural network or how many promoters that responded to your molecules of interest.
Results:
- When you present a given result, it is often helpful to describe why that result is important / the context of that result! For example, in Lines 100-105: Why did you expect a dose-dependent response? What is the importance of the fold increase? You want to make it clear to your audience (other students, like you!) what your results mean scientifically beyond just the "numbers". I would do this with each result -- present the "numbers", and then briefly describe what that means scientifically/biologically/in the larger context of your problem/project.
- As someone with less experience with machine learning, it is difficult for me to interpret what the field-specific language (for example, "epochs", "validation data line", "low loss", "overfitting", etc.) means, even though I read papers about machine learning relatively frequently! I would suggest taking some time to explain what these things mean for your audience, who may not have much experience with machine learning.
- Every time you name a gene, you should write the gene name in italics. This is just typical scientific convention -- an easy fix!
- In the modeling section, I would reiterate which molecules are paired with which DNA-TF pair.
- In the modeling section, you mention (Line 173-176) that the modeling helps your understanding of small molecule - protein interactions -- did you actually model the small molecules with the proteins? If not, I would clarify that here.
- In your last paragraph, you discuss the Nac transcription factor and PybcK. Why did you choose these two over any other promoters/transcription factors?
Discussion:
- Similar to the results section, I would suggest explaining complex jargon and providing a "bigger picture" outlook on your results. This will help the audience to understand the outcomes of your work!

Recommended Figure Changes:

Figure 7: Does this data have multiple replicates? If so, I would perform statistics to help prove that the DNA sequences improved metrics.
Table 1: Using a heatmap (a range of colors to show "smallest numbers" to "highest numbers") may help better display which hits were best!

A scientist with subject-specific expertise provided this feedback. Constructive feedback plays a key role in the scientific process because it allows researchers to learn from other scientists, be encouraged, and refine their ideas, research, and presentation.