Time-resolved x-ray crystallography capture of a slow reaction tetrahydrofolate intermediate

Time-resolved crystallography is a powerful technique to elucidate molecular mechanisms at both spatial (angstroms) and temporal (picoseconds to seconds) resolutions. We recently discovered an unusually slow reaction at room temperature that occurs on the order of days: the in crystalline reverse oxidative decay of the chemically labile (6S)-5,6,7,8-tetrahydrofolate in complex with its producing enzyme Escherichia coli dihydrofolate reductase. Here, we report the critical analysis of a representative dataset at an intermediate reaction time point. A quinonoid-like intermediate state lying between tetrahydrofolate and dihydrofolate features a near coplanar geometry of the bicyclic pterin moiety, and a tetrahedral sp3 C6 geometry is proposed based on the apparent mFo-DFc omit electron densities of the ligand. The presence of this intermediate is strongly supported by Bayesian difference refinement. Isomorphous Fo-Fo difference map and multi-state refinement analyses suggest the presence of end-state ligand populations as well, although the putative intermediate state is likely the most populated. A similar quinonoid intermediate previously proposed to transiently exist during the oxidation of tetrahydrofolate was confirmed by polarography and UV-vis spectroscopy to be relatively stable in the oxidation of its close analog tetrahydropterin. We postulate that the constraints on the ligand imposed by the interactions with the protein environment might be the origin of the slow reaction observed by time-resolved crystallography.


I. INTRODUCTION
Time-resolved crystallography is an experimental technique that can detect molecular changes at atomic and temporal resolutions. 1 Due to the feasibility of rapid reaction initiation by a laser pulse, this technique has been widely used to study light-active protein systems including myoglobin, 2 hemoglobin, 3 photoactive yellow protein, [4][5][6] photosystem II, [7][8][9] and rhodopsin. 10 Recent advances in sample delivery systems and femtosecond X-ray free electron lasers (XFELs) have allowed time-resolved serial femtosecond crystallography (TR-SFX) to be extended to other systems as well. 1,[11][12][13] For example, the mixand-inject method can rapidly and uniformly initiate an enzymatic reaction 11,12 or an RNA-ligand interaction 13 in micro/nanocrystals before diffraction whose rate is limited by diffusion. To the best of our knowledge, time-resolved crystallography has almost exclusively been applied on time scales of picoseconds to seconds. Here, using traditional cryocrystallography at a synchrotron source without rapid mixing or use of a laser pulse, we recently discovered the slow oxidative decay of tetrahydrofolate to dihydrofolate in the enzyme bound crystalline form at room temperature, with the critical transition occurring 2-3 days after the crystallization setup. 14 We present an analysis of the third day's crystal structure to estimate the putative intermediate's geometry and population relative to dihydrofolate and tetrahydrofolate. The implications of the current observation on the molecular mechanism of tetrahydrofolate to dihydrofolate conversion and its possible generalization to other systems are discussed.

II. MATERIALS AND METHODS A. Protein expression and purification
C-terminal 6xHis-tagged Escherichia coli DHFR (dihydrofolate reductase, with 100% sequence identity to UNIPROT sequence spjP0ABQ4j or spjP0ABQ5j) was generously provided by Drs. Eugene Shakhnovich and Joa~o Rodrigues from Harvard University. DHFR was overexpressed in E. coli BL21 and then purified by Ni-NTA and size exclusion chromatography as previously described. 15 The initial protein stock was stored at À80 C at a concentration of 30 mg ml 21 in 20 mM Tris, at pH 8, and in 1 mM dithiothreitol (DTT).
Polyethylene glycol 3350 was purchased from Hampton Research. All other chemicals and reagents were obtained at the highest quality available from Sigma-Aldrich or ThermoFisher and used without further purification.

B. Protein crystallization
The current intermediate time point complex was obtained using the same crystal growth condition as in our previously reported binary tetrahydrofolate complex (PDB: 6CW7). 14 Briefly, as-purified eDHFR was crystallized by sitting drop vapor diffusion using a 1:1 v/v mixing of 20 mg ml À1 DHFR solution in 13.3 mM Tris at pH 8, 16.7 mM HEPES at pH 7.3, 33.3 mM NaCl, and 0.67 mM DTT with the reservoir solution containing 0.1 M MES, at pH 6.5, with 30% w/v PEG 3350, and 0.4 M MgCl 2 . Mixed drops of 0.8 ll were equilibrated over a reservoir solution of 50 ll on a MRC 2-well plate (Hampton Research) and incubated at 20 C in the dark. A single crystal per dataset was harvested at 3 days after setting up the crystallization drops (as opposed to 2 days for the tetrahydrofolate complex or 2 weeks for the dihydrofolate complex in our earlier report) 14 and then cryoprotected with LV CryoOil and flash-frozen in liquid N 2 .

C. Data collection and refinement
Diffraction data were collected at the Advanced Photon Source at Argonne National Laboratory on the LRL-CAT (31-ID-D) beamline at 100 K. The detector was a Rayonix 225 HE CCD (Rayonix) using a single wavelength of 0.97931 Å . The intermediate time point (3 days of crystal growth) dataset was collected and processed to a resolution of 1.35 Å . The datasets were indexed, integrated, and scaled using XDS. 16 The structure was determined by molecular replacement with Phaser_MR 17 (using the protein coordinates of the tetrahydrofolate complex as the search model, PDB ID: 6CW7) and completed by alternating rounds of manual model building with COOT 18 and both reciprocal and real space refinements using phenix.refine of the PHENIX suite. 19 The ligand was originally built into the model as tetrahydrofolate (PDB Ligand ID: THG) based on the apparent tetrahedral geometry suggested by the clear omit electron density. Met16 is very close in geometry to that in the tetrahydrofolate complex. We conjecture that it acts as a clamp that is partly responsible for the slowdown of tetrahydrofolate to dihydrofolate oxidation. The Met20 loop appears to be partially disordered for residues Glu17-Asn18-Ala19 in contrast to the clearly traceable conformations of the end-state complexes with tetrahydrofolate (PDB: 6CW7) or dihydrofolate (PDB: 6CXK) that were visible in our earlier report. 14 On the other hand, Met20 and Pro21 are clearly visible and are quite close to their conformation in the dihydrofolate complex. Thus, the residues in the Met20 loop adopt structure of the intermediate in complex with DHFR are a hybrid of the two endpoint conformations with disorder in the middle residues of the loop. Mg 2þ , Cl À1 ions, and water molecules were added to the model after the ligand was built in. The fully refined model contains a ligand geometry containing coplanar pterin rings and a tetrahedral C6 geometry suggesting a possible intermediate state. This contrasts with the puckered pterin previously observed for the tetrahydrofolate complex despite the same ligand restraints being used. 14 Due to the uncertainty of the exact chemical structure and fractional population of the ligand, we further performed refinement analysis by comparing one-state (a single intermediate), two-state (a linear combination of two end states, dihydrofolate and tetrahydrofolate), three-state (a linear combination of both end states and the proposed intermediate) models. Briefly, the previously reported end-state models (PDB IDs: 6CW7 and 6CXK) were each aligned to the current protein model and their ligand coordinates extracted and merged into the current model with COOT. 18 To achieve the linear combination of ligand states, only the occupancy and individual B-factors of the structures were refined in reciprocal space with their coordinates kept intact. The initial ligand occupancy was assigned as 0.5 to each ligand for the two-state model. In the three-state model, the initial ligand occupancy was intentionally assigned as 0.1 to the proposed intermediate state and 0.45 to each of the two end-state ligands to test how these three states compete with each other in terms of percentage population in the current refinement scheme. The final refined occupancy values of the three ligands appear to be insensitive to their initial occupancy assignments (e.g., equal initial occupancy of 0.33 for each ligand state) and always converged to the same results, i.e., the intermediate state is the most populated at 0.45 which approximately equals the combined occupancy of two-end states. The structures determined in this study display Ramachandran statistics absent of outliers, with 98.8% of the residues in the most favored regions and 1.2% of residues in additionally allowed regions of the Ramachandran diagram defined by MolProbity 20 (Table I). All structures are displayed using PyMOL 21 unless otherwise stated. The coordinates and reflection files of the structures are deposited in the Protein Data Bank (www.rcsb.org) under PDB IDs: 6MR9 (one-state), 6MT8 (two-state), and 6MTH (three-state).

D. Fo-Fo difference map
Fo-Fo difference electron density maps were calculated using the "Isomorphous Difference Map" utility of PHENIX. 19 The input files are the coordinate (.pdb) files and structure factor (.mtz) files of the corresponding states being compared.

E. Bayesian difference refinement
Bayesian difference refinement was performed using the default protocol developed by Terwilliger and Berendzen. 22 This is a sensitive method to detect small but finite structural changes between two very similar protein structural models obtained from two very similar experimental X-ray diffraction datasets. It was reported to perform better or at least as good as individual refinements in estimating the finite atomic shifts (RMS of shifts), depending on the correlation coefficient of model errors between test datasets. 22 The more similar the two datasets (hence the models) are, the more useful is the Bayesian difference refinement method. 22 Briefly, a pseudo variant dataset with amplitudes and weights was generated by running FDIFF scripts in the SOLVE 23 program based on Fo (the observed amplitude of the structure factor) and Fc (the amplitude calculated from the model) from both native and variant datasets. In our case, the native dataset was the tetrahydrofolate complex (PDB: 6CW7), 14 is the intensity of an individual measurement of the symmetry related reflection, and hI(hkl)i is the mean intensity of the symmetry related reflections. b I/r is defined as the ratio of the averaged value of the intensity to its standard deviation. For Bayesian difference refinement, because the pseudo dataset contains only amplitude rather than intensity information, an overall I/r is estimated by running the SFCHECK 34 program of the CCP4 suite. 35 100% of the reflections in the pseudo dataset show F/r(F) >2 using amplitude units. c CC 1/2 ¼ percentage of correlation between intensities from random half-datasets. CC 1/2 above 0.1 is considered significant. 36 is a robust, statistically informative quantity useful for defining the high-resolution cutoff of diffraction data to improve the model quality. 36 where F obs and F calc are the observed and calculated structure-factor amplitudes for the reflections being refined against. R Bayes,work and R work were calculated the same way except that the corresponding structure factors being used are F diff of the pseudo dataset and F obs of the experimental dataset, respectively. e R free was calculated as R work using randomly selected small fractions (typically <10%) of the unique reflections that were omitted from the structure refinement. R Bayes,free was calculated the same way as R free except that F diff of an omit pseudo set was used. f Mean coordinate error was calculated based on maximum likelihood. g Ramachandran statistics indicate the percentage of residues in the most favored, additionally allowed and outlier regions of the Ramachandran diagram as defined by MolProbity. 19 where q obs and q calc are the observed and calculated electron densities, and hq obs i and hq calc i are the mean values of q obs and q calc , respectively. RSCC is an abbreviation for the real-space correlation coefficient.
where q obs and q calc are the observed and calculated electron densities, respectively. RSR is an abbreviation for the real-space R factor. j Bayesian difference refinement uses a pseudo dataset as described in the methods. Parameters such as Rsym are unavailable.
Structural Dynamics ARTICLE scitation.org/journal/sdy The value of F diff is given by The factor b, essentially the correlation coefficient between Fo native -Fc native and Fo variant -Fc variant , is given by The weighting factor or pseudo experimental errors of the pseudo dataset is given by E 2 represents the total correlated model error vs. the data between the native and variant datasets. A native 2 and A variant 2 represent uncorrelated model errors of the native and variant datasets, respectively. r native 2 and r variant 2 represent the experimental measurement errors of the native and variant datasets, respectively. Both b and r Fdiff 2 terms contain certain weighted information derived from errors in both models and the experimental measurement of the two original datasets. The pseudo dataset containing information of F diff (amplitudes) and r Fdiff 2 (weights) can then be used as input for the structure factor data in phenix.refine following regular refinement procedures.

III. RESULTS AND DISCUSSION A. Structural refinement using an individual dataset
We first performed a regular refinement on the intermediate time point dataset (3 days of crystal growth). The details of the structural determination using molecular replacement and subsequent refinement are described in Sec. II. Both the 2mFo-DFc and mFo-DFc omit electron density maps indicate that the bound ligand retains an sp 3 C6 with a tetrahedral geometry (Fig. 1). When the structure is refined with a single ligand [using PDB Ligand THG, (6S)-5,6,7,8-tetrahydrofolate], the pterin rings of the ligand appear to be near coplanar as compared to the puckered rings in the tetrahydrofolate bound complex. Hence, the ligand density suggests a putative intermediate   E. coli DHFR on the mechanism of conformational selection in response to the identity of the bound ligand. 2 We next refined the structure with mixed ligand states to estimate the population percentage since it is rare to observe a true intermediate state in time-resolved crystallography experiments. [1][2][3][4][5][6][7][8][9][10][11][12][13] In particular, a linear combination of two end states (tetrahydrofolate and dihydrofolate) and all three states (tetrahydrofolate, putative intermediate, and dihydrofolate) was used to define the bound ligands with equal partial initial occupancy summed to 1. Since the initial and final states were well defined in a previous study, the atomic B factors and occupancy values were optimized with their coordinates intact. The refinement statistics are listed in Table I. It appears that in the reciprocal space as reflected by R work and R free , both single state and multiple states yield similar statistics with the two-state model performing slightly better. The real-space correlation coefficient (RSCC) values of the individual ligands are comparable among the single intermediate state (0.94), two-state (0.95), and three-state (0.95) models (Table I). The individual ligand real-space R factors (RSR) and B-factors of the multi-state models appear to be relatively more favorable than the one-state model (Table I). In the three-state model, the putative intermediate state displays a slightly more favorable RSR value of 0.10 over those of the two end states each of 0.11. The mFo-DFc residual electron densities of refinement with all three strategies display weak positive and negative peaks adjacent to the pterin and c-carboxylate groups of the ligands as evidenced at the 63 r level but little residual densities at the 63.5 r level (Fig. S1, supplementary material). This further indicates in real space that single state and multiple states can both fit the ligand electron densities with little residual electron densities remaining to suggest other alternative states. However, a closer examination of the negative electron density peaks near the pterin ring of the modeled putative intermediate facing the dihydrofolate end state suggests the absence of the latter. (Fig. S1, top panel, supplementary material). In addition, the 2mFo-DFc difference maps indicate that the benzoyl ring moiety of the initial tetrahydrofolate state in the two-state or three state models partially resides outside the electron density envelope.

ARTICLE scitation.org/journal/sdy
This suggests that atomic shifts occur from the initial state to a putative intermediate state, and the observed ligand electron densities cannot be attributed solely to a simple linear combination of the two end states (Fig. S2, supplementary material). In real space, the apparently better fit of the putative intermediate state over the two end state model is even clearer from the 2mFo-DFc maps at a higher cutoff, 1.5 r level (Fig. S2, supplementary material, right panels). The final refined occupancy values of the two-state model are 0.47 and 0.45 for tetrahydrofolate and dihydrofolate, respectively. This suggests that the diffraction data can also be fit with approximately equal populations of two end ligand states in the reciprocal space (Table I), although the real space omit electron density suggests the presence of a putative intermediate ligand conformational state (Fig. 1). If the ligand and protein conformations are indeed coupled as we postulated earlier, then the real space electron density of the anchor residues of the partially disordered Met20 loop also favors a quasi-intermediate state rather than a simple linear combination of equal populations of the two end states (Fig. 3).

B. Fo-Fo isomorphous difference map
The difference map method has been often used for timeresolved crystallography to analyze the direction of atom shifts and intermediate states. 3,6,[8][9][10]13 We also performed a Fo-Fo difference map analysis using the ligand free model as unbiased phase input (PDB ID: 6CW7) 14 to compare the intermediate dataset and the initial state tetrahydrofolate complex dataset (Fig. 4). The difference map shows positive Fo-Fo electron density near the atoms of the pterin, methylene linker, aminobenzoyl moiety, and a-carboxylate, in the direction shifted away from the initial tetrahydrofolate state, but hardly any negative electron density except near the exocyclic amino group of the pterin. It is known from other studies that the Fo-Fo difference electron densities rarely overlap exactly with the atom positions depending on the extent of shift. 3,6,[8][9][10]13,24,25 We further analyzed the Fo-Fo map to compare the two end-state datasets as a positive control (Fig. 4). There are clearly both positive and negative electron densities to suggest the completion of the reverse decay reaction from tetrahydrofolate to dihydrofolate. This indicates that the ligand atoms at the intermediate time point have a finite shift in the direction of decay that is small enough to differentiate it from the tetrahydrofolate and dihydrofolate beginning and end states. Importantly, the intuitive Fo-Fo difference map shown here qualitatively indicates the direction of the atom shifts rather than quantitatively estimating the percentage populations of different states or how each atom shifts. In order to assess the relatively small atom shifts of

C. Bayesian difference refinement based on dataset pairs
Difference refinement aims to minimize the residual between the observed and calculated differences in the structure-factor amplitudes between two structures. The method was pioneered by Terwilliger and Berendzen 22,26 and relies on the correlation of model errors between two similar datasets having very similar structures with relatively small atom shifts or conformational changes. The advantage of the Bayesian difference refinement 22 method is that it accounts for correlated and uncorrelated model errors as well as the experimental uncertainty of each structure by introducing appropriate weighting terms. As described in Sec. II, ultimately a pseudo dataset was generated for refinement. 22 The successful application of the method requires that the native structure be confidently determined with high accuracy, so that the very small shifts in the variant structure can be reliably estimated. This is suitable to the current case where the native tetrahydrofolate complex at 2 days of crystal growth was previously determined at a high resolution of 1.03 Å . 14 Also as shown earlier, the isomorphous variant dataset at 3 days of crystal growth suggests finite but small changes based on the mFo-DFc omit maps (Fig. 2) and Fo-Fo difference maps (Fig. 4).
The results of Bayesian difference refinement based on the paired datasets (2 days vs. 3 days of crystal growth) are summarized in Fig. 5 and Table II. The Bayesian difference refined structure displays ligand atoms lying somewhere in between the two end states, similar to the individually refined putative intermediate state (Fig. 5). However, it more closely resembles the initial tetrahydrofolate complex. This is expected, since the Bayesian difference refinement will bring model bias from the native structure as described by the method developer. 22 Consequently, the method is very sensitive to any small true FIG. 5. Bayesian difference refinement detects subtle atomic shifts from the initial state in stereo views. F diff from the Bayesian weighted pseudo dataset was used as input in the same way as the regular F obs for the calculation of electron density maps for Bayesian difference refinement as described in Sec. II. Top panel, 2mFo-DFc omit map at 1 r (carve ¼ 1.3); middle panel, mFo-DFc omit map at 6 r; bottom panel, mFo-DFc omit map at 10 r. Both the native (2 days' crystal) and variant (3 days' crystal) states use the same ligand geometry restraints (PDB ligand ID: THG for tetrahydrofolate). The Bayesian difference refinement ligand model is shown as sticks (carbon in grey). The initial state tetrahydrofolate (green), end state dihydrofolate (magenta), and the individually refined putative intermediate state (orange) are shown as thin lines and superposed based on protein coordinates for comparison. The guanidine and acarboxylate groups form bidentate saltbridges with Asp27 and Arg58, respectively (indicated by dashed lines). They display minimal shifts compared to the rest of pterin, methylene linker, and benzoyl moieties (general direction of shifts indicated by red arrows). Importantly, chemical labile C6 maintains sp 3 tetrahedral geometry with little shift between 3 day and 2 day complexes but shows a relatively large shift for the dihydrofolate end state. The left and right images are stereo views.

ARTICLE
scitation.org/journal/sdy differences from the native structure, which is of interest here and in many similar cases. 22 The overall trend of the atom shifts is in the same direction toward the formation of dihydrofolate as observed in the individual structure refinement (Fig. 5). The Bayesian difference refinement also suggests that the pterin ring becomes less puckered and thus favors charge delocalization at the intermediate time point, although the chemical nature of the ligand still appears reduced as the C6 position displays a tetrahedral geometry (as indicated by an arrow in Fig. 5). The quantitative estimation of atomic shifts is summarized in   27,28 This quinonoid intermediate was formed through proton coupled electron transfer under either anaerobic conditions in the presence of ferricyanide or aerobic conditions. 27,28 However, the oxidation of free tetrahydrofolate in solution generated 6,8-dihydrofolate anaerobically 29 or pterins aerobically. 30 However, a quinonoid intermediate was never experimentally observed but nonetheless proposed as a transient unstable state following the same oxidation mechanism of tetrahydropterin. 29,30 Due to the finite coupling of ligand and protein conformational changes and the limited protein motion in crystals, our working hypothesis is that the crystalline protein bound tetrahydrofolate is oxidized at a slower overall rate as compared to solution, where the C6-H bond breaking step (sp 3 to sp 2 C6 major geometry change) becomes at least partially rate-limiting. As previously observed, the guanidine and the a-carboxylic groups at two ends of the ligand are anchored by two strong bidentate salt-bridges to Asp27 and Arg57 in E. coli DHFR (Fig. 5). 14 This is also evidenced here by the clear mFo-DFc omit electron densities of the pairwise Bayesian difference refinement traceable even at 10 r (Fig. 5) and minimal atomic shifts compared to the rest of the ligand during the decay ( Fig. 5 and N1, C2, N2, N3; OX1, OX2, C in Table II). Importantly, these pairs of salt-bridging residues are highly conserved in all DHFRs. Consequently. there will be limited conformational space to explore for the enzyme-bound ligand as compared to free solution. This would necessitate the twisting motion of the ligand in both the catalytic forward (with NADPH) and the slow reverse (absence of NADPH) reactions involving the conversion between the sp 3 and sp 2 geometry at C6 and the concomitant rotation of the aminobenzoyl group connected by the methylene linkage. In recognition of the essential structural changes between physiologically relevant ligand states, we previously proposed the design of slow-onset inhibitors to closely mimic the binding pose of tetrahydrofolate rather than general active site blockers that are more susceptible to mutation based emergence of drug resistance. 14,15,31,32

IV. CONCLUSIONS
We report here a critical assessment of the ligand identity during a slow reverse oxidative decay of tetrahydrofolate in the enzyme bound crystalline form. The linearly combined multi-state analysis suggests that we captured a putative intermediate state featuring a more planar pterin maintaining the sp 3 C6 geometry based on both the individual refinement of a single dataset and Bayesian difference refinement using paired datasets. A geometrically similar quinonoid intermediate was experimentally observed for the oxidation of related tetrahydropterin 27,28 and proposed for the oxidation of tetrahydrofolate free in solution; 29,30 both occur on the minute time scale. 28,33 This implies that the extension of the conjugation system in the bicyclic pterin favoring charge delocalization may occur either stably or transiently depending on the exocyclic derivatives at the C6 position during the rate-limiting C6-H bond breaking step. Here, the overall reverse oxidation of tetrahydrofolate is slowed down to an appreciable extent due to the coupling of ligand and protein conformational changes and the limited protein motion in the crystalline state. As shown in Fig. 3, one end of the intermediate is basically clamped by the protein in the same conformation as that in the tetrahydrofolate structure, while the other more mobile end adopts a conformation much closer to the dihydrofolate oxidation product. Time-resolved crystallography is achieved due to a relatively uniform starting point, the rate-limiting product release complex of E. coli DHFR, whose irreversible oxidation was likely triggered by a finite amount of freely diffusing oxygen in the generally aerobic crystallization conditions, despite the presence of the mM level of the reducing agent in the crystallization drop. The limitation of the current method in delineating the exact ligand populations and chemical nature can be complemented by orthogonal approaches such as time-resolved spectroscopy at controlled oxidant levels. Such complementary approaches might provide additional evidence to allow further differentiation between the presence of the putative intermediate state and a spatially averaged mixture of two end states only. The capability of the current method to exhibit time-resolved crystallography on a slow time scale may inspire its application to other systems via coupling chemistry with protein motions to probe substantial ligand chemical changes without the need for a chemical modification to the native system. The key is that the protein itself needs to at least partially constrain the ligand motion so that a trapped intermediate ligand state results. The generality of such an approach needs to be explored but a possible place to start is with slow-onset tight binders that are subject to oxidative decay on product release. Unresolved questions are whether these observation are physiologically relevant, and in particular, in cells, how rapid is tetrahydrofolate consumed for the production of DNA base pairs relative to its decay to dihydrofolate, which in free solution under aerobic conditions occurs on the order of minutes. These issues will be explored in future work.

SUPPLEMENTARY MATERIAL
See supplementary material for Figs. S1 and S2 on electron densities of single-state and multi-state refinements and also a supplementary coordinate file containing multiple-aligned PDB entries of E. coli DHFR deposited by our group to facilitate comparison and use of these structures. In particular, 6MR9, 6MT8, 6MTH, 6CXK, 6CYV, and 6CQA were superposed with PyMOL onto the published FH4 complex (6CW7). 14