Dynamic multiple-scattering treatment of X-ray absorption: Parameterization of a new molecular dynamics force field for myoglobin

We present a detailed analysis of the X-ray absorption near-edge structure (XANES) data on the Fe K-edge of CO Myoglobin based on a combined procedure of Molecular Dynamics (MD) calculations and MXAN (Minuit XANes) data analysis that we call D-MXAN. The ability of performing quantitative XANES data analysis allows us to refine classical force field MD parameters, thus obtaining a reliable tool for the atomic investigation of this important model system for biological macromolecules. The iterative procedure here applied corrects the greatest part of the structural discrepancy between classical MD sampling and experimental determinations. Our procedure, moreover, is able to discriminate between different heme conformational basins visited during the MD simulation, thus demonstrating the necessity of a sampling on the order of tens of nanoseconds, even for an application such X-ray absorption spectroscopy data analysis.


INTRODUCTION
X-ray absorption spectroscopy (XAS) is a powerful analytical technique to get structural and electronic information on the absorbing site of different types of matter, from biological systems to condensed materials. The low energy part of the XAS spectrum is called XANES (X-ray absorption near-edge structure) and is extremely rich of information: oxidation state, overall symmetry, distances, and angles between the atomic species around absorbing site can be derived from this part of the XAS spectra. 1 In principle, an almost complete quantitative recovery of the geometrical structure within 6-7 Å from the absorber can be achieved from the XANES spectrum. 2 In recent years, Benfatto and co-workers proposed a fitting procedure (called Minuit XANes, MXAN)) based on multiple-scattering (MS) theory which is able to extract local structural information around absorbing atoms from experimental XANES data. 3 The MXAN method has been successfully used to analyze known and unknown systems, obtaining structural information comparable to X-ray diffraction and standard EXAFS analysis. [4][5][6] In particular, the possibility of performing quantitative analysis on the XANES energy region is very relevant for metallo-protein systems where both the low S/N ratio and the weak scattering power of the light elements surrounding the metal site limit the energy range of the XAS data available for the analysis.
Myoglobin is a globular protein containing a heme prosthetic group that reversibly binds small ligands such as oxygen, carbon monoxide, and nitric oxide. 7 For its small size, the relative structural stability, and the yet complex functional behavior, myoglobin is often used as model system in biophysics to study structure-dynamics-function relationships. 8 Myoglobin docks the exogenous ligand in a cavity, the distal pocket, delimited mainly by hydrophobic residues and by the heme. Among the residues lining the distal pocket, the distal histidine plays a relevant role in fine-tuning the ligand affinities or in stabilizing the bound state of the ligand through a hydrogen bond with its N e proton. 9 In addition, the distal histidine may adopt two distinct conformations, corresponding, respectively, to the closed and open state of the so-called "histidine-gate," which may represent a short and quickly accessible escape route for the ligand. 10 The myoglobin protein bonded to carbon monoxide (MbCO) has been studied for several years with many experimental and computational techniques, due to its important biological role. [11][12][13] In particular, classic MD simulations have been applied to highlight, among others, the diffusion of small ligand in the hydrophobic cavities of the protein matrix, either in solution [14][15][16][17][18] or in the crystal environment, 19 the protein conformational transitions upon ligand photodissociation, 20,21 and the spectroscopic properties of myoglobin or of the exogenous ligands inside its cavities. [22][23][24][25] For that reason, it has been assisted at the developing of force fields for the description of the geometry and conformational dynamics of the heme cofactor, in different coordination states and in coordination with several ligands, in particular, with the carbon monoxide. [26][27][28][29][30][31] However, the force field developed for the heme group still shows some limits. Among the big varieties of experimental data, the heme group has been fully described using the XAS technique at the iron K-edge. 4,32 These experiments have put in evidence structural parameters such as the distance and angle between Fe and the CO molecule, which are not well reproduced by the classical force field. 4 The combination of MXAN analysis and MD simulations (named D-MXAN analysis) 1,33 can be a powerful tool to test in detail the classical force fields used in MD because it allows a quantitative comparison between calculation and experimental data. In the last years, the D-MXAN method has been successfully applied to aqueous ionic solutions [33][34][35] and metallo-proteins. 36,37 In the following, we report an iterative application of D-MXAN to the myoglobin protein bonded to carbon monoxide (MbCO) to evaluate the MD force field parameters and, when applicable, to refine them with the twofold objective of allowing a better knowledge of the MbCO structural/dynamic characteristics in solution and producing a more accurate MbCO force field for the community.
The experimental details and data normalization can be found in Lima and coauthors. 32

THEORETICAL FRAMEWORK
The MXAN data analysis approach The MXAN method has been proposed in the literature some years ago 3 to derive structural information around the absorbing atom using the low energy part of the XAS spectrum, i.e., the so called XANES energy region. This method is based on the comparison between experimental data and many theoretical calculations performed by varying selected structural parameters starting from a well-defined initial geometrical configuration around the absorber. The calculation of XANES spectra is performed within the full multiple-scattering (MS) approach. 38 The fit procedure is performed in the energy space without using any Fourier transforms algorithm, while polarized spectra can be also easily analyzed. 3 The optimization in the space of the structural parameters is achieved using the CERNlibrary MINUIT 39 routines by minimizing the square residual function, defined as where n is the number of independent parameters, m is the number of data points, y th i and y exp i are the theoretical and experimental values of the absorption, respectively, e i is the error in each point of the experimental data set, and the statistical weight w i is usually set to 1. The experimental error e i is a constant over the whole data set. A typical fit involves an experimental energy range of about 150-200 eV from the absorption edge, and applications to several test cases indicate that the best-fit solution is quite stable and independent from the starting conditions.
The MXAN method is based on the standard MS theoretical approach within the muffin-tin (MT) approximation for the shape of the potential and the so-called "extended continuum" scheme to calculate both the continuum and the bound part of the XAS spectrum. 38 It also uses the concept of complex optical potential based on the local density approximation of the selfenergy of the excited photoelectron. 38 The total charge density, needed to calculate the whole potential, is obtained by superimposing atomic self-consistent Hartree-Fock charges, derived using neutral or non-neutral atoms. In the MT approximation, it is necessary to define the radii of the spheres surrounding all the atoms used in the calculation, as well as the potential in the interstitial volume of the cluster. These quantities can be considered as free parameters and they must be optimized during the fit procedure. In the last version of the program, they are included in the structural loop optimization, in order to minimize the computer time and to calculate the statistical correlations with the geometrical parameters. It turns out that they are small in most of the cases with a very weak influence in the structural determination. On the other hand, it has been demonstrated that a good optimization of these non-structural parameters can minimize the errors coming from the MT approximation and overcome the need of full self-consistent potential calculation. 40 For further details, see Refs. 1 and 3.
In order to better discriminate between different fits, it is useful to visualize the detailed behavior of the error over the whole energy range through the function f e E ð Þ defined 33 as Here, y th E ð Þ and y exp E ð Þ are the values of the fit and experimental data at different energies. This function gives a clear indication of the goodness of different fits and is quite useful in all those cases where the difference in the residual function R sq is below 10%.

Quantum Mechanical (QM) calculations for partial charge development
At the best of our knowledge, the partial charges of the iron(II)-heme group bonded to the carbon monoxide were never developed for the CHARMM force field. The standard iron(II)heme parameters in the CHARMM36 force field are those obtained by Kuczera et al., 1990 and refer to the iron(II)-heme group bonded to the two proximal and distal histidine residues. 41 Luthey-Schulten and coworkers optimized the partial charges and bonded parameters for the heme, hexacoordinated with ligands appropriate to Cyt c, relying on both protein X-ray structures and ab initio density functional theory (DFT) geometry optimizations at the B3LYP/6-31G* level. 42 Adam and coauthors have recently carried out a similar optimization of the standard CHARMM parameters, for the Photosystem II, 43 while Soloviov and coauthors added an ad hoc potential term to the CHARMM22 force field to model the ligand-heme interaction in Myoglobin-NO. 44 Some of us, moreover, developed the carboxy Neuroglobin parameters for the Gromos force field, 45 but it is well known that partial charges are not physical observables. Consequently, the procedure to obtain partial charges from QM calculations is slightly different in different force fields.
In order to determine the partial charges of the hexacoordinated heme in MbCO, we performed quantum chemical calculations on the isolated carboxy-imidazole iron(II) porphyrin [Fe(II)P(CO)(Im)] complex in the singlet, closed-shell electronic state. After a geometry optimization and a single-point calculation, the point charges were obtained by means of the restrained electrostatic potential (RESP) procedure, 46 according to the CHELPG (CHarges from ELectrostatic Potentials using a Grid based method) scheme. 47 All QM calculations were performed at B3LYP/6-311þþG** level 48,49 using the Gaussian 09 softwarepackage. 50

MD simulation
The starting coordinates employed for the simulations were taken from the X-ray structure of CO-bound ferrous sperm whale myoglobin at 1.15 Å resolution (Protein Data Bank, PDB 1bzr). 51 MD simulations were performed with the GROMACS 4.6 software package 52 using either the CHARMM36 force field 53 or its XANES optimized version.
The XANES data of MbCO are dominated by the local geometry around the Fe absorber atom. For this reason, the refinement of the MD force field parameters was carried out by modifying the structural parameters related to the Fe surroundings. In particular, the atomic distances between the heme iron atom and both the proximal histidine nitrogen atom (Fe-N His ) and the CO carbon atom (Fe-C CO ), as well as the force constant of the atomic angle (N His -Fe-C CO ). These force field refinements are driven by the lowering of the MXAN error function R sq .
Simulations were carried out at constant temperature of 300 K within a dodecahedron box using periodic boundary conditions. The P-LINCS (Parallel LINear Constraint Solver) algorithm was applied to constrain covalent bond lengths, allowing an integration step of 2 fs. 54 A nobond pair list cutoff of 1.2 nm was used and long range electrostatic interactions were calculated with the Particle-Mesh Ewald method (PME) 55 using a grid spacing of 0.12 nm combined with a fourth-order B-spline interpolation to compute the potential and forces in between grid points. The temperature was controlled by separately coupling the protein and solvent to an external temperature bath. 56 Moreover, a weak coupling for pressure was also applied. 42,57 MD simulations were performed for 100 ns and the configurations, used for D-MXAN procedure, were collected every 5 ps.
Radial distribution functions (RdFs) are calculated with the "gmx rdf" tool of Gromacs, averaging the position of all the 20 000 MD frames. Separate calculations were carried out on the Fe atom with the CO molecule, the proximal His93 residue, and the distal His64.

Dynamic MXAN method
The D-MXAN analysis is a method that combines Molecular Dynamic and XAS calculations, typically used to analyze disordered systems, where a quantitative XANES analysis is not feasible because of the lack of an ab initio reliable computational scheme. The proper configurational average spectrum is thus obtained by averaging thousands of spectra generated from distinct MD snapshots. Each snapshot is then used to generate the XANES signal associated with the corresponding instantaneous geometry, and the averaged theoretical spectrum is obtained by summing all the instantaneous spectra and dividing by the total number of used MD snapshots. Therefore, no parameter fit (neither structural nor no-structural) is carried out, with the only exception of the inelastic losses, namely, the experimental broadening, the corehole lifetime, and the energy onset and amplitude of the plasmon contribution that are adjusted to make a quantitative comparison with the experimental data. 1,2,40 We observe that the fitted values are the same in both cases, within the error. We obtain, for example, a value of 1.0 6 0.44 for the core-hole lifetime in the CHARMM potential vs 1.0 6 0.43 in the optimized one. Note that the core-hole lifetime for the K-edge of Fe is about 1.25 eV. 58 It is important to note that, in general, in the D-MXAN analysis, we expect an R sq fitting error greater than the value normally obtained in the standard MXAN fit procedure. This is due to the impossibility in the D-MXAN procedure to optimize for each frame the overlapping factor and the interstitial potential. However, as already stated, these parameters have a very weak influence on the structural determination. 1 The cluster size used in the D-MXAN analysis is such to include the nearest 100 atoms around the Fe absorber.
In order to calculate the total sampling length necessary to obtain a statistically significant average of the calculated XANES signal, we used the following residual function R f as a function of the number of analyzed MD frames N: where r N (E i ) is the averaged cross section obtained using N structural frames.

RESULTS AND DISCUSSION
Force field parameter refinement The Carboxy-Myoglobin is a challenging system for the D-MXAN analysis because there is a high number of parameters that controls the atomic configurations around the Fe atom generated by the MD simulation in its time window. The Fe atom is embedded in the heme group (see Fig. 1) and strongly interacts with the CO molecule and with the proximal histidine (His93), while other residues can also influence the heme conformation, first of all the distal histidine (His64).
The standard iron(II)-heme parameters in the CHARMM36 force field 33 refer to the deoxy Mb, where the iron(II)-heme group is bonded to the two proximal and distal histidine residues. Therefore, we first carried out standard QM calculations to produce a set of atomic charges for MbCO, compatible with the CHARMM36 force field (see Theoretical Framework section for details). In Table S2, we report the calculated and original charges for the heme group. Note that the total À2 negative charge is now distributed on the heme group, the proximal histidine, and the CO molecule.
Then, we carried out an iterative procedure that evaluated the effect of different force field parameters on the D-MXAN results, starting from those playing a major effect on the XANES signal. Overall, we optimized the bond parameters for the heme group, the CO molecule, and the proximal histidine His93, the force constant of the Fe-CO angle. In Table S1, we report the original CHARMM and our optimized force field parameters (also made available in GROMACS format as supplementary material). We show in Fig. 2(a) the comparison between the experimental data (red points), two D-MXAN fit performed using the standard CHARMM36 force field (green line), and our optimized force field (black line). The CHARMM curve is able to reproduce the overall shape of the experimental data, but there are several discrepancies in the whole energy range. Here, the R sq fitting error is 3.99, which is a relatively large value for the standard MXAN fit but can be considered as a good starting point for the D-MXAN analysis.
In the case of the D-MXAN fit obtained with the optimized potential, the R sq error reduces to 1.95 and now the agreement with the experimental data is rather good in the whole energy range. It should be noted, however that the MXAN R sq obtained in the static fit is 1.71. This is not surprisingly because in the case of static MXAN fit, we were able to optimize both structural and non-structural parameters. In panel B, we report the behavior of the integral of f e E ð Þ as function of energy. The difference between the two curves is big over the whole energy range, confirming the great improvement of the quality of the fit using the optimized force field parameters. Finally, convergence of both calculations was evaluated by plotting of the residual function R f as a function of the number of MD frames. Figure S2 (supplementary material) shows that after 200 frames R f is always below 10 À6 for both CHARMM and the optimized potentials.
In the following, we show the structural parameters that have the strongest influence on the D-MXAN fits. In Fig. 2(c), we report the comparison between the Fe-CO radial distribution function (RdF) obtained with the original CHARMM36 force field (green color) and our optimized potential (black color). Our potential optimization reduces the distance between the Fe atom and the C atom of the CO molecule from 1.90 Å to 1.80 Å (first peak) and increases the Fe-O distance from 3.02 to 3.04 Å (second peak). It is notable the opposite behavior of the two peaks in the RdF. This effect is due to the increase in the C-O bond length from 1.128 Å in the standard CHARMM36 force field to 1.25 Å in our optimized potential. In fact, standard CHARMM36 parameters model a photodissociated CO molecule, while our refined model, as already stated, deals with the covalent bonded MbCO complex in which the CO distance is longer. Note that the adjustment of the C-O bond length can have a negligible impact on the long distance properties of the protein, but we have demonstrated that it produces a bad reproduction of the XANES signal, being responsible for the greater part of the Rsq error difference between the CHARMM potential and our optimized one.
Our optimized value of the Fe-C distance, as already mentioned, reduces the distance between the Fe atom and the C atom of the CO molecule from 1.90 Å to 1.80 Å (see Table S1). This is in good agreement with both XRD and static MXAN fit determination. In fact, XRD indicates a value from 1.73 Å (PDB id 1BZR 51 ) to 1.82 Å (PDB id 1A6G 59 ) for the Fe-CO distance, depending on the experimental conditions. The static MXAN fits performed on single crystal and in water solution give a value of 1.83 6 0.02 Å and 1.83 6 0.01 Å , respectively. 4,32 Our optimization procedure also produced a slight decrease in the Fe-C-O bending constant from 70.0 (CHARMM36) to 67.4 kcal/mol rad 2 (our optimized potential). Note that although the force field equilibrium angle value (180 ), the sampled simulation value is 172.9 6 3.5 in both cases, because of the additive effects of different force field parameters. This value is close to experimental determination in liquid solution (171.12 ) 32 and to the X-ray crystallographic (XRD) value which ranges between 171 and 173 . 51,59 The effect of the potential optimization results also in a change of the Fe-His93 RdF [first peak distances in Fig. 2(d)] from 2.20 to 2.04 Å . This last value is in good agreement with what found in the XRD analysis and static MXAN fits performed on crystal and water solution. 4,32,51,59 In Table S1 (supplementary material) we report, a summary of the changes of the above parameters changed during the potential optimization.
Atomic charges of heme group, CO molecule, and proximal His93 have been ab initio calculated from us. The values are reported in Table S2 (supplementary material) and are different from the original data of CHARMM36 that are also reported for comparison in the same table.
The above described small changes in the MbCO force field parameters were enough to yield a better reproduction of the XANES experimental signal on the whole energy range using a pure MD sampling.
It is also worth noting that the refining procedure is manual and iterative. Changes in the force field parameters that produced an increase in the R sq error were hence discarded.

D-MXAN variability during MD sampling
Usually R sq converges after few nanoseconds of MD simulation but in the case of the Myoglobin, we observe a relatively high variability of this value along the MD simulation time. To see this effect, we report in Fig. 3(a) the R sq value in time window of 5 ns for the whole 100 ns MD simulation. The R sq in single time windows goes from 2.121 in the 15-20 ns time window to 1.953 in the 85-90 ns time window. The total R sq average obtained using the whole 5-100 ns MD simulation is 2.046. The relatively great variability shows also a specific trend, with greater R sq values in the first 20 ns of simulation. In the supplementary material, we have investigated this issue with more detail. In Fig. S1(a), we plot the comparison between the experimental data (red points) and the two D-MXAN fit performed using our optimized force field in the best and worse time windows (in brown and cyan colors, respectively). Enlargement in Fig. S1(b) shows how the brown curve, corresponding to the 85-90 ns time window with Rsq ¼ 1.953, is particularly closer to the experimental data in the 5-8 eV range. In line, the fit error function f(E), in Fig. S1(c) shows the separation of the two curves in that energy range. The difference then is maintained nearly constant in the whole energy range.
We further investigated this behavior and we found that the most significant difference in the "small R sq " and "big R sq " MD frames was a different sampling of the His97 residue, the third farther histidine from Fe, after the proximal His93 and the distal His64. In particular, in the "big R sq " frames, His97 samples mainly conformation inner to the heme ring [see a representative structure in Fig. 3(b)], with a minority of conformations being outer of the heme ring. The "small R sq ," on the contrary, visits only conformations with His97 farthest from the heme ring [see a representative structure in Fig. 3(c)].
The fact that His97 visits different conformational basins is somehow normal in a MD simulation and, actually, it could be related to the observed role played by this residue in the stability of the heme group in Myoglobin. 60 The interesting results, coming from this D-MXAN analysis, are that the sampling for a metalloprotein should be of the order of tens of nanoseconds for such type of applications.

D-MXAN sensitivity to different MD parameters
In order to demonstrate the sensitivity of the D-MXAN method to small structural details, we carried out a MD simulation in which the His64 hydrogen atom was on the d nitrogen instead of the preferential e one [see Fig. 4(a)]. Only this last conformation permits the formation of a hydrogen bond between the distal histidine His64 and the CO molecule [dashed line in Fig. 4(a)]. 24,61,62 The structural rearrangements around the Fe absorber atom, following the loss of the hydrogen bond in the His64 d nitrogen tautomer, are shown in Fig. 4(b). The Fe-His64 distance changes from 5.1 to 7.4 Å [first RdF peak distance in Fig. 4(b)] and at the same time it samples a smaller distance range as compared to the preferential e nitrogen, being the great majority of conformations at a range distance of 6-8 Å from Fe.
The described alterations in the MD sampling of the d nitrogen tautomer produce an increase in the D-MXAN Rsq value up to 2.16 from the 1.95 of the preferential e nitrogen tautomer. The increase is significant and it is also demonstrated by the behavior of the integral of the f e E ð Þ reported in Fig. 4(c). The black line is related to the preferential e tautomer and it is always below the corresponding values of the d tautomer. parameters and the sensitivity of the D-MXAN method is enough to highlight the structural and dynamical details at a resolution that is difficult to reach with other experimental methods.
We have developed a new set of force field parameters for the CO Myoglobin protein in the water solution, capable of correcting the greatest part of the structural discrepancy between classical MD sampling and experimental determinations. This is quite important in all those cases where a detailed analysis of the heme properties is performed.

SUPPLEMENTARY MATERIAL
See supplementary material for optimized bonded parameters and atomic charges in our modified potential vs the original CHARMM36; for D-MXAN fits on two of the 5 ns time windows; and for the plot of the residual function R f as a function of the number of MD frames N.