Electronic damage in S atoms in a native protein crystal induced by an intense X-ray free-electron laser pulse

Current hard X-ray free-electron laser (XFEL) sources can deliver doses to biological macromolecules well exceeding 1 GGy, in timescales of a few tens of femtoseconds. During the pulse, photoionization can reach the point of saturation in which certain atomic species in the sample lose most of their electrons. This electronic radiation damage causes the atomic scattering factors to change, affecting, in particular, the heavy atoms, due to their higher photoabsorption cross sections. Here, it is shown that experimental serial femtosecond crystallography data collected with an extremely bright XFEL source exhibit a reduction of the effective scattering power of the sulfur atoms in a native protein. Quantitative methods are developed to retrieve information on the effective ionization of the damaged atomic species from experimental data, and the implications of utilizing new phasing methods which can take advantage of this localized radiation damage are discussed.

Current hard X-ray free-electron laser (XFEL) sources can deliver doses to biological macromolecules well exceeding 1 GGy, in timescales of a few tens of femtoseconds. During the pulse, photoionization can reach the point of saturation in which certain atomic species in the sample lose most of their electrons. This electronic radiation damage causes the atomic scattering factors to change, affecting, in particular, the heavy atoms, due to their higher photoabsorption cross sections. Here, it is shown that experimental serial femtosecond crystallography data collected with an extremely bright XFEL source exhibit a reduction of the effective scattering power of the sulfur atoms in a native protein. Quantitative methods are developed to retrieve information on the effective ionization of the damaged atomic species from experimental data, and the implications of utilizing new phasing methods which can take advantage of this localized radiation damage are discussed.

I. INTRODUCTION
The sulfur single-wavelength anomalous diffraction (S-SAD) phasing method allows the determination of native protein structures without requiring chemical modification or the knowledge of a homologous structure. However, this de novo phasing technique presents its own problems arising from the long 5.02 Å wavelength at which the sulfur K-edge lies at 2.47 keV photon energy. To resolve structural features at near-atomic resolution (2 Å or higher) using diffraction at scattering angles less than 90 , a diffraction experiment must be carried out at a photon energy higher than about 6 keV. At this wavelength, the Bijvoet differences upon which the S-SAD phasing relies are very weak: for a typical protein containing a single sulfur atom for every 30 residues, the average difference is about 2% at 6 keV. Low anomalous signal requires the collection of very accurate data, which often means longer acquisition times (exacerbated by strong air absorption) and consequently a higher risk of radiation damage effects. The difficulty of the technique is also demonstrated by the very low number of deposited structures solved with S-SAD, compared to other X-ray methods (124 vs. 94 474: Results from queries dated 09/02/15 using the Protein Data Bank website (www.pdb.org). For S-SAD, the advanced search "Structure Determination Method" was used, in combination with "Text search." Due to the absence of well-defined S-SAD search criteria, this number could be an underestimate). Son et al. (2011a;2011b) postulated that at high-intensity, X-ray free-electron laser (XFEL) radiation can ionize a significant population of the atoms of the sample during the X-ray pulse, and that heavy atoms are more strongly affected due to their higher photoabsorption cross sections. Furthermore, it was predicted that standard anomalous phasing methods would be frustrated under these conditions, because the absorption edges of the ions created during the XFEL pulse shift towards higher energies. These effects have been observed in recent experiments (Nass et al., 2015 and Galli et al. unpublished). In particular, Nass et al. (2015) observed indications of electronic damage localized on the iron-sulfur clusters of ferredoxin protein crystals. Similarly, Galli et al. (unpublished) showed a moderate reduction of the scattering strengths of Gd atoms contained in a derivative protein crystal at high X-ray fluence, as well as a degradation of the quality of the experimental phasing that could be obtained. In the case of SAD, indeed, the theory presented in the paper of Son et al. predicts a reduction of the out-of-phase component of the atomic form factor proportional to the incident X-ray intensity, which would complicate anomalous phasing methods (Barends et al., 2014). However, this theory also predicts a large amount of ionization of the heavy atoms at high intensities, which would increase the difference in the scattering strengths of these atoms for different fluences. By acquiring data at more than a single X-ray fluence, one could therefore exploit the change of scattering factors of the heavy atoms to retrieve their positions. Son et al. (2011a;2011b) also derived new equations that could be used, similar to the case of MAD (multi-wavelength anomalous dispersion), to solve the phase problem ab initio. This new phasing technique is not limited by the chosen wavelength. Longer wavelengths will generally enhance the effect of ionization, whereas shorter wavelengths may allow higher-resolution diffraction. The main requirement is that the pulse fluence be sufficient to induce moderate ionization effects. In a previous paper , simulations of serial femtosecond crystallography (SFX) experiments showed that the available XFEL sources such as the Linac Coherent Light Source (LCLS) should provide enough photon flux to saturate sulfur ionisation at a photon energy of 6 keV, and that by reducing the highest accessible flux by about two orders of magnitude it should be possible to utilize a high fluence (HF) version of the conventional radiation-induced phasing (RIP) (Ravelli et al., 2003) workflow to obtain good quality phases of a native protein system. Here, a high intensity XFEL experiment on the actual native protein that was used for the reported simulations is described, allowing for a comparison between theory and experiment.

A. Sample
The native protein sample employed for this experiment was Cathepsin B (CatB) -an enzyme belonging to the class of cysteine proteases which degrade polypeptides. The form of the enzyme used is specific to the Trypanosoma brucei (Tb) parasite. The structure of the glycosylated form of TbCatB was recently solved using the SFX method (Redecke et al., 2013). TbCatB crystals used in this study were grown inside living Spodoptera frugiperda insect cells infected by a recombinant baculovirus and purified as described previously (Redecke et al., 2013) with the exception that the crystals were not crushed prior to measurements. These crystals have typical dimensions of 5-20 lm length and about 0.9 lm width, and they remain crystalline in a 1Â phosphate buffered saline (PBS) suspension (137 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4, and 2 mM KH2PO4) after purification from the mother cells.

B. SFX experiment
The experiment was carried out using the nanofocus sample chamber of the coherent X-ray imaging instrument (Boutet and Williams, 2010) in June 2013 (Proposal No. L669). Two SFX datasets were collected from TbCatB crystals using 6 keV photons at two different X-ray fluences. A first dataset was collected with the incident X-ray beam attenuated, using silicon filters of different thicknesses, to between 1% and 27% of its full intensity. A second dataset was also collected using the full photon flux, and in this case the detector was protected by an attenuator placed one centimeter downstream of the interaction region to reduce the strong scattered signal that would have damaged the detector. The transmission of this attenuator was 25%, and specially designed to be constant over the 2# scattering angle. The beamline efficiency was estimated to be around 20% at 6 keV, meaning that the average pulse energy at the sample was about 0.5 mJ. Assuming a circular focal spot size with 0.2 lm diameter, the pulse fluence for the unattenuated beam was nominally 1.1 Â 10 13 photons/lm 2 .
A suspension of TbCatB crystals in 1Â PBS buffer at pH 7.2, containing on the order of 10 9 crystals/ml, was passed through a 5 lm stainless steel inline filter. The crystal suspension was then delivered to the X-ray beam as a liquid jet of about 3-5 lm diameter produced with a gas dynamic virtual nozzle (DePonte et al., 2008 andWeierstall et al., 2012) and running with a flow rate between 10 and 25 ll/min. FEL pulses of nominal 60 fs duration were focused onto the liquid stream, about 50 lm away from the nozzle tip. Under these experimental conditions, about 3% of the recorded frames contained diffraction from TbCatB crystals (see Table I for data statistics). C. Data analysis Figure 1 shows the effective scattering strength of sulfur atoms in the protein as a function of the peak fluence, as calculated using XATOM (Son et al., 2011b). Here, the effective scattering strength is expressed in terms of the number of scattering electrons and defined as the root mean square of the time-dependent form factor during the XFEL pulse, weighted by the spatial and temporal X-ray beam profile (Galli et al. unpublished). The temporal profile was assumed to have a 60 fs flat-top shape. The scattering strength is highly dependent on the spatial X-ray beam profile in the interaction volume, and its upper limit was found in the case of spatially uniform fluence (solid line). The dashed line was calculated using a Gaussian spatial profile with a 0.2-lm FWHM. In this calculation, the sample was assumed to be considerably larger than the Gaussian spot, in which case the contribution of atoms in the sample to the effective scattering strength is weighted by the spatially varying fluence. The calculation equivalently represents the result of averaging diffraction intensities obtained from many smaller crystals that each randomly samples positions in the focused profile on different pulses. The dotted line of Figure 1 was obtained by assuming the same Gaussian profile and taking into account only the first 20 fs of the pulse, to simulate a possible self-gating effect of diffraction Caleman et al., 2015). From these calculations, it is expected that the resulting contrast in the effective scattering strength on the sulfur sites between HF (unattenuated beam) and the low-fluence (LF, attenuated beam) is in the range of þ1.3 electrons to þ8.0 electrons, depending on the interaction geometry and on possible self-gating effects.
A total of 101 080 images were identified as crystal hits by the program Cheetah (Barty et al., 2014) and further processed by indexamajig from the CrystFEL suite . Indexing trials were initially performed to optimize the unit cell parameters, starting from the values of a previously deposited structure (PDB code 4HWY) (Redecke et al., 2013) and to refine the detector geometry. Randomly chosen diffraction patterns from each run were visually inspected to find possible regions of the detector to exclude from further analysis. The excluded regions contained, amongst other things, bright streaks close to the central beam caused by scatter from the water jet, shadows from the nozzle tip at high angles of diffraction, and features from the post-sample attenuator. In total, 37 389 and 32 536 diffraction patterns were successfully indexed for the HF and LF conditions, respectively. Monte Carlo integrated intensities were computed with process_hkl , discarding reflections measured less than 11 times. Table I shows the detailed statistics of the two datasets collected as well as the values of some quality metrics determined from the final merged data. It can be seen from Figure 2 that because of the long wavelength and the experimental geometry, the diffracted intensity at high resolution is weak, and the I/r(I) level  decreases to below 2.0 at 3.26 Å , at which point R split also starts to increase rapidly. The two data sets were therefore truncated at this limiting resolution.

D. Substructure determination and phasing attempts
In order to measure relative differences due to the photoionization processes, the diffraction intensities of the two datasets were cross-scaled using CCP4 Scaleit (Howell and Smith, 1992). The low fluence data was treated as a "derivative" set, due to the expected higher scattering strength of the less-ionised atoms, while the high fluence data was treated as "native." The best scaling function was found using a final Wilson scaling performed after a least-squares determination of the isotropic temperature factors.
Initial evidence of a difference signal was obtained from a difference electron density map derived from difference structure factors Fo(LF)-Fo(HF) combined with the phases obtained from a molecular replacement run using the low fluence data, performed using Phaser (McCoy et al., 2007). The search model was from PDB accession code 4HWY, and its refined version is superimposed on the map in Figure 3 to provide a visual reference. This figure shows that some of the strongest difference density (shown in green) corresponds to the sulfur positions, in particular, those belonging to the residues CYS 158, CYS 215, and MET 138. The highest of these peaks reaches a level of about 7r.
A RIP approach was attempted, using the localized contrast on the sulfur positions, with no success. Similarly, conventional S-SAD phasing failed on both datasets, most probably due to the low resolution of the datasets and the insufficient data quality.
We then tried to quantify the electron density loss in the proximity of the sulfur atoms using occupancy refinement. Any excess photoionization of these atoms (compared to the average photoionization of all atoms) will result in a loss of scattering power of the sulfur species that will affect the diffraction pattern. A convenient way to quantify a relative change in the average electron density of each sulfur atom is to refine the occupancies of these atoms, keeping the B-factors of all atoms fixed. Although refinement of occupancy and B-factor is often degenerate, fixing one term allows a difference in electron density to be tracked. Ten parallel refinement runs on the experimental data were carried out using REFMAC5 (Murshudov et al., 1997), where the initial occupancy of the S atoms of a previously refined model was varied from 0 to 1 in steps of 0.1. Figure 4 shows the average electron loss between low and high fluence, for each of the sulfur sites of the TbCatB (shown along the horizontal axis). The associated error bars indicate the standard deviation of the refinement results. Two sites, namely, MET 138 and MET 244, failed to produce meaningful results, either because the final occupancy was close to zero for most of the starting parameters (for MET 138) or because it failed to converge (MET 244), and are indicated with dotted lines. The graph shows an average positive difference of about 1.9 electrons, consistent with an increased ionization at high fluence and at the lower bound of the calculated contrast mentioned in Sec. II C. To test the efficacy of the occupancy refinement, the hardcoded form factor of sulfur in the CCP4 library was modified, adding two extra scattering electrons. The results are shown in the figure with pink circles.
The confidence of the results obtained was tested by performing a single sample t-test with a null-hypothesis, using the average occupancy change as input data. The test resulted in a statistical t value of 6.7, much higher that the critical value (2.1), which allowed the hypothesis that the results were obtained by chance to be excluded.

III. DISCUSSION AND CONCLUSION
The difference signals shown in Figures 3 and 4 give the first experimental evidence of a change in scattering strength of atoms in a protein crystal due to differences in the degree of ionization at different fluences of high-intensity X-ray FEL pulses. The largest changes are observed to be localised at the sulfur sites, due to the higher photoabsorption cross section of this atomic species with respect to the other atoms in the sample. However, the experiment was limited by the low number of indexed diffraction patterns and by the relatively low resolution of the datasets (3.26 Å ), which prevented de novo phasing attempts from succeeding. Recent FIG. 4. The average electron loss around the sulfur sites, calculated from the difference in occupancy between the two datasets (low fluence and high fluence). The red dashed line represents the weighted average. The labels of the horizontal axis are consistent with those used in the deposited structure (4HWY.pdb). The dotted lines indicate that the refinement was found not to work for the corresponding sulfur sites. The black lines below the horizontal axis indicate S-S bridges present in the structure. The pink circles are the result of a test, where the form factor of sulfur was increased by 2 electrons. simulations presented in the work of Galli et al. (2015) suggested, for the same model system, a minimum required RIP peak height of 16r for experimental phasing to be successful. Under the conditions of the experiment reported here, we predict that about 200 000 diffraction patterns would be needed to achieve the required contrast. As already shown (Galli et al. submitted), by using particular criteria for selecting the best diffraction patterns, the difference of the average scattering strength of the heavy atoms between the LF and HF sets could be increased. This selection reduces the number of usable patterns, so it requires initial datasets containing a higher number of patterns: a condition that was not met in this experiment. Besides the criterion considered here of selecting the patterns according to the average integrated intensity of detected peaks as a function of the incoming pulse intensity, it might also be possible to choose patterns that show large differences of the intensity of selected Bragg reflections which are more sensitive to the scattering strength of the heavy atoms. This approach is, however, limited by the unknown degree of partiality of the reflections (Kirian et al., 2010 andWhite et al., 2013), an important factor affecting SFX data that can only be approximately modelled (Sauter, 2015). Further improvement in future experiments may come using extra diagnostics installed at the SFX experiment, to provide shot-by-shot information about the real fluence impinging on the crystal, such as a beam intensity monitor after the interaction region to measure the transmitted beam intensity or a simultaneous measurement of the fluorescence signal from the heavy atoms in the sample.