Use of triple correlations for the sign determinations of expansion coefficients of symmetric approximations to the diffraction volumes of regular viruses

An X-ray free electron laser is a new source of x-rays some 10 × 109 times brighter than any previous X-ray source, giving rise to the possibility of structure determination of individual biological particles without crystallization. Some of the earliest samples used in the X-ray free electron laser are viruses because they are about the largest of reproducible bioparticles. We show how common virus near-symmetries can be exploited to find a first approximation to their structures to give a starting point for a perturbation approach to determine their structures.


I. INTRODUCTION
As demonstrated in a recent experiment, 1 a single virus particle may be injected into a very short and bright pulsed beam from a coherent X-ray free electron laser (XFEL) and is capable of diffracting enough intensity to form a high-quality diffraction pattern before suffering significant damage. 2 Experiments have suggested the feasibility of this approach for both single particles 3 and nanocrystals. 4 Therefore, a stream of particles can be directed into an XFEL beam, and diffraction patterns recorded sequentially, each corresponding to a different particle orientation. There have also been other significant applications of XFEL radiation, such as the discovery from wide-angle X-ray scattering (WAXS) of a "protein quake" that might explain the utilization of solar energy during photosynthesis. 5 Here, we develop a method to recover the 3D structure of a particle without knowing the orientation of the particle contributing to each diffraction pattern. Though the method we describe assumes a form of symmetry (icosahedral or helical) suggested by Caspar and Klug 6 to be usual with regular viruses, we acknowledge that many viruses deviate from this symmetry with their internal genetic material 1 or with external features such as spikes 7 or hair. 1 Icosahedral or helical symmetry often remains the symmetry associated with the bulk of the capsid. For example, the low-resolution image of the capsid in Fig. 3 of Ref. 1 appears to be largely icosahedral with little evidence of the hair that is known to cover this virus. This suggests that a symmetric structure like those deduced here from simulated XFEL data may form an excellent first guess to such a structure which can later be modified by a form of perturbation theory as used in quantum mechanics, or else form an excellent support as a starting point in an iterative algorithm 8 that constrains the reciprocal space data to the measured angular correlations, and the real-space image to a dynamic support based on the shrink-wrap algorithm 9 with no symmetry restrictions. Our approach in our earlier paper 10 was to find the optimum set of coefficients that ensured positivity of the diffraction intensities. This is not a very strong constraint, and there are a variety of combinations of signs which are consistent with this constraint. A stronger constraint and one which tends to a) Author to whom correspondence should be addressed. Electronic mail: dksaldin@uwm.edu 2329-7778/2015/2(4)/041716/8 V C Author(s) 2015 2, 041716-1 determine these signs more uniquely is that which makes them agree with the magnitudes and signs of the triple correlations 11 derived from the ensemble of diffraction patterns. In this paper, we derive precise expressions for these triple correlations in terms of the relevant expansion coefficients of the diffraction volume, and show how this constraint may be used for determining the signs of the expansion coefficients in the cases of both icosahedral and helical symmetry. The resulting 3D diffraction volume can be phased iteratively with algorithms like charge flipping 12,13 or standard fiber diffraction phasing algorithms (e.g., Refs. 14 and 15) in the case of a helical virus.

II. AVERAGE ANGULAR CORRELATIONS
For a set of experimental diffraction patterns, the average pair correlation function may be defined by where q and q 0 refer to two resolution shells, which can be identified with concentric circles on individual diffraction patterns, and hÁ Á Á i DP refers an average over diffraction patterns. It would be noted that the angular correlations themselves (Eq. (1)) are between intensities on the same diffraction patterns, and thus insensitive to shot-to-shot intensity variations, a problem with an X-ray laser. Of course, hÁ Á Á i DP invokes an average over different diffraction patterns. However, such an average is much less sensitive to shot-to-shot variations than correlations between different diffraction patterns. All that is required is a reasonable statistical distribution of the shotto-shot intensity variations. It should perhaps be pointed out that this method of determining a diffraction volume from random particle orientations works from averages over all particle orientations, and at no stage involves the determination of the relative orientations of the particles contributing to the individual diffraction patterns. It can be shown that, in terms of the spherical harmonic expansion coefficients I lm ðqÞ of the diffraction volume, this angular correlation reduces to 16,17 which is independent of /, where and cosðDhÞ ¼ cos ðhðqÞÞ cos ðhðq 0 ÞÞ þ sin ðhðqÞÞ sin ðhðq 0 ÞÞ cos ðD/Þ; where hðqÞ and hðq 0 Þ refer to scattering angles of the incident radiation corresponding to the resolution shells q and q 0 . Note that, in the above equations, the quantity B l , which is easily recoverable from the set of measured diffraction patterns using the orthogonality property of Legrendre polynomials, is a quadratic function of the spherical harmonic expansion coefficients I lm ðqÞ. If the expansion coefficients could be recovered directly, it would be possible to reconstruct that diffraction volume via Unfortunately, the quantities B l ðq; q 0 Þ, in general, do not determine the coefficients I lm ðqÞ uniquely. 16,17 For example, the coefficients depend on the extra magnetic quantum number m, not specified by B l . However, for particles of known symmetry, each l quantum number may be associated with known values of m in some known ratio. As we show below, this is certainly true of icosahedral and helical viruses. Under such circumstances, as we show below, it is possible to extract the I lm ðqÞ coefficients from the measured B l ðq; q 0 Þ. We begin by considering the icosahedral case.

III. ICOSAHEDRAL PARTICLE
The recovery of charge density of icosahedral virus from pair angular correlation has been reported in a previous work. 10 Since the quantities B l ðq; q 0 Þ extractable from the pair correlations depend only on the angular momentum quantum number, l, the use of icosahedral harmonics (which also depend on the same quantum numbers as they are sums over spherical harmonics of magnetic quantum numbers m in a known proportion) allows their magnitudes to be determined by these quantities alone. Although icosahedral harmonic expansion coefficients for real intensities are real, also there remains an ambiguity of sign. In our previous work, 10 we determined these signs by a positivity constraint on the intensities. This is not a very strong constraint and does not always determine the signs of the icosahedral harmonic expansion coefficients uniquely. We hereby describe an alternative, and stricter, way of constraining these signs by adjusting them to agree with the magnitudes and signs of the triple angular correlations. 11 As we will see, this approach is equally applicable to the great classes of virus symmetry, noted by Caspar and Klug. 6 Icosahedral harmonics, I l , are defined by where the a lm are a set of known coefficients, tabulated as in, e.g., Ref. 18. A general diffraction volume of icosahedral symmetry can be described by an expansion of the form where the amplitudes g l ðqÞ denote the precise combinations of the different icosahedral harmonics needed to specify the 3D diffraction volume of each particle. The quantities g l ðqÞ only depend on l and q, precisely the parameters specifying the quantities B l ðq; q 0 Þ determinable by experiment. Indeed, in terms of the quantities g l ðqÞ, it is easy to see that B l ðq; q 0 Þ ¼ g l ðqÞg l ðq 0 Þ: Consequently, the magnitudes of the g l ðqÞ coefficients may be determined from the shelldiagonal parts of B l via Although icosahedral harmonics are known to be real and consequently the expansion coefficients g l ðqÞ may be chosen to be real to represent a real diffraction volume, there is still an uncertainty of sign. Fortunately, it is possible to determine these signs from other quantities determinable from the experimental data, namely, the triple angular correlations defined by 11 It should be noted that these two-point triple correlations consist of functions of the products of only two pixel intensities, and consequently on averaging over perhaps a million diffraction patterns is likely to give converged values from XFEL diffraction patterns of typical proteins. The current application is to an entire virus, which is perhaps two orders of magnitude larger than a typical protein, so the problem of weak scattered intensities probably does not arise. It can be shown that 16 this can be written as In this case, where bðl 1 ; l; l 2 Þ ¼ X Here, Gðl 1 ; m 1 ; l; m; l 2 ; m 2 Þ is a coefficient defined by It should be noted that T l may be extracted from the measurable quantities C 3 as before, using the orthogonality property of Legendre functions. We may write I lm ðqÞ ¼ a lm g l ðqÞ; where g l ðqÞ is real function of q, and a lm is real. In Eq. (5), I lm ðqÞ is the expansion coefficient with respect to complex spherical harmonics. It is different from most previous works which expand intensity in terms of real spherical harmonics. 10 In Eq. (15), a lm is related to the tabulated values a JH lm of Ref. 18 by if m <> 0 and a lm ¼ a JH lm =d if m ¼ 0: The z axis has been chosen to be along a five-fold axis of the icosahedron, and the zx plane chosen along a mirror plane cutting the pentagon in the xy plane between two vertices. Such a choice ensures that all quantities in Eq. (15) are real. Note that the real square root in Eq. (9) exists because B l ðq; qÞ is real by the definition in Eq. (3). In a previous work, 10 we determine the signs of g l ðqÞ by the positivity condition of intensity. Here, we will make use of the triple correlation which in general provides a more stringent condition on the signs. Because of Friedel's rule, l must be even. The simple form in Eq. (15) is valid only for l < 30, and a lm is non-zero only for selected values of l. 18 The allowed even values of l for non-degenerate icosahedral harmonics are 0, 6, 10, 12, 16, 18, 20, 22, 24, 26, and 28. For a given shell, specified by a value of q, the number of possible combinations of the signs of g l is 2 11 ¼ 2048. We can determine these signs by an exhaustive search over all possible combinations of signs in order to fit T l ðq; qÞ in Eq. (12). By means of Eq. (8),  (12)) and experimental (Eq. (11)) shell-diagonal components of T for all the reference shells were found. The signs can then be propagated to other shells using the chosen multiple references by the non-shelldiagonal B's. With all the signs determined, the I lm (q) and hence the diffraction volume can be calculated. The flipping method 12,13 was used for phasing the diffraction volume to give the charge density of PBCV-1 shown in Fig. 1. Since l max ¼ q max Â R, where R ¼ 1000 Å , is the radius of the virus, and l max ¼ 28 for PBCV-1, we have q max ¼ 28/1000 ¼ 0.028 Å À1 . Therefore, this method is restricted to a resolution of 2p=q max or 220 Å . Fig. 1 was calculated for this resolution. As expected, it is icosahedrally symmetric. The degeneracy of icosahedral harmonics for l between 30 and 44 is only two-fold. 19 Between l ¼ 30 and l ¼ 44, therefore, one may write where the matrix O and its transpose O T are single parameter (2 Â 2) orthogonal matrices that cannot be determined by B l alone since OO T ¼ I, where I is the identity matrix. However, they can be determined by an optimization algorithm from the triple correlations T l ðq; q 0 Þ, where they appear in triplicate and therefore do not cancel. Further work is in progress on this idea. Although to achieve even higher resolution would require the use of many-fold degenerate icosahedral harmonics, there do not, in principle, seem to be reasons they cannot also be found by such techniques, because all the q cross-diagonal T l 's can be determined by this technique from experimental data. To put it another way, although there are extra parameters specified by the extra quantum number n specifying the degerate icosahedral harmonics, we can also exploit the extra information in the q off-diagonal triple correlations T l ðq; q 0 Þ which we do not exploit currently. In other words, we are currently using only a fraction of the data available to us by working only with the q diagonal values of the T l quantities.

IV. HELICAL VIRUS
By the theory of Cochran, Crick, and Vand (CCV), a helix with u subunits or proteins per period should have an integral multiple of u-fold azimuthal symmetry, 20,21 For the tobacco mosaic virus (TMV) with a radius of 100 Å , u ¼ 49. For a resolution of 12 Å , l max ¼ 48. Therefore, only zero-fold or azimuthal symmetry is possible below this resolution. Consequently, only the m ¼ 0 term exists in Eq. (3), which reduces to B l ðq; q 0 Þ ¼ I l0 ðqÞI l0 ðq 0 Þ: From Eq. (20), we have which has exactly the same form as in the icosahedral case except now the angular momentum goes up to 48, and we have a much larger number of possible sign combinations. Again, we make use of triple correlation to determine the signs. The theoretical expression for the triple correlations may be written T l ðq; q 0 Þ ¼ X l 1 l 2 Gðl 1 ; 0; l; 0; l 2 ; 0ÞI l 1 0 ðqÞI l0 ðq 0 ÞI l 2 0 ðqÞ: (22) To demonstrate this method, the simulated experimental data for B and T of TMV were obtained directly from CCV theory. In this case, since the number of sign combinations is too large for an exhaustive search, we use simulated annealing (SA) method instead to minimize the difference between theoretical and experimental triple correlations on a reference shell. In the case of helical virus, it turns out that the magnitude of I l0 ðqÞ does not decay rapidly as a function of l. Therefore, any shell can be chosen as reference shell without causing numerical instability. The SA algorithm was started with a sufficiently high temperature with a high acceptance rate (about 97%) and a random sign combination sgnðq ref ; lÞ at a reference shell; the temperature was then slowly decreased with a cooling rate of 0.9. Each sign at temperature T was flipped one at a time, and the new configuration was always accepted if the energy parameter decreases or DR was negative. In the present case, DR is defined as the change of R factor between theoretical and experimental (q diagonal) triple correlations at q ¼ q ref .
On the other hand, if the energy increases or DR is positive, the configuration is accepted with a probability e ÀDR=T according to the Metropolis criterion. 22 At each temperature, 500 initial sweeps were performed on all signs to thermalize the system. Each sweep consists of successive flips over all the signs. Then 500 additional sweeps were performed and the R factors sampled and averaged. The temperature was lowered, and the same procedure was repeated until the acceptance rate is very low or the average R factor stayed constant for 4 successive temperatures.
The final optimized combination of signs was used to find the I l0 ðqÞ's at the reference shell. Again, after the optimum sign combination was found by triple correlation on a reference shell, the signs were propagated to other shells by the non-diagonal pair correlation matrix B l ðq; q 0 Þ in Eq. (20). Having thus obtained the signs of the I l0 ðqÞ coefficients in addition to their magnitudes, the diffraction volume was calculated by Iðq; hÞ ¼ X I l0 ðqÞP l ðcos ðhÞÞ: The comparison between this recovered intensity and original CCV intensity along the layer lines is shown in Fig. 2. This was found to consist of some discrete layer planes perpendicular to the helical axis (q z axis) at positions q z ¼ k2p=c, where k ¼ 0,1,2, etc., and c is the period as expected from fiber diffraction. As the recovered intensity along the layer lines is the same as that in conventional fiber diffraction experiment, the standard fiber diffraction methods for phasing fiber diffraction intensity can be applied to obtain the structure of TMV. 14,15 Finally, the real space image Fig. 3 was reconstructed as described in the caption from a diffraction volume of a single c-repeat unit and repeated 5 times to produce an image from a 5unit virus, by repeating the image 5 times to produce a 5 c-repeat unit structure. We reconstructed a real-space image from this diffraction volume with the same flipping algorithm that we used for the icosahedral virus. It should be pointed out that methods of fiber diffraction have perfected over the years methods of reconstructing a real-space structure from the layerline intensities. Therefore, it is sufficient for us to reconstruct the correct layer-line intensities by this method to enable "fiber diffraction without fibers" 20

V. CONCLUSION
We have shown that for molecules with either icosahedral or helical symmetry, the symmetries of near-symmetries associated with almost all regular viruses, 6 the spherical harmonic components of the intensity can be obtained directly up to a sign from the pair correlations B. In order to resolve the sign ambiguity, the triple correlations of the experimental intensities are exploited. For the helical TMV, we demonstrated how the fiber diffraction intensity can be recovered without the need for aligning the randomly orientated fibers by experimental means. In both cases, a reconstructed real-space image was obtained by an iterative phasing algorithm applied to the recovered diffraction volume.