FreeSAS Package

This is the public API of FreeSAS. Since most of the number crunching part is implemented in Cython, it is not documented here: please refer to the source code.

freesas.align

class InputModels[source]

Bases: object

assign_models(molecule=None)[source]

Create SASModels from pdb files saved in self.inputfiles and saved them in self.models. Center of mass, inertia tensor and canonical parameters are computed for each SASModel.

Parameters:molecule – optional 2d array, coordinates of the atoms for the model to create
Return self.models:
 list of SASModel
rcalculation()[source]

Calculation the maximal value for the R-factors, which is the mean of all the R-factors of inputs plus 2 times the standard deviation. R-factors are saved in the attribute self.rfactors, 1d array, and in percentage.

Return rmax:maximal value for the R-factor
models_selection()[source]

Check if each model respect the limit for the R-factor

Return self.validmodels:
 1d array, 0 for a non valid model, else 1
rfactorplot(filename=None, save=False)[source]

Create a png file with the table of R factor for each model. A threshold is computed to discarded models with Rfactor>Rmax.

Parameters:
  • filename – filename for the figure, default to Rfactor.png
  • save – save automatically the figure if True, else show it
Return fig:

the wanted figures

class AlignModels(files, slow=True, enantiomorphs=True)[source]

Bases: object

Used to align DAM from pdb files

assign_models()[source]

Create SASModels from pdb files saved in self.inputfiles and saved them in self.models. Center of mass, inertia tensor and canonical parameters are computed for each SASModel.

Return self.models:
 list of SASModel
optimize(reference, molecule, symmetry)[source]

Use scipy.optimize to optimize transformation parameters to minimize NSD

Parameters:
  • reference – SASmodel
  • molecule – SASmodel
  • symmetry – 3-list of +/-1
Return p:

transformation parameters optimized

Return dist:

NSD after optimization

alignment_sym(reference, molecule)[source]

Apply 8 combinations to the molecule and select the one which minimize the distance between it and the reference.

Parameters:
  • reference – SASModel, the one which do not move
  • molecule – SASModel, the one wich has to be aligned
Return combinaison:
 

best symmetry to minimize NSD

Return p:

transformation parameters optimized if slow is true, unoptimized else

makeNSDarray()[source]

Calculate the NSD correlation table and save it in self.arrayNSD

Return self.arrayNSD:
 2d array, NSD correlation table
plotNSDarray(rmax=None, filename=None, save=False)[source]

Create a png file with the table of NSD and the average NSD for each model. A threshold is computed to segregate good models and the ones to exclude.

Parameters:
  • rmax – threshold of R factor for the validity of a model
  • filename – filename for the figure, default to nsd.png
  • save – save automatically the figure if True, else show it
Return fig:

the wanted figures

find_reference()[source]

Find the reference model among the models aligned. The reference model is the one with lower average NSD with other models.

Return ref_number:
 position of the reference model in the list self.models
alignment_reference(ref_number=None)[source]

Align all models in self.models with the reference one. The aligned models are saved in pdb files (names in list self.outputfiles)

alignment_2models(save=True)[source]

Align two models using the first one as reference. The aligned models are save in pdb files.

Return dist:NSD after alignment

freesas.autorg

Functions for calculating the radius of gyration and forward scattering intensity.

auto_gpa(data, Rg_min=1.0, qRg_max=1.3, qRg_min=0.5)[source]

Uses the GPA theory to guess quickly Rg, the radius of gyration and I0, the forwards scattering

The theory is described in Guinier peak analysis for visual and automated inspection of small-angle X-ray scattering data Christopher D. Putnam J. Appl. Cryst. (2016). 49, 1412–1419

This fits sqrt(q²Rg²)*exp(-q²Rg²/3)*I0/Rg to the curve I*q = f(q²)

The Guinier region goes arbitrary from 0.5 to 1.3 q·Rg qRg_min and qRg_max can be provided

Parameters:
  • data – the raw data read from disc. Only q and I are used.
  • Rg_min – the minimal accpetable value for the radius of gyration
  • qRg_max – the default upper bound for the Guinier region.
  • qRg_min – the default lower bound for the Guinier region.
Returns:

autRg result with limited information

auto_guinier(data, Rg_min=1.0, qRg_max=1.3, relax=1.2)[source]

Yet another implementation of the Guinier fit

The idea: * extract the reasonable range * convert to the Guinier space (ln(I) = f(q²) * scan all possible intervall * keep any with qRg_max<1.3 (or 1.5 in relaxed mode) * select the begining and the end of the guinier region according to the contribution of two parameters:

  • (q_max·Rg - q_min·Rg)/qRg_max –> in favor of large ranges
  • 1 / RMSD –> in favor of good quality data

For each start and end point, the contribution of all ranges are averaged out (using histograms) The best solution is the start/end position with the maximum average.

  • All ranges within this region are averaged out to measure Rg, I0 and more importantly their deviation.
  • The quality is still to be calculated
  • Aggergation is assessed according a second order polynom fit.
Parameters:
  • data – 2D array with (q,I,err)
  • Rg_min – minimum value for Rg
  • qRg_max – upper bound of the Guinier region
  • relax – relaxation factor for the upper bound
  • resolution – step size of the slope histogram
Returns:

autRg result

freesas.average

class Grid(inputfiles)[source]

Bases: object

This class is used to create a grid which include all the input models

spatial_extent()[source]

Calculate the maximal extent of input models

Return self.size:
 6-list with x,y,z max and then x,y,z min
calc_radius(nbknots=None)[source]

Calculate the radius of each point of a hexagonal close-packed grid, knowing the total volume and the number of knots in this grid.

Parameters:nbknots – number of knots wanted for the grid
Return radius:the radius of each knot of the grid
make_grid()[source]

Create a grid using the maximal size and the radius previously computed. The geometry used is a face-centered cubic lattice (fcc).

Return knots:2d-array, coordinates of each dot of the grid. Saved as self.coordknots.
class AverModels(inputfiles, grid)[source]

Bases: object

Provides tools to create an averaged models using several aligned dummy atom models

read_files(reference=None)[source]

Read all the pdb file in the inputfiles list, creating SASModels. The SASModels created are save in a list, the reference model is the first model in the list.

Parameters:reference – position of the reference model file in the inputfiles list
calc_occupancy(griddot)[source]

Assign an occupancy and a contribution factor to the point of the grid.

Parameters:griddot – 1d-array, coordinates of a point of the grid
Return tuple:2-tuple containing (occupancy, contribution)
assign_occupancy()[source]

For each point of the grid, total occupancy and contribution factor are computed and saved. The grid is then ordered with decreasing value of occupancy. The fourth column of the array correspond to the occupancy of the point and the fifth to the contribution for this point.

Return sortedgrid:
 2d-array, coordinates of each point of the grid
make_header()[source]

Create the layout of the pdb file for the averaged model.

save_aver(filename)[source]

Save the position of each occupied dot of the grid, its occupancy and its contribution in a pdb file.

Parameters:filename – name of the pdb file to write

freesas.bift

Bayesian Inverse Fourier Transform

This code is the implementation of Steen Hansen J. Appl. Cryst. (2000). 33, 1415-1421

Based on the BIFT from Jesse Hopkins, available at: https://sourceforge.net/p/bioxtasraw/git/ci/master/tree/bioxtasraw/BIFT.py

Many thanks to Pierre Paleo for the auto-alpha guess

auto_bift(data, Dmax=None, alpha=None, npt=100, start_point=None, end_point=None, scan_size=11, Dmax_over_Rg=3)[source]

Calculates the inverse Fourier tranform of the data using an optimisation of the evidence

Parameters:
  • data – 2D array with q, I(q), δI(q). q can be in 1/nm or 1/A, it imposes the unit for r & Dmax
  • Dmax – Maximum diameter of the object, this is the starting point to be refined. Can be guessed
  • alpha – Regularisation parameter, let it to None for automatic scan
  • npt – Number of point for the curve p(r)
  • start_point – First useable point in the I(q) curve, this is not the start of the Guinier region
  • end_point – Last useable point in the I(q) curve
  • scan_size – size of the initial geometrical scan for alpha values.
  • Dmax_over_Rg – In average, protein’s Dmax is 3x Rg, use this to adjust
Returns:

BIFT object. Call the get_best to retrieve the optimal solution

freesas.collections

Set of namedtuples defined a bit everywhere

class RG_RESULT(Rg, sigma_Rg, I0, sigma_I0, start_point, end_point, quality, aggregated)

Bases: tuple

I0

Alias for field number 2

Rg

Alias for field number 0

aggregated

Alias for field number 7

end_point

Alias for field number 5

quality

Alias for field number 6

sigma_I0

Alias for field number 3

sigma_Rg

Alias for field number 1

start_point

Alias for field number 4

class FIT_RESULT(slope, sigma_slope, intercept, sigma_intercept, R, R2, chi2, RMSD)

Bases: tuple

R

Alias for field number 4

R2

Alias for field number 5

RMSD

Alias for field number 7

chi2

Alias for field number 6

intercept

Alias for field number 2

sigma_intercept

Alias for field number 3

sigma_slope

Alias for field number 1

slope

Alias for field number 0

class RT_RESULT(Vc, sigma_Vc, Qr, sigma_Qr, mass, sigma_mass)

Bases: tuple

Qr

Alias for field number 2

Vc

Alias for field number 0

mass

Alias for field number 4

sigma_Qr

Alias for field number 3

sigma_Vc

Alias for field number 1

sigma_mass

Alias for field number 5

class RadiusKey(Dmax, npt)

Bases: tuple

Dmax

Alias for field number 0

npt

Alias for field number 1

class PriorKey(type, npt)

Bases: tuple

npt

Alias for field number 1

type

Alias for field number 0

class TransfoValue(transfo, B, sum_dia)

Bases: tuple

B

Alias for field number 1

sum_dia

Alias for field number 2

transfo

Alias for field number 0

class EvidenceKey(Dmax, alpha, npt)

Bases: tuple

Dmax

Alias for field number 0

alpha

Alias for field number 1

npt

Alias for field number 2

class EvidenceResult(evidence, chi2r, regularization, radius, density, converged)

Bases: tuple

chi2r

Alias for field number 1

converged

Alias for field number 5

density

Alias for field number 4

evidence

Alias for field number 0

radius

Alias for field number 3

regularization

Alias for field number 2

class StatsResult(radius, density_avg, density_std, evidence_avg, evidence_std, Dmax_avg, Dmax_std, alpha_avg, alpha_std, chi2r_avg, chi2r_std, regularization_avg, regularization_std, Rg_avg, Rg_std, I0_avg, I0_std)

Bases: tuple

Dmax_avg

Alias for field number 5

Dmax_std

Alias for field number 6

I0_avg

Alias for field number 15

I0_std

Alias for field number 16

Rg_avg

Alias for field number 13

Rg_std

Alias for field number 14

alpha_avg

Alias for field number 7

alpha_std

Alias for field number 8

chi2r_avg

Alias for field number 9

chi2r_std

Alias for field number 10

density_avg

Alias for field number 1

density_std

Alias for field number 2

evidence_avg

Alias for field number 3

evidence_std

Alias for field number 4

radius

Alias for field number 0

regularization_avg

Alias for field number 11

regularization_std

Alias for field number 12

save(filename, source=None)

Save the results of the fit to the file

save_bift(stats, filename, source=None)[source]

Save the results of the fit to the file

class GOF(n, c, P)

Bases: tuple

P

Alias for field number 2

c

Alias for field number 1

n

Alias for field number 0

freesas.cormap

class LongestRunOfHeads[source]

Bases: object

Implements the “longest run of heads” by Mark F. Schilling The College Mathematics Journal, Vol. 21, No. 3, (1990), pp. 196-207

See: http://www.maa.org/sites/default/files/pdf/upload_library/22/Polya/07468342.di020742.02p0021g.pdf

A(n, c)[source]

Calculate A(number_of_toss, length_of_longest_run)

Parameters:
  • n – number of coin toss in the experiment, an integer
  • c – length of the longest run of
Returns:

The A parameter used in the formula

B(n, c)[source]

Calculate B(number_of_toss, length_of_longest_run) to have either a run of Heads either a run of Tails

Parameters:
  • n – number of coin toss in the experiment, an integer
  • c – length of the longest run of
Returns:

The B parameter used in the formula

probaHeadOrTail(n, c)[source]

Calculate the probability of a longest run of head or tails to occur

Parameters:
  • n – number of coin toss in the experiment, an integer
  • c – length of the longest run of heads or tails, an integer
Returns:

The probablility of having c subsequent heads or tails in a n toss of fair coin

probaLongerRun(n, c)[source]

Calculate the probability for the longest run of heads or tails to exceed the observed length

Parameters:
  • n – number of coin toss in the experiment, an integer
  • c – length of thee observed run of heads or tails, an integer
Returns:

The probablility of having more than c subsequent heads or tails in a n toss of fair coin

gof(data1, data2)[source]

Calculate the probability for a couple of dataset to be equivalent

Implementation according to: http://www.nature.com/nmeth/journal/v12/n5/full/nmeth.3358.html

Parameters:
  • data1 – numpy array
  • data2 – numpy array
Returns:

probablility for the 2 data to be equivalent

freesas.invariants

This module is mainly about the calculation of the Rambo-Tainer invariant described in:

https://dx.doi.org/10.1038%2Fnature12070

Some formula taken from Putnam et al, 2007, Table 1 in the review

extrapolate(data, guinier)[source]

Extrapolate SAS data according to the Guinier fit until q=0 Uncertainties are extrapolated (linearly) from the Guinier region

Parameters:
  • data – SAS data in q,I,dI format
  • guinier – result of a Guinier fit
Returns:

extrapolated SAS data

calc_Porod(data, guinier)[source]

Calculate the particle volume according to Porod’s formula:

V = 2*π²I₀²/(sum_q I(q)q² dq)

Formula from Putnam’s review, 2007, table 1 Intensities are extrapolated to q=0 using Guinier fit.

Parameters:
  • data – SAS data in q,I,dI format
  • Guinier – result of a Guinier fit (instance of RT_RESULT)
Returns:

Volume calculated according to Porrod’s formula

calc_Vc(data, Rg, dRg, I0, dI0, imin)[source]

Calculates the Rambo-Tainer invariant Vc, including extrapolation to q=0

Parameters:
  • data – SAS data in q,I,dI format, cropped to maximal q that should be used for calculation (normally 2 nm-1)
  • Rg,dRg,I0,dI0 – results from Guinier approximation/autorg
  • imin – minimal index of the Guinier range, below that index data will be extrapolated by the Guinier approximation
Returns:

Vc and an error estimate based on non-correlated error propagation

calc_Rambo_Tainer(data, guinier, qmax=2.0)[source]

calculates the invariants Vc and Qr from the Rambo & Tainer 2013 Paper, also the the mass estimate based on Qr for proteins

Parameters:
  • data – data in q,I,dI format, q in nm^-1
  • guinier – RG_RESULT instance with result from the Guinier fit
  • qmax – maximum q-value for the calculation in nm^-1

@return: dict with Vc, Qr and mass plus errors

freesas.model

delta_expand(vec1, vec2)[source]

Create a 2d array with the difference vec1[i]-vec2[j]

Parameters:vec2 (vec1,) – 1d-array
Return v1 - v2:difference for any element of v1 and v2 (i.e a 2D array)
class SASModel(molecule=None)[source]

Bases: object

Tools for Dummy Atoms Model manipulation

read(filename)[source]

Read the PDB file, extract coordinates of each dummy atom, extract the R-factor of the model, coordinates of each dummy atom and pdb file header.

Parameters:filename – name of the pdb file to read
save(filename)[source]

Save the position of each dummy atom in a PDB file.

Parameters:filename – name of the pdb file to write
centroid()[source]

Calculate the position of the center of mass of the molecule.

Return self.com:
 1d array, coordinates of the center of mass of the molecule
inertiatensor()[source]

calculate the inertia tensor of the protein

Return self.inertensor:
 inertia tensor of the molecule
canonical_translate()[source]

Calculate the translation matrix to translate the center of mass of the molecule on the origin of the base.

Return trans:translation matrix
canonical_rotate()[source]

Calculate the rotation matrix to align inertia momentum of the molecule on principal axis.

Return rot:rotation matrix det==1
canonical_parameters()[source]

Save the 6 canonical parameters of the initial molecule: x0, y0, z0, the position of the center of mass phi, theta, psi, the three Euler angles of the canonical rotation (axis:x,y’,z’‘)

calc_invariants(use_cython=True)[source]
Calculate the invariants of the structure:
  • fineness, ie. average distance between an atoms and its nearest neighbor
  • radius of gyration of the model
  • diameter of the model
Return invariants:
 3-tuple containing (fineness, Rg, Dmax)
fineness
Rg
Dmax
dist(other, molecule1, molecule2, use_cython=True)[source]

Calculate the distance with another model.

Parameters:
  • self,other – two SASModel
  • molecule1 – 2d array of the position of each atom of the first molecule
  • molecule2 – 2d array of the position of each atom of the second molecule
Return D:

NSD between the 2 molecules, in their position molecule1 and molecule2

transform(param, symmetry, reverse=None)[source]

Calculate the new coordinates of each dummy atoms of the molecule after a transformation defined by six parameters and a symmetry

Parameters:
  • param – 6 parameters of transformation (3 coordinates of translation, 3 Euler angles)
  • symmetry – list of three constants which define a symmetry to apply
Return mol:

2d array, coordinates after transformation

dist_after_movement(param, other, symmetry)[source]

The first molecule, molref, is put on its canonical position. The second one, mol2, is moved following the transformation selected

Parameters:
  • param – list of 6 parameters for the transformation, 3 coordinates of translation and 3 Euler angles
  • symmetry – list of three constants which define a symmetry to apply
Return distance:
 

the NSD between the first molecule and the second one after its movement

freesas.plot

Functions to generating graphs related to SAS.

scatter_plot(data, guinier=None, ift=None, filename=None, img_format='svg', unit='nm', title='Scattering curve', ax=None, labelsize=None, fontsize=None)[source]

Generate a scattering plot I = f(q) in semi_log_y.

Parameters:
  • data – data read from an ASCII file, 3 column (q,I,err)
  • filename – name of the file where the cuve should be saved
  • img_format – image format
  • unit – Unit name for Rg and 1/q
  • guinier – output of autoRg
  • ift – converged instance of BIFT (output of auto_bift)
  • ax – subplot where the plot shall go in
Returns:

the matplotlib figure

kratky_plot(data, guinier, filename=None, img_format='svg', unit='nm', title='Dimensionless Kratky plot', ax=None, labelsize=None, fontsize=None)[source]

Generate a Kratky plot q²Rg²I/I₀ = f(q·Rg)

Parameters:
  • data – data read from an ASCII file, 3 column (q,I,err)
  • guinier – output of autoRg
  • filename – name of the file where the cuve should be saved
  • img_format – image format
  • unit – Unit name for Rg and 1/q
  • ax – subplot where the plot shall go in
Returns:

the matplotlib figure

guinier_plot(data, guinier, filename=None, img_format='png', unit='nm', ax=None, labelsize=None, fontsize=None)[source]

Generate a guinier plot: ln(I) = f(q²)

Parameters:
  • data – data read from an ASCII file, 3 column (q,I,err)
  • guinier – A RG_RESULT object from AutoRg
  • filename – name of the file where the cuve should be saved
  • img_format – image format
Param:

ax: subplot where to plot in

Returns:

the matplotlib figure

density_plot(ift, filename=None, img_format='png', unit='nm', ax=None, labelsize=None, fontsize=None)[source]

Generate a density plot p(r)

@param ift: An IFT result comming out of BIFT @param filename: name of the file where the cuve should be saved @param img_format: image image format @param ax: subplotib where to plot in @return: the matplotlib figure

plot_all(data, filename=None, img_format=None, unit='nm', labelsize=None, fontsize=None)[source]

freesas.sasio

Contains helper functions for loading SAS data from differents sources.

load_scattering_data(filename: Union[os.PathLike, str, bytes]) → numpy.ndarray[source]

Load scattering data q, I, err into a numpy array.

Parameters:filename – ASCII file, 3 column (q,I,err)
Returns:numpy array with 3 column (q,I,err)
parse_ascii_data(input_file_text: List[str], number_of_columns: int) → numpy.ndarray[source]

Parse data from an ascii file into an N column numpy array

Parameters:
  • input_file_text – List containing one line of input data per element
  • number_of_columns – Expected number of lines in the data file
Returns:

numpy array with 3 column (q,I,err)

convert_inverse_angstrom_to_nanometer(data_in_inverse_angstrom: numpy.ndarray) → numpy.ndarray[source]

Convert data with q in 1/Å to 1/nm.

Parameters:data_in_inverse_angstrom – numpy array in format (q_in_inverse_Angstrom,I,err)
Returns:numpy array with 3 column (q_in_inverse_nm,I,err)