This directory contains ia32 SSE assembly implementations for some
of the most important Gromacs nonbonded functions.

Since we want it to work with any compiler we cannot use gcc inline
assembly (Portland and Microsoft don't support it), so they are 
coded entirely in assembly. This is slightly (10-15%) faster too.

It is not as bad as it looks to edit them - search the net for a 
tutorial on x86 assembly. The code is using the intel syntax instead
of the old AT&T syntax employed in early versions of gcc.

I've made pretty good use of registers, and frequently use them to
save information needed several lines further down.
Thus, if you need to add something in the code you might have to 
save a couple of registers on the stack and reload them when you 
are done. 

Normally, SSE works on 4 floats in parallel, and SSE2 on 2 doubles.
This means I've had to write special sections to take care of the
remaining odd elements when neighborlists are not a multiple of 4.
When changing this code you might have to be careful and zero the
unused elements; if the first three positions are valid but the 4th
NaN (not a number), the energy would be NaN if we add them!

The following loops have been implemented in SSE:
(There are also non-force versions in each file)

nb010_sse (No Coul, VdW=Lennard-Jones, no water optimization)
nb030_sse (No Coul, VdW=Table, no water optimization)
nb100_sse (Coul=Normal, No VdW, no water optimization)
nb101_sse (Coul=Normal, No VdW, water=SPC/TIP3P-other atom)
nb102_sse (Coul=Normal, No VdW, water=SPC/TIP3P-SPC/TIP3P)
nb103_sse (Coul=Normal, No VdW, water=TIP4P-other atom)
nb104_sse (Coul=Normal, No VdW, water=TIP4P-TIP4P)
nb111_sse (Coul=Normal, VdW=L-J, water=SPC/TIP3P-other atom)
nb112_sse (Coul=Normal, VdW=L-J, water=SPC/TIP3P-SPC/TIP3P)
nb113_sse (Coul=Normal, VdW=L-J, water=TIP4P-other atom)
nb114_sse (Coul=Normal, VdW=L-J, water=TIP4P-TIP4P)
nb201_sse (Coul=RF, No VdW, water=SPC/TIP3P-other atom)
nb202_sse (Coul=RF, No VdW, water=SPC/TIP3P-SPC/TIP3P)
nb203_sse (Coul=RF, No VdW, water=TIP4P-other atom)
nb204_sse (Coul=RF, No VdW, water=TIP4P-TIP4P)
nb211_sse (Coul=RF, VdW=L-J, water=SPC/TIP3P-other atom)
nb212_sse (Coul=RF, VdW=L-J, water=SPC/TIP3P-SPC/TIP3P)
nb213_sse (Coul=RF, VdW=L-J, water=TIP4P-other atom)
nb214_sse (Coul=RF, VdW=L-J, water=TIP4P-TIP4P)
nb301_sse (Coul=Table, No VdW, water=SPC/TIP3P-other atom)
nb302_sse (Coul=Table, No VdW, water=SPC/TIP3P-SPC/TIP3P)
nb303_sse (Coul=Table, No VdW, water=TIP4P-other atom)
nb304_sse (Coul=Table, No VdW, water=TIP4P-TIP4P)
nb311_sse (Coul=Table, VdW=L-J, water=SPC/TIP3P-other atom)
nb312_sse (Coul=Table, VdW=L-J, water=SPC/TIP3P-SPC/TIP3P)
nb313_sse (Coul=Table, VdW=L-J, water=TIP4P-other atom)
nb314_sse (Coul=Table, VdW=L-J, water=TIP4P-TIP4P)
nb331_sse (Coul=Table, VdW=Table, water=SPC/TIP3P-other atom)
nb332_sse (Coul=Table, VdW=Table, water=SPC/TIP3P-SPC/TIP3P)
nb333_sse (Coul=Table, VdW=Table, water=TIP4P-other atom)
nb334_sse (Coul=Table, VdW=Table, water=TIP4P-TIP4P)


