summaryrefslogtreecommitdiffstats
path: root/libraries
diff options
context:
space:
mode:
author Willy Sudiarto Raharjo <willysr@slackbuilds.org>2022-03-08 00:11:39 +0700
committer Willy Sudiarto Raharjo <willysr@slackbuilds.org>2022-03-08 00:11:39 +0700
commita36d012001a96f7d97959cc7a2a279090bf4631c (patch)
treee58c2a76ce34ec2a66e27ea29aa2e5e91bf4cc9b /libraries
parentc946ccd4de52a7ba91af30c87f13f809d517ce4a (diff)
downloadslackbuilds-a36d012001a96f7d97959cc7a2a279090bf4631c.tar.gz
slackbuilds-a36d012001a96f7d97959cc7a2a279090bf4631c.tar.xz
libraries/atlas: Removed (use OpenBLAS).
Signed-off-by: Willy Sudiarto Raharjo <willysr@slackbuilds.org>
Diffstat (limited to 'libraries')
-rw-r--r--libraries/atlas/AMD64K10h64SSE3.tgzbin11038 -> 0 bytes
-rw-r--r--libraries/atlas/README15
-rw-r--r--libraries/atlas/README.SLACKWARE135
-rw-r--r--libraries/atlas/TimingResults.txt62
-rw-r--r--libraries/atlas/atlas.SlackBuild438
-rw-r--r--libraries/atlas/atlas.info10
-rw-r--r--libraries/atlas/atlas.patch5072
-rw-r--r--libraries/atlas/slack-desc19
8 files changed, 0 insertions, 5751 deletions
diff --git a/libraries/atlas/AMD64K10h64SSE3.tgz b/libraries/atlas/AMD64K10h64SSE3.tgz
deleted file mode 100644
index 727f3748db..0000000000
--- a/libraries/atlas/AMD64K10h64SSE3.tgz
+++ /dev/null
Binary files differ
diff --git a/libraries/atlas/README b/libraries/atlas/README
deleted file mode 100644
index f8b90a9b83..0000000000
--- a/libraries/atlas/README
+++ /dev/null
@@ -1,15 +0,0 @@
-ATLAS (Automatically Tuned Linear Algebra Software) is an ongoing
-research effort focusing on applying empirical techniques in order to
-provide portable performance. At present, it provides C and Fortran77
-interfaces to a portably efficient BLAS implementation, as well as a few
-routines from LAPACK. Nevertheless, by default, this SlackBuild also
-builds a full LAPACK linked with ATLAS. If you are really sure that you
-don't want this, set LAPACK_SOURCE to the empty string when running this
-script.
-
-This conflicts with cblas and lapack (not to be confused with lapack-atlas).
-Nevertheless, it should be possible to avoid these conflicts by proper use
-of the SYS_DESTDIR variable.
-
-The impatient may just switch CPU throttling off and run the script, but
-you are advised to read over README.SLACKWARE *in advance*.
diff --git a/libraries/atlas/README.SLACKWARE b/libraries/atlas/README.SLACKWARE
deleted file mode 100644
index 826d5ddcf5..0000000000
--- a/libraries/atlas/README.SLACKWARE
+++ /dev/null
@@ -1,135 +0,0 @@
-IMPORTANT NOTES
-
-1) The present SlackBuild for ATLAS does by no means try to take into account
- all configuration/build issues of ATLAS. Nevertheless, any relevant patches
- mentioned in the ATLAS Errata are applied.
-
-2) The script mostly assumes that you are installing on an x86 or x86_64
- platform and use gcc for compilation. If you decide to use other compilers or
- install on another platform, you are unfortunately on your own and welcome to
- suggest improvements or patches to this SlackBuild. There is one small
- exception to this: the USE_DWALL variable, see below.
-
-3) There is no "post install" tuning performed by this script.
-
-4) ATLAS does not conflict with the reference netlib BLAS. Nevertheless, if
- ATLAS got installed successfully you should consider removing netlib BLAS and
- (re)compiling every BLAS/LAPACK dependent package. Otherwise you may not have
- much gain from installing ATLAS.
-
-5) There is a strong interaction between ATLAS and LAPACK. By default ATLAS
- implements an optimized subset of LAPACK and creates the corresponding static
- library. Nevertheles, provided that the full LAPACK source is available,
- ATLAS builds a complete LAPACK library linked against its optimized BLAS
- implementation. This is what the atlas SlackBuild does by default. You may
- decide that you don't what this, then make use of the LAPACK_SOURCE variable
- (see below).
-
-
-INSTALLATION DETAILS
-
-1) Make sure CPU throttling is off before starting the install. This is
- important, since ATLAS has to tune itself. As with Slackware 14.2 you
- can run /etc/rc.d/rc.cpufreq as root with "performance" as command line
- argument. To reset, run it again with what gets set at boot time (by
- default "ondemand") as command line argument.
-
-2) For the same reason, keep the extra load on the system as low as possible
- while building ATLAS.
-
-
-GENERIC SETUP VARIABLES
-
-1) SYS_DESTDIR is set by default to "/usr" and is the system destination
- directory. When installing the package produced by this SlackBuild,
- ATLAS's and LAPACK's files will be written to $SYS_DESTDIR/include,
- $SYS_DESTDIR/include/atlas and $SYS_DESTDIR/lib (or lib64).
- Documentation files are written to /usr/doc/atlas-$VERSION if not
- otherwise stated (see below).
- You may want to change the value of SYS_DESTDIR to avoid conflicts. If
- you do so, you have to make sure that these libraries and corresponding
- headers are found by the compiler or the configuration software used
- to build code depending on them.
- IMPORTANT: SYS_DESTDIR has to have an absolute path as value.
-
-2) DEFAULT_DOCS has the default value "yes", which means that docs go
- to /usr/doc/atlas-$VERSION, but you may want to let the docs go
- to $SYS_DESTDIR/doc/atlas-$VERSION. For this, just set this
- variable to "no".
-
-
-SETUP VARIABLES FOR ATLAS
-
-1) USE_ARCH_DEFAULTS defaults to "yes", which means that the library
- will be optimized by trying to take into account former builds done
- on a similar machine. Thus ATLAS will use predefined optimizations
- if available. This may reduce (much) the compilation time but may
- not give you the best result if you don't use the same gcc compiler
- version as the ATLAS author.
- Please note that with this variable set to "no", or if there are no
- known optimizations for your machine ATLAS compilation may last for
- many hours! Take a nap :-)
- NOTE: On the machine of this SlackBuild's author setting
- USE_ARCH_DEFAULTS to "no" provided libraries with definitely
- better performance. Compilation took about six hours.
-
-2) ARCH_DEF_DIR has different meanings, depending on the value of
- USE_ARCH_DEFAULTS:
- a) If USE_ARCH_DEFAULTS is "yes" and you have some custom architectural
- defaults, then you may set this to the absolute path of the directory
- containing the file with your custom defaults.
- b) If USE_ARCH_DEFAULTS is "no" and you would like to create custom
- architectural defaults then set this to the absolute path of the
- directory which should contain the file with the custom defaults.
- NOTE: Since this file is supposed to survive an upgrade, it doesn't
- get included in the Slackware package. You have to remove it
- by hand, if needed. A file named "ARCH_DEF_DIR" gets written
- to the documentation directory, to remind you where the created
- architectural defaults are. Make a backup of it, since it may
- get deleted with an upgrade.
- ARCH_DEF_DIR defaults to the empty string, which means that neither your
- custom defaults are used nor custom defaults are created.
-
-3) USE_DWALL defaults to "no" which should be OK for x86 or x86_64 and the gcc
- compiler. If you are on another architecture than x86 and/or don't use gcc
- you need to set it to "yes".
-
-4) L2_CACHE_SIZE provides the size of the level 2 cache in bytes. By default it
- is deduced from /proc/cpuinfo but you can just set the value manually, if you
- wish or need so.
-
-5) NUM_THREADS allows you to set the maximum number of threads. By default it
- is "-1", which means autodection. In this case it gets set equal to the
- number of available processors.
-
-6) USE_PROCESSORS is by default the empty string, which means that any of the
- available processors may be used. Nevertheless, under some circumstances,
- one may want to specify the processor IDs, e.g. "0 2 4". Please consult
- atlas_install.pdf, p. 13 for more informations.
- NOTES: a) This is incompatible with the autodetection of the number of
- threads. Therefore NUM_THREADS must be greater than 1.
- b) Write just the processor IDs to this string, the script takes
- care of the rest. Take care to have NUM_THREADS equal to the
- amount of processor IDs.
-
-7) SHARED_SWITCH is set by default to ask for building shared libs along with
- the static ones. Set this to the empty string, if you don't want to have
- shared libs.
-
-
-SETUP VARIABLES FOR LAPACK
-
-1) LAPACK_SOURCE set this variable to the empty string, if you don't want for a
- full LAPACK library to get build.
-
-2) TEST_LAPACK set this variable to "yes" if you would like to run the LAPACK
- tests. You will find the results of the tests in the documentation directory.
- This has no relevance, if you didn't allow for a full LAPACK build.
-
-3) LAPACK_TIMER sets the timer to be used for LAPACK. If you stay with
- gfortran, presently the default compiler on Slackware, you can leave the
- value as is. Otherwise, set it to "NONE" or read LAPACK's make.inc.example
- for more informations.
- This has no relevance, if you didn't allow for a full LAPACK build.
-
-
diff --git a/libraries/atlas/TimingResults.txt b/libraries/atlas/TimingResults.txt
deleted file mode 100644
index 4cc33c0093..0000000000
--- a/libraries/atlas/TimingResults.txt
+++ /dev/null
@@ -1,62 +0,0 @@
-MACHINE: Intel Core2 Duo T9600 @ 2.80GHz
-COMPILER: gcc 5.3.0 (as shipped with Slackware Linux 14.2)
-
-The times labeled Reference are for ATLAS as installed by the authors.
-NAMING ABBREVIATIONS:
- kSelMM : selected matmul kernel (may be hand-tuned)
- kGenMM : generated matmul kernel
- kMM_NT : worst no-copy kernel
- kMM_TN : best no-copy kernel
- BIG_MM : large GEMM timing (usually N=1600); estimate of asymptotic peak
- kMV_N : NoTranspose matvec kernel
- kMV_T : Transpose matvec kernel
- kGER : GER (rank-1 update) kernel
-Kernel routines are not called by the user directly, and their
-performance is often somewhat different than the total
-algorithm (eg, dGER perf may differ from dkGER)
-
-
-AFTER A PARTIAL SEARCH, ARCH IDENTIFIED AS Core232SSE3
-======================================================
-
-Reference clock rate=2493Mhz, new rate=2801Mhz
- Refrenc : % of clock rate achieved by reference install
- Present : % of clock rate achieved by present ATLAS install
-
- single precision double precision
- ******************************** *******************************
- real complex real complex
- --------------- --------------- --------------- ---------------
-Benchmark Refrenc Present Refrenc Present Refrenc Present Refrenc Present
-========= ======= ======= ======= ======= ======= ======= ======= =======
- kSelMM 578.5 363.2 564.7 577.7 334.6 352.5 325.1 336.5
- kGenMM 156.3 101.2 156.5 102.0 159.9 159.2 161.7 97.3
- kMM_NT 134.3 125.8 133.0 127.1 151.6 140.7 151.2 152.9
- kMM_TN 154.8 101.3 152.6 101.1 142.4 90.8 149.7 94.2
- BIG_MM 554.0 350.7 554.6 352.2 318.9 330.7 312.3 324.5
- kMV_N 63.6 71.7 106.8 62.5 29.7 40.3 56.5 71.8
- kMV_T 64.7 74.7 108.0 79.3 32.5 44.9 60.5 65.8
- kGER 45.9 37.9 88.6 61.2 22.1 19.7 45.5 44.5
-
-
-AFTER A FULL SEARCH
-===================
-
-Reference clock rate=2493Mhz, new rate=2801Mhz
- Refrenc : % of clock rate achieved by reference install
- Present : % of clock rate achieved by present ATLAS install
-
- single precision double precision
- ******************************** *******************************
- real complex real complex
- --------------- --------------- --------------- ---------------
-Benchmark Refrenc Present Refrenc Present Refrenc Present Refrenc Present
-========= ======= ======= ======= ======= ======= ======= ======= =======
- kSelMM 578.5 624.7 564.7 572.9 334.6 347.2 325.1 334.3
- kGenMM 156.3 156.0 156.5 155.4 159.9 163.2 161.7 163.2
- kMM_NT 134.3 104.8 133.0 96.9 151.6 140.5 151.2 144.5
- kMM_TN 154.8 170.8 152.6 163.5 142.4 122.0 149.7 127.9
- BIG_MM 554.0 527.8 554.6 558.3 318.9 331.3 312.3 331.0
- kMV_N 63.6 72.1 106.8 118.8 29.7 44.8 56.5 79.1
- kMV_T 64.7 78.8 108.0 134.4 32.5 45.5 60.5 88.3
- kGER 45.9 40.2 88.6 74.6 22.1 21.7 45.5 44.8
diff --git a/libraries/atlas/atlas.SlackBuild b/libraries/atlas/atlas.SlackBuild
deleted file mode 100644
index c5204336f1..0000000000
--- a/libraries/atlas/atlas.SlackBuild
+++ /dev/null
@@ -1,438 +0,0 @@
-#!/bin/bash
-
-# Slackware build script for ATLAS
-
-# Copyright 2010-2016 Serban Udrea <s.udrea@gsi.de>
-# All rights reserved.
-#
-# Redistribution and use of this script, with or without modification,
-# is permitted provided that the following conditions are met:
-#
-# 1. Redistributions of this script must retain the above copyright
-# notice, this list of conditions and the following disclaimer.
-#
-# THIS SOFTWARE IS PROVIDED BY THE AUTHOR ''AS IS'' AND ANY EXPRESS OR
-# IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-# DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
-# INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
-# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
-# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
-# STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
-# IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
-# POSSIBILITY OF SUCH DAMAGE.
-
-cd $(dirname $0) ; CWD=$(pwd)
-
-PRGNAM=atlas
-VERSION=${VERSION:-3.10.3}
-BUILD=${BUILD:-2}
-TAG=${TAG:-_SBo}
-PKGTYPE=${PKGTYPE:-tgz}
-
-if [ -z "$ARCH" ]; then
- case "$( uname -m )" in
- i?86) ARCH=i586 ;;
- arm*) ARCH=arm ;;
- *) ARCH="$( uname -m )" ;;
- esac
-fi
-
-# If the variable PRINT_PACKAGE_NAME is set, then this script will report what
-# the name of the created package would be, and then exit. This information
-# could be useful to other scripts.
-if [ ! -z "${PRINT_PACKAGE_NAME}" ]; then
- echo "$PRGNAM-$VERSION-$ARCH-$BUILD$TAG.$PKGTYPE"
- exit 0
-fi
-
-TMP=${TMP:-/tmp/SBo}
-PKG=$TMP/package-$PRGNAM
-OUTPUT=${OUTPUT:-/tmp}
-
-if [ "$ARCH" = "i586" ]; then
- SLKCFLAGS="-O2 -march=i586 -mtune=i686"
- LIBDIRSUFFIX=""
- BITSize="32" # Specifically for ATLAS
-elif [ "$ARCH" = "i686" ]; then
- SLKCFLAGS="-O2 -march=i686 -mtune=i686"
- LIBDIRSUFFIX=""
- BITSize="32" # Specifically for ATLAS
-elif [ "$ARCH" = "x86_64" ]; then
- SLKCFLAGS="-O2 -fPIC"
- LIBDIRSUFFIX="64"
- BITSize="64" # Specifically for ATLAS
-fi
-
-# If you don't want to use architectural defaults set the following to
-# something like "no".
-#
-USE_ARCH_DEFAULTS=${USE_ARCH_DEFAULTS:-yes}
-
-# If you decide to use arch defaults and have some custom ones you may
-# set the following variable to point to the directory containing these.
-#
-# If you decide to not use arch defauts and wish to create some after a build
-# with full search, set the following variable to point to the directory where
-# the file containing them should be placed.
-# IMPORTANT: In this case, the file copied to ARCH_DEF_DIR will not be part of
-# the ATLAS package, to avoid problems in case of an upgrade on the
-# same machine. The value of ARCH_DEF_DIR will be written for your
-# reference to the file named ARCH_DEF_DIR within the doc directory
-# of ATLAS.
-#
-ARCH_DEF_DIR=${ARCH_DEF_DIR:-""}
-
-# If you are on another architecture than x86 and/or don't use gcc you need to
-# set the following variable to "yes".
-#
-USE_DWALL=${USE_DWALL:-no}
-
-# You may wish to set the level 2 cache size to the proper value. The default
-# is to deduce it from /proc/cpuinfo
-#
-L2_CACHE_SIZE=${L2_CACHE_SIZE:-"auto"}
-
-if [ "$L2_CACHE_SIZE" = "auto" ]; then
- L2_CACHE_SIZE="$(cat /proc/cpuinfo |grep "cache size"| head -n 1| cut -d ":" -s -f2| cut -d " " -s -f2)"
- L2_SIZE_UNIT="$(cat /proc/cpuinfo |grep "cache size"| head -n 1| cut -d " " -s -f4)"
- case "$L2_SIZE_UNIT" in
- "KB") L2_CACHE_SIZE=$(($L2_CACHE_SIZE * 1024))
- ;;
- "MB") L2_CACHE_SIZE=$(($L2_CACHE_SIZE * 1024 * 1024))
- ;;
- "GB") L2_CACHE_SIZE=$(($L2_CACHE_SIZE * 1024 * 1024 * 1024))
- ;;
- esac
-fi
-
-# Check the value of L2_CACHE_SIZE
-#
-case "$L2_CACHE_SIZE" in
- ''|'0'|*[!0-9]*) echo "ERROR: The value of L2_CACHE_SIZE is not a strictly positive integer!"
- exit 1
- ;;
-esac
-
-# Set the (maximum) number of threads. If this is 0 just the serial libs get
-# built, even on an SMP machine. By default it's set to -1 for autodetection.
-#
-NUM_THREADS=${NUM_THREADS:-"-1"}
-case "$NUM_THREADS" in
- '-1'|'0') echo -n # Do nothing
- ;;
- '1') NUM_THREADS="0" # One processor => no threading
- ;;
- ''|*[!0-9]*) echo "ERROR: NUM_THREADS has an improper value!"
- exit 1
- ;;
-esac
-
-if [ $NUM_THREADS -gt 1 ]; then
- # On SMP machines one may want to set the processors to be used (see
- # atlas_install.pdf, p. 13). By default the list of processor ID's is empty
- # which means that ATLAS may use whatever is available.
- # NOTE: This is incompatible with the autodetection of the number of threads.
- # Therefore NUM_THREADS must be greater than 1.
- #
- USE_PROCESSORS=${USE_PROCESSORS:-""}
- if [ -z "$USE_PROCESSORS" ]; then
- MT_SWITCH="-t $NUM_THREADS"
- else
- MT_SWITCH="--force-tids=\"$NUM_THREADS $USE_PROCESSORS\""
- fi
-else
- MT_SWITCH="-t $NUM_THREADS"
-fi
-
-# Decide upon building full LAPACK or not. Set LAPACK_SOURCE to the empty
-# string, if you don't want a full LAPACK build.
-#
-LAPACK_SOURCE=${LAPACK_SOURCE:-"/usr/share/lapack-atlas/lapack.tgz"}
-if [ -z "$LAPACK_SOURCE" ]; then
- echo
- echo "WARNING"
- echo "WARNING: No LAPACK source specified. Just the highly restricted LAPACK"
- echo " offered by ATLAS will get compiled!"
- echo "WARNING"
- echo
- sleep 3
-else
- tar -tf "$LAPACK_SOURCE" > /dev/null 2>&1 || \
- { echo "ERROR: Improper LAPACK source archive!" \
- && echo " Please check $LAPACK_SOURCE" \
- && echo " and set it properly! " \
- && exit 1; } # NOTE: Here we just test that we deal with a tar archive.
- LAPACK_SOURCE="--with-netlib-lapack-tarfile=$LAPACK_SOURCE"
-
- # Change the following to yes if you would like to run the tests for LAPACK.
- #
- TEST_LAPACK="${TEST_LAPACK:-no}"
- # Make Y or N out of yes, Yes, No, no, etc.
- #
- TEST_LAPACK=$(echo "$TEST_LAPACK"|cut -b 1|tr a-z A-Z)
-fi
-
-# Decide upon building shared libraries or not. By default we ask for shared
-# libs too. If one doesn't want this, she has to just set SHARED_SWITCH to the
-# empty string.
-#
-SHARED_SWITCH=${SHARED_SWITCH:-"--shared"}
-
-# This is the timer to be used for LAPACK. If you stay with gfortran,
-# presently the default compiler on Slackware, you can leave the value as is.
-# Otherwise, please read LAPACK's make.inc.example for more informations.
-#
-LAPACK_TIMER="${LAPACK_TIMER:-INT_ETIME}"
-
-# This is the system destination directory. When installing the
-# package produced by this script, ATLAS's files will be written to
-# $SYS_DESTDIR/include, $SYS_DESTDIR/include/atlas, $SYS_DESTDIR/lib
-# or $SYS_DESTDIR/lib64 ond appropriate platforms, etc.
-# Nevertheless, by default the documentation files go to
-# /usr/doc/$PRGNAM-$VERSION. You may change this through the variable
-# DEFAULT_DOCS, see below.
-#
-SYS_DESTDIR=${SYS_DESTDIR:-/usr}
-
-# Check if SYS_DESTDIR is an absolute path. If not, exit with error.
-# NOTE: The $ is used because echo adds a \n at the end of the string.
-#
-echo $SYS_DESTDIR | grep -vE '/\.\./|/\.\.$' | grep -qE '^/' || \
-{ echo "ERROR: The system destination directory has no absolute path!" \
-&& echo " The value of SYS_DESTDIR is $SYS_DESTDIR" \
-&& echo " Please set it properly! " \
-&& exit 1; }
-
-# You may want to have the documentation files installed under
-# $SYS_DESTDIR/doc/$PRGNAM-$VERSION not /usr/doc/$PRGNAM-$VERSION.
-# To achieve this just set the following variable to something like
-# "no".
-#
-DEFAULT_DOCS=${DEFAULT_DOCS:-yes}
-
-# The build directory to be created within the source directory of
-# ATLAS.
-#
-BLDdir="BuildDir"
-
-set -e
-
-rm -rf $PKG
-mkdir -p $TMP $PKG $OUTPUT
-
-cd $TMP
-rm -rf $PRGNAM-$VERSION
-tar xvf $CWD/${PRGNAM}${VERSION}.tar.bz2
-mv ATLAS $PRGNAM-$VERSION
-cd $PRGNAM-$VERSION
-chown -R root:root .
-find -L . \
- \( -perm 777 -o -perm 775 -o -perm 750 -o -perm 711 -o -perm 555 \
- -o -perm 511 \) -exec chmod 755 {} \; -o \
- \( -perm 666 -o -perm 664 -o -perm 640 -o -perm 600 -o -perm 444 \
- -o -perm 440 -o -perm 400 \) -exec chmod 644 {} \;
-
-# Set the proper value to USE_ARCH_DEFAULTS, and the proper value to the
-# configure switch needed in case you want to use custom arch defaults.
-#
-ARCH_DIR_SWITCH=""
-case "$USE_ARCH_DEFAULTS" in
- [yY]|[yY][eE]|[yY][eE][sS]) USE_ARCH_DEFAULTS="1"
- [ -z "$ARCH_DEF_DIR" ] || \
- ARCH_DIR_SWITCH="-Ss ADdir $ARCH_DEF_DIR"
- ;;
- *) USE_ARCH_DEFAULTS="0" ;;
-esac
-
-mkdir -p $BLDdir
-cd $BLDdir
-
-# Configure atlas.
-#
-case "$USE_DWALL" in
- [yY]|[yY][eE]|[yY][eE][sS])
- # Here we assume that we aren't on a x86 machine
- # and/or gcc isn't the compiler to be used.
- #
- ../configure $SHARED_SWITCH \
- --prefix="$SYS_DESTDIR" \
- $LAPACK_SOURCE \
- $MT_SWITCH \
- -Si archdef "$USE_ARCH_DEFAULTS" \
- $ARCH_DIR_SWITCH \
- -b "$BITSize" -D c -DWALL
- ;;
- *)
- # Here we assume that we are on a x86 machine
- # (be it 32 or 64 bits) and gcc is the compiler
- # to be used.
- #
- # Get the CPU frequency for good timing.
- #
- CPU_FREQ="$(cat /proc/cpuinfo |grep "cpu MHz"| head -n 1| cut -d ":" -s -f2| tr -d [:blank:])"
- #
- ../configure $SHARED_SWITCH \
- --prefix="$SYS_DESTDIR" \
- $LAPACK_SOURCE \
- $MT_SWITCH \
- -Si archdef "$USE_ARCH_DEFAULTS" \
- $ARCH_DIR_SWITCH \
- -b "$BITSize" \
- -D c -DPentiumCPS="$CPU_FREQ"
- ;;
-esac
-
-# NOTES ON SOME FLAGS FOR CONFIGURE
-#
-# SHARED_SWITCH = "--shared" asks for building the shared libraries too
-# -Si archdef "$USE_ARCH_DEFAULTS" means that we ignore or not architectural defaults depending
-# upon the value of "$USE_ARCH_DEFAULTS".
-# -b "$BITSize" tells ATLAS about the platform's bitsize, 32 or 64.
-# -D c -DPentiumCPS="$CPU_FREQ" is for achieving good timing on x86 platforms with gcc.
-# -D c -DWALL is for achieving good timing on non x86 platforms and/or non gcc compilers
-
-# Write the value of L2_CACHE_SIZE to Make.inc
-#
-sed -i -r Make.inc -e \
- "s%L2SIZE = -DL2SIZE=[0-9]+%L2SIZE = -DL2SIZE=$L2_CACHE_SIZE%"
-
-# Allow for deprecated LAPACK routines to get build in case of a full LAPACK
-# installation. Also set the LAPACK timer to the desired value and add
-# -frecursive to the compile flags, since this should help avoid problems
-# with some functions which seem otherwise to not be thread safe.
-#
-if [ "$LAPACK_SOURCE" ]; then
- sed -i ./src/lapack/reference/make.inc.example -e \
- "s%^#MAKEDEPRECATED *=.*Yes%MAKEDEPRECATED = Yes%"
- sed -i ./interfaces/lapack/F77/src/Makefile -e \
- "s%NONE%$LAPACK_TIMER%" -e \
- "s%F77FLAGS)@%F77FLAGS) -frecursive@%" -e \
- "s%F77NOOPT)@%F77NOOPT) -frecursive@%"
-fi
-
-make build
-make check
-
-# If parallel libraries have been compiled check them too.
-#
-if [ -f lib/libptcblas.a ]; then
- make ptcheck
-fi
-
-# If the full LAPACK got build one may wish to test it too.
-#
-if [ "$LAPACK_SOURCE" ]; then
- if [ "$TEST_LAPACK" = "Y" ]; then
- ( cd src/lapack/reference
- [ -e ./libtmglib.a ] || make tmglib
- # Some testers segfault when build with -frecursive if one doesn't
- # increase the stack size limit, thus it's better to remove this flag
- # from make.inc
- #
- sed -i make.inc -e "s%-frecursive%%"
-
- # Now we have to set the proper library paths. Here for the serial libs.
- #
- ATLAS_LIBS="../../../../../lib/libf77blas.a ../../../../../lib/libcblas.a"
- ATLAS_LIBS="$ATLAS_LIBS ../../../../../lib/libatlas.a"
- LAPACK_LIB="../../../lib/liblapack.a"
-
- sed -i make.inc -e \
- "s%^BLASLIB *=.*%BLASLIB = $ATLAS_LIBS%" -e \
- "s%^CBLASLIB *=.*%CBLASLIB =%" -e \
- "s%^LAPACKLIB *=.*%LAPACKLIB = $LAPACK_LIB%"
-
- # Perform the tests.
- #
- make lapack_testing
-
- # Put the test results together
- #
- tar czf TEST_SERIAL_RESULTS.tgz TESTING/*.out
-
- # If threaded libs got build, we repeat the tests with them.
- #
- if [ -e ../../../lib/libptlapack.a ]; then
- make cleantesting
- ATLAS_LIBS="../../../../../lib/libptf77blas.a"
- ATLAS_LIBS="$ATLAS_LIBS ../../../../../lib/libptcblas.a"
- ATLAS_LIBS="$ATLAS_LIBS ../../../../../lib/libatlas.a -lpthread"
- LAPACK_LIB="../../../lib/libptlapack.a"
- sed -i make.inc -e \
- "s%^BLASLIB *=.*%BLASLIB = $ATLAS_LIBS%" -e \
- "s%^LAPACKLIB *=.*%LAPACKLIB = $LAPACK_LIB%"
- make lapack_testing
- tar czf TEST_PT_RESULTS.tgz TESTING/*.out
- fi
- )
- fi
-fi
-
-make install DESTDIR=${PKG}${SYS_DESTDIR}
-
-# The install script (sometimes) "forgets" about libptlapack.a
-#
-cp -ua lib/libptlapack.a ${PKG}${SYS_DESTDIR}/lib/ || true
-
-find $PKG | xargs file | grep -e "executable" -e "shared object" | grep ELF \
- | cut -f 1 -d : | xargs strip --strip-unneeded 2> /dev/null || true
-
-# This is probably the easiest way to make sure that we install in the
-# proper place.
-#
-if [ "$LIBDIRSUFFIX" ]; then
- mv ${PKG}${SYS_DESTDIR}/lib ${PKG}${SYS_DESTDIR}/lib${LIBDIRSUFFIX}
-fi
-
-# Create the doc directory for atlas and populate it.
-#
-case "$DEFAULT_DOCS" in
- [nN]|[nN][oO]) DOC_DIR="$PKG$SYS_DESTDIR/doc/$PRGNAM-$VERSION" ;;
- *) DOC_DIR="$PKG/usr/doc/$PRGNAM-$VERSION" ;;
-esac
-mkdir -p ${DOC_DIR}
-cp -a ../INSTALL.txt ../README ../doc ${DOC_DIR}
-
-# Add the Slackbuild script and README.SLACKWARE to the docs.
-#
-cat $CWD/$PRGNAM.SlackBuild > $DOC_DIR/$PRGNAM.SlackBuild
-cat $CWD/README.SLACKWARE > $DOC_DIR/README.SLACKWARE
-
-# Create custom arch defaults if appropriate.
-#
-if [ "$USE_ARCH_DEFAULTS" = "0" ]; then
- if [ "$ARCH_DEF_DIR" ]; then
- ( cd ARCHS
- make ArchNew
- make tarfile
- cp -ua *.tar.* "$ARCH_DEF_DIR"
- )
- echo "$ARCH_DEF_DIR" > $DOC_DIR/ARCH_DEF_DIR
- fi
-fi
-
-# If the full LAPACK got installed add also some relevant files from its source
-# tree.
-#
-if [ "$LAPACK_SOURCE" ]; then
- ( cd src/lapack/reference
- LAPACK_VER=$(./INSTALL/testversion | sed -e "s% *LAPACK *%%" -e "s% *%%g")
- LAPACK_DOC_DIR="${DOC_DIR}/lapack-$LAPACK_VER"
- mkdir "$LAPACK_DOC_DIR"
- cp -a LICENSE README "$LAPACK_DOC_DIR"
-
- # Copy the test results if present (getting around "set -e" with "echo -n").
- #
- cp -a TEST_* "$LAPACK_DOC_DIR" 2>/dev/null || echo -n
- )
-fi
-
-rm -f $PKG/usr/lib*/*.la
-
-mkdir -p $PKG/install
-cat $CWD/slack-desc > $PKG/install/slack-desc
-
-cd "$PKG"
-/sbin/makepkg -l y -c n $OUTPUT/$PRGNAM-$VERSION-$ARCH-$BUILD$TAG.$PKGTYPE
diff --git a/libraries/atlas/atlas.info b/libraries/atlas/atlas.info
deleted file mode 100644
index 72483a6644..0000000000
--- a/libraries/atlas/atlas.info
+++ /dev/null
@@ -1,10 +0,0 @@
-PRGNAM="atlas"
-VERSION="3.10.3"
-HOMEPAGE="http://math-atlas.sourceforge.net/"
-DOWNLOAD="http://downloads.sourceforge.net/math-atlas/atlas3.10.3.tar.bz2"
-MD5SUM="d6ce4f16c2ad301837cfb3dade2f7cef"
-DOWNLOAD_x86_64=""
-MD5SUM_x86_64=""
-REQUIRES="lapack-atlas"
-MAINTAINER="Serban Udrea"
-EMAIL="S.Udrea@gsi.de"
diff --git a/libraries/atlas/atlas.patch b/libraries/atlas/atlas.patch
deleted file mode 100644
index dea4dcc0b2..0000000000
--- a/libraries/atlas/atlas.patch
+++ /dev/null
@@ -1,5072 +0,0 @@
-diff -rupN ATLAS/CONFIG/src/backend/archinfo_x86.c atlas-3.8.3/CONFIG/src/backend/archinfo_x86.c
---- ATLAS/CONFIG/src/backend/archinfo_x86.c 2009-02-18 19:47:37.000000000 +0100
-+++ atlas-3.8.3/CONFIG/src/backend/archinfo_x86.c 2009-11-12 13:47:23.777451677 +0100
-@@ -320,7 +320,7 @@ enum MACHTYPE Chip2Mach(enum CHIP chip,
- iret = IntP4;
- break;
- case 3:
-- case 4:
-+ case 4: ; case 6:
- iret = IntP4E;
- break;
- default:
-diff -rupN ATLAS/include/atlas_lvl3.h atlas-3.8.3/include/atlas_lvl3.h
---- ATLAS/include/atlas_lvl3.h 2009-02-18 19:47:35.000000000 +0100
-+++ atlas-3.8.3/include/atlas_lvl3.h 2009-11-12 13:52:49.308496090 +0100
-@@ -126,7 +126,7 @@
- #define CPAT Mjoin(C_ATL_, PRE);
-
- #ifndef ATL_MaxMalloc
-- #define ATL_MaxMalloc 67108864
-+ #define ATL_MaxMalloc XXX_MaxMalloc_XXX
- #endif
-
- typedef void (*MAT2BLK)(int, int, const TYPE*, int, TYPE*, const SCALAR);
-diff -rupN ATLAS/src/blas/gemm/ATL_cmmJITcp.c atlas-3.8.3/src/blas/gemm/ATL_cmmJITcp.c
---- ATLAS/src/blas/gemm/ATL_cmmJITcp.c 2009-02-18 19:47:44.000000000 +0100
-+++ atlas-3.8.3/src/blas/gemm/ATL_cmmJITcp.c 2009-11-12 12:44:34.816529051 +0100
-@@ -268,7 +268,8 @@ static void Mjoin(PATL,mmK)
- {
- NBmm0 = NBmm1 = NBmmX = Mjoin(PATLU,pKBmm);
- if (SCALAR_IS_ZERO(beta))
-- Mjoin(PATL,gezero)(M, N, C, ldc);
-+ /* Mjoin(PATL,gezero)(M, N, C, ldc); */
-+ { Mjoin(PATLU,gezero)(M, N, pC, ldpc); Mjoin(PATLU,gezero)(M, N, pC+ipc, ldpc); }
- }
- if (nblk)
- {
-diff -rupN ATLAS/src/blas/gemm/ATL_gereal2cplx.c atlas-3.8.3/src/blas/gemm/ATL_gereal2cplx.c
---- ATLAS/src/blas/gemm/ATL_gereal2cplx.c 2009-02-18 19:47:44.000000000 +0100
-+++ atlas-3.8.3/src/blas/gemm/ATL_gereal2cplx.c 2009-11-12 12:49:49.331651677 +0100
-@@ -43,7 +43,53 @@ void Mjoin(PATL,gereal2cplx)
- const int ldc2 = (ldc-M)<<1;
- int i, j;
-
-- if (ialp == ATL_rzero && ibet == ATL_rzero)
-+/*
-+ * Cannot read C if BETA is 0
-+ */
-+ if (rbet == ATL_rzero && ibet == ATL_rzero)
-+ {
-+ if (ialp == ATL_rzero) /* alpha is a real number */
-+ {
-+ if (ralp == ATL_rone) /* alpha = 1.0 */
-+ {
-+ for (j=0; j < N; j++, R += ldr, I += ldi, C += ldc2)
-+ {
-+ for (i=0; i < M; i++, C += 2)
-+ {
-+ *C = R[i];
-+ C[1] = I[i];
-+ }
-+ }
-+ }
-+ else
-+ {
-+ for (j=0; j < N; j++, R += ldr, I += ldi, C += ldc2)
-+ {
-+ for (i=0; i < M; i++, C += 2)
-+ {
-+ *C = ralp * R[i];
-+ C[1] = ralp * I[i];
-+ }
-+ }
-+ }
-+ }
-+ else /* alpha is a complex number */
-+ {
-+ for (j=0; j < N; j++, R += ldr, I += ldi, C += ldc2)
-+ {
-+ for (i=0; i < M; i++, C += 2)
-+ {
-+ ra = R[i]; ia = I[i];
-+ C[0] = ralp * ra - ialp * ia;
-+ C[1] = ralp * ia + ialp * ra;
-+ }
-+ }
-+ }
-+ }
-+/*
-+ * If alpha and beta are both real numbers
-+ */
-+ else if (ialp == ATL_rzero && ibet == ATL_rzero)
- {
- if (ralp == ATL_rone && rbet == ATL_rone)
- {
-diff -rupN ATLAS/tune/blas/gemm/CASES/ATL_smm14x1x84_sseCU.c atlas-3.8.3/tune/blas/gemm/CASES/ATL_smm14x1x84_sseCU.c
---- ATLAS/tune/blas/gemm/CASES/ATL_smm14x1x84_sseCU.c 2009-02-18 19:48:26.000000000 +0100
-+++ atlas-3.8.3/tune/blas/gemm/CASES/ATL_smm14x1x84_sseCU.c 2009-11-12 12:35:50.453038827 +0100
-@@ -27,6 +27,13 @@
- * POSSIBILITY OF SUCH DAMAGE.
- *
- */
-+#if KB > 84
-+ #error "KB cannot exceed 84!"
-+#endif
-+#if (KB/4)*4 != KB
-+ #error "KB must be a multiple of 4!"
-+#endif
-+
- #ifndef ATL_GAS_x8664
- #error "This kernel requires x86-64 assembly!"
- #endif
-@@ -58,25 +65,25 @@
- * Integer register usage shown be these defines
- */
- #define pA %rcx
--#define pA10 %rbx
--#define ldab %rbp
--#define mldab %rdx
-+#define pA10 %rbx
-+#define ldab %rbp
-+#define mldab %rdx
- #define mldab5 %rax
- #define pB %rdi
- #define pC %rsi
- #define incCn %r10
- #define stM %r9
- #define stN %r11
--#define pfA %r8
--#define pA5 pA
--#define pB0 pB
-+#define pfA %r8
-+#define pA5 pA
-+#define pB0 pB
- #if MB == 0
-- #define stM0 %r12
-- #define incAm %r13
-+ #define stM0 %r12
-+ #define incAm %r13
- #endif
- /* rax used in 32/64 conversion */
-
--#define NBso (KB*4)
-+#define NBso (KB*4)
- #define MBKBso (MB*KB*4)
- #define NB2so (NBso+NBso)
- #define NB3so (NBso+NBso+NBso)
-@@ -95,22 +102,22 @@
- /*
- * SSE2 register usage shown be these defines
- */
--#define rA0 %xmm0
--#define rB0 %xmm1
--#define rC0 %xmm2
--#define rC1 %xmm3
--#define rC2 %xmm4
--#define rC3 %xmm5
--#define rC4 %xmm6
--#define rC5 %xmm7
--#define rC6 %xmm8
--#define rC7 %xmm9
--#define rC8 %xmm10
--#define rC9 %xmm11
--#define rC10 %xmm12
--#define rC11 %xmm13
--#define rC12 %xmm14
--#define rC13 %xmm15
-+#define rA0 %xmm0
-+#define rB0 %xmm1
-+#define rC0 %xmm2
-+#define rC1 %xmm3
-+#define rC2 %xmm4
-+#define rC3 %xmm5
-+#define rC4 %xmm6
-+#define rC5 %xmm7
-+#define rC6 %xmm8
-+#define rC7 %xmm9
-+#define rC8 %xmm10
-+#define rC9 %xmm11
-+#define rC10 %xmm12
-+#define rC11 %xmm13
-+#define rC12 %xmm14
-+#define rC13 %xmm15
- /*
- * Prefetch defines
- */
-@@ -127,99 +134,99 @@
- #if MB != 0
- #define incAm $MBKBso-NB14so+176
- #endif
-- .text
-+ .text
- .global ATL_asmdecor(ATL_USERMM)
- ATL_asmdecor(ATL_USERMM):
- /*
- * Save callee-saved iregs
- */
-- movq %rbp, -8(%rsp)
-- movq %rbx, -16(%rsp)
-+ movq %rbp, -8(%rsp)
-+ movq %rbx, -16(%rsp)
- #if MB == 0
-- movq %r12, -32(%rsp)
-- movq %r13, -40(%rsp)
-+ movq %r12, -32(%rsp)
-+ movq %r13, -40(%rsp)
- #endif
- #ifdef BETAX
- #define BOF -56
-- movss %xmm1, BOF(%rsp)
-- movss %xmm1, BOF+4(%rsp)
-- movss %xmm1, BOF+8(%rsp)
-- movss %xmm1, BOF+12(%rsp)
-+ movss %xmm1, BOF(%rsp)
-+ movss %xmm1, BOF+4(%rsp)
-+ movss %xmm1, BOF+8(%rsp)
-+ movss %xmm1, BOF+12(%rsp)
- #endif
- /*
- * pA already comes in right reg
- * Initialize pB = B; pC = C; NBso = NB * sizeof;
- */
-- movq %rsi, stN
-- movq %rdi, %rax
-- movq 16(%rsp), pC
-- prefC((pC))
-- prefC(64(pC))
-- movq %r9, pB
-- prefB((pB))
-- prefB(64(pB))
-- movq %rax, stM
-+ movq %rsi, stN
-+ movq %rdi, %rax
-+ movq 16(%rsp), pC
-+ prefC((pC))
-+ prefC(64(pC))
-+ movq %r9, pB
-+ prefB((pB))
-+ prefB(64(pB))
-+ movq %rax, stM
- /*
- * stM = pA + NBNBso; stN = pB + NBNBso;
- */
- #if MB == 0
-- movq stM, pfA
-- imulq $NBso, pfA
-- prefB(128(pB))
-- movq pfA, incAm
-- addq pA5, pfA
-- addq $176-NB14so, incAm
-+ movq stM, pfA
-+ imulq $NBso, pfA
-+ prefB(128(pB))
-+ movq pfA, incAm
-+ addq pA5, pfA
-+ addq $176-NB14so, incAm
- #else
-- movq $MBKBso, pfA
-- addq pA5, pfA
-- prefB(128(pB))
-+ movq $MBKBso, pfA
-+ addq pA5, pfA
-+ prefB(128(pB))
- #endif
- /*
- * convert ldc to 64 bits, and then set incCn = (ldc - MB)*sizeof
- */
-- movl 24(%rsp), %eax
-- cltq
-- movq %rax, incCn
-- subq stM, incCn
-- addq $14, incCn
-+ movl 24(%rsp), %eax
-+ cltq
-+ movq %rax, incCn
-+ subq stM, incCn
-+ addq $14, incCn
- #ifdef SREAL
-- shl $2, incCn
-+ shl $2, incCn
- #else
-- shl $3, incCn
-- prefC(128(pC))
-- prefC(192(pC))
-+ shl $3, incCn
-+ prefC(128(pC))
-+ prefC(192(pC))
- #endif
- /*
- * Find M/14 if MB is not set
- */
- #if MB == 0
-- cmp $84, stM
-- jne MB_LT84
--/* movq $84/14, stM */
-- movq $6, stM
-+ cmp $84, stM
-+ jne MB_LT84
-+/* movq $84/14, stM */
-+ movq $6, stM
- MBFOUND:
-- subq $1, stM
-- movq stM, stM0
-+ subq $1, stM
-+ movq stM, stM0
- #endif
-- addq $120, pA5
-- addq $120, pB0
-- movq $KB*4, ldab
-- movq $-KB*5*4, mldab5
-- movq $-KB*4, mldab
-- subq mldab5, pA5
-- lea KB*4(pA5, ldab,4), pA10
--/* movq $NB, stN */
-+ addq $120, pA5
-+ addq $120, pB0
-+ movq $KB*4, ldab
-+ movq $-KB*5*4, mldab5
-+ movq $-KB*4, mldab
-+ subq mldab5, pA5
-+ lea KB*4(pA5, ldab,4), pA10
-+/* movq $NB, stN */
-
- UNLOOP:
- #if MB == 0
-- movq stM0, stM
-- cmp $0, stM
-- je MLAST
-+ movq stM0, stM
-+ cmp $0, stM
-+ je MLAST
- #else
- #ifdef ATL_DivAns
-- movq $ATL_DivAns-1, stM
-+ movq $ATL_DivAns-1, stM
- #else
-- movq $MB/14-1, stM
-+ movq $MB/14-1, stM
- #endif
- #endif
- #if MB == 0 || MB > 14
-@@ -227,992 +234,992 @@ UMLOOP:
- /*
- * rC[0-13] = pC[0-13] * beta
- */
-- ALIGN16
-+ ALIGN16
- /*UKLOOP: */
- #ifdef BETA1
-- movaps 0-120(pA10,mldab5,2), rC0
-- movaps 0-120(pB0), rB0
-- mulps rB0, rC0
-- addss (pC), rC0
-- movaps 0-120(pA5, mldab,4), rC1
-- mulps rB0, rC1
-- addss CMUL(4)(pC), rC1
-- movaps 0-120(pA10, mldab,8), rC2
-- mulps rB0, rC2
-- addss CMUL(8)(pC), rC2
-- movaps 0-120(pA5, mldab,2), rC3
-- mulps rB0, rC3
-- addss CMUL(12)(pC), rC3
-- movaps 0-120(pA5, mldab), rC4
-- mulps rB0, rC4
-- addss CMUL(16)(pC), rC4
-- movaps 0-120(pA5), rC5
-- mulps rB0, rC5
-- addss CMUL(20)(pC), rC5
-- movaps 0-120(pA5, ldab), rC6
-- mulps rB0, rC6
-- addss CMUL(24)(pC), rC6
-- movaps 0-120(pA5, ldab,2), rC7
-- mulps rB0, rC7
-- addss CMUL(28)(pC), rC7
-- movaps 0-120(pA10, mldab,2), rC8
-- mulps rB0, rC8
-- addss CMUL(32)(pC), rC8
-- movaps 0-120(pA5,ldab,4), rC9
-- mulps rB0, rC9
-- addss CMUL(36)(pC), rC9
-- movaps 0-120(pA10), rC10
-- mulps rB0, rC10
-- addss CMUL(40)(pC), rC10
-- movaps 0-120(pA10,ldab), rC11
-- mulps rB0, rC11
-- addss CMUL(44)(pC), rC11
-- movaps 0-120(pA10,ldab,2), rC12
-- mulps rB0, rC12
-- addss CMUL(48)(pC), rC12
-- movaps 0-120(pA5,ldab,8), rC13
-- mulps rB0, rC13
-- addss CMUL(52)(pC), rC13
-+ movaps 0-120(pA10,mldab5,2), rC0
-+ movaps 0-120(pB0), rB0
-+ mulps rB0, rC0
-+ addss (pC), rC0
-+ movaps 0-120(pA5, mldab,4), rC1
-+ mulps rB0, rC1
-+ addss CMUL(4)(pC), rC1
-+ movaps 0-120(pA10, mldab,8), rC2
-+ mulps rB0, rC2
-+ addss CMUL(8)(pC), rC2
-+ movaps 0-120(pA5, mldab,2), rC3
-+ mulps rB0, rC3
-+ addss CMUL(12)(pC), rC3
-+ movaps 0-120(pA5, mldab), rC4
-+ mulps rB0, rC4
-+ addss CMUL(16)(pC), rC4
-+ movaps 0-120(pA5), rC5
-+ mulps rB0, rC5
-+ addss CMUL(20)(pC), rC5
-+ movaps 0-120(pA5, ldab), rC6
-+ mulps rB0, rC6
-+ addss CMUL(24)(pC), rC6
-+ movaps 0-120(pA5, ldab,2), rC7
-+ mulps rB0, rC7
-+ addss CMUL(28)(pC), rC7
-+ movaps 0-120(pA10, mldab,2), rC8
-+ mulps rB0, rC8
-+ addss CMUL(32)(pC), rC8
-+ movaps 0-120(pA5,ldab,4), rC9
-+ mulps rB0, rC9
-+ addss CMUL(36)(pC), rC9
-+ movaps 0-120(pA10), rC10
-+ mulps rB0, rC10
-+ addss CMUL(40)(pC), rC10
-+ movaps 0-120(pA10,ldab), rC11
-+ mulps rB0, rC11
-+ addss CMUL(44)(pC), rC11
-+ movaps 0-120(pA10,ldab,2), rC12
-+ mulps rB0, rC12
-+ addss CMUL(48)(pC), rC12
-+ movaps 0-120(pA5,ldab,8), rC13
-+ mulps rB0, rC13
-+ addss CMUL(52)(pC), rC13
- #else
-- movaps 0-120(pA10,mldab5,2), rC0
-- movaps 0-120(pB0), rC13
-- mulps rC13, rC0
-- movaps 0-120(pA5, mldab,4), rC1
-- mulps rC13, rC1
-- movaps 0-120(pA10, mldab,8), rC2
-- mulps rC13, rC2
-- movaps 0-120(pA5, mldab,2), rC3
-- mulps rC13, rC3
-- movaps 0-120(pA5, mldab), rC4
-- mulps rC13, rC4
-- movaps 0-120(pA5), rC5
-- mulps rC13, rC5
-- movaps 0-120(pA5, ldab), rC6
-- mulps rC13, rC6
-- movaps 0-120(pA5, ldab,2), rC7
-- mulps rC13, rC7
-- movaps 0-120(pA10, mldab,2), rC8
-- mulps rC13, rC8
-- movaps 0-120(pA5,ldab,4), rC9
-- mulps rC13, rC9
-- movaps 0-120(pA10), rC10
-- mulps rC13, rC10
-- movaps 0-120(pA10,ldab), rC11
-- mulps rC13, rC11
-- movaps 0-120(pA10,ldab,2), rC12
-- mulps rC13, rC12
-- mulps 0-120(pA5,ldab,8), rC13
-+ movaps 0-120(pA10,mldab5,2), rC0
-+ movaps 0-120(pB0), rC13
-+ mulps rC13, rC0
-+ movaps 0-120(pA5, mldab,4), rC1
-+ mulps rC13, rC1
-+ movaps 0-120(pA10, mldab,8), rC2
-+ mulps rC13, rC2
-+ movaps 0-120(pA5, mldab,2), rC3
-+ mulps rC13, rC3
-+ movaps 0-120(pA5, mldab), rC4
-+ mulps rC13, rC4
-+ movaps 0-120(pA5), rC5
-+ mulps rC13, rC5
-+ movaps 0-120(pA5, ldab), rC6
-+ mulps rC13, rC6
-+ movaps 0-120(pA5, ldab,2), rC7
-+ mulps rC13, rC7
-+ movaps 0-120(pA10, mldab,2), rC8
-+ mulps rC13, rC8
-+ movaps 0-120(pA5,ldab,4), rC9
-+ mulps rC13, rC9
-+ movaps 0-120(pA10), rC10
-+ mulps rC13, rC10
-+ movaps 0-120(pA10,ldab), rC11
-+ mulps rC13, rC11
-+ movaps 0-120(pA10,ldab,2), rC12
-+ mulps rC13, rC12
-+ mulps 0-120(pA5,ldab,8), rC13
- #endif
-
- #if KB > 4
-- movaps 16-120(pA10,mldab5,2), rA0
-- movaps 16-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 16-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 16-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 16-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 16-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 16-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 16-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 16-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 16-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 16-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 16-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 16-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 16-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 16-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 16-120(pA10,mldab5,2), rA0
-+ movaps 16-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 16-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 16-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 16-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 16-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 16-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 16-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 16-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 16-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 16-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 16-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 16-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 16-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 16-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 8
-- movaps 32-120(pA10,mldab5,2), rA0
-- movaps 32-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 32-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 32-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 32-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 32-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 32-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 32-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 32-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 32-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 32-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 32-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 32-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 32-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 32-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 32-120(pA10,mldab5,2), rA0
-+ movaps 32-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 32-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 32-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 32-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 32-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 32-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 32-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 32-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 32-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 32-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 32-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 32-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 32-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 32-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 12
-- movaps 48-120(pA10,mldab5,2), rA0
-- movaps 48-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 48-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 48-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 48-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 48-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 48-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 48-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 48-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 48-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 48-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 48-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 48-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 48-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 48-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 48-120(pA10,mldab5,2), rA0
-+ movaps 48-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 48-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 48-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 48-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 48-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 48-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 48-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 48-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 48-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 48-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 48-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 48-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 48-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 48-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 16
-- movaps 64-120(pA10,mldab5,2), rA0
-- movaps 64-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 64-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 64-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 64-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 64-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 64-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 64-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 64-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 64-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 64-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 64-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 64-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 64-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 64-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 64-120(pA10,mldab5,2), rA0
-+ movaps 64-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 64-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 64-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 64-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 64-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 64-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 64-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 64-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 64-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 64-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 64-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 64-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 64-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 64-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 20
-- movaps 80-120(pA10,mldab5,2), rA0
-- movaps 80-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 80-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 80-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 80-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 80-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 80-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 80-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 80-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 80-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 80-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 80-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 80-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 80-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 80-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 80-120(pA10,mldab5,2), rA0
-+ movaps 80-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 80-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 80-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 80-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 80-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 80-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 80-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 80-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 80-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 80-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 80-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 80-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 80-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 80-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 24
-- movaps 96-120(pA10,mldab5,2), rA0
-- movaps 96-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 96-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 96-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 96-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 96-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 96-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 96-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 96-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 96-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 96-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 96-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 96-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 96-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 96-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 96-120(pA10,mldab5,2), rA0
-+ movaps 96-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 96-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 96-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 96-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 96-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 96-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 96-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 96-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 96-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 96-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 96-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 96-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 96-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 96-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 28
-- movaps 112-120(pA10,mldab5,2), rA0
-- movaps 112-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 112-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 112-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 112-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 112-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 112-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 112-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 112-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 112-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 112-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 112-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 112-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 112-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 112-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 112-120(pA10,mldab5,2), rA0
-+ movaps 112-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 112-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 112-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 112-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 112-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 112-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 112-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 112-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 112-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 112-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 112-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 112-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 112-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 112-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
- #ifndef SREAL
-- pref2((pfA))
-- pref2(64(pfA))
-+ pref2((pfA))
-+ pref2(64(pfA))
- #endif
-
- #if KB > 32
-- movaps 128-120(pA10,mldab5,2), rA0
-- movaps 128-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 128-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 128-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 128-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 128-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 128-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 128-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 128-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 128-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 128-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 128-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 128-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 128-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 128-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 128-120(pA10,mldab5,2), rA0
-+ movaps 128-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 128-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 128-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 128-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 128-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 128-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 128-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 128-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 128-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 128-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 128-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 128-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 128-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 128-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 36
-- movaps 144-120(pA10,mldab5,2), rA0
-- movaps 144-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 144-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 144-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 144-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 144-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 144-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 144-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 144-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 144-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 144-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 144-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 144-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 144-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 144-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 144-120(pA10,mldab5,2), rA0
-+ movaps 144-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 144-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 144-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 144-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 144-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 144-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 144-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 144-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 144-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 144-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 144-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 144-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 144-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 144-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 40
-- movaps 160-120(pA10,mldab5,2), rA0
-- movaps 160-120(pB0), rB0
-- mulps rB0, rA0
-- addq $176, pB0
-- addps rA0, rC0
-- movaps 160-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 160-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 160-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 160-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 160-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 160-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 160-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 160-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 160-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 160-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 160-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 160-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addq $176, pA10
-- addps rA0, rC12
-- mulps 160-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-- addq $176, pA5
-+ movaps 160-120(pA10,mldab5,2), rA0
-+ movaps 160-120(pB0), rB0
-+ mulps rB0, rA0
-+ addq $176, pB0
-+ addps rA0, rC0
-+ movaps 160-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 160-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 160-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 160-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 160-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 160-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 160-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 160-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 160-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 160-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 160-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 160-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addq $176, pA10
-+ addps rA0, rC12
-+ mulps 160-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
-+ addq $176, pA5
- #else
-- addq $176, pB0
-- addq $176, pA10
-- addq $176, pA5
-+ addq $176, pB0
-+ addq $176, pA10
-+ addq $176, pA5
- #endif
-
- #if KB > 44
-- movaps 0-120(pA10,mldab5,2), rA0
-- movaps 0-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 0-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 0-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 0-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 0-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 0-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 0-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 0-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 0-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 0-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 0-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 0-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 0-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 0-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 0-120(pA10,mldab5,2), rA0
-+ movaps 0-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 0-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 0-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 0-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 0-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 0-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 0-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 0-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 0-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 0-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 0-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 0-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 0-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 0-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 48
-- movaps 16-120(pA10,mldab5,2), rA0
-- movaps 16-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 16-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 16-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 16-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 16-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 16-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 16-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 16-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 16-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 16-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 16-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 16-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 16-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 16-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 16-120(pA10,mldab5,2), rA0
-+ movaps 16-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 16-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 16-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 16-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 16-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 16-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 16-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 16-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 16-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 16-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 16-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 16-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 16-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 16-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 52
-- movaps 32-120(pA10,mldab5,2), rA0
-- movaps 32-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 32-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 32-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 32-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 32-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 32-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 32-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 32-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 32-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 32-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 32-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 32-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 32-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 32-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 32-120(pA10,mldab5,2), rA0
-+ movaps 32-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 32-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 32-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 32-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 32-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 32-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 32-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 32-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 32-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 32-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 32-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 32-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 32-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 32-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 56
-- movaps 48-120(pA10,mldab5,2), rA0
-- movaps 48-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 48-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 48-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 48-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 48-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 48-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 48-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 48-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 48-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 48-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 48-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 48-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 48-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 48-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 48-120(pA10,mldab5,2), rA0
-+ movaps 48-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 48-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 48-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 48-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 48-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 48-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 48-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 48-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 48-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 48-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 48-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 48-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 48-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 48-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 60
-- movaps 64-120(pA10,mldab5,2), rA0
-- movaps 64-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 64-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 64-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 64-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 64-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 64-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 64-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 64-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 64-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 64-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 64-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 64-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 64-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 64-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 64-120(pA10,mldab5,2), rA0
-+ movaps 64-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 64-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 64-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 64-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 64-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 64-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 64-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 64-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 64-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 64-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 64-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 64-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 64-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 64-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 64
-- movaps 80-120(pA10,mldab5,2), rA0
-- movaps 80-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 80-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 80-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 80-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 80-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 80-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 80-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 80-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 80-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 80-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 80-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 80-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 80-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 80-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 80-120(pA10,mldab5,2), rA0
-+ movaps 80-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 80-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 80-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 80-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 80-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 80-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 80-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 80-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 80-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 80-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 80-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 80-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 80-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 80-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 68
-- movaps 96-120(pA10,mldab5,2), rA0
-- movaps 96-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 96-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 96-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 96-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 96-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 96-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 96-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 96-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 96-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 96-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 96-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 96-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 96-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 96-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 96-120(pA10,mldab5,2), rA0
-+ movaps 96-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 96-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 96-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 96-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 96-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 96-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 96-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 96-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 96-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 96-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 96-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 96-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 96-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 96-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 72
-- movaps 112-120(pA10,mldab5,2), rA0
-- movaps 112-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 112-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 112-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 112-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 112-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 112-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 112-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 112-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 112-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 112-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 112-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 112-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 112-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 112-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 112-120(pA10,mldab5,2), rA0
-+ movaps 112-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 112-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 112-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 112-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 112-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 112-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 112-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 112-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 112-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 112-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 112-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 112-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 112-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 112-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 76
-- movaps 128-120(pA10,mldab5,2), rA0
-- movaps 128-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 128-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 128-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 128-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 128-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 128-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 128-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 128-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 128-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 128-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 128-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 128-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 128-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 128-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 128-120(pA10,mldab5,2), rA0
-+ movaps 128-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 128-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 128-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 128-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 128-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 128-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 128-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 128-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 128-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 128-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 128-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 128-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 128-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 128-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 80
-- movaps 144-120(pA10,mldab5,2), rA0
-- movaps 144-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 144-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 144-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 144-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 144-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 144-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 144-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 144-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 144-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 144-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 144-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 144-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 144-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 144-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 144-120(pA10,mldab5,2), rA0
-+ movaps 144-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 144-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 144-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 144-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 144-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 144-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 144-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 144-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 144-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 144-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 144-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 144-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 144-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 144-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- /*UKLOOP */
-@@ -1220,234 +1227,234 @@ UMLOOP:
- * Get these bastard things summed up correctly
- */
-
-- /* rC0 = c0a c0b c0c c0d */
-- /* rC1 = c1a c1b c1c c1d */
-- /* rC2 = c2a c2b c2c c2d */
-- /* rC3 = c3a c3b c3c c3d */
-+ /* rC0 = c0a c0b c0c c0d */
-+ /* rC1 = c1a c1b c1c c1d */
-+ /* rC2 = c2a c2b c2c c2d */
-+ /* rC3 = c3a c3b c3c c3d */
- /* */
-- movaps rC2, rB0 /* rB0 = c2a c2b c2c c2d */
-- prefC((pC))
-- prefC(64(pC))
-- movaps rC0, rA0 /* rA0 = c0a c0b c0c c0d */
-- unpckhps rC3, rB0 /* rB0 = c2c c3c c2d c3d */
-- unpckhps rC1, rA0 /* rA0 = c0c c1c c0d c1d */
-- unpcklps rC3, rC2 /* rC2 = c2a c3a c2b c3b */
-- movlhps rB0, rC3 /* rC3 = c3a c3b c2c c3c */
-- unpcklps rC1, rC0 /* rC0 = c0a c1a c0b c1b */
-- movhlps rA0, rC3 /* rC3 = c0d c1d c2c c3c */
-- movlhps rC2, rA0 /* rA0 = c0c c1c c2a c3a */
-- movhlps rC0, rB0 /* rB0 = c0b c1b c2d c3d */
-- addps rA0, rC3 /* rC3 = c0cd c1cd c2ac c3ac */
-- movlhps rC0, rC1 /* rC1 = c1a c1b c0a c1a */
-- movhlps rC1, rC2 /* rC2 = c0a c1a c2b c3b */
-- movaps rC4, rA0 /* rA0 = c4a c4b c4c c4d */
-- addps rB0, rC2 /* rC2 = c0ab c1ab c2bd c3bd */
-- movaps rC6, rB0 /* rB0 = c6a c6b c6c c6d */
-- addps rC2, rC3 /* rC3 = c0abcd c1abcd c2bdac c3bdac */
--
--
-- /* rC4 = c4a c4b c4c c4d */
-- /* rC5 = c5a c5b c5c c5d */
-- /* rC6 = c6a c6b c6c c6d */
-- /* rC7 = c7a c7b c7c c7d */
-- /* rC8 = c08a c08b c08c c08d */
-- /* rC9 = c09a c09b c09c c09d */
-- /* rC10 = c10a c10b c10c c10d */
-- /* rC11 = c11a c11b c11c c11d */
-- /* rC12 = c12a c12b c12c c12d */
-- /* rC13 = c13a c13b c13c c13d */
-+ movaps rC2, rB0 /* rB0 = c2a c2b c2c c2d */
-+ prefC((pC))
-+ prefC(64(pC))
-+ movaps rC0, rA0 /* rA0 = c0a c0b c0c c0d */
-+ unpckhps rC3, rB0 /* rB0 = c2c c3c c2d c3d */
-+ unpckhps rC1, rA0 /* rA0 = c0c c1c c0d c1d */
-+ unpcklps rC3, rC2 /* rC2 = c2a c3a c2b c3b */
-+ movlhps rB0, rC3 /* rC3 = c3a c3b c2c c3c */
-+ unpcklps rC1, rC0 /* rC0 = c0a c1a c0b c1b */
-+ movhlps rA0, rC3 /* rC3 = c0d c1d c2c c3c */
-+ movlhps rC2, rA0 /* rA0 = c0c c1c c2a c3a */
-+ movhlps rC0, rB0 /* rB0 = c0b c1b c2d c3d */
-+ addps rA0, rC3 /* rC3 = c0cd c1cd c2ac c3ac */
-+ movlhps rC0, rC1 /* rC1 = c1a c1b c0a c1a */
-+ movhlps rC1, rC2 /* rC2 = c0a c1a c2b c3b */
-+ movaps rC4, rA0 /* rA0 = c4a c4b c4c c4d */
-+ addps rB0, rC2 /* rC2 = c0ab c1ab c2bd c3bd */
-+ movaps rC6, rB0 /* rB0 = c6a c6b c6c c6d */
-+ addps rC2, rC3 /* rC3 = c0abcd c1abcd c2bdac c3bdac */
-+
-+
-+ /* rC4 = c4a c4b c4c c4d */
-+ /* rC5 = c5a c5b c5c c5d */
-+ /* rC6 = c6a c6b c6c c6d */
-+ /* rC7 = c7a c7b c7c c7d */
-+ /* rC8 = c08a c08b c08c c08d */
-+ /* rC9 = c09a c09b c09c c09d */
-+ /* rC10 = c10a c10b c10c c10d */
-+ /* rC11 = c11a c11b c11c c11d */
-+ /* rC12 = c12a c12b c12c c12d */
-+ /* rC13 = c13a c13b c13c c13d */
- /* */
-- movaps rC10, rC0 /* rC0 = c10a c10b c10c c10d */
-- prefC(128(pC))
-+ movaps rC10, rC0 /* rC0 = c10a c10b c10c c10d */
-+ prefC(128(pC))
- #ifdef SREAL
-- pref2((pfA))
-+ pref2((pfA))
- #else
-- prefC(192(pC))
-+ prefC(192(pC))
- #endif
-- movaps rC8 , rC1 /* rC1 = c08a c08b c08c c08d */
-- movaps rC12, rC2 /* rC2 = c12a c12b c12c c12d */
-- unpckhps rC7, rB0 /* rB0 = c6c c7c c6d c7d */
-- unpckhps rC5, rA0 /* rA0 = c4c c5c c4d c5d */
-- unpcklps rC7, rC6 /* rC6 = c6a c7a c6b c7b */
-- unpckhps rC11, rC0 /* rC0 = c10c c11c c10d c11d */
-- unpckhps rC9 , rC1 /* rC1 = c08c c09c c08d c09d */
-- movlhps rB0, rC7 /* rC7 = c7a c7b c6c c7c */
-- unpcklps rC5, rC4 /* rC4 = c4a c5a c4b c5b */
-- movhlps rA0, rC7 /* rC7 = c4d c5d c6c c7c */
-- movlhps rC6, rA0 /* rA0 = c4c c5c c6a c7a */
-- unpcklps rC11, rC10 /* rC10 = c10a c11a c10b c11b */
-- movhlps rC4, rB0 /* rB0 = c4b c5b c6d c7d */
-- movlhps rC0, rC11 /* rC11 = c11a c11b c10c c11c */
-- addps rA0, rC7 /* rC7 = c4cd c5cd c6ac c7ac */
-+ movaps rC8 , rC1 /* rC1 = c08a c08b c08c c08d */
-+ movaps rC12, rC2 /* rC2 = c12a c12b c12c c12d */
-+ unpckhps rC7, rB0 /* rB0 = c6c c7c c6d c7d */
-+ unpckhps rC5, rA0 /* rA0 = c4c c5c c4d c5d */
-+ unpcklps rC7, rC6 /* rC6 = c6a c7a c6b c7b */
-+ unpckhps rC11, rC0 /* rC0 = c10c c11c c10d c11d */
-+ unpckhps rC9 , rC1 /* rC1 = c08c c09c c08d c09d */
-+ movlhps rB0, rC7 /* rC7 = c7a c7b c6c c7c */
-+ unpcklps rC5, rC4 /* rC4 = c4a c5a c4b c5b */
-+ movhlps rA0, rC7 /* rC7 = c4d c5d c6c c7c */
-+ movlhps rC6, rA0 /* rA0 = c4c c5c c6a c7a */
-+ unpcklps rC11, rC10 /* rC10 = c10a c11a c10b c11b */
-+ movhlps rC4, rB0 /* rB0 = c4b c5b c6d c7d */
-+ movlhps rC0, rC11 /* rC11 = c11a c11b c10c c11c */
-+ addps rA0, rC7 /* rC7 = c4cd c5cd c6ac c7ac */
- #ifdef BETAX
- #ifdef SREAL
-- movups (pC), rA0
-- movlhps rC4, rC5 /* rC5 = c5a c5b c4a c5a */
-- movups 16(pC), rC4
-- unpcklps rC9 , rC8 /* rC8 = c08a c09a c08b c09b */
-- movhlps rC1, rC11 /* rC11 = c08d c09d c10c c11c */
-- movlhps rC10, rC1 /* rC1 = c08c c09c c10a c11a */
-- movhlps rC5, rC6 /* rC6 = c4a c5a c6b c7b */
-- movups 32(pC), rC5
-- movhlps rC8 , rC0 /* rC0 = c08b c09b c10d c11d */
-- unpcklps rC13, rC2 /* rC2 = c12a c13a c12b c13b */
-- addps rC1, rC11 /* rC11 = c08cd c09cd c10ac c11ac */
-- movlps 48(pC), rC1
-- addps rB0, rC6 /* rC6 = c4ab c5ab c6bd c7bd */
-- movlhps rC8 , rC9 /* rC9 = c09a c09b c08a c09a */
-- unpckhps rC13, rC12 /* rC12 = c12c c13c c12d c13d */
-- movhlps rC9 , rC10 /* rC10 = c08a c09a c10b c11b */
-- addps rC6, rC7 /* rC7 = c4abcd c5abcd c6bdac c7bdac */
-- pref2(64(pfA))
-- mulps BOF(%rsp), rA0
-- addps rC0, rC10 /* rC10 = c08ab c09ab c10bd c11bd */
-- mulps BOF(%rsp), rC4
-- addps rC2, rC12 /* rC12 = c12ac c13ac c12bd c13bd */
-- mulps BOF(%rsp), rC5
-- addps rC10, rC11 /* rC11 = c08abcd c09abcd c10bdac c11bdac */
-- mulps BOF(%rsp), rC1
-+ movups (pC), rA0
-+ movlhps rC4, rC5 /* rC5 = c5a c5b c4a c5a */
-+ movups 16(pC), rC4
-+ unpcklps rC9 , rC8 /* rC8 = c08a c09a c08b c09b */
-+ movhlps rC1, rC11 /* rC11 = c08d c09d c10c c11c */
-+ movlhps rC10, rC1 /* rC1 = c08c c09c c10a c11a */
-+ movhlps rC5, rC6 /* rC6 = c4a c5a c6b c7b */
-+ movups 32(pC), rC5
-+ movhlps rC8 , rC0 /* rC0 = c08b c09b c10d c11d */
-+ unpcklps rC13, rC2 /* rC2 = c12a c13a c12b c13b */
-+ addps rC1, rC11 /* rC11 = c08cd c09cd c10ac c11ac */
-+ movlps 48(pC), rC1
-+ addps rB0, rC6 /* rC6 = c4ab c5ab c6bd c7bd */
-+ movlhps rC8 , rC9 /* rC9 = c09a c09b c08a c09a */
-+ unpckhps rC13, rC12 /* rC12 = c12c c13c c12d c13d */
-+ movhlps rC9 , rC10 /* rC10 = c08a c09a c10b c11b */
-+ addps rC6, rC7 /* rC7 = c4abcd c5abcd c6bdac c7bdac */
-+ pref2(64(pfA))
-+ mulps BOF(%rsp), rA0
-+ addps rC0, rC10 /* rC10 = c08ab c09ab c10bd c11bd */
-+ mulps BOF(%rsp), rC4
-+ addps rC2, rC12 /* rC12 = c12ac c13ac c12bd c13bd */
-+ mulps BOF(%rsp), rC5
-+ addps rC10, rC11 /* rC11 = c08abcd c09abcd c10bdac c11bdac */
-+ mulps BOF(%rsp), rC1
-
- /* */
-
-- movhlps rC12, rC13 /* rC13 = c12bd c13bd X X */
-- addps rA0, rC3
-- addq $68, pfA
-- addps rC13, rC12 /* rC12 = c12abcd c13abcd X X */
-- addps rC4, rC7
-- addps rC5, rC11
-- addps rC1, rC12
-+ movhlps rC12, rC13 /* rC13 = c12bd c13bd X X */
-+ addps rA0, rC3
-+ addq $68, pfA
-+ addps rC13, rC12 /* rC12 = c12abcd c13abcd X X */
-+ addps rC4, rC7
-+ addps rC5, rC11
-+ addps rC1, rC12
- #else /* BETA = X, complex type */
-- movups (pC), rA0
-- movlhps rC4, rC5 /* rC5 = c5a c5b c4a c5a */
-- movups 16(pC), rC4
-- unpcklps rC9 , rC8 /* rC8 = c08a c09a c08b c09b */
-- shufps $0x88, rC4, rA0 /* rA0 = c0 c1 c2 c3 */
-- movups 32(pC), rC4 /* rC4 = c4 X c5 X */
-- movhlps rC1, rC11 /* rC11 = c08d c09d c10c c11c */
-- movlhps rC10, rC1 /* rC1 = c08c c09c c10a c11a */
-- movhlps rC5, rC6 /* rC6 = c4a c5a c6b c7b */
-- movups 48(pC), rC5 /* rC5 = c6 X c7 X */
-- movhlps rC8 , rC0 /* rC0 = c08b c09b c10d c11d */
-- unpcklps rC13, rC2 /* rC2 = c12a c13a c12b c13b */
-- addps rC1, rC11 /* rC11 = c08cd c09cd c10ac c11ac */
-- shufps $0x88, rC5, rC4 /* rC4 = c4 c5 c6 c7 */
-- movups 64(pC), rC5 /* rC5 = c8 X c9 X */
-- movups 80(pC), rC1 /* rC1 = c10 X c11 X */
-- addps rB0, rC6 /* rC6 = c4ab c5ab c6bd c7bd */
-- shufps $0x88, rC1, rC5 /* rC5 = c8 c9 c10 c11 */
-- movlhps rC8 , rC9 /* rC9 = c09a c09b c08a c09a */
-- movss 96(pC), rC1
-- unpckhps rC13, rC12 /* rC12 = c12c c13c c12d c13d */
-- movss 104(pC), rB0
-- movhlps rC9 , rC10 /* rC10 = c08a c09a c10b c11b */
-- unpcklps rB0, rC1
-- addps rC6, rC7 /* rC7 = c4abcd c5abcd c6bdac c7bdac */
-- prefC(256(pC))
-- mulps BOF(%rsp), rA0
-- addps rC0, rC10 /* rC10 = c08ab c09ab c10bd c11bd */
-- mulps BOF(%rsp), rC4
-- addps rC2, rC12 /* rC12 = c12ac c13ac c12bd c13bd */
-- mulps BOF(%rsp), rC5
-- addps rC10, rC11 /* rC11 = c08abcd c09abcd c10bdac c11bdac */
-- mulps BOF(%rsp), rC1
-+ movups (pC), rA0
-+ movlhps rC4, rC5 /* rC5 = c5a c5b c4a c5a */
-+ movups 16(pC), rC4
-+ unpcklps rC9 , rC8 /* rC8 = c08a c09a c08b c09b */
-+ shufps $0x88, rC4, rA0 /* rA0 = c0 c1 c2 c3 */
-+ movups 32(pC), rC4 /* rC4 = c4 X c5 X */
-+ movhlps rC1, rC11 /* rC11 = c08d c09d c10c c11c */
-+ movlhps rC10, rC1 /* rC1 = c08c c09c c10a c11a */
-+ movhlps rC5, rC6 /* rC6 = c4a c5a c6b c7b */
-+ movups 48(pC), rC5 /* rC5 = c6 X c7 X */
-+ movhlps rC8 , rC0 /* rC0 = c08b c09b c10d c11d */
-+ unpcklps rC13, rC2 /* rC2 = c12a c13a c12b c13b */
-+ addps rC1, rC11 /* rC11 = c08cd c09cd c10ac c11ac */
-+ shufps $0x88, rC5, rC4 /* rC4 = c4 c5 c6 c7 */
-+ movups 64(pC), rC5 /* rC5 = c8 X c9 X */
-+ movups 80(pC), rC1 /* rC1 = c10 X c11 X */
-+ addps rB0, rC6 /* rC6 = c4ab c5ab c6bd c7bd */
-+ shufps $0x88, rC1, rC5 /* rC5 = c8 c9 c10 c11 */
-+ movlhps rC8 , rC9 /* rC9 = c09a c09b c08a c09a */
-+ movss 96(pC), rC1
-+ unpckhps rC13, rC12 /* rC12 = c12c c13c c12d c13d */
-+ movss 104(pC), rB0
-+ movhlps rC9 , rC10 /* rC10 = c08a c09a c10b c11b */
-+ unpcklps rB0, rC1
-+ addps rC6, rC7 /* rC7 = c4abcd c5abcd c6bdac c7bdac */
-+ prefC(256(pC))
-+ mulps BOF(%rsp), rA0
-+ addps rC0, rC10 /* rC10 = c08ab c09ab c10bd c11bd */
-+ mulps BOF(%rsp), rC4
-+ addps rC2, rC12 /* rC12 = c12ac c13ac c12bd c13bd */
-+ mulps BOF(%rsp), rC5
-+ addps rC10, rC11 /* rC11 = c08abcd c09abcd c10bdac c11bdac */
-+ mulps BOF(%rsp), rC1
-
- /* */
-
-- movhlps rC12, rC13 /* rC13 = c12bd c13bd X X */
-- addps rA0, rC3
-- prefC(192(pC))
-- addq $68, pfA
-- addps rC13, rC12 /* rC12 = c12abcd c13abcd X X */
-- addps rC4, rC7
-- addps rC5, rC11
-- addps rC1, rC12
-+ movhlps rC12, rC13 /* rC13 = c12bd c13bd X X */
-+ addps rA0, rC3
-+ prefC(192(pC))
-+ addq $68, pfA
-+ addps rC13, rC12 /* rC12 = c12abcd c13abcd X X */
-+ addps rC4, rC7
-+ addps rC5, rC11
-+ addps rC1, rC12
- #endif
-
- #else
-- movlhps rC4, rC5 /* rC5 = c5a c5b c4a c5a */
-- unpcklps rC9 , rC8 /* rC8 = c08a c09a c08b c09b */
-- movhlps rC1, rC11 /* rC11 = c08d c09d c10c c11c */
-- movlhps rC10, rC1 /* rC1 = c08c c09c c10a c11a */
-- movhlps rC5, rC6 /* rC6 = c4a c5a c6b c7b */
-- movhlps rC8 , rC0 /* rC0 = c08b c09b c10d c11d */
-- unpcklps rC13, rC2 /* rC2 = c12a c13a c12b c13b */
-- addps rC1, rC11 /* rC11 = c08cd c09cd c10ac c11ac */
-- addps rB0, rC6 /* rC6 = c4ab c5ab c6bd c7bd */
-- movlhps rC8 , rC9 /* rC9 = c09a c09b c08a c09a */
-- unpckhps rC13, rC12 /* rC12 = c12c c13c c12d c13d */
-- movhlps rC9 , rC10 /* rC10 = c08a c09a c10b c11b */
-- addps rC6, rC7 /* rC7 = c4abcd c5abcd c6bdac c7bdac */
-+ movlhps rC4, rC5 /* rC5 = c5a c5b c4a c5a */
-+ unpcklps rC9 , rC8 /* rC8 = c08a c09a c08b c09b */
-+ movhlps rC1, rC11 /* rC11 = c08d c09d c10c c11c */
-+ movlhps rC10, rC1 /* rC1 = c08c c09c c10a c11a */
-+ movhlps rC5, rC6 /* rC6 = c4a c5a c6b c7b */
-+ movhlps rC8 , rC0 /* rC0 = c08b c09b c10d c11d */
-+ unpcklps rC13, rC2 /* rC2 = c12a c13a c12b c13b */
-+ addps rC1, rC11 /* rC11 = c08cd c09cd c10ac c11ac */
-+ addps rB0, rC6 /* rC6 = c4ab c5ab c6bd c7bd */
-+ movlhps rC8 , rC9 /* rC9 = c09a c09b c08a c09a */
-+ unpckhps rC13, rC12 /* rC12 = c12c c13c c12d c13d */
-+ movhlps rC9 , rC10 /* rC10 = c08a c09a c10b c11b */
-+ addps rC6, rC7 /* rC7 = c4abcd c5abcd c6bdac c7bdac */
- #ifdef SREAL
-- pref2(64(pfA))
-+ pref2(64(pfA))
- #else
-- prefC(256(pC))
-+ prefC(256(pC))
- #endif
-- addps rC0, rC10 /* rC10 = c08ab c09ab c10bd c11bd */
-- addps rC2, rC12 /* rC12 = c12ac c13ac c12bd c13bd */
-- addps rC10, rC11 /* rC11 = c08abcd c09abcd c10bdac c11bdac */
-+ addps rC0, rC10 /* rC10 = c08ab c09ab c10bd c11bd */
-+ addps rC2, rC12 /* rC12 = c12ac c13ac c12bd c13bd */
-+ addps rC10, rC11 /* rC11 = c08abcd c09abcd c10bdac c11bdac */
-
- /* */
-
-- movhlps rC12, rC13 /* rC13 = c12bd c13bd X X */
-+ movhlps rC12, rC13 /* rC13 = c12bd c13bd X X */
- #ifndef SREAL
-- prefC(192(pC))
-+ prefC(192(pC))
- #endif
-- addq $68, pfA
-- addps rC13, rC12 /* rC12 = c12abcd c13abcd X X */
-+ addq $68, pfA
-+ addps rC13, rC12 /* rC12 = c12abcd c13abcd X X */
-
- #endif
- /*
- * Write results back to C; pC += 14;
- */
- #ifdef SREAL
-- movups rC3, (pC)
-- movups rC7, 16(pC)
-- movups rC11, 32(pC)
-- movlps rC12, 48(pC)
-- addq $56, pC
-+ movups rC3, (pC)
-+ movups rC7, 16(pC)
-+ movups rC11, 32(pC)
-+ movlps rC12, 48(pC)
-+ addq $56, pC
- #else
-- movss rC3, (pC)
-- movss rC7, 32(pC)
-- movhlps rC3, rC0
-- movhlps rC7, rC6
-- movss rC0, 16(pC)
-- movss rC6, 48(pC)
-- shufps $0x55, rC3, rC3
-- shufps $0x55, rC7, rC7
-- movss rC3, 8(pC)
-- movss rC7, 40(pC)
-- shufps $0x55, rC0, rC0
-- shufps $0x55, rC6, rC6
-- movss rC0, 24(pC)
-- movss rC6, 56(pC)
--
-- movss rC11, 64(pC)
-- movhlps rC11, rC2
-- movss rC12, 96(pC)
-- movss rC2, 80(pC)
-- shufps $0x55, rC11, rC11
-- shufps $0x55, rC12, rC12
-- movss rC11, 72(pC)
-- shufps $0x55, rC2, rC2
-- movss rC12, 104(pC)
-- movss rC2, 88(pC)
-+ movss rC3, (pC)
-+ movss rC7, 32(pC)
-+ movhlps rC3, rC0
-+ movhlps rC7, rC6
-+ movss rC0, 16(pC)
-+ movss rC6, 48(pC)
-+ shufps $0x55, rC3, rC3
-+ shufps $0x55, rC7, rC7
-+ movss rC3, 8(pC)
-+ movss rC7, 40(pC)
-+ shufps $0x55, rC0, rC0
-+ shufps $0x55, rC6, rC6
-+ movss rC0, 24(pC)
-+ movss rC6, 56(pC)
-+
-+ movss rC11, 64(pC)
-+ movhlps rC11, rC2
-+ movss rC12, 96(pC)
-+ movss rC2, 80(pC)
-+ shufps $0x55, rC11, rC11
-+ shufps $0x55, rC12, rC12
-+ movss rC11, 72(pC)
-+ shufps $0x55, rC2, rC2
-+ movss rC12, 104(pC)
-+ movss rC2, 88(pC)
-
-- addq $112, pC
-+ addq $112, pC
- #endif
- /*
- * Write results back to C
- */
-- addq $NB14so-176, pA5
-- addq $NB14so-176, pA10
-- subq $176, pB0
-+ addq $NB14so-176, pA5
-+ addq $NB14so-176, pA10
-+ subq $176, pB0
- /*
- * pC += 14; pA += 14*NB; pB -= NB;
- */
- /*
- * while (pA != stM);
- */
-- subq $1, stM
-- jne UMLOOP
-+ subq $1, stM
-+ jne UMLOOP
- #endif
-
- /*
-@@ -1459,994 +1466,994 @@ MLAST:
- #endif
- /*UKLOOP: */
- #ifdef BETA1
-- movaps 0-120(pA10,mldab5,2), rC0
-- movaps 0-120(pB0), rB0
-- mulps rB0, rC0
-- addss (pC), rC0
-- movaps 0-120(pA5, mldab,4), rC1
-- mulps rB0, rC1
-- addss CMUL(4)(pC), rC1
-- movaps 0-120(pA10, mldab,8), rC2
-- mulps rB0, rC2
-- addss CMUL(8)(pC), rC2
-- movaps 0-120(pA5, mldab,2), rC3
-- mulps rB0, rC3
-- addss CMUL(12)(pC), rC3
-- movaps 0-120(pA5, mldab), rC4
-- mulps rB0, rC4
-- addss CMUL(16)(pC), rC4
-- movaps 0-120(pA5), rC5
-- mulps rB0, rC5
-- addss CMUL(20)(pC), rC5
-- movaps 0-120(pA5, ldab), rC6
-- mulps rB0, rC6
-- addss CMUL(24)(pC), rC6
-- movaps 0-120(pA5, ldab,2), rC7
-- mulps rB0, rC7
-- addss CMUL(28)(pC), rC7
-- movaps 0-120(pA10, mldab,2), rC8
-- mulps rB0, rC8
-- addss CMUL(32)(pC), rC8
-- movaps 0-120(pA5,ldab,4), rC9
-- mulps rB0, rC9
-- addss CMUL(36)(pC), rC9
-- movaps 0-120(pA10), rC10
-- mulps rB0, rC10
-- addss CMUL(40)(pC), rC10
-- movaps 0-120(pA10,ldab), rC11
-- mulps rB0, rC11
-- addss CMUL(44)(pC), rC11
-- movaps 0-120(pA10,ldab,2), rC12
-- mulps rB0, rC12
-- addss CMUL(48)(pC), rC12
-- movaps 0-120(pA5,ldab,8), rC13
-- mulps rB0, rC13
-- addss CMUL(52)(pC), rC13
-+ movaps 0-120(pA10,mldab5,2), rC0
-+ movaps 0-120(pB0), rB0
-+ mulps rB0, rC0
-+ addss (pC), rC0
-+ movaps 0-120(pA5, mldab,4), rC1
-+ mulps rB0, rC1
-+ addss CMUL(4)(pC), rC1
-+ movaps 0-120(pA10, mldab,8), rC2
-+ mulps rB0, rC2
-+ addss CMUL(8)(pC), rC2
-+ movaps 0-120(pA5, mldab,2), rC3
-+ mulps rB0, rC3
-+ addss CMUL(12)(pC), rC3
-+ movaps 0-120(pA5, mldab), rC4
-+ mulps rB0, rC4
-+ addss CMUL(16)(pC), rC4
-+ movaps 0-120(pA5), rC5
-+ mulps rB0, rC5
-+ addss CMUL(20)(pC), rC5
-+ movaps 0-120(pA5, ldab), rC6
-+ mulps rB0, rC6
-+ addss CMUL(24)(pC), rC6
-+ movaps 0-120(pA5, ldab,2), rC7
-+ mulps rB0, rC7
-+ addss CMUL(28)(pC), rC7
-+ movaps 0-120(pA10, mldab,2), rC8
-+ mulps rB0, rC8
-+ addss CMUL(32)(pC), rC8
-+ movaps 0-120(pA5,ldab,4), rC9
-+ mulps rB0, rC9
-+ addss CMUL(36)(pC), rC9
-+ movaps 0-120(pA10), rC10
-+ mulps rB0, rC10
-+ addss CMUL(40)(pC), rC10
-+ movaps 0-120(pA10,ldab), rC11
-+ mulps rB0, rC11
-+ addss CMUL(44)(pC), rC11
-+ movaps 0-120(pA10,ldab,2), rC12
-+ mulps rB0, rC12
-+ addss CMUL(48)(pC), rC12
-+ movaps 0-120(pA5,ldab,8), rC13
-+ mulps rB0, rC13
-+ addss CMUL(52)(pC), rC13
- #else
-- movaps 0-120(pA10,mldab5,2), rC0
-- movaps 0-120(pB0), rC13
-- mulps rC13, rC0
-- movaps 0-120(pA5, mldab,4), rC1
-- mulps rC13, rC1
-- movaps 0-120(pA10, mldab,8), rC2
-- mulps rC13, rC2
-- movaps 0-120(pA5, mldab,2), rC3
-- mulps rC13, rC3
-- movaps 0-120(pA5, mldab), rC4
-- mulps rC13, rC4
-- movaps 0-120(pA5), rC5
-- mulps rC13, rC5
-- movaps 0-120(pA5, ldab), rC6
-- mulps rC13, rC6
-- movaps 0-120(pA5, ldab,2), rC7
-- mulps rC13, rC7
-- movaps 0-120(pA10, mldab,2), rC8
-- mulps rC13, rC8
-- movaps 0-120(pA5,ldab,4), rC9
-- mulps rC13, rC9
-- movaps 0-120(pA10), rC10
-- mulps rC13, rC10
-- movaps 0-120(pA10,ldab), rC11
-- mulps rC13, rC11
-- movaps 0-120(pA10,ldab,2), rC12
-- mulps rC13, rC12
-- mulps 0-120(pA5,ldab,8), rC13
-+ movaps 0-120(pA10,mldab5,2), rC0
-+ movaps 0-120(pB0), rC13
-+ mulps rC13, rC0
-+ movaps 0-120(pA5, mldab,4), rC1
-+ mulps rC13, rC1
-+ movaps 0-120(pA10, mldab,8), rC2
-+ mulps rC13, rC2
-+ movaps 0-120(pA5, mldab,2), rC3
-+ mulps rC13, rC3
-+ movaps 0-120(pA5, mldab), rC4
-+ mulps rC13, rC4
-+ movaps 0-120(pA5), rC5
-+ mulps rC13, rC5
-+ movaps 0-120(pA5, ldab), rC6
-+ mulps rC13, rC6
-+ movaps 0-120(pA5, ldab,2), rC7
-+ mulps rC13, rC7
-+ movaps 0-120(pA10, mldab,2), rC8
-+ mulps rC13, rC8
-+ movaps 0-120(pA5,ldab,4), rC9
-+ mulps rC13, rC9
-+ movaps 0-120(pA10), rC10
-+ mulps rC13, rC10
-+ movaps 0-120(pA10,ldab), rC11
-+ mulps rC13, rC11
-+ movaps 0-120(pA10,ldab,2), rC12
-+ mulps rC13, rC12
-+ mulps 0-120(pA5,ldab,8), rC13
- #endif
-
- #if KB > 4
-- movaps 16-120(pA10,mldab5,2), rA0
-- movaps 16-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 16-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 16-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 16-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 16-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 16-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 16-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 16-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 16-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 16-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 16-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 16-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 16-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 16-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 16-120(pA10,mldab5,2), rA0
-+ movaps 16-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 16-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 16-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 16-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 16-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 16-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 16-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 16-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 16-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 16-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 16-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 16-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 16-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 16-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 8
-- movaps 32-120(pA10,mldab5,2), rA0
-- movaps 32-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 32-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 32-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 32-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 32-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 32-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 32-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 32-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 32-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 32-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 32-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 32-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 32-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 32-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 32-120(pA10,mldab5,2), rA0
-+ movaps 32-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 32-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 32-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 32-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 32-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 32-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 32-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 32-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 32-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 32-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 32-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 32-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 32-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 32-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 12
-- movaps 48-120(pA10,mldab5,2), rA0
-- movaps 48-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 48-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 48-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 48-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 48-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 48-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 48-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 48-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 48-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 48-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 48-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 48-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 48-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 48-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 48-120(pA10,mldab5,2), rA0
-+ movaps 48-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 48-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 48-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 48-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 48-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 48-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 48-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 48-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 48-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 48-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 48-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 48-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 48-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 48-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 16
-- movaps 64-120(pA10,mldab5,2), rA0
-- movaps 64-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 64-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 64-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 64-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 64-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 64-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 64-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 64-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 64-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 64-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 64-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 64-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 64-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 64-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 64-120(pA10,mldab5,2), rA0
-+ movaps 64-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 64-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 64-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 64-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 64-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 64-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 64-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 64-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 64-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 64-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 64-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 64-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 64-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 64-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 20
-- movaps 80-120(pA10,mldab5,2), rA0
-- movaps 80-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 80-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 80-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 80-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 80-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 80-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 80-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 80-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 80-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 80-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 80-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 80-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 80-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 80-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 80-120(pA10,mldab5,2), rA0
-+ movaps 80-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 80-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 80-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 80-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 80-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 80-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 80-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 80-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 80-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 80-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 80-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 80-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 80-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 80-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 24
-- movaps 96-120(pA10,mldab5,2), rA0
-- movaps 96-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 96-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 96-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 96-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 96-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 96-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 96-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 96-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 96-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 96-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 96-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 96-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 96-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 96-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 96-120(pA10,mldab5,2), rA0
-+ movaps 96-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 96-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 96-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 96-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 96-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 96-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 96-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 96-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 96-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 96-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 96-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 96-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 96-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 96-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 28
-- movaps 112-120(pA10,mldab5,2), rA0
-- movaps 112-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 112-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 112-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 112-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 112-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 112-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 112-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 112-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 112-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 112-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 112-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 112-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 112-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 112-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 112-120(pA10,mldab5,2), rA0
-+ movaps 112-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 112-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 112-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 112-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 112-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 112-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 112-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 112-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 112-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 112-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 112-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 112-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 112-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 112-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 32
-- movaps 128-120(pA10,mldab5,2), rA0
-- movaps 128-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 128-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 128-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 128-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 128-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 128-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 128-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 128-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 128-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 128-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 128-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 128-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 128-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 128-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 128-120(pA10,mldab5,2), rA0
-+ movaps 128-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 128-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 128-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 128-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 128-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 128-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 128-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 128-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 128-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 128-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 128-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 128-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 128-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 128-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 36
-- movaps 144-120(pA10,mldab5,2), rA0
-- movaps 144-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 144-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 144-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 144-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 144-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 144-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 144-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 144-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 144-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 144-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 144-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 144-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 144-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 144-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 144-120(pA10,mldab5,2), rA0
-+ movaps 144-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 144-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 144-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 144-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 144-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 144-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 144-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 144-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 144-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 144-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 144-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 144-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 144-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 144-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-- prefB((pB,ldab))
-- prefB(64(pB,ldab))
-+ prefB((pB,ldab))
-+ prefB(64(pB,ldab))
-
- #if KB > 40
-- movaps 160-120(pA10,mldab5,2), rA0
-- movaps 160-120(pB0), rB0
-- mulps rB0, rA0
-- addq $176, pB0
-- addps rA0, rC0
-- movaps 160-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 160-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 160-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 160-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 160-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 160-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 160-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 160-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 160-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 160-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 160-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 160-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addq $176, pA10
-- addps rA0, rC12
-- mulps 160-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-- addq $176, pA5
-+ movaps 160-120(pA10,mldab5,2), rA0
-+ movaps 160-120(pB0), rB0
-+ mulps rB0, rA0
-+ addq $176, pB0
-+ addps rA0, rC0
-+ movaps 160-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 160-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 160-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 160-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 160-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 160-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 160-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 160-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 160-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 160-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 160-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 160-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addq $176, pA10
-+ addps rA0, rC12
-+ mulps 160-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
-+ addq $176, pA5
- #else
-- addq $176, pB0
-- addq $176, pA10
-- addq $176, pA5
-+ addq $176, pB0
-+ addq $176, pA10
-+ addq $176, pA5
- #endif
-
- #if KB > 44
-- movaps 0-120(pA10,mldab5,2), rA0
-- movaps 0-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 0-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 0-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 0-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 0-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 0-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 0-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 0-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 0-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 0-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 0-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 0-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 0-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 0-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 0-120(pA10,mldab5,2), rA0
-+ movaps 0-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 0-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 0-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 0-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 0-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 0-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 0-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 0-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 0-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 0-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 0-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 0-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 0-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 0-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 48
-- movaps 16-120(pA10,mldab5,2), rA0
-- movaps 16-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 16-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 16-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 16-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 16-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 16-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 16-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 16-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 16-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 16-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 16-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 16-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 16-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 16-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 16-120(pA10,mldab5,2), rA0
-+ movaps 16-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 16-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 16-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 16-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 16-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 16-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 16-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 16-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 16-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 16-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 16-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 16-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 16-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 16-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 52
-- movaps 32-120(pA10,mldab5,2), rA0
-- movaps 32-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 32-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 32-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 32-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 32-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 32-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 32-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 32-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 32-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 32-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 32-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 32-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 32-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 32-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 32-120(pA10,mldab5,2), rA0
-+ movaps 32-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 32-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 32-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 32-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 32-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 32-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 32-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 32-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 32-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 32-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 32-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 32-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 32-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 32-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 56
-- movaps 48-120(pA10,mldab5,2), rA0
-- movaps 48-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 48-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 48-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 48-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 48-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 48-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 48-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 48-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 48-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 48-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 48-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 48-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 48-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 48-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 48-120(pA10,mldab5,2), rA0
-+ movaps 48-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 48-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 48-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 48-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 48-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 48-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 48-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 48-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 48-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 48-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 48-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 48-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 48-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 48-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 60
-- movaps 64-120(pA10,mldab5,2), rA0
-- movaps 64-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 64-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 64-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 64-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 64-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 64-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 64-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 64-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 64-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 64-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 64-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 64-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 64-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 64-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 64-120(pA10,mldab5,2), rA0
-+ movaps 64-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 64-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 64-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 64-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 64-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 64-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 64-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 64-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 64-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 64-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 64-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 64-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 64-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 64-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-- prefB(128-176(pB,ldab))
-- prefB(192-176(pB,ldab))
-+ prefB(128-176(pB,ldab))
-+ prefB(192-176(pB,ldab))
-
- #if KB > 64
-- movaps 80-120(pA10,mldab5,2), rA0
-- movaps 80-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 80-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 80-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 80-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 80-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 80-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 80-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 80-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 80-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 80-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 80-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 80-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 80-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 80-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 80-120(pA10,mldab5,2), rA0
-+ movaps 80-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 80-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 80-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 80-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 80-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 80-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 80-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 80-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 80-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 80-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 80-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 80-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 80-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 80-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 68
-- movaps 96-120(pA10,mldab5,2), rA0
-- movaps 96-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 96-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 96-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 96-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 96-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 96-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 96-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 96-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 96-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 96-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 96-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 96-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 96-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 96-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 96-120(pA10,mldab5,2), rA0
-+ movaps 96-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 96-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 96-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 96-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 96-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 96-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 96-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 96-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 96-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 96-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 96-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 96-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 96-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 96-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 72
-- movaps 112-120(pA10,mldab5,2), rA0
-- movaps 112-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 112-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 112-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 112-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 112-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 112-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 112-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 112-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 112-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 112-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 112-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 112-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 112-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 112-120(pA5,ldab,8), rB0
-- prefC((pC))
-- prefC((pC,incCn))
-- addps rB0, rC13
-+ movaps 112-120(pA10,mldab5,2), rA0
-+ movaps 112-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 112-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 112-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 112-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 112-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 112-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 112-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 112-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 112-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 112-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 112-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 112-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 112-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 112-120(pA5,ldab,8), rB0
-+ prefC((pC))
-+ prefC((pC,incCn))
-+ addps rB0, rC13
- #else
-- prefC((pC))
-- prefC((pC,incCn))
-+ prefC((pC))
-+ prefC((pC,incCn))
- #endif
-
- #if KB > 76
-- movaps 128-120(pA10,mldab5,2), rA0
-- movaps 128-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 128-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 128-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 128-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 128-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 128-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 128-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 128-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 128-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 128-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 128-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 128-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 128-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 128-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 128-120(pA10,mldab5,2), rA0
-+ movaps 128-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 128-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 128-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 128-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 128-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 128-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 128-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 128-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 128-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 128-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 128-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 128-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 128-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 128-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- #if KB > 80
-- movaps 144-120(pA10,mldab5,2), rA0
-- movaps 144-120(pB0), rB0
-- mulps rB0, rA0
-- addps rA0, rC0
-- movaps 144-120(pA5, mldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC1
-- movaps 144-120(pA10, mldab,8), rA0
-- mulps rB0, rA0
-- addps rA0, rC2
-- movaps 144-120(pA5, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC3
-- movaps 144-120(pA5, mldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC4
-- movaps 144-120(pA5), rA0
-- mulps rB0, rA0
-- addps rA0, rC5
-- movaps 144-120(pA5, ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC6
-- movaps 144-120(pA5, ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC7
-- movaps 144-120(pA10, mldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC8
-- movaps 144-120(pA5,ldab,4), rA0
-- mulps rB0, rA0
-- addps rA0, rC9
-- movaps 144-120(pA10), rA0
-- mulps rB0, rA0
-- addps rA0, rC10
-- movaps 144-120(pA10,ldab), rA0
-- mulps rB0, rA0
-- addps rA0, rC11
-- movaps 144-120(pA10,ldab,2), rA0
-- mulps rB0, rA0
-- addps rA0, rC12
-- mulps 144-120(pA5,ldab,8), rB0
-- addps rB0, rC13
-+ movaps 144-120(pA10,mldab5,2), rA0
-+ movaps 144-120(pB0), rB0
-+ mulps rB0, rA0
-+ addps rA0, rC0
-+ movaps 144-120(pA5, mldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC1
-+ movaps 144-120(pA10, mldab,8), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC2
-+ movaps 144-120(pA5, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC3
-+ movaps 144-120(pA5, mldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC4
-+ movaps 144-120(pA5), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC5
-+ movaps 144-120(pA5, ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC6
-+ movaps 144-120(pA5, ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC7
-+ movaps 144-120(pA10, mldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC8
-+ movaps 144-120(pA5,ldab,4), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC9
-+ movaps 144-120(pA10), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC10
-+ movaps 144-120(pA10,ldab), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC11
-+ movaps 144-120(pA10,ldab,2), rA0
-+ mulps rB0, rA0
-+ addps rA0, rC12
-+ mulps 144-120(pA5,ldab,8), rB0
-+ addps rB0, rC13
- #endif
-
- /*UKLOOP */
-@@ -2454,202 +2461,202 @@ MLAST:
- * Get these bastard things summed up correctly
- */
-
-- /* rC0 = c0a c0b c0c c0d */
-- /* rC1 = c1a c1b c1c c1d */
-- /* rC2 = c2a c2b c2c c2d */
-- /* rC3 = c3a c3b c3c c3d */
-+ /* rC0 = c0a c0b c0c c0d */
-+ /* rC1 = c1a c1b c1c c1d */
-+ /* rC2 = c2a c2b c2c c2d */
-+ /* rC3 = c3a c3b c3c c3d */
- /* */
-- movaps rC2, rB0 /* rB0 = c2a c2b c2c c2d */
-- prefC(64(pC,incCn))
-- prefB(256-176(pB,ldab))
-- movaps rC0, rA0 /* rA0 = c0a c0b c0c c0d */
-- unpckhps rC3, rB0 /* rB0 = c2c c3c c2d c3d */
-- unpckhps rC1, rA0 /* rA0 = c0c c1c c0d c1d */
-- unpcklps rC3, rC2 /* rC2 = c2a c3a c2b c3b */
-- movlhps rB0, rC3 /* rC3 = c3a c3b c2c c3c */
-- unpcklps rC1, rC0 /* rC0 = c0a c1a c0b c1b */
-- movhlps rA0, rC3 /* rC3 = c0d c1d c2c c3c */
-- movlhps rC2, rA0 /* rA0 = c0c c1c c2a c3a */
-- movhlps rC0, rB0 /* rB0 = c0b c1b c2d c3d */
-- addps rA0, rC3 /* rC3 = c0cd c1cd c2ac c3ac */
-- movlhps rC0, rC1 /* rC1 = c1a c1b c0a c1a */
-- movhlps rC1, rC2 /* rC2 = c0a c1a c2b c3b */
-- movaps rC4, rA0 /* rA0 = c4a c4b c4c c4d */
-- addps rB0, rC2 /* rC2 = c0ab c1ab c2bd c3bd */
-- movaps rC6, rB0 /* rB0 = c6a c6b c6c c6d */
-- addps rC2, rC3 /* rC3 = c0abcd c1abcd c2bdac c3bdac */
--
--
-- /* rC4 = c4a c4b c4c c4d */
-- /* rC5 = c5a c5b c5c c5d */
-- /* rC6 = c6a c6b c6c c6d */
-- /* rC7 = c7a c7b c7c c7d */
-- /* rC8 = c08a c08b c08c c08d */
-- /* rC9 = c09a c09b c09c c09d */
-- /* rC10 = c10a c10b c10c c10d */
-- /* rC11 = c11a c11b c11c c11d */
-- /* rC12 = c12a c12b c12c c12d */
-- /* rC13 = c13a c13b c13c c13d */
-+ movaps rC2, rB0 /* rB0 = c2a c2b c2c c2d */
-+ prefC(64(pC,incCn))
-+ prefB(256-176(pB,ldab))
-+ movaps rC0, rA0 /* rA0 = c0a c0b c0c c0d */
-+ unpckhps rC3, rB0 /* rB0 = c2c c3c c2d c3d */
-+ unpckhps rC1, rA0 /* rA0 = c0c c1c c0d c1d */
-+ unpcklps rC3, rC2 /* rC2 = c2a c3a c2b c3b */
-+ movlhps rB0, rC3 /* rC3 = c3a c3b c2c c3c */
-+ unpcklps rC1, rC0 /* rC0 = c0a c1a c0b c1b */
-+ movhlps rA0, rC3 /* rC3 = c0d c1d c2c c3c */
-+ movlhps rC2, rA0 /* rA0 = c0c c1c c2a c3a */
-+ movhlps rC0, rB0 /* rB0 = c0b c1b c2d c3d */
-+ addps rA0, rC3 /* rC3 = c0cd c1cd c2ac c3ac */
-+ movlhps rC0, rC1 /* rC1 = c1a c1b c0a c1a */
-+ movhlps rC1, rC2 /* rC2 = c0a c1a c2b c3b */
-+ movaps rC4, rA0 /* rA0 = c4a c4b c4c c4d */
-+ addps rB0, rC2 /* rC2 = c0ab c1ab c2bd c3bd */
-+ movaps rC6, rB0 /* rB0 = c6a c6b c6c c6d */
-+ addps rC2, rC3 /* rC3 = c0abcd c1abcd c2bdac c3bdac */
-+
-+
-+ /* rC4 = c4a c4b c4c c4d */
-+ /* rC5 = c5a c5b c5c c5d */
-+ /* rC6 = c6a c6b c6c c6d */
-+ /* rC7 = c7a c7b c7c c7d */
-+ /* rC8 = c08a c08b c08c c08d */
-+ /* rC9 = c09a c09b c09c c09d */
-+ /* rC10 = c10a c10b c10c c10d */
-+ /* rC11 = c11a c11b c11c c11d */
-+ /* rC12 = c12a c12b c12c c12d */
-+ /* rC13 = c13a c13b c13c c13d */
- /* */
-- movaps rC10, rC0 /* rC0 = c10a c10b c10c c10d */
-- movaps rC8 , rC1 /* rC1 = c08a c08b c08c c08d */
-- movaps rC12, rC2 /* rC2 = c12a c12b c12c c12d */
-- unpckhps rC7, rB0 /* rB0 = c6c c7c c6d c7d */
-- unpckhps rC5, rA0 /* rA0 = c4c c5c c4d c5d */
-- unpcklps rC7, rC6 /* rC6 = c6a c7a c6b c7b */
-- unpckhps rC11, rC0 /* rC0 = c10c c11c c10d c11d */
-- unpckhps rC9 , rC1 /* rC1 = c08c c09c c08d c09d */
-- movlhps rB0, rC7 /* rC7 = c7a c7b c6c c7c */
-- unpcklps rC5, rC4 /* rC4 = c4a c5a c4b c5b */
-- movhlps rA0, rC7 /* rC7 = c4d c5d c6c c7c */
-- movlhps rC6, rA0 /* rA0 = c4c c5c c6a c7a */
-- unpcklps rC11, rC10 /* rC10 = c10a c11a c10b c11b */
-- movhlps rC4, rB0 /* rB0 = c4b c5b c6d c7d */
-- movlhps rC0, rC11 /* rC11 = c11a c11b c10c c11c */
-- addps rA0, rC7 /* rC7 = c4cd c5cd c6ac c7ac */
-+ movaps rC10, rC0 /* rC0 = c10a c10b c10c c10d */
-+ movaps rC8 , rC1 /* rC1 = c08a c08b c08c c08d */
-+ movaps rC12, rC2 /* rC2 = c12a c12b c12c c12d */
-+ unpckhps rC7, rB0 /* rB0 = c6c c7c c6d c7d */
-+ unpckhps rC5, rA0 /* rA0 = c4c c5c c4d c5d */
-+ unpcklps rC7, rC6 /* rC6 = c6a c7a c6b c7b */
-+ unpckhps rC11, rC0 /* rC0 = c10c c11c c10d c11d */
-+ unpckhps rC9 , rC1 /* rC1 = c08c c09c c08d c09d */
-+ movlhps rB0, rC7 /* rC7 = c7a c7b c6c c7c */
-+ unpcklps rC5, rC4 /* rC4 = c4a c5a c4b c5b */
-+ movhlps rA0, rC7 /* rC7 = c4d c5d c6c c7c */
-+ movlhps rC6, rA0 /* rA0 = c4c c5c c6a c7a */
-+ unpcklps rC11, rC10 /* rC10 = c10a c11a c10b c11b */
-+ movhlps rC4, rB0 /* rB0 = c4b c5b c6d c7d */
-+ movlhps rC0, rC11 /* rC11 = c11a c11b c10c c11c */
-+ addps rA0, rC7 /* rC7 = c4cd c5cd c6ac c7ac */
- #ifdef BETAX
- #ifdef SREAL
-- movups (pC), rA0
-- movlhps rC4, rC5 /* rC5 = c5a c5b c4a c5a */
-- movups 16(pC), rC4
-- unpcklps rC9 , rC8 /* rC8 = c08a c09a c08b c09b */
-- movhlps rC1, rC11 /* rC11 = c08d c09d c10c c11c */
-- movlhps rC10, rC1 /* rC1 = c08c c09c c10a c11a */
-- movhlps rC5, rC6 /* rC6 = c4a c5a c6b c7b */
-- movups 32(pC), rC5
-- movhlps rC8 , rC0 /* rC0 = c08b c09b c10d c11d */
-- unpcklps rC13, rC2 /* rC2 = c12a c13a c12b c13b */
-- addps rC1, rC11 /* rC11 = c08cd c09cd c10ac c11ac */
-- movlps 48(pC), rC1
-- addps rB0, rC6 /* rC6 = c4ab c5ab c6bd c7bd */
-- movlhps rC8 , rC9 /* rC9 = c09a c09b c08a c09a */
-- unpckhps rC13, rC12 /* rC12 = c12c c13c c12d c13d */
-- movhlps rC9 , rC10 /* rC10 = c08a c09a c10b c11b */
-- addps rC6, rC7 /* rC7 = c4abcd c5abcd c6bdac c7bdac */
-- mulps BOF(%rsp), rA0
-- addps rC0, rC10 /* rC10 = c08ab c09ab c10bd c11bd */
-- mulps BOF(%rsp), rC4
-- addps rC2, rC12 /* rC12 = c12ac c13ac c12bd c13bd */
-- mulps BOF(%rsp), rC5
-- addps rC10, rC11 /* rC11 = c08abcd c09abcd c10bdac c11bdac */
-- mulps BOF(%rsp), rC1
-+ movups (pC), rA0
-+ movlhps rC4, rC5 /* rC5 = c5a c5b c4a c5a */
-+ movups 16(pC), rC4
-+ unpcklps rC9 , rC8 /* rC8 = c08a c09a c08b c09b */
-+ movhlps rC1, rC11 /* rC11 = c08d c09d c10c c11c */
-+ movlhps rC10, rC1 /* rC1 = c08c c09c c10a c11a */
-+ movhlps rC5, rC6 /* rC6 = c4a c5a c6b c7b */
-+ movups 32(pC), rC5
-+ movhlps rC8 , rC0 /* rC0 = c08b c09b c10d c11d */
-+ unpcklps rC13, rC2 /* rC2 = c12a c13a c12b c13b */
-+ addps rC1, rC11 /* rC11 = c08cd c09cd c10ac c11ac */
-+ movlps 48(pC), rC1
-+ addps rB0, rC6 /* rC6 = c4ab c5ab c6bd c7bd */
-+ movlhps rC8 , rC9 /* rC9 = c09a c09b c08a c09a */
-+ unpckhps rC13, rC12 /* rC12 = c12c c13c c12d c13d */
-+ movhlps rC9 , rC10 /* rC10 = c08a c09a c10b c11b */
-+ addps rC6, rC7 /* rC7 = c4abcd c5abcd c6bdac c7bdac */
-+ mulps BOF(%rsp), rA0
-+ addps rC0, rC10 /* rC10 = c08ab c09ab c10bd c11bd */
-+ mulps BOF(%rsp), rC4
-+ addps rC2, rC12 /* rC12 = c12ac c13ac c12bd c13bd */
-+ mulps BOF(%rsp), rC5
-+ addps rC10, rC11 /* rC11 = c08abcd c09abcd c10bdac c11bdac */
-+ mulps BOF(%rsp), rC1
-
- /* */
-
-- movhlps rC12, rC13 /* rC13 = c12bd c13bd X X */
-- addps rA0, rC3
-- addps rC13, rC12 /* rC12 = c12abcd c13abcd X X */
-- addps rC4, rC7
-- addps rC5, rC11
-- prefB(320-176(pB,ldab))
-- addps rC1, rC12
-+ movhlps rC12, rC13 /* rC13 = c12bd c13bd X X */
-+ addps rA0, rC3
-+ addps rC13, rC12 /* rC12 = c12abcd c13abcd X X */
-+ addps rC4, rC7
-+ addps rC5, rC11
-+ prefB(320-176(pB,ldab))
-+ addps rC1, rC12
- #else /* BETA = X, complex type */
-- movups (pC), rA0
-- movlhps rC4, rC5 /* rC5 = c5a c5b c4a c5a */
-- movups 16(pC), rC4
-- unpcklps rC9 , rC8 /* rC8 = c08a c09a c08b c09b */
-- shufps $0x88, rC4, rA0 /* rA0 = c0 c1 c2 c3 */
-- movups 32(pC), rC4 /* rC4 = c4 X c5 X */
-- movhlps rC1, rC11 /* rC11 = c08d c09d c10c c11c */
-- movlhps rC10, rC1 /* rC1 = c08c c09c c10a c11a */
-- movhlps rC5, rC6 /* rC6 = c4a c5a c6b c7b */
-- movups 48(pC), rC5 /* rC5 = c6 X c7 X */
-- movhlps rC8 , rC0 /* rC0 = c08b c09b c10d c11d */
-- unpcklps rC13, rC2 /* rC2 = c12a c13a c12b c13b */
-- addps rC1, rC11 /* rC11 = c08cd c09cd c10ac c11ac */
-- shufps $0x88, rC5, rC4 /* rC4 = c4 c5 c6 c7 */
-- movups 64(pC), rC5 /* rC5 = c8 X c9 X */
-- movups 80(pC), rC1 /* rC1 = c10 X c11 X */
-- addps rB0, rC6 /* rC6 = c4ab c5ab c6bd c7bd */
-- shufps $0x88, rC1, rC5 /* rC5 = c8 c9 c10 c11 */
-- movlhps rC8 , rC9 /* rC9 = c09a c09b c08a c09a */
-- movss 96(pC), rC1
-- unpckhps rC13, rC12 /* rC12 = c12c c13c c12d c13d */
-- movss 104(pC), rB0
-- movhlps rC9 , rC10 /* rC10 = c08a c09a c10b c11b */
-- unpcklps rB0, rC1
-- addps rC6, rC7 /* rC7 = c4abcd c5abcd c6bdac c7bdac */
-- mulps BOF(%rsp), rA0
-- addps rC0, rC10 /* rC10 = c08ab c09ab c10bd c11bd */
-- mulps BOF(%rsp), rC4
-- addps rC2, rC12 /* rC12 = c12ac c13ac c12bd c13bd */
-- mulps BOF(%rsp), rC5
-- addps rC10, rC11 /* rC11 = c08abcd c09abcd c10bdac c11bdac */
-- mulps BOF(%rsp), rC1
-+ movups (pC), rA0
-+ movlhps rC4, rC5 /* rC5 = c5a c5b c4a c5a */
-+ movups 16(pC), rC4
-+ unpcklps rC9 , rC8 /* rC8 = c08a c09a c08b c09b */
-+ shufps $0x88, rC4, rA0 /* rA0 = c0 c1 c2 c3 */
-+ movups 32(pC), rC4 /* rC4 = c4 X c5 X */
-+ movhlps rC1, rC11 /* rC11 = c08d c09d c10c c11c */
-+ movlhps rC10, rC1 /* rC1 = c08c c09c c10a c11a */
-+ movhlps rC5, rC6 /* rC6 = c4a c5a c6b c7b */
-+ movups 48(pC), rC5 /* rC5 = c6 X c7 X */
-+ movhlps rC8 , rC0 /* rC0 = c08b c09b c10d c11d */
-+ unpcklps rC13, rC2 /* rC2 = c12a c13a c12b c13b */
-+ addps rC1, rC11 /* rC11 = c08cd c09cd c10ac c11ac */
-+ shufps $0x88, rC5, rC4 /* rC4 = c4 c5 c6 c7 */
-+ movups 64(pC), rC5 /* rC5 = c8 X c9 X */
-+ movups 80(pC), rC1 /* rC1 = c10 X c11 X */
-+ addps rB0, rC6 /* rC6 = c4ab c5ab c6bd c7bd */
-+ shufps $0x88, rC1, rC5 /* rC5 = c8 c9 c10 c11 */
-+ movlhps rC8 , rC9 /* rC9 = c09a c09b c08a c09a */
-+ movss 96(pC), rC1
-+ unpckhps rC13, rC12 /* rC12 = c12c c13c c12d c13d */
-+ movss 104(pC), rB0
-+ movhlps rC9 , rC10 /* rC10 = c08a c09a c10b c11b */
-+ unpcklps rB0, rC1
-+ addps rC6, rC7 /* rC7 = c4abcd c5abcd c6bdac c7bdac */
-+ mulps BOF(%rsp), rA0
-+ addps rC0, rC10 /* rC10 = c08ab c09ab c10bd c11bd */
-+ mulps BOF(%rsp), rC4
-+ addps rC2, rC12 /* rC12 = c12ac c13ac c12bd c13bd */
-+ mulps BOF(%rsp), rC5
-+ addps rC10, rC11 /* rC11 = c08abcd c09abcd c10bdac c11bdac */
-+ mulps BOF(%rsp), rC1
-
- /* */
-
-- movhlps rC12, rC13 /* rC13 = c12bd c13bd X X */
-- addps rA0, rC3
-- addps rC13, rC12 /* rC12 = c12abcd c13abcd X X */
-- addps rC4, rC7
-- addps rC5, rC11
-- prefB(320-176(pB,ldab))
-- addps rC1, rC12
-+ movhlps rC12, rC13 /* rC13 = c12bd c13bd X X */
-+ addps rA0, rC3
-+ addps rC13, rC12 /* rC12 = c12abcd c13abcd X X */
-+ addps rC4, rC7
-+ addps rC5, rC11
-+ prefB(320-176(pB,ldab))
-+ addps rC1, rC12
- #endif
-
- #else
-- movlhps rC4, rC5 /* rC5 = c5a c5b c4a c5a */
-- unpcklps rC9 , rC8 /* rC8 = c08a c09a c08b c09b */
-- movhlps rC1, rC11 /* rC11 = c08d c09d c10c c11c */
-- movlhps rC10, rC1 /* rC1 = c08c c09c c10a c11a */
-- movhlps rC5, rC6 /* rC6 = c4a c5a c6b c7b */
-- movhlps rC8 , rC0 /* rC0 = c08b c09b c10d c11d */
-- unpcklps rC13, rC2 /* rC2 = c12a c13a c12b c13b */
-- addps rC1, rC11 /* rC11 = c08cd c09cd c10ac c11ac */
-- addps rB0, rC6 /* rC6 = c4ab c5ab c6bd c7bd */
-- movlhps rC8 , rC9 /* rC9 = c09a c09b c08a c09a */
-- unpckhps rC13, rC12 /* rC12 = c12c c13c c12d c13d */
-- movhlps rC9 , rC10 /* rC10 = c08a c09a c10b c11b */
-- addps rC6, rC7 /* rC7 = c4abcd c5abcd c6bdac c7bdac */
-- addps rC0, rC10 /* rC10 = c08ab c09ab c10bd c11bd */
-- addps rC2, rC12 /* rC12 = c12ac c13ac c12bd c13bd */
-- addps rC10, rC11 /* rC11 = c08abcd c09abcd c10bdac c11bdac */
-+ movlhps rC4, rC5 /* rC5 = c5a c5b c4a c5a */
-+ unpcklps rC9 , rC8 /* rC8 = c08a c09a c08b c09b */
-+ movhlps rC1, rC11 /* rC11 = c08d c09d c10c c11c */
-+ movlhps rC10, rC1 /* rC1 = c08c c09c c10a c11a */
-+ movhlps rC5, rC6 /* rC6 = c4a c5a c6b c7b */
-+ movhlps rC8 , rC0 /* rC0 = c08b c09b c10d c11d */
-+ unpcklps rC13, rC2 /* rC2 = c12a c13a c12b c13b */
-+ addps rC1, rC11 /* rC11 = c08cd c09cd c10ac c11ac */
-+ addps rB0, rC6 /* rC6 = c4ab c5ab c6bd c7bd */
-+ movlhps rC8 , rC9 /* rC9 = c09a c09b c08a c09a */
-+ unpckhps rC13, rC12 /* rC12 = c12c c13c c12d c13d */
-+ movhlps rC9 , rC10 /* rC10 = c08a c09a c10b c11b */
-+ addps rC6, rC7 /* rC7 = c4abcd c5abcd c6bdac c7bdac */
-+ addps rC0, rC10 /* rC10 = c08ab c09ab c10bd c11bd */
-+ addps rC2, rC12 /* rC12 = c12ac c13ac c12bd c13bd */
-+ addps rC10, rC11 /* rC11 = c08abcd c09abcd c10bdac c11bdac */
-
- /* */
-
-- movhlps rC12, rC13 /* rC13 = c12bd c13bd X X */
-- prefB(320-176(pB,ldab))
-- addps rC13, rC12 /* rC12 = c12abcd c13abcd X X */
-+ movhlps rC12, rC13 /* rC13 = c12bd c13bd X X */
-+ prefB(320-176(pB,ldab))
-+ addps rC13, rC12 /* rC12 = c12abcd c13abcd X X */
-
- #endif
- /*
- * Write results back to C; pC += 14;
- */
- #ifdef SREAL
-- movups rC3, (pC)
-- movups rC7, 16(pC)
-- movups rC11, 32(pC)
-- movlps rC12, 48(pC)
--/* addq $56, pC */
-+ movups rC3, (pC)
-+ movups rC7, 16(pC)
-+ movups rC11, 32(pC)
-+ movlps rC12, 48(pC)
-+/* addq $56, pC */
- #else
-- movss rC3, (pC)
-- movss rC7, 32(pC)
-- movhlps rC3, rC0
-- movhlps rC7, rC6
-- movss rC0, 16(pC)
-- movss rC6, 48(pC)
-- shufps $0x55, rC3, rC3
-- shufps $0x55, rC7, rC7
-- movss rC3, 8(pC)
-- movss rC7, 40(pC)
-- shufps $0x55, rC0, rC0
-- shufps $0x55, rC6, rC6
-- movss rC0, 24(pC)
-- movss rC6, 56(pC)
--
-- movss rC11, 64(pC)
-- movhlps rC11, rC2
-- movss rC12, 96(pC)
-- movss rC2, 80(pC)
-- shufps $0x55, rC11, rC11
-- shufps $0x55, rC12, rC12
-- movss rC11, 72(pC)
-- shufps $0x55, rC2, rC2
-- movss rC12, 104(pC)
-- movss rC2, 88(pC)
-+ movss rC3, (pC)
-+ movss rC7, 32(pC)
-+ movhlps rC3, rC0
-+ movhlps rC7, rC6
-+ movss rC0, 16(pC)
-+ movss rC6, 48(pC)
-+ shufps $0x55, rC3, rC3
-+ shufps $0x55, rC7, rC7
-+ movss rC3, 8(pC)
-+ movss rC7, 40(pC)
-+ shufps $0x55, rC0, rC0
-+ shufps $0x55, rC6, rC6
-+ movss rC0, 24(pC)
-+ movss rC6, 56(pC)
-+
-+ movss rC11, 64(pC)
-+ movhlps rC11, rC2
-+ movss rC12, 96(pC)
-+ movss rC2, 80(pC)
-+ shufps $0x55, rC11, rC11
-+ shufps $0x55, rC12, rC12
-+ movss rC11, 72(pC)
-+ shufps $0x55, rC2, rC2
-+ movss rC12, 104(pC)
-+ movss rC2, 88(pC)
-
--/* addq $112, pC */
-+/* addq $112, pC */
- #endif
- /*
- * Write results back to C
-@@ -2660,55 +2667,55 @@ MLAST:
- /*
- * while (pA != stM);
- */
--/* subq $1, stM */
--/* jne UMLOOP */
-+/* subq $1, stM */
-+/* jne UMLOOP */
- /*
- * pC += 14; pA += 14*NB; pB -= NB;
- */
--/* subq $MBKBso-NB14so+176, pA5 */
--/* subq $MBKBso-NB14so+176, pA10 */
-- subq incAm, pA5
-- subq incAm, pA10
-- addq $NBso-176, pB0
-+/* subq $MBKBso-NB14so+176, pA5 */
-+/* subq $MBKBso-NB14so+176, pA10 */
-+ subq incAm, pA5
-+ subq incAm, pA10
-+ addq $NBso-176, pB0
- /*
- * while (pA != stM);
- */
--/* subq $1, stM */
--/* jne UMLOOP */
-+/* subq $1, stM */
-+/* jne UMLOOP */
- /*
- * pC += incCn; pA -= NBNB; pB += NB;
- */
-- addq incCn, pC
-+ addq incCn, pC
- /*
- * while (pB != stN);
- */
-- sub $1, stN
-- jne UNLOOP
-+ sub $1, stN
-+ jne UNLOOP
-
- /*
- * Restore callee-saved iregs
- */
- DONE:
-- movq -8(%rsp), %rbp
-- movq -16(%rsp), %rbx
-+ movq -8(%rsp), %rbp
-+ movq -16(%rsp), %rbx
- #if MB == 0
-- movq -32(%rsp), %r12
-- movq -40(%rsp), %r13
-+ movq -32(%rsp), %r12
-+ movq -40(%rsp), %r13
- #endif
-- ret
-+ ret
- #if MB == 0
- MB_LT84:
-- cmp $70, stM
-- jne MB_LT70
--/* movq $70/14, stM */
-- movq $5, stM
-- jmp MBFOUND
-+ cmp $70, stM
-+ jne MB_LT70
-+/* movq $70/14, stM */
-+ movq $5, stM
-+ jmp MBFOUND
- MB_LT70:
-- cmp $56, stM
-- jne MB_LT56
--/* movq $56/14, stM */
-- movq $4, stM
-- jmp MBFOUND
-+ cmp $56, stM
-+ jne MB_LT56
-+/* movq $56/14, stM */
-+ movq $4, stM
-+ jmp MBFOUND
- MB_LT56:
- cmp $42, stM
- jne MB_LT42
-diff -rupN ATLAS/tune/blas/level1/scalsrch.c atlas-3.8.3/tune/blas/level1/scalsrch.c
---- ATLAS/tune/blas/level1/scalsrch.c 2009-02-18 19:48:25.000000000 +0100
-+++ atlas-3.8.3/tune/blas/level1/scalsrch.c 2009-11-12 13:45:48.141174024 +0100
-@@ -747,7 +747,7 @@ void GenMainRout(char pre, int n, int *i
- /*
- * Handle all special alpha cases
- */
-- fprintf(fpout, "%sif ( SCALAR_IS_ZERO(alpha) )\n", spc);
-+ /* fprintf(fpout, "%sif ( SCALAR_IS_ZERO(alpha) )\n", spc);
- fprintf(fpout, "%s{\n", spc);
- if (pre == 'c' || pre == 'z')
- {
-@@ -756,7 +756,7 @@ void GenMainRout(char pre, int n, int *i
- }
- else fprintf(fpout, "%s Mjoin(PATL,set)(N, ATL_rzero, X, incx);\n", spc);
- fprintf(fpout, "%s return;\n", spc);
-- fprintf(fpout, "%s}\n", spc);
-+ fprintf(fpout, "%s}\n", spc); */
- GenAlphCase(pre, spc, fpout, 1, n, ix, iy, ia, ib);
- GenAlphCase(pre, spc, fpout, -1, n, ix, iy, ia, ib);
- if (pre == 'c' || pre == 'z')
diff --git a/libraries/atlas/slack-desc b/libraries/atlas/slack-desc
deleted file mode 100644
index 73ea6b801b..0000000000
--- a/libraries/atlas/slack-desc
+++ /dev/null
@@ -1,19 +0,0 @@
-# HOW TO EDIT THIS FILE:
-# The "handy ruler" below makes it easier to edit a package description.
-# Line up the first '|' above the ':' following the base package name, and
-# the '|' on the right side marks the last column you can put a character in.
-# You must make exactly 11 lines for the formatting to be correct. It's also
-# customary to leave one space after the ':' except on otherwise blank lines.
-
- |-----handy-ruler------------------------------------------------------|
-atlas: atlas (Automatically Tuned Linear Algebra Software)
-atlas:
-atlas: This is ATLAS (Automatically Tuned Linear Algebra Software), an
-atlas: ongoing research effort focusing on applying empirical techniques in
-atlas: order to provide portable performance. At present, it provides C and
-atlas: Fortran77 interfaces to a portably efficient BLAS implementation as
-atlas: well as a few routines from LAPACK. Nevertheless, the default setting
-atlas: for Slackware is to allow for a full LAPACK to get build and installed
-atlas: along with ATLAS.
-atlas:
-atlas: Homepage: http://math-atlas.sourceforge.net/