Vector Math Library (VML) Notes

Vector Math Library (VML) Notes
for Intel^® Architecture Processors

Disclaimer

This document as well as the software described in it is furnished under license and may only be used or copied in accordance with the terms of the license. The information in these Notes is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document.

Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the express written consent of Intel Corporation.

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL^® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Intel may make changes to specifications and product descriptions at any time, without notice.

This document describes Vector Math Library (VML), which is designed to compute elementary functions on vector arguments. VML is an integral part of the Intel® Math Kernel Library and the VML terminology is used here for simplicity in discussing this group of functions.

VML includes a set of highly optimized implementations of certain computationally expensive core mathematical functions (power, trigonometric, exponential, hyperbolic, etc.) that operate on vectors. VML may significantly improve performance for such applications as nonlinear software, computations of integrals, and many others.

Each vector function from VML (for each data format) can work in two modes: High Accuracy (HA) and Low Accuracy (LA). For many functions, using the LA version improves performance at the cost of accuracy. For some cases, the advantage of relaxing the accuracy improves performance very little so the same function is employed for both versions. Error and special value behavior does not depend either on the accuracy mode (HA versus LA) or on the processor on which the software runs. Accuracy behavior however is a processor specific, so results might differ slightly on Intel® Pentium® 4 processor and Intel® Xeon™ processor with Streaming SIMD Extensions 3 (SSE3) instructions, Intel® Itanium® 2 processor and Intel® Xeon™ processor with EM64T technology. For more information see the special value behavior section and web-based data on accuracy.

This document refers to a more detailed description of performance and accuracy properties of VML functions, which you can find at the product web page. There are several issues considered (performance, accuracy, special values processing) and two levels of details (brief information for all functions in one table and more detailed information for every function on a separate page).

Performance issues: Performance numbers in the respective tables are shown for so-called "working" intervals arguments. Performance behavior may be different for other intervals. For example, it is quite expensive to compute trigonometric functions on "huge" arguments. Therefore, to obtain needed accuracy, performance is sacrificed. Each function lists the working interval over which performance is measured. The same page contains graphs that demonstrate how the performance behavior depends on vector length. There are two extreme cases: so-called "short" and "long" vectors (logarithmic scale is used to show both cases). For short vectors there are cycle organization and initialization overheads. The cost of such overheads is amortized with increasing vector length, and for vectors longer than a few dozens of elements the performance remains quite flat until the L2 cache size is exceeded with the length of the vector.

Data prefetching with the Intel® Pentium® III processor (explicit data prefetch in software) and Pentium 4 processor (implicit data prefetch in hardware) greatly reduce the out-of-cache problem.

See the performance data tables and graphs at the product web page.

Accuracy issues: The design requirement for the HA functions is less than 1.0 ulp error with all special values being processed correctly. For the LA version some of these requirements are not so strict. For more details see the web-placed accuracy table with ulp errors for all functions.

Special Values processing issues: Special Values are processed in accordance with C9X standard. For a full list of special values for every function see the corresponding table.

For details on performance of individual functions see the list of VML functions at the product web page.

To ensure a correct display of this document, use the following recommended browser versions: Internet Explorer* 5.5 or higher (on Windows*), Netscape* 4.79 or Mozilla* 1.2.1 or higher (on Linux*).

Celeron, Dialogic, i386, i486, iCOMP, Intel, Intel Centrino, Intel logo, Intel386, Intel486, Intel740, IntelDX2, IntelDX4, IntelSX2, Intel Inside, Intel Inside logo, Intel NetBurst, Intel NetStructure, Intel Xeon, Intel XScale, Itanium, MMX, MMX logo, Pentium, Pentium II Xeon, Pentium III Xeon, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.