Implementing sqrt

I was reading some post about interview questions of 2011 and came across one that stated “find the square root of a number”.

Assuming we can’t use the sqrtf function of the standard C math library, let’s see how we can calculate the square root of a number x.

Given n, we know that its square root is a number x that holds:

\sqrt{n} = x

Let’s work on this equation a little. Raise both sides to the second power:

n = x^2

Move to the left of the equality:

0 = x^2 - n

If we found the roots of this last equation somehow, we would have found the square root of n. We can do this by using the Newton-Raphson iteration.

The Newton-Raphson iteration states that we can find the root of an equation using the following formula iteratively:

x_{n+1} = x_n - \frac{f(x)}{f'(x)}

Where f'(x) is the derivative of function f(x). We will approximate the derivative using the definition of derivative at a point (we could also note that the derivative could be trivially calculated; this method is more general).

f'(x) = \frac{f(x+h) - f(x)}{h}

The error of the Newton-Raphson iteration is given by:

|x_{n+1} - x_n|

Starting with a hardcoded seed value, we will perform this iteration in a loop until the error is less than a given value. I have chosen to iterate until the error is less than 1×10^(-10): 0.00000000001.

Let us see what a tentative “pythonesque” pseudocode for this loop could be:

def sqrt(n):
    f = function(x*x - n)
    x = 1 # seed
    xant = 0
    do:
        f1 = (f.eval(x+h) - f.eval(x)) / h
        xant = x
        x = x - f.eval(x)/f1
    while abs(x - xant) > err;
    return x

Assuming we have a symbolic function type, that loop does not seem too difficult. In order to code this in C, since the equation is always the same, I will hardcode it as a plain function.

typedef double real; // change to float for single precision

real f(real x, real n)
{
    return x*x - n;
}

real sqrt(real n)
{
    real err = 0.00000000001f;
    real h = 0.01f;

    real x = 1.0f; // seed
    real xant = 0.0f;

    do
    {
        xant = x;
        real df = (f(x+h, n) - f(x, n))/h;
        x = x - f(x, n)/df;
    }
    while (abs(x - xant) > err);

    return x;
}

Here are the results of running our custom square root function, compared to the standard version provided with the C programming language:

[ale@syaoran sqrt]$ ./sqrt 1.0
Custom sqrt of: 1 = 1
libm sqrt of: 1 = 1

[ale@syaoran sqrt]$ ./sqrt 2.0
Custom sqrt of: 2 = 1.41421
libm sqrt of: 2 = 1.41421

[ale@syaoran sqrt]$ ./sqrt 4.0
Custom sqrt of: 4 = 2
libm sqrt of: 4 = 2

[ale@syaoran sqrt]$ ./sqrt 16.0
Custom sqrt of: 16 = 4
libm sqrt of: 16 = 4

[ale@syaoran sqrt]$ ./sqrt 32.0
Custom sqrt of: 32 = 5.65685
libm sqrt of: 32 = 5.65685

[ale@syaoran sqrt]$ ./sqrt 100.0
Custom sqrt of: 100 = 10
libm sqrt of: 100 = 10

[ale@syaoran sqrt]$ ./sqrt 1000000.0
Custom sqrt of: 1e+06 = 1000
libm sqrt of: 1e+06 = 1000

A short detour along the way…

I wanted to improve the Stencil Shadow Volumes code a little bit and enable it for the Programmable Pipeline in Vortex, however, I had to take a small detour to fix an issue related to mobile device support.

It turns out that OpenGL ES, the 3D library that Vortex uses to render its graphics on mobile devices, does not support rendering indexed geometry for indices larger than 16 bits. Keep this in mind when developing for mobile devices such as the iPhone or iPad.

I can completely understand the reasoning behind this decision. 32-bit indices could be considered too much data to submit to the GPU in a mobile device. Furthermore, they are not strictly necessary, as they could be replaced (if needed) by splitting the geometry into two groups defined by 16-bit indices.

The solution I devised, which is now part of Vortex 2.0, is to allow the user to specify the data size for the indices when defining the geometry. This provides the flexibility to use 32, 16 or 8 bit indices. You can even have several geometric objects with different index sizes in a scene.

The advantage of leveraging this mechanism is that now it is very easy to fine-tune the number of bytes used for index representation for improving performance. For example, using 16-bit indices instead of 32-bit indices would make no difference for representing models composed of less than 65536 vertices, while requiring a copy of just half the number of bytes to the GPU.

In the extreme case of 8-bit indices we would be constrained to only 256 vertices, but we would be sending only one fourth of the data to the GPU.

This was mostly plumbing work, so no new picture this time. Stay tuned for more updates!