Revision 76a92d4...

Go back to digest for 22nd December 2013

Optimization in Educational

Henry de Valence committed changes in [kstars] kstars/skyobjects/skypoint.h:

Save 16 bytes per sky object.

In practice, the `long double` type has 16 byte size and alignment.
We can inspect the memory layout of some class inheriting SkyPoint using
clang [1]:

*** Dumping AST Record Layout
0 | class StarObject
0 | class SkyObject (primary base)
0 | class SkyPoint (primary base)
0 | (SkyPoint vtable pointer)
0 | (SkyPoint vftable pointer)
16 | long double lastPrecessJD
32 | class dms RA0
32 | double D
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]
...(snipped)...
184 | float B
188 | float V
| [sizeof=192, dsize=192, align=16
| nvsize=192, nvalign=16]

The vtable takes up only 8 bytes (on 64-bit), but we waste 8 bytes on
padding. Moreover, we then take up 16 bytes to store lastPrecessJD.
Using a program like the following:

#include <stdio.h>
#include <math.h>

int main()
{
double jd2000 = 2451545.0;
double delta = nextafter(jd2000,jd2000+1) - jd2000;
printf("delta: %.30f\n", delta);
return 0;
}

we can compute that at J2000, the minimum time step at double precision
is approximately 40 microseconds, so it's not clear that we gain
anything by using 80-bit long doubles instead of 64-bit doubles.
Changing the `long double` to `double` (and placing it last) results in
memory layout like so:

*** Dumping AST Record Layout
0 | class SkyPoint
0 | (SkyPoint vtable pointer)
0 | (SkyPoint vftable pointer)
8 | class dms RA0
8 | double D
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]

16 | class dms Dec0
16 | double D
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]

24 | class dms RA
24 | double D
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]

32 | class dms Dec
32 | double D
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]

40 | class dms Alt
40 | double D
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]

48 | class dms Az
48 | double D
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]

56 | double lastPrecessJD
| [sizeof=64, dsize=64, align=8
| nvsize=64, nvalign=8]

This also has the benefit that the SkyPoint data fits in a single cache
line, though I don't think this really makes a difference given the
inefficiencies in the rest of the code. A before/after test showed a
drop in memory usage of about 6%.

[1]: http://eli.thegreenplace.net/2012/12/17/dumping-a-c-objects-memory-layout-with-clang/

File Changes

Modified 1 files
  • kstars/skyobjects/skypoint.h
1 files changed in total