Optimization is simply the act of taking existing code and making it run more efficiently. This does not necessarily mean faster, as it is certainly possible to optimize code for storage size or for end user efficiency (work flow) or any number of other factors that management/clients deem important. For homebrew development, the key things that need to be optimized for are memory use and speed. This makes sense as the storage space on a cartridge is very limited. Likewise, frame rates are always important to maintain. One of the interesting things about doing homebrew development is it brings home how much optimization techniques have changed over the years.
The scariest change in optimizing is the “why don’t you just use the -O3 flag?” response that less experienced programmers give. I find this scary because it is an indication that compilers have become some type of magic tool to some people. Compiler optimization switches simply tell the compiler to take extra time to produce more efficient machine code. When you consider how lousy the code compilers produced use to be, this is very impressive. In the “old” days, your average assembly language programmer could write better code than the best compiler could produce. Now you need to be a really good assembly language programmer to produce better code making writing assembly language a mostly obsolete skill. Knowing how the machine works, however, will always be important so I do believe that all real programmers should know some assembly language even if they never use it.
It use to be possible to write faster code by employing techniques such as code re-ordering and loop unrolling. This is now done by the compiler when optimization is enabled. As code re-ordering and loop unrolling resulted in ugly and hard to understand code this is an improvement that I actually like.
The biggest change that has happened, and will probably continue to happen, is shifting bottlenecks. When I started programming, floating point was something you used as a last resort as is was so much slower than using integers. Integer techniques such as Fixed point math were used as despite the extra instructions and complexity they were still faster than floating point math. Now floating point is just as fast as integer math, and in certain cases may be faster.
Today’s biggest bottleneck is probably memory. We are now at a point in time where RAM access speeds are significantly slower than the speed that the processor is running at. I have heard it claimed that a cache miss will result in over a hundred cycle delay. This means that techniques such as lookup tables may be a lot slower than you would expect. In a few years this will probably change as technological solutions to the RAM bottleneck are discovered. At which point a new bottleneck will appear causing optimizers to have to change their techniques yet again.
Ultimately the best form of optimization is the same as it always has been. Simply to understand the problem enough that you can develop a better algorithm for solving it. This often requires thinking outside of the box which is something a compiler simply can’t do. When compilers get to the point that they are able to do this for the programmer, we should be very worried as the robot overlords will be soon to follow.