Documents I've found helpful are:
Some highlights:
One of the most simple optimization tips to limit memory usage is to
use the appropriate type of display object. For simple shapes that are
not interactive, use Shape objects. For interactive objects that don’t
need a timeline, use Sprite objects. For animation that uses a
timeline, use MovieClip objects.
getSize()
returns the size in memory of a specified object.
All primitive types except String use 4 - 8 bytes in memory. A
Number, which represents a 64-bit value, is allocated 8 bytes by the
ActionScript Virtual Machine (AVM), if it is not assigned a value.
The behavior differs for the String type. Benchmark code and
determine the most efficient object for the task.
Optimize memory by reusing objects and avoid recreating them whenever
possible.
Reusing objects reduces the need to instantiate objects, which can be
expensive. It also reduces the chances of the garbage collector
running, which can slow down your application.
To make sure that an object is garbage collected, delete all
references to the object. Memory allocation, rather than object
deletion, triggers garbage collection. Try to limit garbage
collection passes by reusing objects as much as possible. Also, set
references to null, when possible, so that the garbage collector
spends less processing time finding the objects. Think of garbage
collection as insurance, and always manage object lifetimes
explicitly, when possible.
Setting a reference to a display object to null does not ensure that
the object is frozen. The object continues consume CPU cycles until it
is garbage collected.
BitmapData class includes a dispose()
method, although the dispose
method removes the pixels from memory, the reference must still be set
to null to release it completely.
Using vectors, especially in large numbers, dramatically increases the
need for CPU or GPU resources. Using bitmaps is a good way to optimize
rendering, because the runtime needs fewer processing resources to
draw pixels on the screen than to render vector content.
When a filter is applied to a display object, the runtime creates two
bitmaps in memory. Using externally authored bitmaps helps the
runtime to reduce the CPU or GPU load.
Use mipmapping sparingly. Although it improves the quality of
downscaled bitmaps, it has an impact on bandwidth, memory, and speed.
For read-only text, it’s best to use the Flash Text Engine, which
offers low memory usage and better rendering. For input text,
TextField objects are a better choice, because less ActionScript code
is required to create typical behaviors, such as input handling and
word-wrap.
Using the native event model can be slower and consume more memory
than using a traditional callback function. Event objects must be
created and allocated in memory, which creates a performance slowdown.
For example, when listening to the Event.ENTER_FRAME
event, a new
event object is created on each frame for the event handler.
Performance can be especially slow for display objects, due to the
capture and bubbling phases, which can be expensive if the display
list is complex.
Even if display objects are no longer in the display list and are
waiting to be garbage collected, they could still be using
CPU-intensive code.
The concept of freezing is also important when loading remote content
with the Loader class.
unloadAndStop()
method allows you to unload a SWF file,
automatically freeze every object in the loaded SWF file, and force
the garbage collector to run.
Event.ACTIVATE
and Event.DEACTIVATE
events allow you to detect
when the runtime gains or loses focus. As a result, code can be
optimized to react to context changes.
The activate and deactivate events allow you to implement a similar
mechanism to the "Pause and Resume" feature sometimes found on mobile
devices and Netbooks.
Detecting mouse interaction can be CPU-intensive when many interactive
objects are shown onscreen, especially if they overlap. When
possible, consider disabling mouse interaction, which helps your
application to use less CPU processing, and as a result, reduce
battery usage on mobile devices.
Timers are preferred over Event.ENTER_FRAME
events for non-animated
content that executes for a long time.
A timer can behave in a similar way to an Event.ENTER_FRAME
event, but an
event can be dispatched without being tied to the frame rate. This
behavior can offer some significant optimization. Consider a video
player application as an example. In this case, you do not need to use
a high frame rate, because only the application controls are moving.
Limit the use of tweening, which saves CPU processing, memory, and
battery life helping content run faster on low-tier devices.
The Vector class allows faster read and write access than the Array
class.
Array element access and iteration are much faster when using a Vector
instance than they are when using an Array.
In strict mode the compiler can identify data type errors.
Runtime range checking (or fixed-length checking) increases
reliability significantly over Arrays.
Reduce amount of code execution using drawPath()
,
drawGraphicsData()
, drawTriangles()
Fewer lines of
code can provide better ActionScript execution performance.
Taking advantage of the bubbling of an event can help you to optimize
ActionScript code execution time. You can register an event handler on
one object, instead of multiple objects, to improve performance.
When painting pixels, some simple optimizations can be made just by
using the appropriate methods of the BitmapData class. A fast way to
paint pixels is to use the setVector()
method.
Calling lock()
and unlock()
prevents the screen from being updated
unnecessarily. Methods that iterate over pixels, such as getPixel()
,
getPixel32()
, setPixel()
, and setPixel32()
, are likely to be slow,
especially on mobile devices. If possible, use methods that retrieve
all the pixels in one call. For reading pixels, use the getVector()
method, which is faster than the getPixels()
method. Also, remember to
use APIs that rely on Vector objects, when possible, as they are
likely to run faster.
When a String class method is available, it runs faster than the
equivalent regular expression and does not require the creation of
another object.
Using the appendText()
method provides performance improvements.
Using the square bracket operator can slow down performance. You can
avoid using it by storing your reference in a local variable.
Calling functions can be expensive. Try to reduce the number of
function calls by moving code inline.
Moving the function call inline results in code that is more than four
times faster.
Even if the off-stage elements are not shown onscreen and are not
rendered, they still exist on the display list. The runtime continues
to run internal tests on these elements to make sure that they are
still off-stage and the user is not interacting with them.
When a display object uses alpha blending, the runtime must combine
the color values of every stacked display object and the background
color to determine the final color. Thus, alpha blending can be more
processor-intensive than drawing an opaque color. This extra
computation can hurt performance on slow devices.
A higher frame rate expends more CPU cycles and energy from the
battery than a lower rate.
Runtime code execution fundamentals

This feature caches a vector object, renders it as a bitmap
internally, and uses that bitmap for rendering. Bitmap caching
improves rendering if the cached content is not rotated, scaled, or
changed on each frame. Any transformation other than translation on
the x- and y-axes, rendering is not improved.
cacheAsBitmapMatrix
in the AIR mobile profile you can apply any
two-dimensional transformation to the object without regenerating the
cached bitmap. You can also change the alpha property without
regenerating the cached bitmap.
Using only a single cached bitmap is used in memory and shared by all
instances.
This technique saves CPU resources.
The bitmap caching feature allows you to cache vector content as
bitmaps to improve rendering performance. This feature is helpful for
complex vector content and also when used with text content that
requires processing to be rendered.
Alpha transparency places an additional burden on the runtime when
drawing transparent bitmap images. You can use the
opaqueBackground
property to bypass that, by specifying a
color as a background.
In order to leverage GPU acceleration of Flash content with AIR for
mobile platforms, Adobe recommends that you use renderMode="direct"
(that is, Stage3D) rather than renderMode="gpu". Adobe officially
supports and recommends the following Stage3D based frameworks:
Starling (2D) and Away3D (3D).
Avoid using wmode=transparent or wmode=opaque in HTML embed
parameters. These modes can result in decreased performance. They can
also result in a small loss in audio-video synchronization in both
software and hardware rendering. Furthermore, many platforms do not
support GPU rendering when these modes are in effect, significantly
impairing performance.
Application code in the current execution thread continues executing.
Asynchronous operations are scheduled and divided to avoid rendering
issues. Consequently, it is much easier to have a responsive
application using asynchronous versions of operations. See Perceived
performance versus actual performance for more information.
Unlike bitmaps, rendering vector content requires many calculations,
especially for gradients and complex paths that contain many control
points. As a designer or developer, make sure that shapes are
optimized enough.
If your application loads assets such as media or data, cache the
assets by saving them to the local device. For assets that change
infrequently, consider updating the cache at intervals.
Use the StageVideo class to take advantage of hardware acceleration to
present video.
This approach takes full advantage of the underlying video hardware.
The result is a much lower load on the CPU, which translates into
higher frame rates on less powerful devices and also less memory
usage.
Similar to video decoding, audio decoding requires high CPU cycles and
can be optimized by leveraging available hardware on the device.
The AAC format offers better quality and smaller file size than the
mp3 format at an equivalent bitrate.
Initialization function such as constructors are interpreted,
everything else is JIT.