Free stock market timing & List of all free timing on www

TIPS: Stock Market Timing - Modeling techniques

The Financial Ad
Trader
Get FAT!


Refinements in off-the-shelf pattern recognition software has made a discussion of neural network training techniques superfluous at this stage. Techniques for variable selection and variable transformations, however, are still not discussed much.

USING REFERENCE MODELS TO SEARCH FOR NEW INPUTS

Some software packages provide preprocessing modules specifically designed for financial time series forecasting. Some companies, notably Bio-Comp, offer to divulge the specific variables and preprocessing techniques that they have employed in the development of their example trading systems, when the user purchases the software. This seems to me to be a great opportunity for the novice model builder to secure a prefabricated 'reference' model.

Once a basic model has been built, it can be used as a reference point for testing new varialbes, new transforms, and new variable relations. By adding in a single new variable, or changing the form of a variable, or its preprocessed relation to another variable, one can observe incremental increases or decreases in fitness relative to the unchanged reference model. As each new input or small group of inputs is tested, the effect on the fitness of the model relative to the reference model can be cataloged, and an inventory of potential inputs can be developed using the reference model as a guide.

MANUALLY SEARCHING FOR NEW INPUTS

The above technique uses the relative fitness of the reference model and the test model to indicate the potential importance of new inputs. Another method is to manually develop indicators by visually inspecting the relationship between the target and the input. This can be done conveniently in a spreadsheet environment that has graphing capabilities...EXCEL, for example. After the target time series and the independent varialbs have been loaded into the spreadsheet, mathematical transforms are executed. The final product transform and the target transform are then quickly viewed by selecting and graphing in two dimensions with time on the x-axis and the magnitude of the target and indicator candidate on the y-axes.

A more advanced method of visually inspecting the relationship between two time series is to eliminate time from the two dimensional plot. Plotting each time series against the other will give a 'scatter plot'. Any regularity or form in the plot may indicate a correlation between the independent variable and the target.

Either of the above visualization techniques can be used to examine the relation between two independent variables, or between the target variable and the a transform of itself, such as the 'lagged' target (current value plotted verses past values) or the moving average of the target, etc.

Finally, one can employ software designed for viewing large numbers of time series, to search for correlations between variables. The software SPSS DIAMOND allows one view multiple two dimensional and three dimensional plots at one time. The software allows one to plot mutiple variables or multiple lags automatically.

STATISTICAL SEARCHES FOR NEW INPUTS

Potential inputs can also be evaluated statistically. Software such as Modelware-PRO can be used not only to model (the software is a complete modeling and forecasting system) but also to view various measures of statistical correlation between two or more times series. Many software packages are available for evaluating the statistical relation between two or more time series, such as STATISTICA, which also includes nerual networks and genetic algorithm modules.

The Financial Ad
Trader
Get FAT!

THE IMPORTANCE OF DATA TRANSFORMS

Data transformation, or 'preprocessing', is necessary prior to implementation of any modeling software. This is obligatory for at least three reasons.

First, the absolute value of most financial time series in the real world will trend over time. This trending will present internal logical conflicts to most modeling approaches, as well as practical problems when the varialbe trends beyond the fixed range of the model that has been developed. Data needs to be de-trended or 'normalized' prior to being modeled.

Second, time series represent the state of a variable over time. Each variable that one wants to consider has a history antecedent to any given point in time. This history is a two dimensional matrix that could in theory be presented to the modeling software in its entirety. However, it would take up too much space. Transforms like moving averages, linear regressions, standard deviations, fractal measurements, and so on, provide a way of reducing the total information burden by forming a simplified representaion of the time series matrix.

Finally, data transformations encode the time series selectively, presenting some information or characteristics of the original time series, while leaving out other (hopefully inconsequential) information. Once transformations are carried out, both the look back period and the form of the data in this window, are fixed. Also, the look forward window for the target and the form of the data in this look forward window, is also fixed. As mentioned above, total information burden is reduced, however, in this case, by storing only selected relations between time series, one is potentially highlighting important relations between the variables, making them as explicit as possible. While some might argue that relations between variables should be left for the net to decide, in practice, financial data systems are too 'noisy', and the correlations between inputs so 'weak' that it pays to try to make them explicit.

The Financial Ad
Trader
Get FAT!

BASIC DATA TRANSFORMS THAT ACCOMPLISH MULTIPLE GOALS

There are some basic data manipulations that address all of the concerns mentioned above. While they limit the information presented to the modeling software in ways that some people may find objectionable, the seem to offer a good compromise for begining model builders.

Remember, the three things that need to be accomplished are,

1) de-trending or 'normalization
2) reduction in information burden
3) highlighting relations between variables

An easy way to carry out all three is the following. If one assumes for the moment the one has a single target (T) and several independent variables (V1, v2, v3), then one starts by dividing the target series by each of the independent variable series.

(T/V1)

(T/V2)

(T/V3)

Following this, each new series is converted to two moving averages (m1 and m2) with different windows. For example, one might use a 10 day and a 50 day moving average. The difference of these two moving averages is then taken.

m1(T/V1)-m2(T/V1)

m1(T/V2)-m2(T/V2)

m1(T/V3)-m2(T/V3)

The end result is that one has three new time series that have been normalized, made explicit with respect to target and each variable, and finally, the information burden has been reduced by encoding only a portion of each time series. By changing the values for M1 and M2, one modulates the look back periods to highlight different information, and several encodings can be done for each (T/V) ratio with different choices of M1 and M2, if necessary.

CULLING OUTLYERS

After producing the time series transforms outlined above, it's a good idea to determine the mean and standard deviation of each new series. After having done so, one should eliminate all values outside of the range defined by the mean +/- two standard deviations. There is nothing magical about this range, and one can use any culling process that one wants, including hand inspection. The idea is to eliminate potentially anomolous, low probability states of each time series transform, as they probably represent 'noise' that will impare good modeling.

CONSIDERATIONS OF THE FORM OF THE TARGET

The target that one is trying to predict needs to be transformed for some of the same reasons cited above. While one can predict multiple target forms at one time (and there are very good reasons to do this to improve modeling) we'll restrict ourselves to a single form.

First, choose a range (R) for the target. Rather than taking a ratio or moving average ratio, take the linear regression (LR) over all the points in the range. The linear regression formula can be looked up in any basic statistics text, however, this won't be necessary if you are working in a spreadsheet like EXCEL. Simply apply the LR function, and set up the function to yield the slope (S) of the linear regression. After having transformed the target into a series of slope values, use a conditional function (again, this can be found in EXCEL) to transform the range of slope values into discreet values in accordance with the following formula.

1) if S greater N1 and S less N2 then S=1

2) if S less (-N1) and S greater (-N2) then S=(-1)

3) if S less N1 and S greater (-N1) then S=0

The result of the above is that is windows the magnitude of the slopes. Very small slopes are reassigned to zero, slopes of intermediate value are either reassigned to 1 or (-1) depending on the direction of the slope, and slopes outside of a certain range are ignored by not getting a value (one needs to then go through and delete any records that have empty values for this new transform value). In effect, this culls the target and simplifies it at the same time. The values of N1 and N2 are arbitrary, but I would start with values of N1 that correspond to about a 1% slope rise or fall, and values of N2 that correspond with slope values of about 2%. One can do additional windowing with added values fo N to increase the resolution of the transform, or the extent of the magnitude encompassed before one reaches a null value.


Home