Some Background on Forecasting


Introduction.  If you watch movies and liked the Back to the Future movies you may remember Biff putting the nab on a sports almanac that was accidentally brought back from the future (assuming I remember the movies).  He was able to make use of it to predict sporting event outcomes and bet his way to extreme wealth.  Well, at least for part of the movie.  

Being able to predict the future is always an important endeavor, whether it is trying to predict how someone will respond to your moves, where you should invest your money, trying to predict future demand for a product or an almost infinite variety of other things.  Businesses tend to focus on trying to forecast things that make them money.

Data and Information.  There are always issues of what data and/or information should be collected in order to forecast .  In addition to considering what data should be collected, one must consider how much money should be spent doing it. 

For example, you could go around digging oil wells as your only approach for trying to determine whether oil is located at each dig.  This would likely be an overly expensive approach in many ways.  But this situation illustrates how you might choose to better spend money to determine where you should take deep samples.  Maybe some sort of sonar technology is much less expensive, less intrusive and gives you reasonable information to make predictions.  Maybe you know something about the geologic formations in the area from other sources.  Some of this information is more or less quantitative.

Consider another example.  Let's assume you want to be able to predict the relative movements of two currencies, let's say dollars and euros.  There is a lot of information that could be worthwhile to obtain.  Some of it could be quite cheap, other information could be quite expensive.  Maybe you automatically keep past data on the relative valuations of each currency.  Maybe you also keep up to date information on each nations debt ratios.  Even with these, there are a lot of other things that will influence such movements.  You can't collect data on many of them, or predict them.  Some of this data would be easy to obtain, other data could be difficult.  Maybe you want to know something about national labor and productivity statistics but only the governments keep such information and you are forced to wait to hear government announcements like everyone else.  This also illustrates that it can be very useful to try and discover relationships between data that other people are unaware of.

For example, maybe you know that 8 times out of 10, when the corn belt has a snowy winter they have a dry summer.  How would this influence your belief in what corn futures are going to do in the next six months?  Is this information sufficiently strong for you to be willing to risk some money on what will occur?  What if this relationship only occurs 6 out of 10 times?

Maybe you've noticed that when most everyone on the ski slopes during your vacation is wearing a red jacket the stock market will be bullish.  Would you put money on this?

Obviously, there can be a lot of different types of inputs that can be relevant, some numeric, some not.  Usually, when making predictions, about the only data you have is data about the past performance of what you're looking at.

For example, let's say you are the computer analyst at a parts warehouse for a major automobile manufacturer.  There are so many parts you can barely keep track of them.  But due to all of the automation and computer systems you do know the on hand quantities and monthly demands for each part for as long as the parts have been around.  While lots of other things could influence this demand, this is likely to be the only information you can use to make a prediction for the future.  What would you do with it?

Prediction Validity.  While we will get back to this topic in more detail later it is something that you always need to keep in the back of your mind.  How much do you believe your predictions?   For example, you may feel that it rains within 24 hours after every time you wash your car, but would you be willing to sell your services to drought ridden areas?  Think of all the free travel and money you could make if this were really true?

Well, obviously, I am being facetious, but this is still an issue of very genuine concern.  We will spend quite a bit of time on model and prediction validation.

When dealing with numbers MSE = Mean Square Error and MAD = Mean Absolute Deviation are the two most typical measures of validity.  Both of these work in scales that are somewhat non-intuitive.  I will develop our measures as percentage error or APE = Average Percentage Error.  More will be said about these when we use them.

Now we will move onto some specific models.