Tuesday, November 6, 2007

Mining for Data - Part 6

Spreadsheets were designed with number junkies in mind. They can do all kinds of wizardry with their built-in mathematical functions. Once you learn one of them, all the other spreadsheets are easy to adapt to, since they seem to have all been designed after the pattern of their grand-daddy, VisiCalc. We begin by opening the file "2001 sales data.xls" in the spreadsheet; I am using OpenOffice's Calc.

We have already set up columns A, B, and C to hold the information about Tax ID, Recording Date, and Sales Price, and the rows were ordered by ascending street number and street name. There are some assumptions we have to make at this point if we are not going to be going back and checking the Auditor's web site for every property on the list.

The first assumption is that the recording date shown is the only transfer for that property in that year. In the event the home transferred more than once (e.g., a foreclosure followed by REO sale followed by investor flip), only the final transfer will be shown. The second assumption is that the transfer is an actual arm's-length sale; this cannot be guaranteed since Realist sometimes reports as sales the sale of partial interests, transfers into revocable trusts, land contract agreements, and multi-parcel transfers by investors. It is worth the effort to scan down the list to see any "sales" that do not have a sales price and check the Auditor's web site; if those transfers need to be removed, simply click on the row header to the left of column A and delete the row. A third assumption is that Realist will not always have the entire data set; they sometimes miss a transaction. While the data is good, it will not be perfect.

One other consideration : beginning with the 2003 data, it seems that Realist is reporting a transfer having occurred in that year, but any subsequent transfer shows up with its more recent recording date and sales price. In the event such transfers show up, the Auditor's site must be visited to obtain and extract the correct data for the year under consideration. Example : 931 Bloomfield shows up in the list of 2003 transfers with a 10/11/2007 recording date and sale price of $50,000. That transfer is actually the second transfer in 2007 (a foreclosure sale) following an investor sale 6/21/2007 for $54,350. The 2003 sale occurred on 3/3/2003 for $69,900 (which, incidentally, was a "flip" after a sale 8/19/2002 for $22,500; a sale that did not show up in the 2002 Realist sales search).

An even better example is 438 Beechwood, which sold in foreclosure 1/15/03 for $25,000, as an REO on 1/31/2003 for $18,000, and then as a "flip" on 9/29/2003 for $69,900. To be as consistent with my data as possible, the latter is the information I will use for 2003. What will not show up is the fact that the home at 438 Beechwood was subsequently foreclosed again on 9/22/2005 for $36,000, sold as an REO 3/8/2006 for $24,000, and again as a "flip" on 5/15/2006 for $80,000. Any appraisal done on those sales was required by Federal law (USPAP) to have all prior sales within 36 months disclosed. Was the law broken? Did an underwriter ignore the facts presented? Did a loan officer somewhere alter the appraisal without the appraiser's knowledge? Only a jury can determine if fraud was committed.

Scrolling down to the bottom of my spreadsheet, I find that the final transfer is in row 40. Since row 1 contains the column headings, this means that Realist.com picked up 39 recorded sales in the neighborhood in 2001. In column A, I move to row 42 and type "Min", in row 43 "Max", in row 44 "Mean", in row 45 "Median", in row 46 "Mode", and in row 47 "SD". We will use the spreadsheet's math functions to find the range and measures of central tendency, as well as one standard deviation from the mean.

For those who just got lost, I recommend reading Conceptual Statistics for Beginners by Newman and Newman (1994). Dr. Isadore Newman is the man who opened my eyes to how statistics can be used and misused, and introduced me to the concept of the Null Hypothesis (the Type 1 Error), forever changing my view of how scientific research ought to be conducted and reported. Just a plug for a great teacher.

Moving to column C, row 42, type "=min(c2:c40)". This will find the lowest sales price for the data set in column C from rows 2 through 40. When you press "ENTER", the number $30,000.00 appears. You are now in row 43 of column C. Type "=max(c2:c4)"; the high sales price of $129,000.00 appears when you press "ENTER". In row 44 of column C, type "=average(c2:c40)" and press "ENTER"; the mean sales price for 2001 was $60,619.23. In row 45 type "=median(c2:c40)" to get the median sales price of $59,000.00, and in row 46 type "=mode(c2:c40)" to get what we can take to be the predominant sales price of about $62,000.00. In column C and row 47 type "=stdev(c2:c40)" and the spreadsheet will calculate the amount of one standard deviation from the mean, which comes out as $17,967.33.

This information tells us that the measures of central tendency -- the mean, median, and mode -- are all very close to each other; the average sales price was about $60,600, there were the same number of sales above $59,000 as there were below $59,000, and the most commonly occurring sales price was $62,000. If we were to graph the data we might expect a normal bell curve, and the standard deviation of about $18,000 tells us that 68% of all the sales in this neighborhood in 2001 occurred within the range of roughly $43,000 to $79,000.

An underwriter presented with this kind of information might ask a few more questions before granting a loan for more than the mean neighborhood sales price on a home in this neighborhood. As I mentioned in Part 2 of this series, however, he is not likely to get this kind of information because (1) appraisers have learned that in order to keep appraisal orders coming in, they should not report information that causes the value opinion to vary from the stated neighborhood predominant price by more than 10%, and, (2) nobody is paying them enough to take the trouble to do this kind of analysis; in fact, there is constant pressure to reduce the price for an appraisal and consequently to reduce the quality of the work. Had this kind of information been sought out and presented, we might not now be in the midst of a housing market melt-down.

Next : Summarizing some findings.

No comments:

Post a Comment