Quantative Methods 370: Regression Analysis

Part 1:

A study on crime rates and poverty for a town, a local news station retrieved some of this data and then made the claim that as the number of kids that get free lunches increases so does the crime in the same area.

After conduction a regression analysis in SPSS one would say that the news is not valid because the
R squared value is .173 which is not significant (Figure 1). This value is lower than 1 so it has no strength.

If a new area of town was identified having a 23.5% of the children having free lunch one would say that the corresponding crime rate would be 22.85945%. This regression analysis equation (y=a+bx).
The coefficients table (Figure 2) shows the variable needed for this equation and the x is the percentage changed into a decimal. The final equation can be solved:

Y=1.685 (.235) + 21.89
Y= 22.285945

The null hypothesis states, there is no linear association between crime rate and the percentage of free lunches given out. The alternative hypothesis states, that there is a linear relationship between crime rate and the percentage of free lunches given out. In order to determine whether or not a linear relationship exists between the two variable of crime rate and percent free lunches, linear regression analysis is used. The null hypothesis can be rejected because of a significance level of less than 0.005. There is a very weak linear association between crime rate and the percentage of lunches given out.
One would not be confident that free lunches are dependent on crime rate because 22.3% is much further away from 100% than it is to 0. One would not be confident that free lunch is dependent on crime rates because the significant level is 97.5% meaning that there is a difference in the number of crimes and the free lunches received.

Figure 1: Model Summary R squared value .173 is not significant because it is smaller than 1.

Figure 2: Coefficients Table show the significance level of .005

Part 2:

Introduction:

The UW system is curious in determining whether or not certain factors influence the amount of enrollment at two different schools, the University of Wisconsin Milwaukee and the University of Wisconsin Eau Claire. The amount of enrollment at each university may be influenced by factors such as the amount of income and education in certain counties, as well as the distance of each county away from each university. These variables can determine a student’s decision in deciding between different universities, thus affecting the amount of enrollment at different schools. Data regarding the enrollment amount at each university as well as income, percent bachelor’s degree, and distance for each county in Wisconsin is used. In order to determine whether or not these variables influence the amount of enrollment the data is analyzed using regression analysis. After performing regression analysis on the data the UW System can determine which factors are most significant in influencing enrollment amounts at the University of Wisconsin Whitewater and the University Wisconsin Eau Claire. When significant factors are determined, spatial representations are used in relation to the regression statistics to determine spatial patterns of enrollment based on the most influential variables.
Introduction:

Methods:

Regression analysis will determine whether or not to reject the null hypothesis, stating that there is no relationship between each variable and enrollment at both universities. If statistically significant, then the alternative hypothesis, stating there is a linear relationship between each variable and enrollment at both universities, can be investigated. In order to properly determine which variables have the most significant influence on the amount of enrollment at each university the data is analyzed through regression analysis in SPSS. Regression analysis statistics are performed in SPSS to determine whether or not any of the three suspected variables have a significant relationship to the amount of enrollment at each university. Six different regression analyses are performed using the enrollment data for both universities in relation to each of the three variables. The results each analysis will indicate which variables have significant relationships to the amount of enrollment at each university, thus indicating which factors are more influential on a student’s decision to attended different universities.

In the file given, below is a following information included for each county:

Education (percent with a Bachelor's degree)
An income variable
Distance from the center of each county
Number of students attending the different UW schools
Number of students under the age of 24
County Name

The focus was on two UW schools, Eau Claire and Whitewater. This information was put into a new spreadsheet and used for this purpose. Two additional fields called PopdistanceEC and PopdistanceWW to normalize the data.

Figure 3: Xcel fields used and created

The variable chosen were:

Population Distance
Age 24 and under
Median Household Income

These variables were used on UWEC and UWW. Maps were chosen to be mapped based on significance level, maps were made using the residuals. The residual is the amount of deviation of each point from the best fit line or regression line. It represents the difference between the actual and the predicted value of Y.

Results:

Figure 4, and 5 show information for the regression analysis of UWEC Enrollment Population and the Normalization of Population Distance. Figure 6 is the residuals in a created map. Figure 4 is the coefficients table. The table is used to help write the regression equation as it has the constant (B) and the slope (b, popdistanceec). The significance level is also in this chart to help decide whether to map it or not. The significance level of 99% and should be mapped because there is a difference between UWEC population distance. Figure 5: This figure is the model summary and it has the R- square value (.945), the r squared value is a useful term that illustrates how much X explains Y. R- squared ranges from 0 to 1, with 1 being very strong. The R-squared value of .94 would be considered very strong meaning that population distance is dependent upon UWEC's population. Figure 6: Map made of the residual levels. Milwaukee County in red is 5.5 or smaller deviations below the best fit line or regression line. Northwestern Wisconsin sends a lot of students to UWEC and above the best fit line mainly include Wood, Barron, Marathon and Chippewa Counties in blue. Northern Wisconsin in green have a high amount of students from these counties included in these is Eau Claire County. One would think Eau Claire County would be in blue as well but there are a great deal of students from around the state that attend UWEC. We reject the null hypothesis because there is a linear relationship.

Figure 4: Coefficient table which shows the constant (B) and the significance level.

Figure 5: Model Summary shows the R-square value.

Figure 6: Map of residuals of University of Eau Claire and the Distance from which students come for school.

Figures 7, 8 are the regression analysis information. Figure 7, shows the constant of 24.153 and a slope of 0.068 and also has a significance level of 99% indicating that it should be mapped for residuals. Figure 8, the R squared value of 0.779 which shows that the population and distance have strong relationship. Figure 9, is the map of the residual levels of UW Whitewater Normalization Population/Distance. Milwaukee in red again in this analysis is very below the best fit line or regression line. Dane, Jefferson, Rock, Waukesha in blue are above the line. Marinette County sends a lot of students compared to the rest of northern and central Wisconsin. We reject the null hypothesis because there is a linear relationship.

Figure 7: Coefficients table with constant (B) 24.153 and the slope of 0.068. The significant is 99%.

Figure 8: Model Summary has the R squared value of 0.779.

Figure 9: UW Whitewater Population Distance from Campus.

Figures 10 and 11 are the regression analysis information. Figure 10 is the coefficient table where the (B) is located, 66.590 and the slope of 0.006. The significance level of 97.5% meaning that it is significant and the residuals should be mapped. Figure 11, Model Summary shows the R square value of .109 which is very low that there is very little relationship between the two variables. Figure 12, map of residuals of UW Eau Claire students under the age of 24. Eau Claire County in blue is the only county that is noticeably above the best fit line. There are a few green counties that are near Eau Claire County that are above the best fit or regression line. Milwaukee in red again is far below. We reject the null hypothesis because there is a linear relationship.

Figure 10: Coefficients table with the constant of 66.590 and the slope of 0.006. The significance level of .005 or 97.5%.

Figure 11: Model Summary showing the R squared value of .109.

Figure 12: Map of residuals of Eau Claire County under age 24.

Figures 13 and 14 are the regression information. Figure 13 is the coefficient table with the constant (B) of 16.284 and the slope of 0.016. The significance level is .000 or 99%. This shows that residuals should be mapped. Figure 14 is the model summary with the R squared value of .515 Figure 15, map of the residuals of students at UW Whitewater 24 and under. Rock, Jefferson, Walworth and Waukesha Counties are far above the best fit line, this could be due to the fact that the city of Whitewater falls on the county lines of Walworth, Jefferson and Rock Counties. The green is directly around the blue almost creating a buffer zone. It make a great deal of sense for the green area to be where it is surrounding the college area as many students could live off campus in neighboring counties. Milwaukee, Eau Claire, Lacrosse, Brown, Portage and Winnebago Counties are far below the best fit line.We reject the null hypothesis because there is a linear relationship.

Figure 13: Coefficients table with the constant (B) of 16.284 and slope of 0.016. Significance level of .000 or 99%.

Figure 14: Model summary table shows the R squared value of .515.

Figure 15: Map of Students 24 and under in UW Whitewater.

Figure 16 and 17 are the regression information. Figure 16 is the coefficients table with the constant (B) of -80.982 and the slope of 0.006 and the significance level of .104 or 89.6%. This is not significant enough of a result to map. Figure 17, shows the R square value of 0.037 it is very small and does not show a relationship between the two variables. We fail to reject the null hypothesis because there not a linear relationship.

Figure 16: Coefficients table showing the constant (B) -80.982 and the slope of 0.006 and a significance level of 0.104 or 89.6%.

Figure 17: Model Summary with R Squared value of 0.037.

Figure 18 and 19 are the regression information. Figure 18 is the coefficient table show the constant (B) of -579.631 and a slope of 0.022. The significance level of .000 or 99% which shows that the residuals should be mapped. Figure 19, Model Summary shows the R square value of 0.329 shows that there is very little relationship between the two variables. Figure 20, map of UW Whitewater Median Household Income in relation to the population. The blue counties Dane, Jefferson, Waukesha, Milwaukee, Rock and Walworth Counties are above the best fit line for these two variables. The red counties, St. Croix, Pierce, Outagamie, Calumet, Washington and Ozaukee are far below the best fit line and are in sets of two counties each near each other indicating that there could be a geographical relationship to income in similar areas. The counties are spread out across the state. Given that Milwaukee County is a generally poor region, one would assume the neighboring county and the others are relatively the same. We reject the null hypothesis because there is a linear relationship.

Figure 18: Coefficient table showing the constant (B) -579.631 and the slope of 0.022 and the significance level of 0.005 or 99%.

Figure 19: Model Summary shows the R square value of 0.0329.

Figure 20: Map of UW Whitewater Median Household Income and Population.

Conclusion:
When considering the statistics as well as the residual maps conclusions can be made about influential factors determining enrollment at different schools. Not only can the statistics determine which factors are statistically significant and have the most influence but they also provide information concerning the pattern and strength of the influence. This information is particularly helpful when used in relation to the residual maps, as certain significant factors of influence vary based on location. Overall, the statistics can provide the means to determine which factors are most influential, but the maps allow clearer interpretation of where each variable has the most influence. Some factors deemed the most influential in determining enrollment at different schools are more significant in some counties compared to others. Because of this different areas seem to be more influenced by one variable, and may not be as influenced by another.
Based on analysis of all the data, it is easy to determine that the most significant factor influencing enrollment at both university is distance. When considering how other significant factors influence enrollment at each university, the influence is not the same throughout the state. While some counties may be more influenced by the percentage of bachelor’s degrees other counties, specifically the ones closer to the university, are more influenced by distance.

The question that is being asked is from the variables that are available which variables help provide possible explanations as to why students choose what schools they do. I would say that the one variable that was chosen to look at, that best describes why students choose schools that they do is based on the number of people in each county is under the age of 24. I say this because in both UW Whitewater and UW Eau Claire’s maps made using this data show the greatest information on both below and above the best fit line that would be made. Although some of the data negates each other but I think an overall analysis of this piece shows why students go where they do. I think another think to keep in mind is that Milwaukee is below the best fit line in almost all the maps, this maybe do to the fact that Milwaukee is one of the most segregated places in the United States and that also takes into consideration the fact that there is a lot of poverty in that area as well.

Quantative Methods 370

Tuesday, December 1, 2015

Regression Analysis

No comments:

Post a Comment

Blog Archive