Edited May 20, 2020 at 12:26 p.m.
Quoting: icehawk2006
Lololol. You’re taking basic data and manipulating it to fit your own narrative, talk about throwing up. Stats 101: weighting certain statistics to drive a result you want to see, bad. Here’s an idea, also from stats 101, let the data speak for itself. Also stats 101, if your output shows a vastly different picture than what the raw data is telling you, your model has a significant error somewhere. You’re model not only disagrees with the raw data, it disagrees with just about every assessment made by every expert in the field. Also basic research 101, if your model won’t pass peer review, it’s wrong. Your model will not pass peer review.
You really need to give the advanced stats gig up. The fact you don’t understand that corsi is an individual players impact on team shooting rates, same for fenwick, etc is largely hilarious. You even said it in your own words that it’s an individual stat.
Isolated stats are the only way to properly address player performance in comparison to other players! That’s why it’s exceptionally important to NOT include stats like team 5v5 sv%, because the stat is almost exclusively influenced by the goaltenders ability to make a save!! This is not rocket science. Ignoring isolated stats, while ranking Vlasic in comparison to other’s (that’s what we call an isolated comparison by the way)
All of my stats were even strength referenced. You’re telling me just because I don’t like it... while I don’t like or dislike it, it’s just wrong, and comically so, and you don’t like that, based on the basics, I’m telling you it’s entirely wrong. That’s some straight up irony right there.
The reference to hockeyviz was to get you to give up this fool’s errand and go to a proven source. No need to reinvent the wheel. Also saying “it doesn’t have what I want”, bad statistician, basically an admission you’re looking for a specific result. Yikes. Their “fancy” charts, all exclusively numbers based and have numbers based axis, from an experienced statistician, yeah totally not what you need. Yikes.
So I’m not going to put my own stuff together because 1) not rocket science 2) I do this kind of analytics every damn day at work all day 3) there’s no reason to with dozens of sources everywhere 4) the raw data is almost exclusively simple enough and has enough integrity that you don’t need to meld them into a model to get a picture of darn good fidelity.
If this is a university project you will fail. I am not joking, find something else.
1. So you know, since I've continuing to work on this, the fundamental problem was actually that oZS was being drastically over-scaled. That whole point was to remove the correlation between CF% and oZS, which you will be glad to know changes many scores.
2. I have since stoppped using oiS and oiSV and started using SCF% I didn't do this because its relation to the goalie, as both stats were used rel to the team. I changed it because of the low sample size.
3. It was a small project, got an 87% but don't plan to stop working on it.
4. Maybe I was too harsh on hockeyviz, some of their graphs are useful, but I was really wanting to see the actual numbers, because I am interested in creating various metrics, which obviously I can't do by judging the thickness of the bars. There is no me trying to find a specific result, after all, its the same data, Natural stat trick is more of the site I'm looking for, they have both graphics and the actual data to those graphics. Natural stat trick has by game all the data one could ask for. There is some useful visuals on hockeyviz, notably shooting maps, but I don't believe they paint the full picture.
5. I think we are not really connecting on individual stat, my understanding was that you were not referring to on ice stats (CF, FF, etc.), which are usually categorized differently than individual stats (iCF, P, etc).
6. I started this project after getting frustrated with looking at
https://frozenpool.dobbersports.com/frozenpool_playerusage.php, which seems to be a very helpful tool, but I couldn't stand that there was no scaling on Corsi, which its common sense that if you start in the O zone more you are going to have a higher CF% than if you started more in the D zone. You could also see that for many players their GF% was drastically different than their CF%, even if the goalies are very different. But obviously GF% doesn't really have a very good sample size, especially for players who only played a few games, one of the reasons that I switched away from oiS and oiSV, so I went with SCF% because typically there is around 9x more data.
newScore2 here is very simply CF%*SCF%/50. Obviously there is a small correlation between them, enough to change many scores.
This was my solution, if you can tell me if there is a problem with that, I would actually appreciate it.
the next step would be to scale it by the quality of competition, newScore + sigma[playerScore*TOI]/(Total TOI) - 50.
since this is what this whole deep dive down this stupid ass rabbit hole started from, the point is basically to solve what is wrong with this graph, for example, Thornton has a pretty good 53% CF, but he starts more in the oZone and faces easier competition than the rest of the team, so obviously his score should be worse than that, while Vlasic for example starts with a 49% CF but plays in the Defensive zone more and faces harder competition, and then there is just the fundamental problem of that Corsi (and Fenwick) don't consider the quality of the shots, hence the inclusion of SCF% and previously the rel difference of oiS and oiSV.
still no clue where you got the idea that I'm purposefully weighing things differently to fit a narrative... but you do you.
Unfortunately, I have not seen many things like this, especially with zone start scaling, and surprisingly quality of competition stats (QoC) seems to be hard to come by as well. I'm not re-inventing the wheel, im using it... its not rocket science.