Pitcher ID: 506433
Pitcher Name: Darvish Yu
Step One: Feature Selection
First we need to choose the features (pitchfx attributes) which will be taken into clustering procedure. From former researches and my quick inspection on the data set, the selected features that should affect a pitch type, are listed as following:
- Release point (x0, y0, z0)
- Acceleration (ax, ay, az)
- Initial Velocity (vx0, vy0, vz0)
- Deviation on x and y axises (pfx_x, pfx_y)
- Spin direction and spin rate
- Break angle, break length (on x-axis), and break_y(on y-axis)
and the 16 features should cover all aspects of a ball.
Step Two: Dimension Deduction
By examine the data set with pitchfx pitch type as reference(shown as below, the bottom line shows numbers of pitch), some rarely pitched types can be perceived.
> table(r$X.pitch_type) CH CU FA FC FF FS FT IN PO SL 1 303 1 572 977 193 642 4 5 657
Obviously, CH, CU, IN, PO are rarely pitched types or misclassified ones in pitchfx system, in this case, pitches of these types are regarded as noises and will be removed. After noise elimination, the pitch types with numbers are:
> table(r$X.pitch_type) CU FC FF FS FT SL 303 572 977 193 642 657
princop calculates the covariance matrix and takes its eigenvalues, while prcomp use a different technique called "singular value decomposition".
Step Three: DBSCAN
The result of DBSCAN shows as follows:
dbscan Pts=3344 MinPts=20 eps=0.5 0 1 border 228 256 seed 0 2860 total 228 3116 dbscan Pts=3344 MinPts=20 eps=0.5 0 1 border 228 256 seed 0 2860 total 228 3116 dbscan Pts=3344 MinPts=15 eps=0.33 0 1 2 3 4 5 border 543 225 116 77 12 14 seed 0 1917 307 127 3 3 total 543 2142 423 204 15 17