<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>dimension-reduction | Dhafer Malouche</title><link>https://dhafermalouche.net/tag/dimension-reduction/</link><atom:link href="https://dhafermalouche.net/tag/dimension-reduction/index.xml" rel="self" type="application/rss+xml"/><description>dimension-reduction</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>Dhafer Malouche © 2026</copyright><lastBuildDate>Thu, 30 Apr 2026 00:00:00 +0000</lastBuildDate><image><url>https://dhafermalouche.net/media/icon_hu294da7f24af66942b94b8e240e33fe59_2153342_512x512_fill_lanczos_center_3.png</url><title>dimension-reduction</title><link>https://dhafermalouche.net/tag/dimension-reduction/</link></image><item><title>StatPCA — Principal Component Analysis Workbench</title><link>https://dhafermalouche.net/apps/statpca/</link><pubDate>Thu, 30 Apr 2026 00:00:00 +0000</pubDate><guid>https://dhafermalouche.net/apps/statpca/</guid><description>&lt;p>A browser-only teaching workbench for the most-used dimension-reduction technique in applied multivariate statistics: &lt;strong>Principal Component Analysis&lt;/strong>. &lt;strong>StatPCA&lt;/strong> completes the family of teaching tools developed for undergraduate and graduate statistics at Qatar University, alongside &lt;strong>StatTables&lt;/strong>, &lt;strong>StatTests&lt;/strong>, &lt;strong>StatRegress&lt;/strong>, &lt;strong>StatCI&lt;/strong>, &lt;strong>StatPower&lt;/strong>, and &lt;strong>StatCorr&lt;/strong>.&lt;/p>
&lt;h2 id="why-a-pca-workbench">Why a PCA workbench?&lt;/h2>
&lt;p>Principal Component Analysis is often presented as a one-line recipe — &amp;ldquo;decompose the correlation matrix and keep the first few eigenvectors&amp;rdquo; — and the geometric, algebraic, and inferential layers of the method are collapsed into a single black-box call. &lt;strong>StatPCA&lt;/strong> keeps the layers separate and visible: the &lt;em>data&lt;/em> panel makes the centering and scaling step explicit, the &lt;em>eigen-decomposition&lt;/em> panel reports eigenvalues and eigenvectors of the chosen matrix, the &lt;em>variance&lt;/em> panel reports the scree plot and the cumulative proportion of variance explained, and the &lt;em>projection&lt;/em> panel renders the score plot, the loading plot, and the biplot on coordinated axes.&lt;/p>
&lt;h2 id="what-the-app-does">What the app does&lt;/h2>
&lt;p>&lt;strong>Input.&lt;/strong> Paste a CSV with $p \geq 2$ numeric variables, load one of the bundled teaching datasets (e.g., the classical decathlon, USArrests, or iris), or generate synthetic correlated data with a user-specified covariance structure. Categorical or grouping variables are kept aside and used only to colour the score plot.&lt;/p>
&lt;p>&lt;strong>Pre-processing options.&lt;/strong> Mean-centering is applied by default; the user toggles between PCA on the &lt;strong>correlation matrix&lt;/strong> (each variable scaled to unit variance) and PCA on the &lt;strong>covariance matrix&lt;/strong> (variables left on their original scale). Missing values are handled by listwise deletion or by mean-imputation, with both options reported next to the result.&lt;/p>
&lt;p>&lt;strong>Quantities reported.&lt;/strong> For every fit the app returns:&lt;/p>
&lt;ul>
&lt;li>the &lt;strong>eigenvalues&lt;/strong> $\lambda_{1} \geq \lambda_{2} \geq \cdots \geq \lambda_{p} \geq 0$ of the chosen matrix, with their proportion $\lambda_{k}/\sum_{j}\lambda_{j}$ and cumulative proportion;&lt;/li>
&lt;li>the &lt;strong>loadings matrix&lt;/strong> $\mathbf{V} = (v_{jk})$ with columns equal to the eigenvectors of the chosen matrix; loadings are reported on the unit-norm scale and on the &lt;strong>correlation-with-component&lt;/strong> scale $v_{jk}\sqrt{\lambda_{k}}$, so that the user can read off the linear association between each original variable and each component;&lt;/li>
&lt;li>the &lt;strong>scores&lt;/strong> $z_{ik} = \sum_{j} v_{jk},(x_{ij}-\bar x_{j})/s_{j}$ of each observation on each component;&lt;/li>
&lt;li>the &lt;strong>communalities&lt;/strong> and the &lt;strong>squared cosines&lt;/strong> $\cos^{2}_{ik}$, which quantify how well each observation is represented in the chosen low-dimensional subspace.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Visual output.&lt;/strong> The app renders three coordinated plots:&lt;/p>
&lt;ol>
&lt;li>the &lt;strong>scree plot&lt;/strong> with the broken-stick and Kaiser ($\lambda &amp;gt; 1$) reference lines superimposed, so that the choice of the number of retained components is grounded in an explicit rule rather than visual judgement alone;&lt;/li>
&lt;li>the &lt;strong>score plot&lt;/strong> of observations on $(\text{PC}&lt;em>{k}, \text{PC}&lt;/em>{\ell})$, with optional colouring by a grouping variable and confidence ellipses per group;&lt;/li>
&lt;li>the &lt;strong>biplot&lt;/strong>, which overlays the loading vectors on the score plot using the standard scaling so that the cosine of the angle between two arrows approximates the correlation between the corresponding variables.&lt;/li>
&lt;/ol>
&lt;h2 id="pedagogical-use">Pedagogical use&lt;/h2>
&lt;p>StatPCA is designed for the lecture in which PCA is introduced and for the practical that follows it. Three exercises map naturally to the app:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Standardisation matters.&lt;/strong> Run PCA on the covariance matrix of a dataset whose variables are on incompatible scales (e.g., heights in cm and weights in kg), then re-run on the correlation matrix and watch the dominant component swap. Discuss when each choice is appropriate.&lt;/li>
&lt;li>&lt;strong>How many components?&lt;/strong> Compare the Kaiser rule, the broken-stick rule, and the elbow-on-the-scree-plot rule on the same data. Show that they need not agree, and connect the disagreement to the eigenvalue spectrum.&lt;/li>
&lt;li>&lt;strong>Interpreting the axes.&lt;/strong> Use the loadings (on the correlation-with-component scale) to label the principal axes in substantive terms; use the squared cosines to flag observations that the two-dimensional summary represents poorly.&lt;/li>
&lt;/ol>
&lt;h2 id="technical-notes">Technical notes&lt;/h2>
&lt;p>The app is a single-page client-side application built with &lt;strong>React + Vite&lt;/strong>: all computation runs in the student&amp;rsquo;s browser, with no server round-trip and no data leaving the device. The eigen-decomposition is performed by a numerically stable QR-based routine on the symmetric correlation/covariance matrix; for the score plot the app uses the singular value decomposition of the centered (and optionally scaled) data matrix, which avoids forming and squaring the cross-product matrix when the number of variables is large. The static bundle is deployed on Netlify; like its siblings it works offline after first load and has no external run-time dependencies.&lt;/p></description></item></channel></rss>