I am working on LDA (linear discriminant analysis), and you can refer to http://www.ccs.neu.edu/home/vip/teach/MLcourse/5_features_dimensions/lecture_notes/LDA/LDA.pdf .
My idea about semi-supervised LDA: I can use labeled data $X\in R^{d\times N}$ to computer all terms in $S_w$ and $S_b$. Now, I also have unlabeled data $Y\in R^{d\times M}$, and such data can be additionally used to estimate the covariance matrix $XX^T$ in $S_w$ by $\frac{N}{N+M}(XX^T+YY^T)$ which intuitively gets a better covariance estimation.
Implementation of different LDA: I also add a scaled identity matrix to $S_w$ for all compared methods, the scaling parameter should be tuned in different methods. I divide training data into two parts: labeled $X\in R^{d\times N}$, unlabeled $Y\in R^{d\times M}$ with $N/M$ ranging from $0.5$ to $0.05$. I run my semi-supervised LDA on three kinds of real datasets.
How to do classification: The eigenvectors of $S_w^{-1}S_b$ are used as the transformation matrix $\Phi$, then
Experiment results: 1) In the testing data, the classification accuracy of my semi-supervised LDA trained on data $X$& $Y$ is always a bit worse than the standard LDA trained only on data $X$. 2) Also, in one real data, the optimal scaling parameter can be very different for these two methods to achieve a best classification accuracy.
Could you tell me the reason and give me suggestion to make my semi-supervised LDA work? My codes have been checked. Many thanks.