1

I have previously used SPSS to perform a factor analysis on sets of data. I collect data using PHP hooked up to an SQL database. Is there a way to perform a factor analysis of data directly in PHP or SQL?

I found that this question (SPSS and PHP/MySQL Integration) addresses how to hook SPSS and SQL together so they share information but I wondered if it were possible to directly implement. There seem to be a number of functions in the PHP stats library (http://php.net/manual/en/ref.stats.php) but the documentation is mostly incomplete.

Community
  • 1
  • 1
  • I've not heard of the stats extension before, but you _might_ get some clues as to how to use it from browsing the source code. Not ideal, but it may be better than nothing. – halfer Jan 11 '15 at 23:33
  • Databases are not really designed for advanced statistical analysis. You could write your own routine. More commonly, a statistics package such as SAS, SPSS, R, Matlab, or even Python is used to read the data and do the analysis. – Gordon Linoff Jan 11 '15 at 23:41
  • @halfer Most of the documentation in the stats package doesn't exist. I've read a few posts about it on SO but they seem to refer to delving into the C libraries. –  Jan 11 '15 at 23:46
  • Fair enough. You could alternatively use something at the command line (see Gordon's suggestions) and then parse the output in PHP. If it is particularly heavy-duty (i.e. long-running) then add it to a queue. – halfer Jan 11 '15 at 23:48
  • @GordonLinoff I've used SPSS before by exporting the data then performing the analysis locally. R seems to be the best choice for a stats language, I just wanted something to automate the current process. Thanks for the help. –  Jan 11 '15 at 23:53

1 Answers1

0

I also don't know the stats package (SVN source code, I couldn't find anything with a simple search for fact that looks useful to your problem) but it should be fairly simple to implement this in PHP if you understand the algorithm and/or have a source code of another implementation. Of course this task would also help you to fully understand the algorithm, since programming it automatically teaches you. The most important thing is to use the proper math functions to ensure highest precision (e.g. BC or GMP).

A source code from another implementation can be found in the scikit learn Python project: documentation and source code.

My answer may not be a good answer in terms of providing a direct solution, but it is the best answer I can give you right now. On a last note, I think this is a challenging and interesting task, but Stackoverflow is not meant to provide developers to code something for someone.

PS: Forget about SQL, it wasn't designed for such tasks in the first place and shouldn't be used for them.

Fleshgrinder
  • 15,703
  • 4
  • 47
  • 56
  • Just found [this question](https://stackoverflow.com/questions/3978580/) in the related questions, seems like a lot of people are searching for statistical libraries. Sounds like a very interesting open source project to start in PHP for me. ;) – Fleshgrinder Jan 12 '15 at 00:19
  • Thanks for your response. I haven't accepted as I don't think it answers the question, but it's helpful and points me in the right direction. I'm not expecting someone to write my code for me, but I am trying to avoid reprogramming a function that might already exist. I certainly take your point that SQL is not the tool for the job. –  Jan 12 '15 at 23:22
  • That is okay. I guess most people still use other programming languages because they think PHP is not suitable for such jobs and that this is the reason why no library exists. I found [this library](https://github.com/mcordingley/PHPStats) but then again, it does not feature the factor analysis. Have you seen [this answer regarding RCurl for PHP](http://stackoverflow.com/a/5122799/1251219)? – Fleshgrinder Jan 13 '15 at 12:57