0

Here is the scenario:

We have a website with the capability for students to create an e-portfolio, which is like a profile page combined with your projects you can add to it.

For each student portfolio we are going to have an educator review the portfolio and give it a set of scores based on the content of the portfolio. So a set of scores which will be summed to a total score will be associated with each students portfolio.

So we have score data, associated with portfolio data and we want to use this data as supervised training data for a machine learning algorithm. So then the computer can examine thousands of these cases, looks for patterns, provide insight and be able to predict scores for other portfolios.

Here is the data we are collecting for each person:

**Portfolio data:**

About: 'Text paragraph data written by the student about themselves'
Skills: 'Text Bullet list of skills'
Career Interests: 'Text Bullet list of career interests'
Work Experience: 'Text paragraph'
Education History: 'Student fills out Universities, majors, gpa, and dates attended'
Courses: 'Text bullet list of courses'
Interests: 'Text paragraph data written by student about interests'
Works: 'Each student adds works to there portfolio and enter the following data'
   Work Title: 'Text title'
   Attachments: 'File and documents attached to the portfolio (jpg, doc, pdf, youtube, dropbox, etc.)
   Work description: 'Text Description of work'
   category of works: 'Selected from list of categories'
   tags: 'list of test tags student adds to work'
   My contribution: 'Text description of students contribution to project'


**Score data we are collecting for each portfolio, each key area rated from 1-100:**

Content completeness:
Selection of Works:
Reflection:
Academic Concepts:
Presentation and Appearance:
Layout and Readability:
Use of Multimedia:
Audience:
Organization of content:
Written Communication:
TOTAL SCORE:

We plan to collect thousands of students portfolios and scores over time. What kind of algorithm could we use to analyze this data to find correlation between portfolios that received similar scores? Then use this data to predict how successful a portfolio will be once a student has filled it out. Please let me know if any of this is confusing or if you need more information, thanks so much!

Nearpoint
  • 7,202
  • 13
  • 46
  • 74
  • 3
    All will depend on the features you use to describe the portfolios. The algorithm is of minor importance. – ziggystar Feb 01 '14 at 09:18
  • Sounds like a regression problem, but as ziggystar says, features will be most important – Ben Allison Feb 01 '14 at 09:20
  • I am new to machine learning feature selection. Are you saying that I need to pick out certain properties of data from the portfolio to use? Based on this problem can you provide me more insight on how I would tackle this problem? It seems tricky to me because the machine learning algorithm needs to compare text data... – Nearpoint Feb 01 '14 at 09:41

1 Answers1

1

There are a lot of issues you are trying to tackle here.

The first thing that comes to mind is to do feature extraction and then apply regression for predicting scores. Now since you're using more than just the text information from the portfolios you would need more than text features. I dont know what features'll help you correlate the "presentation and appearance" of the portfolio to their scores. One approach would be to get color, font, font-size information and represent them as features. For getting insights from the text you could use the vector space model for representing your text.

I shall get back and write a detailed answer soon. I am sorry if all of this sounds too vague right now.

Ganesh Iyer
  • 411
  • 5
  • 14
  • Thanks so much lastlegion! This is extremely helpful. I shall start looking into feature extraction and regression. Yah, I might leave out presentation and appearance score, or include the font and spacing information. But since the website mostly manages the appearance leaving it out would make sense. I will post updates here, and I look forward to reading any more information you can provide! Your really awesome thanks so much man! – Nearpoint Feb 02 '14 at 14:27
  • Also I was thinking would it make it easier to sum the set of scores into one total score and only associate the total score with each portfolio? Instead of giving the machine learning problem a set of scores, only give it one score and the portfolio data for each one...Or would more detailed score information help the analysis? I would think it would but maybe it would make the problem much more complicated... – Nearpoint Feb 02 '14 at 14:35