2

I'm playing around with historical stock market analysis in Perl. One aspect deals with analyzing the accuracy of research firms past stock ratings. The most rudimentary rating scale would be Buy, Hold, Sell. However many of these firms use different terminology, some with more than 3 points on their scale.

What I have is a list of thousands of upgrade/downgrades issued by hundreds of different firms (from Yahoo Finance) that looks something like this:

Action      From      To
==================================================
Upgrade     Add           Buy
Downgrade   Add           Hold
Upgrade     Hold          Add
Downgrade   Buy           Outperform
Upgrade     Hold          Outperform
Downgrade   Hold          Reduce
Upgrade     Add           Outperform

So basically it's a list of comparisons like A > B, D < C, B > C, D < A

What I need for each research firm, taking in a long list of these comparisons, is an ordered list that looks like this:

A > B > C > D > E

I've given this problem a lot of thought and can't come up with a solution. If each upgrade/downgrade only jumped one increment, I think I could do it but I can't wrap my head around how to insert a comparison like C < A, where it jumps two increments.

Anybody have any ideas?




Update:

Thanks @ikegami. I tested with the original data and you are correct.

I also ran some data through Graph::Easy, which renders graphs.

Code:

use Graph::Easy;
my $graph = Graph::Easy->new( );

# Note that these are all in 'Upgrade' direction
$graph->add_edge ('Hold', 'Add');
$graph->add_edge ('Hold', 'Buy');
$graph->add_edge ('Hold', 'Outperform');
$graph->add_edge ('Buy', 'Outperform');
$graph->add_edge ('Reduce', 'Hold');
$graph->add_edge ('Add', 'Buy');

print $graph->as_ascii( );

Output:

               +------------------------+
               |                        v
+--------+     +------+     +-----+     +-----+     +------------+
| Reduce | --> | Hold | --> | Add | --> | Buy | --> | Outperform |
+--------+     +------+     +-----+     +-----+     +------------+
               |                                    ^
               +------------------------------------+

Here's a graph with some clear ambiguities. Perhaps I can somehow use both modules (or some feature of Graph) to test for ambiguities.

+------------------------------+
|                              v
+--------+     +---------+     +-----+
| Reduce | --> | Neutral | --> | Buy |
+--------+     +---------+     +-----+
                 ^               ^
                 |               |
                 |               |
               +---------+       |
               |  Sell   | ------+
               +---------+
Community
  • 1
  • 1
calyeung
  • 133
  • 7
  • Can I assume that `A > B > C > D > E` will exist in each dataset? – Dave Jan 19 '12 at 21:24
  • In your example, it's ambiguous whether Outperform should be higher or lower than Add. – Sean Jan 19 '12 at 21:30
  • @Sean: Sorry, I've added some data to the chart. I was aiming for brevity but now see that it didn't provide all the information needed. – calyeung Jan 19 '12 at 22:22
  • @Dave: Each set could be comprised of any number of ratings. The minimum would be 3 up to perhaps 5 or 7. – calyeung Jan 19 '12 at 22:25
  • I'm more worried about a set being complete. In other words; are you sure the full hierarchy of upgrading/downgrading for a given data set exists, or could there be holes/inconsistencies in the chain (like @ikegami pointed out)? – Dave Jan 19 '12 at 22:57
  • @Dave: I've got about 14 years of data and thousands of ups/downs for each firm. I can't say with 100% certainty but my guess is that a full hierarchy can be determined. – calyeung Jan 19 '12 at 23:10
  • And are you comparing data from different firms in one go or will each firm's data be handled discretely? – mwp Jan 20 '12 at 00:01

1 Answers1

5

I don't use graphs all that often, but this code (using the Graph module) seems to do the job:

use Graph;
use strict;

my $graph = Graph->new;

while (<DATA>) {
    my ($dir, $x, $y) = split;
    if ($dir eq 'Downgrade') {
        ($x, $y) = ($y, $x);
    } elsif ($dir ne 'Upgrade') {
        die qq(Unknown direction "$dir"\n);
    }
    $graph->add_edge($x, $y);
}

$graph->is_dag
    or die "Graph has a cycle--unable to analyze\n";
$graph->is_weakly_connected
    or die "Graph is not weakly connected--unable to analyze\n";

print join(' < ', $graph->topological_sort), "\n";

__DATA__
Upgrade     Add           Buy
Downgrade   Add           Hold
Upgrade     Hold          Add
Downgrade   Buy           Outperform
Upgrade     Hold          Outperform
Downgrade   Hold          Reduce
Upgrade     Add           Outperform

This prints Reduce < Hold < Add < Outperform < Buy.

ikegami
  • 367,544
  • 15
  • 269
  • 518
Sean
  • 29,130
  • 4
  • 80
  • 105
  • Thank you for the introduction to graph theory. This is a very elegant solution. – calyeung Jan 20 '12 at 02:12
  • Note that this doesn't detect ambiguities; it just returns one possible ordering. For example, it gives `Hold < Outperform < Add < Buy` for the original input even though the answer might be `Hold < Add < Outperform < Buy`. – ikegami Jan 20 '12 at 02:15