Other Benchmarks
I found an interesting benchmark here on VPS Benchmarks. Note that they have unfair graphs that don't include 0 on the scale, so the graphs are pretty much useless. The numbers behind the tests seem fine.
Their test compares a AWS t2.small (1 core, 2GB RAM) with an GCE n1-standard-1. The t2 instances aren't a great comparison for the n1 standard, they have burstable CPU performance compared with GCE having constant CPU, but it's the only suitable test I can find.
The t2 instances are reputed to run on older AWS hardware (m1 generation), whereas the M3/M4 AWS instances which are newer. The GCE test was done a lot more recently as well.
Individual Tests
These all refer to the test linked above.
The CPU test is close, within 3%.
File IO random read isn't close at all. AWS has 24Mbps and GCE at 1787Mbps. I know that in AWS your I/O is closely related to your instance type, small instances get a lot less I/O than large instances. Given this huge discrepancy, and the other tests being roughly similar, I would want to see this retested before I trust the numbers. I have read that GCE does do very well for Network I/O. It could also be that the GCE test was done with local SSD and the AWS test done with network attached storage.
Other IO test are roughly similar. Sometimes AWS is higher, sometimes GCE is higher, but there's no clear winner.
Memory tests are roughly similar, with AWS edging out Google.
Notes
Any single test on any instance on any provider could come in low for a wide variety of reasons. Over-provisioned hardware, a noisy neighbour taking more than their share of resources, and CPU Stealing are just a few examples.
A good test would use a variety of tests (CPU, I/O, memory, etc), and would be run on at least three separate virtual machines.
Conclusion
AWS and GCE seem to perform roughly similar on these reasonably well documented test, even though instance types and hardware are quite different.
I would like to see @StanHou do significantly more robust, well documented tests to compare performance rather than rely on what could be a single test on a single instances.