0

This may not be the most technical question, but I was just interested, nonetheless...

How does a giant company like Google keep from having their code stolen by employees? Maybe I'm wrong, but I would assume that their source code to their search algorithms (amongst other things) would be valuable to their competitors (i.e. Microsoft).

I guess I can best phrase it like this:

What's keeping an unscrupulous employee who has sufficient clearance from accessing Google's code repository for a specific project and copying significant amounts of code to a flash drive and taking it to their competitors?

skaffman
  • 398,947
  • 96
  • 818
  • 769
ServAce85
  • 1,602
  • 2
  • 23
  • 51

6 Answers6

4

Fear of being sued?

Things within a company like Google are also compartmentalized. So not everybody has access to all code. If someone has access to code, you can bet that Google knows when they access it. I'm sure they have some kind of algorithm that looks and sees if somebody just downloads a lot of files very fast. The search algorithm isn't a small file obviously, it is a gigantic application.

All this would allow them to track who has stolen the code from within. There is also the fact that any self-respecting company or company with something to lose (i.e. Microsoft) would not take anything like this from somebody. They would probably even tell Google about it.

zsalzbank
  • 9,685
  • 1
  • 26
  • 39
  • 1
    Yes, I agree that this is the main motivator for most people. However, I kind of feel like there's always someone out there willing to risk it all for a big payday. – ServAce85 Dec 16 '10 at 02:19
  • 2
    And who would want to hire someone who has a reputation of leaking source? – a sandwhich Dec 16 '10 at 02:23
  • A big payday means a lot of attention, and a lot of attention often means a lawsuit. This is true *whether or not* you've actually violated the law. If you've committed massive copyright infringement, you're going to get sued and you're almost certainly going to lose. I also agree that many competitors would notify Google right away. – Matthew Flaschen Dec 16 '10 at 02:29
  • I agree with the last section, at least. I've actually heard that a company I worked with received a call from a competitor about stolen code. Fortune 500-class companies. – darron Dec 16 '10 at 02:36
  • "Things within a company like Google are also compartmentalized." Yes, but I heard it's only into two compartments, top secret code which is considered highly valuable IP, and most of their code, which is visible to all Google developers. (Three compartments, if you include their open source code published to the world.) – Robin Green Apr 28 '13 at 08:33
1

It is called protocol. The idea that only a few people get to know the code. In which then those few have to tell a major very embarrassing secret to the others. So then nobody can tell or else they get outed in the public. Which can be very simple like they like something, compared to as bashful as they are all the way to they killed somebody.

0

Many employers, including one that I've worked for, completely block flash drives.

In many cases, though, this is to protect non-technical confidential information.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
0

Companies that are serious about protecting their assets will have access logging on their core systems and active scanning to detect suspicious patterns. Similar security is implemented for employees of government agencies (e.g. tax, social security) holding sensitive personal information. Users who access data outside of their assigned cases can be flagged and investigated.

I suspect (but don't know) that similar scanning could be implemented in high value source code repositories.

Some organizations block the use of removable media (It has been reported that some agencies have reacted to Wikileaks with such policies), in some cases by physically gluing up the USB/media ports. This restricts potential thiefs to network transfers of material which can be scanned.

Jonathan Day
  • 18,519
  • 10
  • 84
  • 137
0

I think codethis hit the nail on the head. Some fly-by-night operation may be interested, but Microsoft, Yahoo, etc - wouldn't touch stolen code with a ten foot pole. And the fly-by-night wouldn't have the infrastructure. If you didn't tell anybody it was stolen - it's not like you could get away with walking in to a company with an entire spider/searching algorithm on your thumbdrive and declare you wrote it last week.

The bigger threat is details of the search algorithm getting out. SEOers, as a whole, are rather shady - and many would kill for solid facts about how the algorithm ranked or downranked pages. Even then, Google has demonstrated the ability to change their ranking algorithms so quickly that it wouldn't much matter.

On the other hand, Google doesn't have that much super-secret code. Most of their cool stuff (MapReduce et.al) is publicly available (see Hadoop). This question is probably more applicable to a company like Adobe. Some of their Photoshop algorithms are really cool, and would probably hurt them if they got out - but again, no legit company would touch it.

Robert
  • 6,412
  • 3
  • 24
  • 26
  • There are *papers* on MapReduce, but the code is *not* publicly released. The same is true for most of Google's infrastructure, including Google File System, Big Table, etc. There isn't even much documentation on the search algorithms. – Matthew Flaschen Dec 16 '10 at 02:33
  • I heard security was super tight at adobe before cs5 because the algorithm used for the delete and refill function in photoshop was partially developed by some college students. – a sandwhich Dec 16 '10 at 02:34
  • @Matthew - I couldn't find one way or the other if Google's *exact* code was released, but per Wiki: "MapReduce libraries have been written in C++, C#, Erlang, Java, Ocaml, Python, Ruby, F#, R and other programming languages." – Robert Dec 16 '10 at 02:38
  • That article should be more clear (I've added a link to Hadoop). Google's code is not released, period. There are available libraries, most notably [Hadoop](http://hadoop.apache.org/), based on the same general algorithm. – Matthew Flaschen Dec 16 '10 at 02:44
  • Apologies. When I said 'released', I took it as given that it could mean 'sufficiently described so as to be recreated'. All sorts of software is like this. I've edited my description to use the world 'available' - not from Google, but an essentially-conforming implementation in, as you noted, Hadoop. – Robert Dec 16 '10 at 02:47
  • Google does release a lot of free and open source code (Chrome, Android, Google Web Toolkit, Closure, Native Client, Tesseract, and more). It's just worth being clear that they don't release most of their server and infrastructure code. – Matthew Flaschen Dec 16 '10 at 02:54
0

I think companies such as Google will implement access control on their source code repository / version control system. So their employee would only be able to access source code in which they were involved. And their access could be revoked from previous repository if they're being assigned to different project. Its the same thing with normal internal documents, would a security-conscious company let documents be downloaded by any employee freely ?

YudhiWidyatama
  • 1,684
  • 16
  • 14
  • I agree. I guess I was just more interested in the actual methodology for stopping someone from leaking the sensitive data (assuming the person is credentialed). – ServAce85 Dec 16 '10 at 02:37