0

Hi buddies, now I'm curious about mining API Usage Patterns. I've met a series of questions. Hope to get help from you.

Based on the papers I've read, a program should analysis many project codes to extract program information and build index like search engine.

The question is: How to write a program to analysis project codes online in Github? Should I write a script to download all the project of interest and then analysis? Or use certain APIs to download them?

Thanks a lot.

Chuck
  • 76
  • 6
  • That's kind of a big project. Simply detecting an API or an API call in *one* language means you have to parse the language; knowing something about the API probably requires name and type resolution. These are enormous projects in their own right, before you even attempt to read somebody's giant database of examples in dozens of languages and analyze them. Why do you think this is practical for you to do? – Ira Baxter Mar 10 '16 at 12:26
  • Curiosity killed the cat. I'm not sure if it's practical for me but I have a plan: Firstly, I should have many projects source code for certain language, let's say java. I think there should exist analysis tools and I don't need to make it myself. So, secondly, I extract program information such as class name, function name and their position. Finally, I could make use of inverse index or other approach to support search. Probably, it's a basic and slow one but continuously being improved later. So I'm stuck in trouble in the first step now. – Chuck Mar 11 '16 at 08:36
  • "mining API usage patterns" is much more than "search for function name" in my mind. Interactive searching not seriously a tool for building what I consider to be (semi) automated analysis tools; all the work ends up getting done by you. If *all* you want to do is search for identifiers, you can use a lexer to tear apart the source files into identifiers, and then index all the positions. To do that in an organized way for multiple languages is actually more difficult than you'd expect. (See "Source Code Search Engine" via my bio which does exactly this). – Ira Baxter Mar 11 '16 at 08:43
  • ... to do this with GitHub, you need to treat GitHub as a kind of file system from which you can fetch files of code text. GitHub obviously has APIs for you to fetch members. Whether you fetch and index them one at a time, or fetch them all in a giant batch to a local file system before processing is just a design choice. (You probably want to think about performance of the options; there's an awful lot of code in GitHub and processing is likely to take a long time no matter what you do; I personally think the extra copy to the local file system would be pointlessly expensive). – Ira Baxter Mar 11 '16 at 08:46
  • All right, since many researches I've read were directly utilised results of search engine, I just think about Source Code Search Engine may be the first step to understand api mining work. Well, I think you're right, I should treat GitHub as a kind of file system if it can be. Thanks a lot. – Chuck Mar 11 '16 at 09:04

0 Answers0