1

I am working with the Unified Medical Language System (UMLS) database within a local copy of MySQL. I am currently attempting to understand the data structures via this webpage: http://www.ncbi.nlm.nih.gov/books/NBK9685/

I am attempting to identify concepts in CUI2. However, I am confused by this result:

SELECT y.* FROM mrrel r INNER JOIN mrconso y ON r.cui2 = y.CUI;

Exactly 1 result. Rather than wildly speculate—is it my sql? is it an error on the tables? did I misunderstand the schema?—I would really appreciate some feedback from the UMLS community. How do I correctly link MRREL.cui2 to other tables within the database? Is there a COMPLETE data model published somewhere?

Thanks!

Brian Dolan
  • 3,086
  • 2
  • 24
  • 35
  • For posterity, I have found a clue. The MRREL, MRSTY and MRSCONSO tables all had leading "\n" characters in the CUI fields. In MRREL only in CUI1, not CUI2. I will comment again if that solves the problem. I fixed that by using http://stackoverflow.com/questions/1504962/how-to-remove-new-line-characters-from-data-rows-in-mysql – Brian Dolan Jun 27 '15 at 21:06

1 Answers1

2

Brian,

Not knowing anything about how you created your subset, I modified your query a bit and ran it against a full Metathesaurus subset from the 2015AA release. SELECT count(DISTINCT y.cui) FROM mrrel r INNER JOIN mrconso y ON r.cui2 = y.CUI ; I got back 3074046 unique CUIs. Perhaps you can send me your mmsys.log file to make sure that your subset completed properly? Also, you can several ER diagrams and sample queries on our website that could help in understanding the RRF data model. Also, how many total rows are in your MRREL table?

user3653270
  • 166
  • 1
  • 8
  • I am running on 2015AA. I have 27,612,106 in mrrel. This query still brings nothing, but I'm making sure I have no more leading/trailing characters. – Brian Dolan Jun 27 '15 at 21:38
  • Another clue, the MRCONSO table has all but one CUIs as bit_length 56, where all but one CUI2 of MRSTY is > 56. select count(*) from mrrel where bit_length(cui2) > 56; Playing with that, only one of MRCONSO.CUI should match MRSTY.CUI2. I'm afraid I don't understand the data model. – Brian Dolan Jun 27 '15 at 21:49
  • With the help of user3653270 I did a complete reinstall. The newlines and bit length problems were corrected. – Brian Dolan Jun 28 '15 at 15:21