0

I have this dataframe in R. It has the structure of a pedigree dataframe, with the id, fid, mid and sex columns.

pedigree <- structure(list(id = c(212, 214, 263, 266, 273, 274, 275, 279, 
280, 281, 286, 287, 312, 313, 314, 315, 316, 317, 318, 319, 320, 
321, 322, 323, 324, 325, 326, 327, 332, 333, 334, 335, 336, 337, 
338, 339, 340, 341, 346, 347, 348, 349, 389, 390, 391, 392, 413, 
414, 415, 416, 466, 475, 476, 477, 478, 479, 480, 483, 486, 487, 
491, 492, 493, 494, 498, 501, 502, 506, 507, 508, 509, 510, 511, 
512, 513, 514, 518, 519, 542, 543, 544, 545, 546, 547, 551, 552, 
553, 554, 555, 556, 564, 565, 568, 569, 570, 575, 576, 579, 580, 
584, 585, 586, 589, 590, 593, 595, 596, 597, 598, 599, 614, 615, 
616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 653, 654, 662, 
663, 671, 672, 673, 674, 675, 676, 681, 682, 683, 684, 688, 689, 
693, 694, 695, 696, 697, 698, 701, 702, 703, 704, 709, 710, 715, 
716, 718, 720, 721, 722, 723, 724, 725, 726, 727, 730, 731, 736, 
737, 738, 739, 740, 744, 745, 842, 843, 874, 875, 884, 885, 886, 
887, 889, 890, 894, 895, 896, 897, 898, 903, 905, 906, 907, 908, 
909, 910, 911, 912, 913, 914, 915, 917, 925, 926, 927, 928, 929, 
931, 932, 936, 965, 999, 1000, 1006, 1007, 1041, 1043, 1044, 
1046, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1099, 1100, 
1101, 1321, 1322, 1368, 1551, 1552, 1553, 1554, 1555), fid = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 326, 326, 326, 326, 279, 320, 320, 320, 320, 320, 320, 
320, 320, 320, 324, 324, 324, 324, 322, 322, 322, 324, 324, 324, 
324, 324, 324, 324, 324, 324, 318, 318, 326, 326, 326, 326, 326, 
326, 326, 326, 326, 326, 326, 326, 332, 332, 287, 287, 287, 287, 
287, 286, 286, 346, 346, 346, 348, 348, 348, 326, 326, 326, 326, 
326, 332, 332, 320, 320, 320, 320, 320, 287, 346, 346, 346, 346, 
273, 273, 273, 273, 266, 334, 334, 334, 334, 334, 336, 336, 336, 
336, 336, 336, 334, 334, 334, 334, 334, 334, 338, 338, 338, 338, 
340, 340, 340, 338, 338, 334, 334, 334, 334, 334, 334, 334, 334, 
314, 314, 314, 314, 314, 314, 314, 312, 312, 0, 0, 286, 286, 
314, 314, 314, 314, 314, 314, 334, 334, 334, 334, 334, 389, 389, 
389, 389, 389, 389, 389, 389, 389, 389, 389, 389, 338, 332, 332, 
332, 332, 332, 332, 332, 346, 274, 391, 391, 391, 391, 0, 0, 
0, 0, 316, 316, 316, 316, 316, 316, 316, 316, 842, 842, 842, 
1041, 1041, 1041, 1043, 1043, 1043, 1043, 1043), mid = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 327, 327, 327, 327, 275, 321, 321, 321, 321, 321, 321, 
321, 321, 321, 325, 325, 325, 325, 323, 323, 323, 325, 325, 325, 
325, 325, 325, 325, 325, 325, 319, 319, 327, 327, 327, 327, 327, 
327, 327, 327, 327, 327, 327, 327, 333, 333, 212, 212, 212, 212, 
212, 214, 214, 347, 347, 347, 349, 349, 349, 327, 327, 327, 327, 
327, 333, 333, 321, 321, 321, 321, 321, 212, 347, 347, 347, 347, 
281, 281, 281, 281, 263, 335, 335, 335, 335, 335, 337, 337, 337, 
337, 337, 337, 335, 335, 335, 335, 335, 335, 339, 339, 339, 339, 
341, 341, 341, 339, 339, 335, 335, 335, 335, 335, 335, 335, 335, 
315, 315, 315, 315, 315, 315, 315, 313, 313, 0, 0, 214, 214, 
315, 315, 315, 315, 315, 315, 335, 335, 335, 335, 335, 390, 390, 
390, 390, 390, 390, 390, 390, 390, 390, 390, 390, 339, 333, 333, 
333, 333, 333, 333, 333, 347, 280, 392, 392, 392, 392, 0, 0, 
0, 0, 317, 317, 317, 317, 317, 317, 317, 317, 843, 843, 843, 
1044, 1044, 1044, 1046, 1046, 1046, 1046, 1046), sex = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 
2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L), levels = c("1", "2"), class = "factor")), row.names = c(NA, 
-234L), class = c("tbl_df", "tbl", "data.frame"))

This is the structure, where there are 234 individuals:

str(pedigree)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   234 obs. of  4 variables:
 $ id : num  212 214 263 266 273 274 275 279 280 281 ...
 $ fid: num  0 0 0 0 0 0 0 0 0 0 ...
 $ mid: num  0 0 0 0 0 0 0 0 0 0 ...
 $ sex: Factor w/ 2 levels "1","2": 1 1 1 2 2 2 1 2 1 1 ...

I am trying to do a pedigree analysis by using pedtools.

In order to convert this dataframe into a ped object, I use this as.ped(pedigree) function.

However, I see this malformed pedigree information:

as.ped(pedigree)
Error: Malformed pedigree.
 Individual 287 is female, but appear as the father of 568
 Individual 212 is male, but appear as the mother of 568

I checked the ids 568, 287 and 212, but everything is properly assigned. This means that 287is the mother of 568 (it is included in fid) and similarly with 212, who is the father of 568 (and is included in mid).

As a convention, 1 refers to males and 2 to females.

What might be happening?

antecessor
  • 2,688
  • 6
  • 29
  • 61

2 Answers2

0

I checked the ids 568, 287 and 212, but everything is properly assigned. This means that 287is the mother of 568 (it is included in fid) and similarly with 287.

Looking at your dataset, the record for 568 states

  A tibble: 1 x 4
     id   fid   mid sex  
  <dbl> <dbl> <dbl> <fct>
1   568   287   212 1 

287 is in the fid column, not the mid column as you state. There is an error somewhere in the data (either fid and mid have been switched here, or the sex value of 287 and 212 have been swapped)

Edit: On further inspection, several records indicate 287 as the father and 212 as the mother, specifically:

# A tibble: 6 x 4
     id   fid   mid sex  
  <dbl> <dbl> <dbl> <fct>
1   568   287   212 1    
2   569   287   212 1    
3   570   287   212 2    
4   575   287   212 1    
5   576   287   212 2    
6   621   287   212 2   

This may indicate the sex values for 287 and 212 are incorrect (rather than fid and mid being swapped across several records), but you will need to examine your data source (or processing pipeline) to confirm

KSkoczek
  • 76
  • 5
  • I know it's been a while, but I realized I had a mistake in the question above ("I checked the ids 568, 287 and 212, but everything is properly assigned. This means that 287is the mother of 568 (it is included in fid) and similarly with 212, who is the father of 568 (and is included in mid)." Do you have any idea? – antecessor Apr 03 '23 at 12:21
  • What exactly was the mistake? The issue is stemming from the as.ped() function expecting a father id to correspond to a male record, and a mother id to correspond to a female record. It appears either the sex values for 287 and 212 are incorrect, or they have been listed incorrectly in 'mid' and 'fid' columns for other records. The problem is in the data, not the function, so without knowing the individuals to which the data corresponds, we can't help any further here. – KSkoczek Apr 04 '23 at 13:15
  • The data of the individuals are in the very first object known as `pedigree`. If you see the structure, `str(pedigree)`, you see what I updated in the question. It is composed of 234 individuals where their fathers, mothers, and sex are known – antecessor Apr 04 '23 at 15:26
  • 'mid' should contain the id of the mother, and 'fid' should contain the id of the father, according to the documentation. Type ?as.ped into the command line to see this. Swapping the names of 'mid' and 'fid' columns does appear to solve the error (see Laura's answer). BUT you will need to make sure this is correct for the individuals your data refers to. If 287 is truly the mother of 568, then as.ped() expects the 'mid' value for id 568 to be 287. The record for id 568 shows a 'mid' value of 212. You can view this with subset(pedigree, id==568) – KSkoczek Apr 04 '23 at 15:52
0

The problem is that males (1) are assigned as mothers (2) and females are assigned as fathers. R only returns the error for the first case it evaluates.

You can rename using colnames and then run the code:

colnames(pedigree) = c("id", "mid", "fid", "sex")
as.ped(pedigree)

You can change the name in the df directly too.

Laura
  • 24
  • 1