2

been wracking my head trying to replicate a solution to a sql exercise on pandas. the exercise contains creating 1 percentile bins using the NTILE function in order to calculate some metrics.

source Dset looks like this and the percentile i want to divide by is the measure_value column :

[source df]

1

this is what i want to get to (first create the "Ntile" equivalent and then group by it, agging with min max and count).

[destination df]

2

tried using qcut but encountered an error because of similar values.

code attempted :

cuts = 100
weightdf = healthdf[healthdf["measure"] == "weight"].copy()
weightdf["Ntile"] = pd.qcut(weightdf["measure_value"].sort_values(), cuts , labels = range(1, 
cuts + 1))

this created a " Bin edges must be unique" error, changing the duplicates argument didnt work. hopefully someone can help, i'm certain there is a way to solve it. thanks!

edit target column output in response to david kaftan's request

{0: 46.03959,
 7: 129.060012817,
 84: 114.758776,
 97: 39.916096,
 132: 114.8494944,
 133: 69.39963261,
 139: 73.93555631,
 143: 73.93555631,
 148: 75.069537235,
 149: 68.0388,
 177: 74.389088,
 181: 34.926582,
 188: 115.1670088,
 189: 75.29633342,
 202: 115.9381152,
 212: 34.0194,
 214: 89.819007874,
 216: 91.295005798,
 218: 65.317248,
 220: 80.28584949,
 222: 123.1,
 223: 123.6,
 224: 123.0,
 225: 122.7,
 229: 122.7,
 230: 121.0,
 231: 121.7,
 243: 64.9543744,
 244: 65.0450928,
 257: 38.55532,
 261: 64.6822192,
 265: 79.151868565,
 267: 79.151868565,
 271: 93.0,
 272: 91.4,
 279: 63.50288,
 282: 128.820128,
 301: 36.28736,
 304: 63.049288,
 346: 65.77084,
 369: 119.8390064,
 388: 64.863656,
 396: 78.698276195,
 400: 79.151868565,
 401: 79.151868565,
 415: 78.698276195,
 416: 78.698276195,
 419: 78.92507238,
 420: 78.47148001,
 424: 78.698276195,
 463: 63.50288,
 478: 88.7225952,
 491: 116.6638624,
 499: 119.9,
 501: 115.0309312,
 505: 102.96547699,
 512: 65.1358112,
 518: 78.698276195,
 519: 78.698276195,
 520: 78.698280334,
 532: 70.6696336,
 534: 69.39963261,
 540: 75.8,
 558: 63.50288,
 574: 94.0,
 590: 79.151868565,
 610: 94.5,
 614: 63.50288,
 631: 57.6,
 659: 63.50288,
 674: 38.101727,
 676: 108.86208,
 717: 70.0,
 756: 125.917145,
 780: 118.6596672,
 865: 79.83225712,
 872: 80.73944186,
 925: 42.63765,
 946: 68.12952,
 1007: 68.0388,
 1066: 74.389088,
 1067: 74.389088,
 1068: 74.389088,
 1069: 74.389088,
 1070: 74.389088,
 1131: 69.5,
 1164: 92.532768,
 1168: 78.698212,
 1172: 64.410064,
 1255: 79.832192,
 1267: 65.77084,
 1271: 65.317248,
 1272: 65.77084,
 1273: 65.77084,
 1282: 65.77084,
 1291: 65.317248,
 1300: 66.224432,
 1314: 121.750007629,
 1315: 119.1,
 1318: 122.900009155,
 1385: 69.5,
 1387: 110.5,
 1496: 40.36969,
 1536: 119.1,
 1541: 122.750007629,
 1543: 122.850006104,
 1642: 91.1,
 1643: 90.4,
 1671: 57.6,
 1699: 74.615884,
 1700: 75.069476,
 1779: 66.678024,
 1784: 52.7527496,
 1789: 105.46014,
 1790: 68.9913432,
 1791: 129.071,
 1792: 128.935012817,
 1793: 68.49244787,
 1794: 68.49244787,
 1799: 65.77084,
 1852: 68.8099064,
 1905: 59.42055,
 1981: 64.863656,
 1995: 92.079176,
 1998: 74.0262144,
 1999: 73.3458264,
 2000: 73.481904,
 2018: 31.3885664,
 2019: 31.0256928,
 2022: 65.317248,
 2027: 136.036010742,
 2030: 65.77084,
 2040: 84.368112,
 2108: 42.184055,
 2139: 38.1,
 2287: 124.100006104,
 2293: 123.350006104,
 2297: 119.1,
 2298: 119.1,
 2341: 119.1,
 2356: 97.965065785,
 2358: 98.127905446,
 2438: 85.27536556,
 2494: 69.12742,
 2496: 68.40167,
 2497: 67.67593,
 2540: 96.61517481,
 2542: 96.433737862,
 2546: 96.705893284,
 2599: 173.725736,
 2607: 75.0,
 2654: 80.0,
 2696: 53.0,
 2700: 69.2181392,
 2703: 68.5831104,
 2873: 70.30676,
 2879: 73.481904,
 2881: 31.3432072,
 2882: 65.77084,
 2886: 66.224432,
 2892: 65.77084,
 2914: 38.101728,
 2930: 134.263232,
 2947: 72.0,
 2951: 99.79024,
 2981: 70.0,
 2986: 109.769264,
 2987: 29.029888,
 2994: 15.422128,
 2995: 27.21552,
 3005: 29.48348,
 3006: 58.96696,
 3010: 32.658623,
 3032: 101.198958525,
 3037: 136.531192,
 3039: 32.205032,
 3050: 25.854744,
 3064: 85.0,
 3065: 84.0,
 3069: 96.070863966,
 3071: 96.433737862,
 3072: 95.889427018,
 3076: 95.70799007,
 3079: 97.159485654,
 3087: 100.243832,
 3088: 82.100152,
 3093: 81.64656,
 3097: 102.798948211,
 3107: 20.865232,
 3109: 93.439952,
 3110: 78.4,
 3111: 102.421074,
 3149: 128.504013062,
 3150: 129.27382,
 3151: 128.50272,
 3152: 130.40192,
 3153: 130.402008057,
 3154: 129.288009644,
 3164: 102.398950789,
 3172: 97.0,
 3179: 109.4063904,
 3185: 83.91452,
 3214: 68.492392,
 3216: 108.0909736,
 3221: 66.224432,
 3226: 100.243832,
 3257: 42.184055,
 3286: 75.6137864,
 3310: 45.812794,
 3313: 44.452015,
 3319: 40.82328,
 3321: 97.0,
 3326: 97.25,
 3330: 41.27687,
 3332: 42.63765,
 3376: 44.90561,
 3397: 68.49244787,
 3398: 68.49244787,
 3406: 68.49244787,
 3408: 68.49244787,
 3409: 68.49244787,
 3411: 68.49244787,
 3413: 68.49244787,
 3415: 122.4,
 3479: 112.490816,
 3515: 34.926582,
 3527: 76.430314345,
 3539: 119.0,
 3541: 118.0,
 3547: 76.20351816,
 3556: 64.2286272,
 3573: 68.49244787,
 3579: 130.38014,
 3580: 131.43384,
 3581: 129.844,
 3586: 128.302001953,
 3588: 131.434005737,
 3589: 130.542007446,
 3591: 130.380004883,
 3594: 129.762008667,
 3595: 130.54207,
 3597: 129.86485,
 3598: 130.08621,
 3599: 130.086013794,
 3600: 130.539001465,
 3601: 129.865005493,
 3604: 130.5389,
 3605: 131.562011719,
 3606: 131.42204,
 3607: 133.11394,
 3608: 133.114013672,
 3613: 130.89587,
 3614: 131.154006958,
 3616: 134.708007812,
 3617: 134.350006104,
 3618: 134.35,
 3622: 76.430314345,
 3624: 64.863656,
 3625: 76.883906715,
 3636: 65.0450928,
 3637: 76.883906715,
 3638: 76.883906715,
 3648: 78.017824,
 3658: 94.982239509,
 3670: 133.53804,
 3672: 67.58526313,
 3673: 67.58526313,
 3676: 67.58526313,
 3678: 133.76485,
 3681: 87.3,
 3685: 132.736,
 3686: 133.222000122,
 3687: 132.73600769,
 3690: 132.866,
 3691: 131.011,
 3692: 132.1251,
 3693: 132.78418,
 3694: 131.93187,
 3695: 132.125,
 3696: 131.998001099,
 3697: 131.523,
 3698: 131.992,
 3706: 64.1379088,
 3723: 122.197685,
 3768: 64.9543744,
 3770: 67.58526313,
 3786: 62.142104,
 3788: 62.595696,
 3792: 62.142104,
 3794: 62.595696,
 3798: 62.142104,
 3801: 62.142104,
 3802: 38.55532,
 3809: 96.524456336,
 3810: 96.343016619,
 3811: 96.343016619,
 3812: 96.343019388,
 3813: 96.343016619,
 3838: 64.5915008,
 3846: 98.610978469,
 3848: 118.387512,
 3858: 65.1358112,
 3859: 64.863656,
 3860: 65.0450928,
 3861: 65.0450928,
 3862: 64.6822192,
 3868: 67.58526313,
 3875: 98.610978469,
 3876: 98.610978469,
 3877: 124.5,
 3878: 123.4,
 3881: 64.863656,
 3889: 96.52445772,
 3891: 79.151804,
 3915: 122.6,
 3916: 122.6,
 3922: 31.297848,
 3923: 31.3885664,
 3924: 31.3432072,
 3926: 65.0450928,
 3935: 79.151868565,
 3936: 79.151868565,
 3939: 56.3,
 3952: 62.595696,
 3957: 62.142104,
 3962: 61.688512,
 3965: 63.049288,
 3966: 62.595696,
 3968: 62.595696,
 3972: 61.688512,
 3973: 62.142104,
 3977: 64.5007824,
 3978: 64.5915008,
 3980: 64.863656,
 3990: 38.55532,
 4004: 121.9,
 4056: 64.863656,
 4106: 35.380177,
 4113: 38.55532,
 4115: 92.98636,
 4119: 62.142104,
 4120: 63.049288,
 4121: 123.2863056,
 4138: 64.6822192,
 4141: 64.9543744,
 4179: 56.2,
 4182: 95.85,
 4888: 34.926582,
 4891: 131.995272,
 4897: 74.84268,
 4899: 36.74095,
 4901: 37.194546,
 4915: 37.194546,
 4919: 99.427447504,
 4959: 64.5915008,
 4960: 65.0450928,
 4966: 65.0450928,
 4967: 80.28584949,
 4971: 65.317248,
 4973: 64.6822192,
 4975: 121.4719376,
 4990: 54.099960447,
 4992: 35.833767,
 4995: 36.74095,
 4997: 34.0194,
 4998: 62.595696,
 5006: 65.317248,
 5008: 87.6340445,
 5009: 87.724761589,
 5060: 75.1601944,
 5083: 74.38914868,
 5102: 114.985572,
 5157: 69.39963261,
 5158: 74.162292,
 5160: 75.29633342,
 5189: 115.439164,
 5199: 73.935496,
 5209: 67.222389234,
 5229: 47.62716,
 5234: 69.39963261,
 5427: 66.5,
 5479: 62.595696,
 5497: 98.157388868,
 5506: 80.28584949,
 5508: 65.1358112,
 5515: 80.28584949,
 5533: 64.863656,
 5548: 78.698212,
 5551: 78.698276195,
 5572: 65.317248,
 5606: 57.6,
 5607: 65.317248,
 5636: 63.956472,
 5641: 65.317248,
 5642: 65.77084,
 5685: 85.0,
 5712: 128.739013672,
 5714: 130.042999268,
 5730: 73.481904,
 5757: 64.6822192,
 5760: 87.089664,
 5792: 64.6822192,
 5796: 89.6297792,
 5797: 32.4771872,
 5798: 64.410064,
 5804: 125.463554,
 5816: 87.0,
 5824: 65.1358112,
 5865: 84.6,
 5874: 61.6,
 5907: 75.5,
 5930: 39.00891,
 5936: 33.112216,
 5939: 63.50288,
 5943: 64.410064,
 5946: 64.410064,
 5951: 63.049288,
 5971: 88.722670341,
 6019: 130.430999756,
 6029: 89.176262711,
 6030: 89.176262711,
 6031: 90.3555264,
 6032: 90.35559872,
 6033: 89.176262711,
 6035: 89.176262711,
 6037: 38.101727,
 6054: 69.8,
 6055: 68.7,
 6062: 69.39963261,
 6074: 67.222389234,
 6137: 65.317248,
 6139: 69.4,
 6141: 89.1761872,
 6164: 69.5,
 6205: 97.29549,
 6225: 88.994822994,
 6269: 103.237632751,
 6286: 103.963378906,
 6301: 117.5710464,
 6376: 79.832260132,
 6377: 79.83225712,
 6378: 79.83225712,
 6380: 79.83225712,
 6381: 79.832192,
 6383: 79.83225712,
 6386: 86.4,
 6392: 79.83225712,
 6395: 79.832192,
 6411: 79.83225712,
 6413: 79.83225712,
 6416: 80.059053305,
 6418: 80.059053305,
 6421: 80.059053305,
 6422: 80.059053305,
 6433: 79.605460935,
 6437: 79.83225712,
 6443: 79.605460935,
 6444: 79.151868565,
 6445: 79.151868565,
 6446: 79.151868565,
 6448: 80.73944186,
 6472: 41.730465,
 6519: 74.389088,
 6520: 73.481904,
 6561: 86.6,
 6648: 65.317248,
 6652: 65.317248,
 6653: 65.77084,
 6682: 103.056190491,
 6699: 84.0,
 6740: 112.0,
 6743: 110.500087415,
 6798: 79.151868565,
 6801: 120.4740352,
 6805: 40.36969,
 6808: 134.889007568,
 6817: 79.605396,
 6838: 75.3,
 6845: 53.98,
 6911: 65.77084,
 6958: 38.55532,
 6980: 73.481904,
 6983: 99.79024,
 7021: 80.013691299,
 7024: 52.163082,
 7027: 48.534344,
 7383: 8.0,
 7411: 39.916096,
 7477: 170.5506,
 7489: 37.648136,
 7510: 116.119552,
 7517: 65.317248,
 7522: 81.0,
 7525: 65.77084,
 7528: 65.997636,
 7535: 65.77084,
 7537: 65.317248,
 7573: 123.450004578,
 7578: 124.850006104,
 7579: 122.250007629,
 7581: 121.350006104,
 7582: 119.1,
 7583: 119.1,
 7758: 77.11064,
 7759: 77.11064,
 7765: 39.00891,
 7787: 80.013691299,
 7788: 80.013691299,
 7793: 31.75144,
 7796: 63.50288,
 7801: 105.8230136,
 7806: 108.0,
 7808: 52.163082,
 7818: 52.16308,
 7822: 70.0,
 7824: 93.893544,
 7836: 98.338825816,
 7837: 97.703796498,
 7838: 76.657048,
 7847: 119.9297248,
 7850: 94.0,
 7852: 96.705893284,
 7860: 102.6,
 7863: 70.0,
 7866: 66.0,
 7871: 37.194546,
 7873: 62.142104,
 7874: 77.9,
 7875: 86.2,
 7876: 86.2,
 7878: 86.3,
 7885: 29.48348,
 7886: 99.79024,
 7898: 117.026736,
 7912: 84.0,
 7916: 99.79024,
 7918: 58.96696,
 7919: 76.20345,
 7920: 72.57472,
 7923: 78.0,
 7932: 103.0,
 7941: 99.79024,
 7942: 71.213944,
 7944: 70.7,
 7950: 121.0183456,
 7955: 54.0,
 7958: 43.998425,
 7975: 75.296272,
 7992: 29.48348,
 7998: 101.604608,
 7999: 59.0,
 8003: 99.79024,
 8009: 42.63765,
 8022: 83.007336,
 8027: 72.57472,
 8035: 106.4126832,
 8044: 113.85168487,
 8049: 82.0,
 8052: 107.501305,
 8055: 107.047712,
 8057: 52.16308,
 8064: 34.0194,
 8065: 83.0,
 8066: 69.5810128,
 8072: 157.5778608,
 8083: 91.625584,
 8094: 84.821704,
 8095: 107.3198672,
 8123: 57.424794042,
 8128: 120.20188,
 8131: 99.0,
 8137: 93.893544,
 8139: 79.3786,
 8140: 49.89512,
 8143: 108.4992064,
 8144: 108.9074392,
 8154: 108.9074392,
 8160: 86.18248,
 8163: 92.079176,
 8168: 65.77084,
 8197: 101.698953395,
 8199: 79.83219,
 8234: 130.52802,
 8235: 120.4,
 8244: 75.29633342,
 8248: 75.29633342,
 8249: 75.523068,
 8252: 75.976721975,
 8253: 99.335006714,
 8276: 64.0471904,
 8277: 64.2286272,
 8281: 64.5915008,
 8284: 64.410064,
 8285: 65.1358112,
 8377: 75.29633342,
 8378: 79.832192,
 8379: 68.49244787,
 8384: 68.49244787,
 8385: 68.49244787,
 8391: 68.49244787,
 8396: 75.29633342,
 8398: 75.29633342,
 8399: 74.84274105,
 8401: 74.84274105,
 8402: 73.93555631,
 8412: 75.296272,
 8429: 48.080753,
 8445: 42.63765,
 8449: 129.926010132,
 8455: 130.01701355,
 8468: 73.93555631,
 8478: 68.49244787,
 8480: 68.49244787,
 8484: 68.49244787,
 8485: 68.49244787,
 8507: 121.3,
 8509: 124.2,
 8510: 124.6,
 8515: 75.296272,
 8516: 75.29633342,
 8519: 75.069476,
 8523: 74.615944865,
 8524: 74.162352495,
 8532: 74.162352495,
 8533: 73.93555631,
 8534: 74.162352495,
 8535: 74.162292,
 8536: 75.29633342,
 8538: 74.162292,
 8539: 74.84274105,
 8556: 68.49244787,
 8557: 68.49244787,
 8560: 68.49244787,
 8574: 75.74992579,
 8591: 98.361051399,
 8609: 76.20351816,
 8610: 64.863656,
 8611: 100.697426,
 8613: 68.49244787,
 8630: 68.49244787,
 8631: 68.49244787,
 8634: 68.49244787,
 8637: 68.49244787,
 8638: 90.352005005,
 8639: 91.86100769,
 8645: 64.1379088,
 8651: 68.49244787,
 8654: 68.49244787,
 8656: 68.49244787,
 8657: 68.49244787,
 8659: 68.49244787,
 8660: 68.49244787,
 8666: 68.49244787,
 8671: 68.49244787,
 8673: 68.49244787,
 8675: 68.49244787,
 8677: 68.49244787,
 8679: 68.49244787,
 8687: 118.8,
 8695: 119.9,
 8711: 44.452015,
 8718: 63.956472,
 8720: 64.183268,
 8722: 64.2286272,
 8723: 76.20351816,
 8725: 76.20351816,
 8734: 88.178353959,
 8735: 87.18045213,
 8737: 87.724761589,
 8738: 87.724761589,
 8740: 87.724761589,
 8746: 76.430314345,
 8771: 68.49244787,
 8774: 76.883906715,
 8796: 74.5705248,
 8797: 74.5705248,
 8799: 68.49244787,
 8800: 68.49244787,
 8809: 68.49244787,
 8818: 94.0,
 8824: 76.20351816,
 8835: 97.522475529,
 8836: 65.77084,
 8843: 63.50288,
 8845: 75.523129605,
 8848: 75.749864,
 8852: 77.791091455,
 8873: 77.337499085,
 8874: 77.337499085,
 8886: 64.5915008,
 8887: 64.5915008,
 8888: 64.1379088,
 8891: 133.135009766,
 8892: 96.25229953,
 8894: 95.980145492,
 8895: 112.490816,
 8897: 62.595696,
 8915: 133.765,
 8922: 62.595696,
 8946: 63.50288,
 8947: 63.50288,
 8957: 34.472992,
 8959: 39.00891,
 8973: 87.54332741,
 8978: 67.58526313,
 8980: 67.58526313,
 8983: 67.58526313,
 8984: 67.58526313,
 8987: 121.109064,
 8994: 77.337499085,
 9005: 131.523010254,
 9016: 40.36969,
 9017: 78.01788764,
 9025: 77.791091455,
 9037: 126.0,
 9038: 126.0,
 9039: 124.6,
 9042: 123.4,
 9054: 122.6,
 9057: 78.47148001,
 9063: 98.610978469,
 9064: 98.610978469,
 9066: 98.610978469,
 9067: 98.610978469,
 9068: 98.610978469,
 9069: 98.610978469,
 9078: 64.5915008,
 9088: 78.92507238,
 9089: 79.83225712,
 9109: 112.490816,
 9110: 113.398,
 9113: 38.101727,
 9119: 62.142104,
 9120: 78.698276195,
 9132: 118.387512,
 9138: 67.58526313,
 9142: 67.58526313,
 9151: 36.74095,
 9153: 62.595696,
 9158: 121.3,
 9159: 121.9,
 9167: 79.605460935,
 9172: 79.37866475,
 9181: 65.0450928,
 9221: 64.5915008,
 9223: 38.101727,
 9242: 56.0,
 9255: 65.317248,
 9259: 122.9,
 9260: 37.648136,
 9261: 79.3786,
 9264: 82.55374,
 9283: 39.00891,
 9289: 101.199996948,
 9313: 90.89100647,
 9314: 90.73500061,
 9323: 64.5007824,
 9362: 85.0,
 9368: 54.099960447,
 9371: 79.151868565,
 9372: 80.285784,
 9383: 90.303001404,
 9386: 39.916096,
 9388: 89.759002686,
 9390: 90.211006165,
 9391: 35.380177,
 9408: 39.916096,
 9421: 36.74095,
 9450: 39.462505,
 9452: 123.0,
 9453: 123.6,
 9454: 124.0,
 9458: 123.1,
 9459: 121.0,
 9462: 122.2,
 9473: 67.7,
 9475: 75.296272,
 9501: 68.719244055,
 9571: 66.224432,
 9572: 127.136009216,
 9587: 67.222389234,
 9594: 74.84274105,
 9615: 68.719244055,
 9616: 118.6,
 9633: 114.985572,
 9665: 87.6340445,
 9667: 87.724761589,
 9681: 73.93555631,
 9702: 41.27687,
 9707: 39.00891,
 9718: 54.0,
 9720: 54.099960447,
 9731: 80.28584949,
 9739: 54.099960447,
 9741: 64.5007824,
 9744: 34.926582,
 9748: 64.5007824,
 9749: 121.5,
 9758: 68.719244055,
 9801: 63.50288,
 9803: 63.50288,
 9809: 78.92507238,
 9813: 114.7134168,
 9816: 69.39963261,
 9820: 75.069476,
 9821: 74.615944865,
 9826: 74.615944865,
 9829: 75.29633342,
 9837: 70.6696336,
 9838: 36.28736,
 9867: 65.317248,
 9915: 64.410064,
 9918: 63.50288,
 9919: 63.50288,
 9923: 65.317248,
 9933: 79.151868565,
 9949: 63.50288,
 9950: 131.898010254,
 9974: 68.9,
 9985: 38.101727,
 9990: 121.1997824,
 9992: 130.081008911,
 10054: 91.625584,
 10062: 131.570999146,
 10084: 87.99685,
 10100: 63.956472,
 10130: 95.0,
 10132: 78.698276195,
 10133: 89.176262711,
 10134: 89.1761872,
 10136: 88.722670341,
 10140: 39.00891,
 10154: 69.0,
 10156: 75.069476,
 10157: 74.162352495,
 10165: 40.82328,
 10174: 65.77084,
 10197: 63.50288,
 10203: 64.410064,
 10204: 65.317248,
 10211: 64.410064,
 10225: 131.854003906,
 10226: 63.50288,
 10231: 78.92507238,
 10235: 62.595696,
 10239: 130.62600708,
 10251: 131.478012085,
 10273: 63.50288,
 10275: 130.477005005,
 10299: 62.595696,
 10303: 41.27687,
 10364: 57.7,
 10374: 78.698276195,
 10389: 40.36969,
 10392: 67.94808,
 10613: 97.431641076,
 10704: 70.1,
 10712: 64.410064,
 10715: 65.77084,
 10716: 65.77084,
 10791: 65.317248,
 10842: 59.0,
 10860: 42.184055,
 10911: 38.101727,
 10973: 59.0,
 11127: 107.682825869,
 11148: 107.6,
 11149: 107.6,
 11150: 109.95,
 11151: 38.101727,
 11201: 79.605460935,
 11202: 79.605460935,
 11205: 79.605460935,
 11208: 66.224432,
 11249: 64.410064,
 11252: 65.77084,
 11296: 64.410064,
 11298: 65.77084,
 11304: 65.317248,
 11308: 88.99482161,
 11318: 65.77084,
 11320: 65.317248,
 11398: 186.8799,
 11420: 37.648136,
 11488: 77.0,
 14712: 121.7,
 14721: 43.09124,
 14740: 131.54168,
 14741: 68.2202368,
 14743: 68.492392,
 14780: 132.23500061,
 14783: 62.595695,
 14784: 97.06876718,
 14797: 77.564232,
 14798: 75.000076424,
 14825: 99.79024,
 14843: 43.09124,
 14867: 134.645004272,
 14994: 69.8985272,
 15064: 37.648136,
 15135: 69.9,
 15137: 69.9,
 15176: 78.0,
 15211: 105.5508584,
 15230: 96.61517481,
 15232: 97.159485654,
 15237: 71.213944,
 15292: 72.57472,
 15305: 72.0,
 15306: 25.854744,
 15311: 40.36969,
 15323: 68.0,
 15337: 70.079964,
 15344: 80.0,
 15349: 92.98636,
 15361: 97.159485654,
 15362: 96.705893284,
 15390: 96.887330232,
 15391: 96.887330232,
 15392: 96.887330232,
 15394: 47.17357,
 15395: 135.624008,
 15397: 72.57472,
 15412: 100.0,
 15413: 100.0,
 15414: 70.4,
 15415: 105.0,
 15430: 113.398,
 15435: 102.511792,
 15458: 94.0,
 15489: 40.82328,
 15490: 127.00576,
 15491: 131.54168,
 15494: 93.439952,
 15502: 68.49244787,
 15521: 75.0,
 15524: 144.242256,
 15527: 108.1363328,
 15529: 113.398,
 15546: 80.0,
 15552: 39.00891,
 15553: 86.18248,
 15554: 107.0930712,
 15557: 93.439952,
 15565: 56.699,
 15577: 105.5508584,
 15584: 107.1,
 15599: 98.338825816,
 15610: 200.487664,
 15618: 36.28736,
 15619: 101.60461,
 15621: 75.0,
 15623: 53.8867296,
 15624: 63.50288,
 15641: 118.6596672,
 15642: 102.6,
 15644: 71.667536,
 15647: 84.821704,
 15652: 96.161504,
 15654: 94.347136,
 15667: 107.4105856,
 15669: 89.357624,
 15681: 90.7184,
 15683: 34.0194,
 15685: 141.1578304,
 15701: 136.531192,
 15705: 135.62401,
 15710: 76.203456,
 15711: 23.586784,
 15725: 92.532768,
 15726: 102.79755496,
 15729: 136.0776,
 15761: 101.60461,
 15772: 109.4063904,
 15775: 63.3,
 15787: 35.833767,
 15820: 91.3534288,
 15836: 130.180904,
 15857: 122.5,
 15891: 35.380177,
 15895: 68.49244787,
 15897: 75.29633342,
 15903: 102.511795,
 15912: 68.49244787,
 15917: 75.29633342,
 15952: 43.09124,
 15954: 64.410064,
 15955: 65.0450928,
 15964: 129.17401123,
 15966: 129.45526,
 15983: 74.615944865,
 15985: 75.976721975,
 15988: 73.48196394,
 15990: 73.48196394,
 15991: 74.162352495,
 15992: 75.29633342,
 15994: 75.29633342,
 16043: 68.49244787,
 16066: 74.615884,
 16070: 73.93555631,
 ...}

edit #2 since the exercise needed to place the values into equal parts bins, i used the following code i found in the link david posted :

cuts = 100
weightdf = healthdf[healthdf["measure"] == "weight"].copy()
weightdf["Ntile"] = 
pd.qcut(weightdf["measure_value"].rank(method='first'), cuts, labels = 
range(1, cuts + 1))
weightdf.groupby("Ntile")["measure_value"].agg(["min","max","count"])

it now works exactly how it did in sql

end result

eshirvana
  • 23,227
  • 3
  • 22
  • 38
Ori Hait
  • 37
  • 6

2 Answers2

0

If the imgur you posted is truely your dataframe, then the problem is that you are performing qcut on a single-rowed dataframe. How can you split up a single row into 100 quantiles? Pandas is trying to create a bunch of bins of different quantile values, but since you only have one observation, all of those bins start and end in the same place. I can prove this is the case by recreating your error using a single row df:

weightdf = pd.DataFrame(dict(weights=[10])) 
pd.qcut(weightdf["weights"].sort_values(), cuts )

returns

ValueError: Bin edges must be unique:

TLDR: You need more than one row in your df if you want to split it into quantiles.

David Kaftan
  • 1,974
  • 1
  • 12
  • 18
  • that's not it, unless I'm really missing something. weightdf["measure_value"] is about 2k rows. i don't know how you came to that conclusion. 10 cut partition worked, once it gets to a point where a bin ends and begins at the same value the error arises. – Ori Hait Oct 06 '21 at 17:46
  • Well, then the dataframe in your picture does not match the real dataframe. It will be hard to help without that. Please run `weightdf.to_dict()` and paste the output in the original question so people can work with the data you have – David Kaftan Oct 06 '21 at 17:49
  • the output would contain thousands of rows, is it even allowed to paste that amount of data in a post?. – Ori Hait Oct 06 '21 at 17:55
  • put it in a code block and see what happens. – David Kaftan Oct 06 '21 at 18:08
  • added the target column's data, its all I need since I already filtered the df to show only weight measurements. – Ori Hait Oct 06 '21 at 18:40
0

(New answer after OP posted data. First answer is still valid for the same error message with different data)

The problem appears to be that there are several repeated weights in the dataset. To understand why this is a problem, see this simple example: [1,2,3,4,5,5,5,5,9,10]. Lets try to bin that by decile. Of course, we notice that the 5th, 6th, 7th, and 8th decile are 5. So what bin do we put 5 into? We could arbitrarily bin the different fives into different deciles as described in this answer, however, that generally is misleading (hence why I have a separate answer). I think the most logical thing to do is to "round up", and say that no element fits into the 5th, 6th, or 7th decile, and all of the fives fit into the 8th decile. To do that, use the following code (the code is for quantile on the original data, not decile on the example data):

edges = np.quantile(weightdf.measure_value,np.arange(0.01,1.01,0.01))                                                                                                                                                                    
unique_edges = np.unique(edges)                                                                                                                                                                                                           
labels = [ 1 + np.where(unique_edge == edges)[0][-1] for unique_edge in unique_edges ]                                                                                                                                                   
weightdf["Ntile"] = pd.cut(weightdf.measure_value, np.insert(unique_edges,0,0), labels=labels) 
David Kaftan
  • 1,974
  • 1
  • 12
  • 18