Convert small numbers to a scale from 0 to 10

Question

I have a list of urls->small numbers. numbers represents the importance of each URL

url, value
https://mywebsite.com/p/1, 0.00212
https://mywebsite.com/p/2, 0.00208
https://mywebsite.com/p/3, 0.00201
https://mywebsite.com/p/4, 0.00138
https://mywebsite.com/p/5, 0.00067
https://mywebsite.com/p/1, 0.00001
...

let's say the sum of values = 1 I would like to represent those number within a scale of 0-10 and maintaining a kind of ratio difference between numbers

url, value, scaled_value
https://mywebsite.com/p/1, 0.00212, 10
https://mywebsite.com/p/2, 0.00208, 9
https://mywebsite.com/p/3, 0.00201, 9
https://mywebsite.com/p/4, 0.00138, 6
https://mywebsite.com/p/5, 0.00067, 3
https://mywebsite.com/p/1, 0.00001, 1
...

something like this (I don't know if the ratio difference here is maintained tho) any one can help with the maths ? thanks

#Update

thanks to @annZen help, I tried two approach, but the results are different I don't know why. if someone can help ?

here are the two formulas I used :

res1 = round(x*9/maxpri)+1
res2 = round(((x-minpri)/(maxpri-minpri))*10, 2)

You have said scale of 0 to 10, but your two `res1` formula gives values from 1 to 10. — alani, Jul 04 '20 at 03:27
thanks @alaniwi do you think this alone can explain the huge difference in the numbers ? I mean the order still the same but the values are different — Dany M, Jul 04 '20 at 03:33

Red · Accepted Answer · 2020-07-04T03:51:58.320

Here is a way:

with open('file.txt', 'r') as p:
    lst = p.read().splitlines() # List all the lines of the file

lst2 = [float(i.split(', ')[1]) for i in lst[1:]] # List all the floats

num = [round(a*9/max(lst2))+1 for a in lst2] # List all the scaled numbers

for i,(l,n) in enumerate(zip(lst,['scaled_value']+num)):
    lst[i] = f"{l}, {n}" # Add the 'scaled_value' column

with open('file.txt', 'w') as p:
    p.write('\n'.join(lst)) # Write the updated data into the file

Before:

url, value
https://mywebsite.com/p/1, 0.00212
https://mywebsite.com/p/2, 0.00208
https://mywebsite.com/p/3, 0.00201
https://mywebsite.com/p/4, 0.00138
https://mywebsite.com/p/5, 0.00067
https://mywebsite.com/p/1, 0.00001

After:

url, value, scaled_value
https://mywebsite.com/p/1, 0.00212, 10
https://mywebsite.com/p/2, 0.00208, 10
https://mywebsite.com/p/3, 0.00201, 10
https://mywebsite.com/p/4, 0.00138, 7
https://mywebsite.com/p/5, 0.00067, 4
https://mywebsite.com/p/1, 0.00001, 1

UPDATE:

The part of my code that does the converting is:

num = [round(a*9/max(lst2))+1 for a in lst2]

where lst2 is simply the list of floats extracted from the file. You updated the question for me to explain the difference between

res1 = round(x*9/maxpri)+1
res2 = round(((x-minpri)/(maxpri-minpri))*10, 2)

Lets first see then in my list comprehension:

num1 = [round(x*9/max(lst2))+1 for x in lst2]
num2 = [round(((x-min(lst2))/(max(lst2)-min(lst2)))*10, 2) for x in lst2]
print(num1)
print(num2)

Output:

[10, 10, 10, 7, 4, 1]
[10.0, 9.81, 9.48, 6.49, 3.13, 0.0]

The first clearest difference is that i rounded my answer to the nearest integer. Without it, it would be:

num1 = [round(x*9/max(lst2), 2)+1 for x in lst2]
num2 = [round(((x-min(lst2))/(max(lst2)-min(lst2)))*10, 2) for x in lst2]
print(num1)
print(num2)

Output:

[10.0, 9.83, 9.53, 6.86, 3.84, 1.04]
[10.0, 9.81, 9.48, 6.49, 3.13, 0.0]

The values are now very close, but there's one more thing. My code assumes that the minimum value for the scaled values is 1, because I saw in your post https://mywebsite.com/p/1, 0.00001, 1. I now realize that you stated 0-10, not 1-10. So that another is to change the 9 (10-1=9) to a 10(10-0=10), and removing the +1:

round(x*9/max(lst2), 2)+1
round(x*10/max(lst2), 2)

num1 = [round(x*10/max(lst2), 2) for x in lst2]
num2 = [round(((x-min(lst2))/(max(lst2)-min(lst2)))*10, 2) for x in lst2]
print(num1)
print(num2)

Output:

[10.0, 9.81, 9.48, 6.51, 3.16, 0.05]
[10.0, 9.81, 9.48, 6.49, 3.13, 0.0]

Still a little different, that's because I assumed the minimum value in your column is 0, because you didn't show your whole array. But in this case, it's 0.00001. So, go with:

num = [round(((x-min(lst2))/(max(lst2)-min(lst2)))*10, 2) for x in lst2]

Summary: My code assumed that you wanted the numbers scaled from 1 to 10, instead of 0 to 10, and my code assumed that the minimum value for your data is 0, which might not be the case.

hey @annZen thanks for your answer, I tried this approach, what do you think about it ? https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range — Dany M, Jul 04 '20 at 02:39
yes @annZen but I don't get the same results, I've updated the thread with my results,please tell me your thoughts — Dany M, Jul 04 '20 at 03:10
very clear @annZen, thanks so much. where can we follow you ? — Dany M, Jul 04 '20 at 04:03
oh sorry, the question was about where we can find you to discuss further, don't mind, I saw that you have a facebook group and a youtube channel :) — Dany M, Jul 04 '20 at 04:08
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/217188/discussion-between-dany-m-and-ann-zen). — Dany M, Jul 04 '20 at 04:12

Telescope · Answer 2 · 2020-07-04T00:58:09.380

0

If you want to maintain a definite ratio difference between the numbers, you can set the smallest number to 1 then have every other number set to num/smallest.

The problem with this approach is that it does not guarantee that every URL is set to a number from 0 - 10. In the above example, it would set the numbers to 212, 208, 201, 138, 67, and 1 respectively.

If you really need to set every number to be between a certain specified range, you first set the smallest URL to have importance 0, and the largest URL to have importance 10. Then, all the other points would lie on a line with slope (max value - min value)/10. The below picture showcases this concept:

In this picture, the y-values of the points represent their URL value, and the x-coordinates represent the points' importance.

edited Jul 04 '20 at 00:58

answered Jul 04 '20 at 00:50

Telescope

2,068
1
5
22

hey @Telescope thanks for your answer, I tried this approach, what do you think about it ? https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range – Dany M Jul 04 '20 at 02:39
That approach follows similar logic to my approach. I think it'll work great for your purposes. – Telescope Jul 04 '20 at 17:06

score 0 · Answer 3 · answered Jul 04 '20 at 04:09

If this is for production code, then I suggest csv.DictReader and csv.DictWriter for sake of code that is easy to read when you come back to it later. For example:

from csv import DictReader, DictWriter

scaled_field_name = 'scaled_value'

with open('input.csv') as fin:
    csvin = DictReader(fin, skipinitialspace=True)
    rows = list(csvin)

values = [float(row['value']) for row in rows]
min_value = min(values)
max_value = max(values)
for row, value in zip(rows, values):
    scaled = 10 * (value - min_value) / (max_value - min_value)
    row[scaled_field_name] = str(round(scaled))

with open('output.csv', 'w') as fout:
    csvout = DictWriter(fout, csvin.fieldnames + [scaled_field_name])
    csvout.writerows(rows)

(Note: it will not write spaces after the commas, but that should be normal for CSV.)

Convert small numbers to a scale from 0 to 10

3 Answers3

Summary: My code assumed that you wanted the numbers scaled from 1 to 10, instead of 0 to 10, and my code assumed that the minimum value for your data is 0, which might not be the case.