NOTE Before I begin, this F-measure is not related to precision and recall, and its title and definition is taken from this paper.
I have a feature known as the F-measure, which is used to measure formality in a given text. It is mostly used in gender classification of text which is what I'm working on as a project.
The F-measure is defined as:
F = 0.5 * (noun freq. + adjective freq. + preposition freq. + article freq. – pronoun freq. – verb freq. – adverb freq. – interjection freq. + 100)
where the frequencies are taken from a given text (for example, a blog post).
I would like to normalize this feature for use in a classification task. Initially, my first thought was that since the value F is bound by the number of words in the given text (text_length), I thought of first taking F and dividing by text_length. Secondly, and finally, since this measure can take on both positive and negative values (as can be inferred from the equation) I then thought of squaring (F/text_length) to only get a positive value.
Trying this I found that the normalised values did not seem to be too correct as I started getting really small values in (below 0.10) for all the cases I tested the feature with and I am thinking that the reason might be because I am squaring the value which would essentially make it smaller since its the square of a fraction. However this is required if I want to guarantee positive values only. I am not sure what else to consider to improve the normalisation such that a nice distribution within [0,1] is produced, and would like to know if there is some kind of strategy involved to correctly normalise NLP features.
How should I approach the normalisation of my feature, and what might I be doing wrong?