Questions tagged [normalization]

Use [tag:database-normalization] for normalizing database-structure, and [tag:unicode-normalization] for normalizing unicode text. Normalization refers to transformations which aim to reduce variation of various types of data and thereby allow more consistent processing, searching, sorting, comparison, etc.

Use for normalizing database-structure
and for normalizing unicode text.

3199 questions
310
votes
15 answers

How to normalize a numpy array to a unit vector

I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function: def normalize(v): norm = np.linalg.norm(v) if norm == 0: return v return v /…
Donbeo
  • 17,067
  • 37
  • 114
  • 188
261
votes
16 answers

Standardize data columns in R

I have a dataset called spam which contains 58 columns and approximately 3500 rows of data related to spam messages. I plan on running some linear regression on this dataset in the future, but I'd like to do some pre-processing beforehand and…
Hoser
  • 4,974
  • 9
  • 45
  • 66
188
votes
10 answers

Why do we have to normalize the input for an artificial neural network?

Why do we have to normalize the input for a neural network? I understand that sometimes, when for example the input values are non-numerical a certain transformation must be performed, but when we have a numerical input? Why the numbers must be in a…
karla
  • 4,506
  • 5
  • 34
  • 39
115
votes
12 answers

How to normalize a 2-dimensional numpy array in python less verbose?

Given a 3 times 3 numpy array a = numpy.arange(0,27,3).reshape(3,3) # array([[ 0, 3, 6], # [ 9, 12, 15], # [18, 21, 24]]) To normalize the rows of the 2-dimensional array I thought of row_sums = a.sum(axis=1) # array([ 9, 36,…
Aufwind
  • 25,310
  • 38
  • 109
  • 154
84
votes
8 answers

When can I save JSON or XML data in an SQL Table

When using SQL or MySQL (or any relational DB for that matter) - I understand that saving the data in regular columns is better for indexing sake and other purposes... The thing is loading and saving JSON data is sometimes a lot more simple - and…
user4602228
77
votes
4 answers

Normalize data before or after split of training and testing data?

I want to separate my data into train and test set, should I apply normalization over data before or after the split? Does it make any difference while building predictive model?
75
votes
9 answers

How can I normalize a URL in python

I'd like to know do I normalize a URL in python. For example, If I have a url string like : "http://www.example.com/foo goo/bar.html" I need a library in python that will transform the extra space (or any other non normalized character) to a proper…
Tom Feiner
  • 20,656
  • 20
  • 48
  • 51
72
votes
5 answers

Save MinMaxScaler model in sklearn

I'm using the MinMaxScaler model in sklearn to normalize the features of a model. training_set = np.random.rand(4,4)*10 training_set [[ 6.01144787, 0.59753007, 2.0014852 , 3.45433657], [ 6.03041646, 5.15589559, 6.64992437, …
72
votes
8 answers

In what way does denormalization improve database performance?

I heard a lot about denormalization which was made to improve performance of certain application. But I've never tried to do anything related. So, I'm just curious, which places in normalized DB makes performance worse or in other words, what are…
Roman
  • 64,384
  • 92
  • 238
  • 332
69
votes
3 answers

MongoDB normalization, foreign key and joining

Before I dive really deep into MongoDB for days, I thought I'd ask a pretty basic question as to whether I should dive into it at all or not. I have basically no experience with nosql. I did read a little about some of the benefits of document…
egervari
  • 22,372
  • 32
  • 121
  • 175
66
votes
11 answers

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers)

I tried to run the graph cut algorithm for a slice of an MRI after converting it into PNG format. I keep encountering the following problem: Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for…
Ankita Shinde
  • 691
  • 2
  • 7
  • 11
57
votes
7 answers

Laying out a database schema for a calendar application

I want to write a calendar application. It is really recurring items that throw a wrench in the works for the DB schema. I would love some input on how to organize this. What if a user creates an event, and inputs that it repeats everyone Monday,…
Anthony D
  • 10,877
  • 11
  • 46
  • 67
54
votes
7 answers

Normalize a feature in this table

This has become quite a frustrating question, but I've asked in the Coursera discussions and they won't help. Below is the question: I've gotten it wrong 6 times now. How do I normalize the feature? Hints are all I'm asking for. I'm assuming…
bjd2385
  • 2,013
  • 4
  • 26
  • 47
54
votes
9 answers

How to normalize a confusion matrix?

I calculated a confusion matrix for my classifier using confusion_matrix() from scikit-learn. The diagonal elements of the confusion matrix represent the number of points for which the predicted label is equal to the true label, while off-diagonal…
Kaly
  • 3,289
  • 4
  • 24
  • 25
53
votes
15 answers

Explaining why "Just add another column to the DB" is a bad idea, to non programmers

I have sales people and bean counters who are trying to sell customizations to clients, which is fine. But when a complex change request comes in that I send back a large estimate for, they get confused. Often they come back at me with "Why can't…
Neil N
  • 24,862
  • 16
  • 85
  • 145
1
2 3
99 100