-3

I have been working with data sets that mostly show the linear relationship between different attributes/features. What activation should I be using with linear datasets? I have been using sigmoid function until now.

Is there any other activation function I must try?

Artem
  • 3,304
  • 3
  • 18
  • 41
Lambar
  • 51
  • 6
  • well, I recommend you to search about it, you'll find for sure! But if you can't find after you search, you can try asking again. – M. Ali Öztürk Aug 01 '18 at 13:15
  • @M.AliÖztürk I tried searching, but could not find it. – Lambar Aug 01 '18 at 13:17
  • For linear relationship use linear regression, no need for deep learning. Nothing will capture a linear relationship with more style. – missuse Aug 01 '18 at 13:20
  • @missuse Linear regression would still require an activation function? – Lambar Aug 01 '18 at 13:22
  • no, it doesn't. – RLave Aug 01 '18 at 13:25
  • @Lambar no. check some easily to google topics: [1](http://r-statistics.co/Linear-Regression.html), [2](https://www.statmethods.net/stats/regression.html) and [3](https://datascienceplus.com/linear-regression-from-scratch-in-r/). The R's `lm` function solves the parameters using QR decomposition. – missuse Aug 01 '18 at 13:26
  • @RLave If there is a need to determine the output in probability, would not I use `sigmoid` function? – Lambar Aug 01 '18 at 13:34
  • There's a little confusion going on here, what's the output of your model? is it binary classification? or regression – RLave Aug 01 '18 at 13:44
  • If it's binary (0 or 1), you can use a linear model variation called logistic linear model, and this will output probabilities, no need for a activation function – RLave Aug 01 '18 at 13:45
  • https://en.wikipedia.org/wiki/Logistic_regression – RLave Aug 01 '18 at 13:46
  • There is no such thing as a "linear dataset"... – desertnaut Aug 02 '18 at 13:31

1 Answers1

0

1) Too many linearly dependent attributes is not good as they may introduce too much noise in comparison with informative attributes. If your sample like IR-spectra of some gas at different temperatures, then it is better to use PCA (or some other dimensiality reduction algorighm) to reduce dimensionality of your data only too the most informative.

2) The activiation function depends on structure of NN as well as of their function. E.g. ReLU activation function is very "trendy" now. For example see the code below for classification iris data set in keras library. Layers have different activation functions.

library(keras)
train <- iris[sample(nrow(iris)),]

y <- train[, "Species"]
x <- train[, 1:4]

x <- as.matrix(apply(x, 2, function(x) (x - min(x)) / (max(x) - min(x))))

levels(y) <- seq_along(y)
y <- to_categorical(as.integer(y) - 1 , num_classes = 3)

model <- keras_model_sequential()

# add layers and activation functions
model %>%
  layer_dense(input_shape = ncol(x), units = 10, activation = "relu") %>%
  layer_dense(units = 10, activation = "relu") %>%
  layer_dense(units = 3, activation = "softmax")

model %>%
  compile(
    loss = "categorical_crossentropy",
    optimizer = "adagrad",
    metrics = "accuracy"
  )

fit <- model %>%
  fit(
    x = x,
    y = y,
    shuffle = T,
    batch_size = 5,
    validation_split = 0.3,
    epochs = 150
  )
Artem
  • 3,304
  • 3
  • 18
  • 41