1) Too many linearly dependent attributes is not good as they may introduce too much noise in comparison with informative attributes. If your sample like IR-spectra of some gas at different temperatures, then it is better to use PCA (or some other dimensiality reduction algorighm) to reduce dimensionality of your data only too the most informative.
2) The activiation function depends on structure of NN as well as of their function. E.g. ReLU activation function is very "trendy" now. For example see the code below for classification iris
data set in keras
library. Layers have different activation functions.
library(keras)
train <- iris[sample(nrow(iris)),]
y <- train[, "Species"]
x <- train[, 1:4]
x <- as.matrix(apply(x, 2, function(x) (x - min(x)) / (max(x) - min(x))))
levels(y) <- seq_along(y)
y <- to_categorical(as.integer(y) - 1 , num_classes = 3)
model <- keras_model_sequential()
# add layers and activation functions
model %>%
layer_dense(input_shape = ncol(x), units = 10, activation = "relu") %>%
layer_dense(units = 10, activation = "relu") %>%
layer_dense(units = 3, activation = "softmax")
model %>%
compile(
loss = "categorical_crossentropy",
optimizer = "adagrad",
metrics = "accuracy"
)
fit <- model %>%
fit(
x = x,
y = y,
shuffle = T,
batch_size = 5,
validation_split = 0.3,
epochs = 150
)