The actual answer depends on the actual ALU you have and the method you choose. You say you have tried to find how to connect two 4-bit ALUs; here is some help:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.defs_pkg.all;
entity cpu_alu is
port(
opA : in std_logic_vector(15 downto 0); -- operand A
opB : in std_logic_vector(15 downto 0); -- operand B
Op : in std_logic_vector(2 downto 0); -- operator
cIn : in std_logic; -- carry in
invA : in std_logic; -- invert A
result : out std_logic_vector(15 downto 0); -- result
cOut : out std_logic; -- carry out
overflow: out std_logic; -- overflow
zero : out std_logic
);
end entity cpu_alu;
architecture rtl1 of cpu_alu is
signal A: std_logic_vector(7 downto 0);
signal INTERNAL_CARRY: std_logic; -- the carry chain
signal zeroMSB: std_logic; -- because this ALU has a 'zero' output
signal zeroLSB: std_logic;
constant Top : integer := A'left;
begin
MSB : entity work.cpu_alu8
port map ( opA => opA(7 downto 0),
opB => opB(7 downto 0),
Op => Op,
CIn => CIn,
invA => inVa,
result => result(7 downto 0),
cout => INTERNAL_CARRY,
overflow => open,
zero => zeroMSB);
MSL : entity work.cpu_alu8
port map ( opA => opA(15 downto 8),
opB => opB(15 downto 8),
Op => Op,
CIn => INTERNAL_CARRY,
invA => inVa,
result => result(15 downto 8),
cout => cOut,
overflow => overflow,
zero => zeroLSB);
zero <= zeroMSB and zeroLSB;
end architecture rtl1; -- of cpu_alu
This shows two 8-bit ALUs connected together to make one 16-bit ALU. I already had a 16-bit ALU prepared earlier, so I converted it to an 8-bit ALU and instantiated it twice to make the original 16-bit ALU (so I could run the same test on it to make sure I had done it correctly*). I'm sure you can convert it to 2x 4-bit ALU.
The 8 LSBs go to the first ALU; the 8 MSBs go to the second. The key thing to see is how I have connected the carry output of the first ALU to the carry input of the second. Notice also that I am not interested in the overflow
output from the LSB ALU. Finally, I needed to combine the zero
output from each.
Now, of course, I don't know what your ALU actually does. This one doesn't do much; the only mathematical operation it does is ADD. (It's an answer to an exercise and for that reason I am not going to post all the code).
*That is what you should always do. You mention Quartus. Quartus doesn't simulate - it starts with synthesis. You should always simulate before synthesising: it is much quicker to find bugs, find their source and fix them.