If you don't want to change the architecture of your neural network and you are only trying to reduce the memory footprint, a tweak that can be made is to reduce the terms annotated by the CountVectorizer.
From the scikit-learn
documentation, we have (at least) three parameter for reduce the vocabulary size.
max_df : float in range [0.0, 1.0] or int, default=1.0
When building the vocabulary ignore terms that have a document frequency strictly higher than the given threshold (corpus-specific stop words). If float, the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.
min_df : float in range [0.0, 1.0] or int, default=1
When building the vocabulary ignore terms that have a document frequency strictly lower than the given threshold. This value is also called cut-off in the literature. If float, the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.
max_features : int or None, default=None
If not None, build a vocabulary that only consider the top max_features ordered by term frequency across the corpus.
This parameter is ignored if vocabulary is not None.
In first instance, try to play with max_df and min_df. If the size is still not suitable with your requirements, you can drop the size as you like using the max_features.
NOTE:
The max_features tuning can drop your classification accuracy by an higher ratio than the other parameters