And the program we normally write thus become a set of texts.
The word "text" isn't really a common term in compiler construction (or at least not one I heard before). Often a program first is translated into a sequence of tokens (which are basically the "words"¹ of the language) and then that sequence is translated into a syntax tree. That tree may then be further transformed and will finally be translated into a sequence of machine instructions, which make up the compiled program.
A language uses regular expressions to define the syntax, i.e., whether all texts in the text program are good or not.
The syntax of a language describes which programs are structurally valid or not (not taking into account type errors and runtime errors, which are handled separately). You can not do that using regular expressions as the vast majority of languages are not regular, that is they're more complicated than what a regular expression can describe. For example you can't say "for every opening parenthesis there must be a closing parenthesis" using a regular expression.
Regular expressions are often used to describe the tokens of a language. That is you can say "identifiers in the language match the regex [a-zA-Z_][a-zA-Z0-9_]*
and numbers match the regex [0-9]+
".
How those tokens fit together to form a complete program is then described in a grammar.
The first two steps of a compiler are lexical analysis and parse.
Usually, yes.
The lexical analysis convert the regular expressions to a NFA / DFA, and work through the program texts, and validate them and convert them to tokens.
If you use a lexer generator, then the generator will take a bunch of regular expressions you gave to it and convert those to automata and will then produce code based on those. That generated code is the lexer, which will take the program source and produce a sequence of tokens.
Note that the conversions between regular expressions and automata happens when the generator runs, not as part of your compiler. And if you write the lexer by hand, no conversion between regular expressions and automata will happen at all (except possibly in your head).
The parsing deal with those tokens and check their semantics.
No. The parsing phase takes the tokens and makes sure that they conform to the syntax of the language. If they do, it will perform actions based on the syntactic structure of the language. Often that means building a syntax tree. For simple languages it is also possible to do semantic analyses (like type checking) and code generating directly in the parser.
If you do build a syntax tree, consequent phases will then go over that tree and that's where the language's semantics come into play.
Another question is that so a definition of a language is regular expression and we use parsing part to validate the program's grammar?
The definition of a language's syntax is generally given as a grammar, not a regular expression. As I said regular expressions aren't expressive enough for that. We do use parsing to validate that a given program conforms to the language's grammar (as well as to determine the syntactic structure of the program).
The definition of a language consists of the definition of the language's syntax and the definition of its semantics. The latter is often given in text form.
¹ Here I'm using the colloquial meaning of the word "word", not its language-theoretic meaning.