0

I need to extract a CVE from a text column. The format is CVE-yyyy-xxxxx.

The year(yyyy) is variable, and the x’s will change per CVE - can be 1 to 6 digits. Sometimes the CVE is enclosed in (), sometimes followed by :

The column is like:

the vulnerability name (CVE-2019-0215) 
the vulnerability name (CVE-2019-0290) extra words 
the vulnerability name CVE-2018-23314: blah blah

Expected Output in a new column:

CVE-2019-0215
CVE-2019-0290
CVE 2018-23314
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Have you got some actual sample data with expected results just to see what you are trying? Do you know about the `SUBSTRING()` function? >> `SELECT SUBSTRING(, ) as NewCol` – JvdV Jun 22 '22 at 19:24

3 Answers3

1

If this is about the standard CVE format, extracting the year can easily be done using substring() with a regular expression:

substring('CVE-2022-1552' from 'CVE-([0-9]{4})-[0-9]+')

returns 2022

substring() will return the first matching group, so the regex "describes" the whole pattern and by using a capturing group for the year, only that will be returned.

If you need to match other formats, you need to adjust the regex accordingly.

1

It is useful to layout the problem through sample data, and the expected result (ps: for this please learn some stackoverflow format capability).

sample data

CREATE TABLE mytable(
   sourcecol VARCHAR(100) NOT NULL
);
INSERT INTO mytable(sourcecol) VALUES ('the vulnerability name (CVE-2019-0215)');
INSERT INTO mytable(sourcecol) VALUES ('the vulnerability name (CVE-2019-0290) extra words');
INSERT INTO mytable(sourcecol) VALUES ('the vulnerability name CVE-2018-23314: blah blah');

query: using regex pattern in substring function

select substring(sourcecol from '(CVE-[0-9]{1,6}-[0-9]{1,6})+')
from mytable

this regex "pattern" looks for the string starting with "CVE-" followed by 1 to 6 digits followed by "-" followed by 1 to 6 digits

result

+----------------+
|   substring    |
+----------------+
| CVE-2019-0215  |
| CVE-2019-0290  |
| CVE-2018-23314 |
+----------------+

see this dbfiddle

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
0

Returns the first valid CVE (according to your definition) in the string:

SELECT substring(col1, 'CVE-[12]\d{3}-\d{1,6}') AS cve
FROM  tbl;

db<>fiddle here

The first CVE was issued 1999, so the year can start with 1 or 2.

The official definition of a CVE allows at least 4 digits and as many digits as needed for the serial number, so:

SELECT substring(col1, 'CVE-[12]\d{3}-\d{4,}') AS cve
FROM  tbl;

Details for regular expressions in the manual.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228