I am trying to grab certain id's out of HTML code. I have some of it working, but other things I need help with. Here is some sample HTML code of videos:
<video id="movie1" class="show_movie-camera animation_target movieBorder hasAudio movieId_750" src="/path/to/movie" style="position: absolute; z-index: 505; top: 44.5px; left: 484px; display: none;" preload="true" autoplay="true"></video>
<video id="movie2" class="clickInfo movieId_587" src="/path/to/movie" preload="true" autoplay="true"></video>
<video id="movie300" src="/path/to/movie" preload="true" autoplay="true"></video>
To get the movie id's, I look for movieId_[ID] or movie[ID] using this regex:
.*?<object|<video.*?movie(\\d+)|movieId_(\\d+)[^>]*>?.*?
This works well, but it puts both movieId_[ID] AND movie[ID] in the matches, rather than just one. What I am looking for is to use movieId_[ID] and using movie[ID] as the fallback. This is what I use:
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(content);
int fileId = -1;
while(m.find()) {
fileId = -1;
if (m.group(2) != null) {
fileId = new Integer(m.group(2));
} else if (m.group(1) != null) {
fileId = new Integer(m.group(1));
}
}
This will give me 1, 750, 2, 587, 300 instead of 750, 578, 300 that I am looking for.
Additionally, I am looking to get the matches that have the hasAudio class. Here is what I tried with no success:
.*?<object|<video.*?hasAudio.*movieId_(\\d+)|movieId_(\\d+).*hasAudio[^>]*>?.*?";
Any help would be appreciated. Thanks!