Short story is that we want to be able to perform a fast search in Umbraco for media that has been picked somewhere.
We have an umbraco 7.2.x site. We wanted to index all of the PDFs in our media, so we could search them. We have that all figured out. The trick is that we only want to be able to search for the PDFs that have been picked by content. All of the PDFs that aren't picked are old versions of PDFs, and we don't want those to show up in the search results.
There's always the possibility of asking the client to go through the media and flag old PDFs as unsearchable or deleting them. We'd really rather not go that route. Someone will inevitably forget to flag a PDF and it will end up being a big deal.
The other thing we'd like to avoid is hitting the database on this search. We want the search to be super fast. It would be nice if any solutions could minimize the amount of time that this would add to reindexing all the media. It would be cool if that didn't take a really long time because of this work.
Some possible solutions we've thought of are:
- When the PDF media is picked, update some index field for that media item, so when we search it, we can filter on that field. I don't know how we would do this, yet.
- Do something tricky with relations. I don't have a lot of experience with relations, and I don't know if there is a way to deal with them without hitting the database.
Does anyone have any cool ideas about how to do this?