0

We are trying to replace multiple character in property(string) that we query from database.

var list = _context.Users.Where(t => t.Enable).AsQueryable();
list = list.Where(t => t.Name.ToLower().Contains(searchValue));

Property Name should be without characters (.,-').

We have tried:

list = list.Where(t => t.Name.ToLower().Replace(".","").Replace(",","").Replace("-","").Contains(searchValue));

and it works like this, but we don't want to use replace multiple times.

Is there any other ways that works with IQueryable? Thanks.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Emily
  • 11
  • 4
  • 2
    https://stackoverflow.com/questions/1321331/replace-multiple-string-elements-in-c-sharp Does this helps? – aca Jul 11 '22 at 13:41
  • Nope. You should write query like this. Or use appropriate format for storing data. – Svyatoslav Danyliv Jul 11 '22 at 13:53
  • Clean up the data when you store it, not afterwards. What you try to do have to scan the entire table to calculate the text value for every single row. What you wrote is essentially `WHERE REPLACE(REPLACE(REPLACE(.......,field)))))) LIKE '%potato%'`. Each of those calls *and* the `LIKE '%%'` operator would force a full scan – Panagiotis Kanavos Jul 11 '22 at 13:59
  • 1
    Why are you using such code in the first place? What does the data look like? What you try to do is wrong and reducing the code needed won't make it better. *Maybe* you could use a [Full-Text](https://learn.microsoft.com/en-us/sql/relational-databases/search/full-text-search?view=sql-server-ver16) Search index and functions like [CONTAINS or FREETEXT](https://learn.microsoft.com/en-us/sql/relational-databases/search/query-with-full-text-search?view=sql-server-ver16) – Panagiotis Kanavos Jul 11 '22 at 14:03
  • How big is the list of users? If it's relatively small, you could call `AsEnumerable()` and then use regex. @PanagiotisKanavos Thanks for the information earlier, didn't realize I was dealing with EF at first. – Ryan Wilson Jul 11 '22 at 14:08
  • @RyanWilson doing that over and over would be a huge waste of IO and CPU that can be avoided by simply cleaning up the data, or storing a cleaned copy. That won't fix `LIKE`'s performance, but the server has far more power than a client. – Panagiotis Kanavos Jul 11 '22 at 14:10
  • @PanagiotisKanavos What I proposed would be loading the data from the DB once in the call to `AsEnumerable()` and using C# methods to reduce the list. I don't see how this would have anything to do with the DB's `LIKE` performance – Ryan Wilson Jul 11 '22 at 14:12
  • The server is far faster than any client (faster CPU, RAM, disks) and can perform a naive search faster than a client. It won't have to wait for a transfer over the network either. A client would be faster *only* if a special structure like a trie was used to accelerate substring searches. That structure would have to be updated every time the data changed though – Panagiotis Kanavos Jul 11 '22 at 14:14
  • A hacky quick fix would be to use a persisted computed column with the `REPLACE(...)` expressions. At least this would avoid performing the replacement every time and clean up the queries. It wouldn't fix `LIKE '%%'`'s bad performance though – Panagiotis Kanavos Jul 11 '22 at 14:15
  • @PanagiotisKanavos I'm not disagreeing with your proposal, only offering a solution that fits the requirements. – Ryan Wilson Jul 11 '22 at 14:16
  • In SQL Server 2017 and later the [TRANSLATE](https://learn.microsoft.com/en-us/sql/t-sql/functions/translate-transact-sql?view=sql-server-ver16) function can be used to replace multiple characters at the same time. There's no equivalent in LINQ though. `FromSqlRaw` or `FromSqlInterpolated` would have to be used as a starting point. – Panagiotis Kanavos Jul 11 '22 at 14:24
  • @PanagiotisKanavos I'm not sure that is true in today's world. Clients may often be much more powerful than a server, especially since their power is dedicated to a single user. Also, I don't think `TRANSLATE` can remove characters? – NetMage Jul 12 '22 at 20:56
  • Does your client have 1TB of RAM? Or 256GB if it's a small machine? Or 16 cores? Does it use parallelization to process large loads? Does it have table data already cached in memory?Precalculated B+-Tree indexes to speed lookups? Never mind the network delays and latency. No matter how fast your connection is, it's slower than the RAM channels. `TRANSLATE` can map multiple characters to the same single character that can be replaced. That's a trick used in a lot of SO answers – Panagiotis Kanavos Jul 13 '22 at 07:22
  • This might help https://stackoverflow.com/questions/7265315/replace-multiple-characters-in-a-c-sharp-string – simmyk Jul 14 '22 at 10:33

2 Answers2

1

We have decided to do it in database,SQL, creating View like this:

CREATE OR ALTER VIEW Users_View 
AS
SELECT Id,CreationDate, UserName =  REPLACE(TRANSLATE(Users.UserName, '_'',.-', '#####'), '#', '')
FROM Users;

and than we just do query on view, like this UserName is already without special characters.

Emily
  • 11
  • 4
  • The only issue is this will have to run the expression on every record each time the view is queried. You may want to consider using triggers and adding another username column if you have enough data. – NetMage Jul 12 '22 at 21:02
  • Triggers are needed if you want to clean up the data on insertion/update. If you want to preserve the original data you can create a computer, persisted column with that expression: `ALTER TABLE Users ADD UserNameForIndex AS REPLACE(...) PERSISTED` . The database will calculate and persist the result when the data changes. A persisted column can even be indexed too, `CREATE INDEX IX_Users_UserName on Users(UsernameForIndex)` – Panagiotis Kanavos Jul 13 '22 at 07:32
-2

Well, the first thing you want to do is add a column to your table called "CanonicalName"

Then your query becomes just:

var list = (from t in _context.Users
           where t.Enable && t.CanonicalName.Contains(searchValue)
           select t).ToList();

Now, you need to populate CanonicalName. Since you only have to do this once ever for each record, it doesn't have to be that efficient, but here goes:

public string Canonicalize(string str)
{
 var sb = new StringBuilder(str.Length);
 foreach(var c in str)
 {
     if (c == '.' || 
         c == ',' ||
         c == '-')    // you may wish to add others
             continue;
     c = Char.ToLower(c);
     sb.Append(c);
 }
return sb.ToString();
}

UPDATE: Since people need everything spelled out...

 foreach(var u in _context.Users)
        u.CanonicalName = Canonicalize(u.Name);
_context.SaveChanges();
James Curran
  • 101,701
  • 37
  • 181
  • 258
  • You never actually populate the new column. Also, I think you would want to use triggers in SQL to handle inserts and updates instead... – NetMage Jul 12 '22 at 21:00
  • It's an EF query `_context.Users`. This code can't be translated to SQL – Panagiotis Kanavos Jul 13 '22 at 07:33
  • @PanagiotisKanavos Because it needs to be done only once, it doesn't need to be translatable into SQL. You're probably better off doing it in C# even if it could be. – James Curran Jul 15 '22 at 13:16
  • @NetMage Why use a trigger to perform 10% of the work of an insert, when the other 90% is being done in code? Particularly when that 10% is a string operation which is best done in code anyway. – James Curran Jul 15 '22 at 13:17
  • Because you have to run the code every time you query, or modify every occurrence of insert/update in your program to update the new column. One of the points of using a database is to make the program and data more independent, you can add a new column, add insert and update triggers, and not change any code at all except the query that needs to use the new column. – NetMage Jul 15 '22 at 22:11
  • @NetMage No, You'd have to do it once when the column is added, and then a bit more work when a new record is added (which is largely done in code, and should happen only in one place), or if this column is modified (which should rarely happen for the Name column and is still done mostly in code) – James Curran Jul 16 '22 at 16:50