My application handles some texts parsing and uses a proper noun cache to reduce database calls:
Dictionary<String, ProperNoun> ProperNounsDict;
if (!ProperNounsDict.ContainsKey(word))
{
var newProper = new ProperNoun() { Word = word };
ProperNounsDict.Add(word, newProper);
UnitOfWork.ProperNounRepository.Insert(newProper);
try
{
UnitOfWork.SaveChangesEx();
}
catch (Exception ex)
{
//
}
}
The problem is that database and C# treat equality of strings in a different way, so I can run into duplicate key error (SQL) for similar words:
1) Database (SQL Server 2014)
Column_name Type Collation
Word nvarchar Latin1_General_100_CS_AS
Saevarsson
and Sævarsson
are the same thing from the database perspective and it is fine for me, since words containing characters æ are very rare in parsed texts:
select * from dict.ProperNoun where Word = N'Saevarsson' -- returns both Saevarsson and Sævarsson
2) C#
string s1 = "Sævarsson";
string s2 = "Saevarsson";
bool equals = s1.Equals(s2, StringComparison.InvariantCulture);
s1 and s2 are seen as equal, if comparison is done in an InvariantCulture way
Question: is there a way to check for a string key existence in an InvariantCulture way? I do not want to loose my O(1) complexity of checking for key existence, if possible.
Things I have tried:
a) Database check - for cache misses, before inserting into the cache, also check in DB. Generates a lot of queries, so performance is awful
b) String normalization - replace undesired characters with "normal" ones using a map similar to this one. Requires a lot of work and I feel it can be automated since StringComparison.InvariantCulture
knows how to deal with this.
Thanks.