3

I'm building a simple chat server in java , where users can have private conversations with each other. I want to save those conversation at a server level(not on client side) so I can list them to the users as a conversation log service. I'm also using MySQL as a database in my software.

What I'm looking is an optimized way to save those conversations and also a fast way to list them later. So far I've thought on 2 implementations.

  1. using the MySQL database and

    a) save the conversation in a row as a text, but the problem is that some conversations are very huge(lots of chars) and I might have problems saving the entire conversation

    b) save every line of the conversation in a row, but this way speed problems can appear when I want to list the entire conversation

  2. saving every conversation in a separated text file, but I'm afraid read/write problems can appear, especially when the users(clients) are writing(sending text) very fast.

Thanks

Doua Beri
  • 10,612
  • 18
  • 89
  • 138
  • why "not on client side"? Is there a specific reason for not doing that? Are there any other requirements that a pure "user-friedly" service? BTW, what is huge? How is your server configured? – Caspar Kleijne Dec 28 '11 at 12:21
  • The most natural solution would be 1 (b) from your question. If 'conversations' occur between two users only, their IDs and an auto increment field can form a decent primary key. Or if you have a concept of a conversation (i.e. conversation = one chat session), then you use that as the ID. Why would this be slow? – ArjunShankar Dec 28 '11 at 12:22
  • I wouldn't worry too much about speed in 1b. Even if the conversation lasts for years with several hundred messages per day, chances that you would ever want to see all the lines at once are minimal. Even if the user opens a chat window with complete history you can easily prefetch just a few lines around the current viewport. MySQL will be able to get the lines much faster than any human can read them anyway. – Fredrik Dec 28 '11 at 12:22
  • @CasparKleijne - One reason could be: wanting to have access to the history irrespective of which client the user connects from. – ArjunShankar Dec 28 '11 at 12:24
  • @Caspar Kleijne there are multiple clients(desktop, browser, mobile), also the user can use different computers. – Doua Beri Dec 28 '11 at 14:05
  • @Fredrik actually this method is currently implemented and after 1 month of usage I have around 400k rows and the select is starting to get slow... and my service is currently in beta with just 30-40 different conversations/day – Doua Beri Dec 28 '11 at 14:07
  • 1
    @DouaBeri sounds like you either need to look at the slow query log to see if you're missing an index or think about if you're making reasonable selects. Selecting all the rows and showing 300 on the screen might not be the smartest thing to do... 400k rows isn't much. – Fredrik Dec 28 '11 at 15:03
  • @Fredrik 400k isn't much .. but we achieved this number in 1 month with only a small amount of beta testers and we intend to keep conversations log for more than 1 year. – Doua Beri Dec 28 '11 at 16:11
  • @Doua On first glance: You only select a handful lines with every select and the query should be indexed. Hence the size of the whole table shouldn't have much of an influence on the performance. (Well if your data gets too large to be hold in RAM all at once you HAVE problems, but then using a different db won't help with that) – Voo Dec 28 '11 at 16:15
  • @DouaBeri You must be doing it wrong, there is NO way you will ever show even a thousand lines at a time on a display so it really doesn't matter if you have 2 billions of lines or 1500. The amount you get from a single query should be the same (what is on display and a buffer on each side that is big enough to do smooth scrolling). Of course, if you do get the entire conversation in one single select you will have a lot of data to transport but doing it like that would just be wrong. – Fredrik Dec 28 '11 at 23:30
  • After 10 years, how was your experience? Was 1.b enough to keep chat perfomance acceptable? – Éder Rocha Apr 20 '21 at 19:55

2 Answers2

11

This is big architectural problem, you know. Companies like facebook and twitter spent lots of time and money to solve your problem in robust way. If your chat server is simple (As you've written), use 1.b way, but make an abstraction layer (something like saveConversation, getConversation). If in future speed wouldn't satisfy you, think about more efficient representation, like NoSQL database (LevelDB or something like this). Don't think about performance now, make a prototype with good abstraction and pluggable architecture.

korifey
  • 3,379
  • 17
  • 17
  • thanks for your answer. We're currently using 1.b already and even if we're on beta stage with a closed group of beta testers we have around 30-40 conversations/day and after a month the mysql server is starting to run a little slow. I'm thinking in the future to move the entire database to Apache Cassandra, but for the moment is to much to change and also I need to spend more money on hardware since apache cassandra is resource hungry – Doua Beri Dec 28 '11 at 14:16
  • 2
    @Doua The first thought there shouldn't be "Oh we should really use a NoSQL solution" but more "What in gods name are we doing wrong that a select of a few hundred lines of conversation (you really can't show your users much more at once) of a 400k row table takes so long?". So better go looking whether all the indizes are used correctly (and exist), you only select the data you really need and so on. Much one can do there. Asking a question here with the table schema, the query and an explain wouldn't harm too after you've made sure the obvious things are correct. – Voo Dec 28 '11 at 15:46
  • @Voo ok. maybe I've wrote a bad example. we achieved 400k rows in 1 months with a minimal numbers of clients(beta testers). once we open the service to public we expect to have 500-1000 or even more clients which will result in a few millions lines/month and probably a few hundred millions after 1 year. The table is very simple. it has 3 columns : conversation_id(int), message(varchar), add_date(datetime), conversation_id and add_date are indexed and the select is something like this: select message from conversation_log where conversation_id=1 order by add_date – Doua Beri Dec 28 '11 at 16:23
  • @Doua Assuming the indizes are used correctly, it shouldn't matter much how large the table is, since we would never do any full table scans. Personally the first thing I'd do is adding a limit there - why read thousands of lines in if you can only show your user a few hundred? (If he later wants to see more/older data just ask anew). Then check with explain what exactly MySQL is doing. Oh and then ask here again, because my knowledge about MySQL starts and ends with about `It's a DB - so is oracle... so I assume they work more or less the same?` ;) – Voo Dec 28 '11 at 16:33
  • This answer is so perfect it hurts – Nicolas Durán Jan 11 '17 at 20:52
-7

None of the proposed ways are the best optimized way to store it. The best way is to use strip segments of dynamic tables on the basis of conversations.

This is a prototype with the paradigms :-

  1. To do a conversation with multiple participants.
  2. Option to Acknowledge Chats.
  3. Option for Actions updates.
  4. Option to post Attachments.
  5. Option to share Map Coordinates like you current location.
  6. Option to Leave a chat.

Types

Coordinates(
  lat FLOAT( 10, 6 ) NOT NULL ,
  lng FLOAT( 10, 6 ) NOT NULL 
)

Schema

dbo.User(UserId,<..Details..>)

dbo.Conversation(ConversationId,Title,CreationDate,LastActivityDate)

Index : LastActivityDate, ConversationId

dbo.Participants(ConversationId,UserId)

collection of tables - Dynamically created

msg.msg_*ConversationId*(MessageId,FromUserId,Message,Attachment,LocationCoordinates,Action,LogDate,AcknowledgeDate)

Abstraction

usp_TouchMessage(@Which):
   if (not exists (select 1 from INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA+'.'+TABLE_NAME = 'msg.msg_'+CAST(@Which as nvarchar(max)))
   begin
        @dynamic_sql = 'create table msg.msg_'+CAST(@Which as nvarchar(max))+' (MessageId uniqueidentifier,FromUserId uniqueidentifier,Message nvarchar(max),Attachment BLOB,LocationCoordinates Coordinates,Action nvarchar(max),LogDate DateTime, AcknowledgeDate DateTime);';
        exec(@dynamic_sql);
        @dynamic_sql = 'CREATE TRIGGER msg.trig_msg_'+CAST(@Which as nvarchar(max))+'  ON msg.msg_'+CAST(@Which as nvarchar(max))+' AFTER INSERT AS Delete From msg.msg_'+CAST(@Which as nvarchar(max))+' where MessageId in (Select MessageId,ROW_NUMBER() OVER (order by LogDate Desc,AcknowledgeDate Desc,MessageId Desc) RN from msg.msg_'+CAST(@Which as nvarchar(max))+') where RN>=5000';
        exec(@dynamic_sql);

     end

usp_GetParticipants(@Where) :
    Select User.UserId,User.FullName 
          from Participants 
                inner join Users on (
                        Participants.UserId = Users.UserId 
                    And Participants.ConversationId = @Where
                );


usp_AddParticipants(@Who, @Where) :
    insert into Participants(ConversationId,UserId) values (@Where, @Who);
    @action = ( select User.FullName + ' has joined the chat' from User where User.UserId = @Who );
    update Conversation set LastActivityDate = GETUTCDATE() where ConversationId = @conversation_id;
    exec usp_TouchMessage(@Where);
    @dynamic_sql = 'insert into msg.msg_'+CAST(@Where as nvarchar(max))+' (MessageId,FromUserId,Action,LogDate) values (NewId(),@Who,@Action,GETUTCDATE())';
    sp_executesql @dynamic_sql, N'@Who uniqueidentifier,@Action nvarchar(max)', @Who = @Who, @Action = @Action;


usp_GetConversationList() :
    With Participation As (
        Select User.UserId,User.FullName,Participants.ConversationId 
          from Participants 
                inner join Users on (Participants.UserId = Users.UserId)
    ) select Conversation.*,
            (STUFF((
                    SELECT ',' + CAST([FullName] AS nvarchar(max)) 
                      FROM Participation 
                     WHERE (Participation.ConversationId = Conversation.ConversationId) 
              FOR XML PATH ('')
             ),1,1,'')) ParticipantList
        from Conversation 
    order by Conversation.LastActivityDate;


usp_GetConversationById(@Which,@Lite = 0,@Page = 1) :
    Select User.UserId,User.FullName 
          from Participants
            inner join Users on (Participants.UserId = Users.UserId and Participants.ConversationId = @Which);
    @dynamic_sql = 'select * from
                        (select u.UserId,u.FullName,m.MessageId,'
                                +(case when @Lite=1 then 'm.Message,m.LocationCoordinates,m.Attachment,m.Action,' else '' end)
                                +'m.LogDate,m.AcknowledgeDate, ROW_NUMBER() Over (order by m.LogDate Desc,m.AcknowledgeDate Desc,m.MessageId Desc) RN
                          From msg.msg_'+CAST(@Which AS nvarchar(max))+' m 
                            inner join User u on (m.FromUserId = u.UserId) 
                        ) tmp 
                    where RN Between ((@Page-1)*20+1) AND (@Page*20+1)
                ';
    sp_executesql @dynamic_sql, N'@Page bigint', @Page = @Page;
    If @Page = 1 And @Lite=0
    begin
        @dynamic_sql = 'update msg.msg_'+CAST(@Which as nvarchar(max))+' Set AcknowledgeDate = GETUTCDATE() where AcknowledgeDate is null and FromUserId <> @Who';
        sp_executesql @dynamic_sql, N'@Who uniqueidentifier', @Who = @Who;
    end

usp_GetConversation(@Who,@WithWhome,@Lite = 0,@Page = 1) :
    @conversation_id = (
        Select top 1 ConversationId 
          from Participants self 
            inner join Participants partner 
                on ( self.ConversationId = partner.ConversationId and self.UserId = @Who and partner.UserId = @WithWhome)
            where (Select count(1) from Participants where ConversationId = self.ConversationId) = 2
    );  
    if(@conversation_id is not null)
    then
        exec usp_GetConversationById(@conversation_id, @Lite, @Page);
    end


usp_PostConversationById(@Who,@Which,@WhatMessage,@WhichLocation,@WhatAttachment) :
    update Conversation set LastActivityDate = GETUTCDATE() where ConversationId = @Which;
    exec usp_TouchMessage(@Which);
    @dynamic_sql = 'insert into msg.msg_'+CAST(@Which as nvarchar(max))+' (MessageId,FromUserId,Message,Attachment,LocationCoordinates,LogDate) values (NewId(),@Who,@WhatMessage,@WhichLocation,@WhatAttachment,GETUTCDATE())';
    sp_executesql @dynamic_sql, N'@Who uniqueidentifier,@WithWhome uniqueidentifier,@WhatMessage nvarchar(max),@WhichLocation Coordinates,@WhatAttachment BLOB', @Who = @Who, @WhatMessage = @WhatMessage, @WhichLocation = @WhichLocation, @WhatAttachment = @WhatAttachment;


usp_PostConversation(@Who,@WithWhome,@WhatMessage,@WhichLocation,@WhatAttachment) :
    @conversation_id = (
        Select top 1 ConversationId 
          from Participants self 
            inner join Participants partner 
                on ( self.ConversationId = partner.ConversationId and self.UserId = @Who and partner.UserId = @WithWhome)
            where (Select count(1) from Participants where ConversationId = self.ConversationId) = 2
    );  
    if(@conversation_id is not null)
    then
        @conversation_id = newid()
        insert into Conversation(ConversationId,CreationDate) values (@conversation_id,GETUTCDATE());
        exec usp_AddParticipants(@Who,@conversation_id);
        exec usp_AddParticipants(@WithWhome,@conversation_id);
    end
    exec usp_PostConversationById(@Who,@conversation_id,@WhatMessage,@WhichLocation,@WhatAttachment);


usp_UpdateConversationAlias(@Who,@Which,@WithWhat) :
    @action = ( select User.FullName + ' has changed title'+isnull(' from <b>'+Conversation.Title+'</b>','')+isnull(' to <b>'+@WithWhat+'</b>','') from User inner join Conversation on (Conversation.ConversationId = @Which and User.UserId = @Who));
    update Conversation set LastActivityDate = GETUTCDATE(), Title = @WithWhat where ConversationId = @conversation_id;
    if (not exists (select 1 from INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA+'.'+TABLE_NAME = 'msg.msg_'+CAST(@Which as nvarchar(max)))
    exec usp_TouchMessage(@Which);
    @dynamic_sql = 'insert into msg.msg_'+CAST(@Which as nvarchar(max))+' (MessageId,FromUserId,Action,LogDate) values (NewId(),@Who,@Action,GETUTCDATE())';
    sp_executesql @dynamic_sql, N'@Who uniqueidentifier,@Action nvarchar(max)', @Who = @Who, @Action = @Action;


usp_LeaveConversation(@Who,@Which) :
    delete from Participants where ConversationId = @Where and UserId = @Who;
    if(not exists (Select 1 From Participants Where ConversationId = @Which))
    begin
        @dynamic_sql = 'drop table msg.msg_'+CAST(@Which as nvarchar(max))+';
        exec @dynamic_sql;
        delete from Conversation where ConversationId = @Which;
    end
    else
    begin
        @action = ( select User.FullName + ' has left the chat' from User where User.UserId = @Who );
        update Conversation set LastActivityDate = GETUTCDATE() where ConversationId = @conversation_id;
        exec usp_TouchMessage(@Which);
        @dynamic_sql = 'insert into msg.msg_'+CAST(@Which as nvarchar(max))+' (MessageId,FromUserId,Action,LogDate) values (NewId(),@Who,@Action,GETUTCDATE())';
        sp_executesql @dynamic_sql, N'@Who uniqueidentifier,@Action nvarchar(max)', @Who = @Who, @Action = @Action;
    end