[Answered ]-Django which is the correct way to store discussion inside db?

1đź‘Ť

âś…

So the answer to your question depends on how you plan on using the data in the future and what you need to do with it. It is entirely possible to store an entire conversation between N users in a columnar database such as Postgres as individual records per message. However, as with all programming questions, there are multiple paradigms to answer your question. I will explore the pros/cons of a couple of them here (with the knowledge that there are certainly more).

Paradigm 1 New record (row) per message

Pros:

  1. Simpler querying for individual messages.
  2. Analytical functions can easily be applied at a message level (i.e. summing number of messages by certain users)
  3. Record size is (relatively) small

Cons:

  1. Very long table sizes
  2. Searching becomes time consuming as table grows.
  3. Post-processing needed on a collection (i.e. All records from a conversation)
  4. More work is shifted to the server

Paradigm 2 New record (row) per conversation

Pros:

  1. Simpler querying for individual conversations
  2. Shorter table sizes
  3. Post-processing needed on an object (i.e. The entire conversation stored as a JSON object)

Cons:

  1. Larger row size that can grow substantially depending on the number and size of messages.
  2. Harder to query individual messages or text within messages (need to use more expensive functions such as LIKE % on blobs of text = slow)
  3. Less conducive to preforming any type of analytical function on messages.
  4. Messages become an append exercise
  5. More work is shifted to the client/application

Which is best? YMMV

Again, there are probably a half-dozen or so more ways you could store your application’s messages, and all depend on your downstream needs. Additionally, I would implode you to look into projects such as Apache Kafka which specialize in message publishing as potentially a scaleable, drop in solution.

👤Tony D

1đź‘Ť

Three recommendations:

  1. If you give PostgreSQL a decent amount of resources (say, an Amazon m3.large instance), then “a lot of rows” for a PostgreSQL database is around 100 million rows (depending). That’s not a limit, it’s just enough rows that you’ll have to spend some time working on performance. So assuming that chats average 100 messages, then that would be one million conversations. So having one row per message is not a performance problem at the scale you’re talking about.
  2. Don’t use a numerical PK as your main way of ordering conversations (you might still have one, Django likes having one). Do have a timestamptz column, which is how you reconstruct the order of conversations.
  3. Have a unique index on user, timestamptz (since a user can’t post two messages simultaneously), and another unique index on conversation, timestamptz (this will allow you to reconstruct conversations quickly).
  4. You should also have a table called “conversations” which summarizes conversation_id, list-of-users, because this will make it easy to answer the request “show me all my conversations”.

Does that answer your questions?

👤FuzzyChef

Leave a comment