I’d love to hear other people’s opinions on my posts. So I’m adding the ability to post and view comments to this blog.
There are solutions that streamline the integration of comments, with Disqus being the oldest service I know. However, I haven’t had good experiences with Disqus as a reader on other blogs. Disqus always took a very long time to load the comments, sometimes failing altogether, and their UI is also not quite to my taste.
Alternatives like giscus or utterances use GitHub to authenticate users and store the underlying comments data. But I don’t feel comfortable letting my website depend on an ad hoc solution like that. I want my website to last. Without a clear migration plan for existing comments, the longevity of these solutions becomes uncertain. Since GitHub didn’t create GitHub Discussions to be used in that way, they aren’t responsible for keeping things the way we want forever. Additionally, I aim to keep the comment creation process on my blog as easy as possible, requiring no login—just enter your name.
Given my existing API written in Rust (yes, I’m on the hype train too) and a PostgreSQL database used to track post views, I might as well make more use of that. So I decided to create my own comment platform. This means I have to address various aspects (this is not supposed to be an exhaustive list):
GET
/POST
commentsThis post will delve into the data modeling process, covering the data schema, data fetching, and the implementation to rank and fetch comments for display.
Modeling involves designing database schemas after researching a suitable model
for storage and easy querying. To sum up, I went for the adjacency list model.
Each comment is a record in the database containing comment data and a
parent_id
pointing to the comment it’s replying to. If parent_id
is NULL
,
it is a root comment. To simplify things, there’s no depth limit, and if
necessary, it’s easy to enforce this later at the application level. Below is
the schema in Prisma, along with a corresponding SQL migration script.
model Post {
id Int @id @default(autoincrement())
category String
slug String
title String
comments Comment[]
@@unique([category, slug], map: "posts_category_slug_unique")
@@map("posts")
}
model Comment {
id Int @id @default(autoincrement())
author_ip String
author_name String
author_email String?
content String
post_id Int
parent_id Int?
created_at DateTime @default(now())
upvote Int @default(0)
parent Comment? @relation("ChildComment", fields: [parent_id], references: [id])
comments Comment[] @relation("ChildComment")
post Post @relation(fields: [post_id], references: [id])
@@map("comments")
}
-- CreateTable
CREATE TABLE "posts" (
"id" SERIAL NOT NULL,
"category" TEXT NOT NULL,
"slug" TEXT NOT NULL,
"title" TEXT NOT NULL,
CONSTRAINT "posts_pkey" PRIMARY KEY ("id")
);
-- CreateTable
CREATE TABLE "comments" (
"id" SERIAL NOT NULL,
"author_ip" TEXT NOT NULL,
"author_name" TEXT NOT NULL,
"author_email" TEXT,
"content" TEXT NOT NULL,
"post_id" INTEGER NOT NULL,
"parent_id" INTEGER,
"created_at" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
"upvote" INTEGER NOT NULL DEFAULT 0,
CONSTRAINT "comments_pkey" PRIMARY KEY ("id")
);
-- CreateIndex
CREATE UNIQUE INDEX "posts_category_slug_unique" ON "posts"("category", "slug");
-- AddForeignKey
ALTER TABLE "comments"
ADD CONSTRAINT "comments_parent_id_fkey" FOREIGN KEY ("parent_id")
REFERENCES "comments"("id") ON DELETE SET NULL ON UPDATE CASCADE;
-- AddForeignKey
ALTER TABLE "comments"
ADD CONSTRAINT "comments_post_id_fkey" FOREIGN KEY ("post_id")
REFERENCES "posts"("id") ON DELETE RESTRICT ON UPDATE CASCADE;
To simplify things further, we’re just going to sort the comments by upvotes and timestamp, but the nested replies have to be sorted too. That means we have to sort every level of comment replies. It’s too complex (for me) to get a final result using only the database queries. So what we’re doing in SQL is to only sort the top-level comments for pagination, and then get all the child comments. This is Hacker News–like thread fetching. Sorting nested comments is deferred to the Rust API.
WITH RECURSIVE root_comments AS (
SELECT
comments.parent_id,
comments.id,
comments.author_name,
comments.content,
ARRAY [comments.id],
0,
comments.created_at,
comments.upvote
FROM comments
JOIN posts ON (posts.id = comments.post_id)
WHERE posts.category = 'blog'
AND posts.slug = $1
AND comments.parent_id IS NULL
ORDER BY comments.upvote DESC, comments.created_at
LIMIT $2 OFFSET $3
), t(parent_id, id, author_name, content, root, level, created_at, upvote) AS (
(
SELECT * FROM root_comments
)
UNION ALL
SELECT
comments.parent_id,
comments.id,
comments.author_name,
comments.content,
array_append(root, comments.id),
t.level + 1,
comments.created_at,
comments.upvote
FROM t
JOIN comments ON (comments.parent_id = t.id)
)
SELECT * FROM t
ORDER BY root;
This SQL snippet draws significant inspiration from a Stack Overflow response that I regrettably couldn’t locate. Executing the SQL query, we’ll have a list of sorted comments. Any comment is guaranteed to be placed after its parent comment in the list. The SQL query employs recursive common table expressions (CTEs) for this purpose, ensuring comments are ordered based on their hierarchical structure.
parent_id | id | author_name | content | root | level | created_at | upvote |
---|---|---|---|---|---|---|---|
NULL | 1 | wonrax | 1. | {1} | 0 | 2023-10-25 | 0 |
1 | 2 | wonrax | 1.1. | {1,2} | 1 | 2023-10-25 | 0 |
2 | 3 | wonrax | 1.1.1. | {1,2,3} | 2 | 2023-10-26 | 0 |
3 | 6 | wonrax | 1.1.1.1. | {1,2,3,6} | 3 | 2023-10-26 | 0 |
1 | 4 | wonrax | 1.2. | {1,4} | 1 | 2023-10-26 | 1 |
1 | 5 | wonrax | 1.3. | {1,5} | 1 | 2023-10-26 | 2 |
NULL | 7 | wonrax | 2. | {7} | 0 | 2023-10-26 | 0 |
7 | 8 | wonrax | 2.1. | {7,8} | 1 | 2023-10-26 | 0 |
Now we need to rank the children of the top-level comments at the application level. The end result we want is similar to the one above, but all the comments on the same level are also sorted by upvotes and timestamp.
Our recursive ranking function receives a list of comments as input and outputs a sorted list. Firstly, it will loop through:
Finding all children of the current comment—this is , where k is the
number of children of the current comment. We do this by incrementing the index
until we get a node whose level is less than or equal to the current node’s
level. Call this node next
and set it aside for now.
We then recursively call our function with all of its children, which returns the sorted children.
We skip to the next
node and repeat the process or stop if we reach the end
of the list.
After we sort all children of our top-level comments, we need to sort the
top-level comments themselves. While finding the next top-level nodes, we also
store the indices of these nodes in a separate list. Now we sort the top-level
nodes and attach their children to this new list—, where
is the number of top-level comments, according to Rust’s built-in
sort_unstable_by_key
function.
We finish the function by returning this new list.
The complexity of this algorithm is:
Which deduces to:
with being the depth of the reply tree. If is unlimited and we assume the worst case which the branching factor being a constant , the complexity is simplified to:
For this method, first we iterate through the list and construct a comment tree. This is with the help of a hash map.
Having the tree, we can just recursively sort the children at each node. This is also if the reply depth is unlimited and assuming the branching factor is a constant .
Finally, we turn the tree back into a list by performing a depth-first search, which has a complexity of .
The complexity of this algorithm is:
Which simplifies to:
It seems the second method could be a bit faster despite both being ,
and it’s easier to reason about and implement too. For the sake of learning
Rust, and to verify my calculation and reasoning, I implemented both solutions
and did a benchmark. Below is a table showing the time each implementation took
in average for an input of n
comments.
Number of comments | Iterative recursive | Intermediate tree |
---|---|---|
10 | 1.3687 µs | 1.6782 µs |
1000 | 252.51 µs | 201.47 µs |
10000 | 3.3283 ms | 1.9920 ms |
100000 | 61.469 ms | 27.919 ms |
For a relatively small number of comments, the iterative recursive implementation is faster probably due to the fact that it doesn’t need to construct a new tree and converting it back to list, which has more overhead than the algorithm itself. But as increases, we can see the intermediate tree implementation is the clear winner here.
Does this mean that my reasoning and calculations are correct? Not entirely, I’m afraid. But I feel more confident about my ability to approximate (or in academic terms, asymptotically analyze) the worst-case complexity of an algorithm. One thing that I wish I could do to make this post more complete is analyze the space complexity, or memory usage, of these two algorithms. But I’m too lazy.
I use the criterion crate for
benchmarking. I think both implementations can be further optimized if I really
try—like the iterative_recursive
method could use slices instead of Vec
so
it doesn’t need to allocate new ones. But unless this becomes an actual
problem, I won’t optimize it.
use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};
use rand::Rng;
#[derive(Clone)]
struct Comment {
id: i32,
author_name: String,
content: String,
parent_id: Option<i32>,
created_at: chrono::NaiveDateTime,
upvote: i32,
depth: i32,
}
#[derive(Clone)]
struct CommentView {
id: i32,
author_name: String,
content: String,
parent_id: Option<i32>,
created_at: chrono::NaiveDateTime,
children: Option<Vec<Rc<RefCell<CommentView>>>>,
upvote: i32,
depth: usize,
}
pub fn criterion_benchmark(c: &mut Criterion) {
let mut group = c.benchmark_group("blog_comments");
for i in [10, 1000, 10000, 100000].iter() {
let comments = generate_comments(*i, i / 100);
group.bench_function(BenchmarkId::new("iterative_recursive", i), |b| {
b.iter(|| iterative_recursive_sort(comments.clone()))
});
group.bench_function(BenchmarkId::new("intermediate_tree", i), |b| {
b.iter(|| intermediate_tree_sort(comments.clone()))
});
}
group.finish();
}
fn generate_comments(n: usize, max_depth: usize) -> Vec<Comment> {
let mut comments = vec![];
for i in 0..n {
let depth = rand::thread_rng().gen_range(0..max_depth + 1);
let comment = Comment {
id: i as i32,
author_name: "author".to_string(),
content: "content".to_string(),
parent_id: None,
created_at: chrono::offset::Local::now().naive_local(),
upvote: 0,
depth: depth as i32,
};
comments.push(comment);
}
comments
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
use std::{collections::HashMap, rc::Rc};
fn iterative_recursive_sort(comments: Vec<Comment>) -> Vec<CommentView> {
use std::borrow::BorrowMut;
sort_vec(comments.into_iter().map(Rc::new).collect())
.into_iter()
.map(|mut comment| CommentView {
id: comment.id,
author_name: comment.borrow_mut().author_name.to_owned(),
content: comment.borrow_mut().content.to_owned(),
parent_id: comment.parent_id,
created_at: comment.created_at,
children: None,
upvote: comment.upvote,
depth: comment.depth as usize,
})
.collect()
}
fn sort_vec(comments: Vec<Rc<Comment>>) -> Vec<Rc<Comment>> {
if comments.len() == 0 {
return vec![];
}
let mut result = Vec::with_capacity(comments.len());
// Map root comment id to its children
let mut top_level_comments_children: HashMap<i32, Vec<Rc<Comment>>> = HashMap::new();
// Because the comments are already sorted by id and depth, the first
// comment's depth is the most shallow depth
let root_depth = comments[0].depth;
let mut current_root_comment_id = comments[0].id;
for comment in comments {
if comment.depth > root_depth {
top_level_comments_children
.get_mut(¤t_root_comment_id)
.unwrap()
.push(comment.clone());
} else {
// Indicating we have reached a new root comment
current_root_comment_id = comment.id;
result.push(comment.clone());
top_level_comments_children.insert(comment.id, vec![]);
}
}
// By now, result only contains the root comments
result.sort_unstable_by_key(|k| (-k.upvote, k.created_at));
// Sort the children recursively
let mut curr = 0;
for top_level_comment in result.clone() {
let children = top_level_comments_children
.remove(&top_level_comment.id)
.unwrap();
let children_length = children.len();
let sorted_children = sort_vec(children);
// emplace the children back into the result array
result.splice(curr + 1..curr + 1, sorted_children.into_iter());
curr += children_length + 1;
}
result
}
use std::{cell::RefCell, collections::HashMap, rc::Rc};
fn intermediate_tree_sort(comments: Vec<Comment>) -> Vec<CommentView> {
let num_comments = comments.len();
let mut nested = flat_comments_to_tree(comments);
sort_tree(&mut nested);
let mut result: Vec<Rc<RefCell<CommentView>>> = Vec::with_capacity(num_comments);
for comment in nested {
depth_first_search(&comment, &mut result);
}
// children are not needed anymore
for comment in &mut result {
comment.borrow_mut().children = None;
}
result
.into_iter()
.map(|c| c.borrow_mut().to_owned())
.collect()
}
fn flat_comments_to_tree(comments: Vec<Comment>) -> Vec<Rc<RefCell<CommentView>>> {
let mut tree = HashMap::<i32, Rc<RefCell<CommentView>>>::with_capacity(comments.len());
let mut final_comments: Vec<Rc<RefCell<CommentView>>> = vec![];
for comment in comments {
let c = Rc::new(RefCell::new(CommentView {
id: comment.id,
author_name: comment.author_name,
content: comment.content,
parent_id: comment.parent_id,
created_at: comment.created_at,
children: None,
upvote: comment.upvote,
depth: comment.depth as usize,
}));
tree.insert(c.borrow().id, c.clone());
if let Some(parent_id) = c.borrow().parent_id {
// If this is a child comment, add it to its parent's children
let parent = tree.get(&parent_id);
if let Some(parent) = parent {
let mut mut_parent = parent.borrow_mut();
if let Some(children) = mut_parent.children.as_mut() {
children.push(c.clone());
} else {
let children = vec![c.clone()];
mut_parent.children = Some(children);
}
}
};
if comment.parent_id.is_none() {
final_comments.push(c.clone());
}
}
final_comments
}
fn sort_tree(comments: &mut Vec<Rc<RefCell<CommentView>>>) {
// sort the top level comments
comments.sort_unstable_by_key(|k| (-k.borrow().upvote, k.borrow().created_at));
// sort the children recursively
for comment in comments {
if let Some(children) = comment.borrow_mut().children.as_mut() {
sort_tree(children);
}
}
}
fn depth_first_search(
comment: &Rc<RefCell<CommentView>>,
mut result: &mut Vec<Rc<RefCell<CommentView>>>,
) {
result.push(comment.clone());
if let Some(children) = comment.borrow().children.as_ref() {
for child in children {
depth_first_search(child, &mut result);
}
}
}