I'm thinking of categorizing threads through #tagging. Right now it's not possible to query threads by topic, and I don't want to go all full-text or semantic search (yet at least). Without reading up on any best practices for a system like this, I'm thinking of the following process:
- user submits post to thread (also what happens on thread creation)
- store thread and return to user, but also kick off a background job to process the text content and categorize the thread based on tags found in it's posts
- add a (repeatable) c query parameter to /t that, if present, filters threads by catgeory
- add a op query paramter to /t that defaults to any, but can also be set to all to only include threads that has all the selected categories
To avoid a very complicated parser to extract tags, I'll restrict it to
- only try to extract tags from the last line
- the line must start with a '#' character
- tags are on the format #[a-z]{3,}
I have to make some UI changes if I want to display tags on posts as links to a thread query that includes the given tag. I think it would be best to gather the tags at the top of the thread and display it under the thread ID and author heading. If I’m using # to denote tags, I’ll start using $ to denote thread and post IDs instead of # to avoid any confusion.
If I've done everything correct, it should be possible to categorize threads now using the syntax described above. I dropped changing the symbol used to denote thread and post IDs, but might change that later.
#devlog
One UI thing I missed: don't include the thread-categories div if there are no categories to display.
Tweaked the UI a little. Removed the border between threads, and added a border-top on the categories div. That also removed the need to hide the categories div if it's empty. Also, my initial rough implemntation of categorization took the entire thread and processed all posts every time. Now it just takes the post that has been written and processes that.
I think I can call this feature complete. Only thing I'm considering is if I should remove the text line containing the tags from the post content after processing. Makes for a cleaner look. Or, I could keep it in the database and remove it when it's rendered. Either will work.
If I decide to remove it, I should take care to handle posts consisting of only tags correctly.