It requires some design efforts to structure the code in a way that it is easily maintainable. Heike Stephan an I started the reimplementation of TopicExplorer as part of the project for the course on foundations of web technology. It should be able to
- load the topic data in Json format from the existing TopicExplorer backend,
- show the topics as lists of words with color background in the order that is
precomputed based on topic similarity,
- allow navigation of the precomputed topic hierarchy.
The first implementation (demo) could just show a hint using colors, which topics would be merged when moving upwards instead of featuring a fully navigable hierarchy. As the demo code is compiled in debug mode, the full interaction history is locally recorded and could saved to a file. Such file would be helpful to report bugs or request new features.
The code consists of three modules, one for each of the tasks listed above. The Json module has several record type aliases to model the nested data
structure incoming from the backend. These records are converted to types that hold the necessary data for implementing the topic view module. The hierarchy module also converts the records from the Json module to extract the hierarchy data. However, it also holds the topic color, which is actually part of the view. The rest of the implementation of the fully navigable hierarchy would require even more access of the hierarchy types to view information.
Two alternative options of refactoring seem possible:
- The hierarchy type gets an additional type parameter that can hold topic records with any additional information.
- The hierarchy type just knows of a topic-id. All the linking to additional topic information must be done elsewhere outside the hierarchy module.
Option 1 has the benefit of direct access from the hierarchy node to additional
topic information. The hierarchy construction would need to two parameters, one for hierarchy information and one for the generic type parameters.
Topic ids would be part of both parameters. All in all this solution seems to lead to code where the hierarchy is a central data structure.
Option 2 would make the hierarchy module much smaller. All the code that connects a hierarchy to the rest would be outside of the hierarchy module. A lean hierarchy implementation would allow several different hierarchies. This seems plausible as the current topic similarity based on similarities in the topic vocabulary has a dual topic similarity based on common appearances of topics in documents. Common appearances of topics in documents lead to connections between topics with potentially different vocabulary. Ranking on topic co-occurrences have been used already as SQL analytics. Thus, different topic hierarchies could be useful in real application scenarios.
In the next part, we explore the impact of option 2 onto the overall implementation.