Due to the benefits of the programming language Elm, which compiles to JavaScript, we plan to implement the TopicExplorer web interface in Elm. We discuss the re-implementation in several blog posts. In this first part, we show a demo and discuss the code of the topic navigator sub-interface.
The current implementation of the TopicExplorer web-client uses JavaScript in
combination with the KnockOutJS library — similar to AngularJS. This approach uses the model-view-controller concept with Html templates and RequireJS modules. It is successful at decoupling different parts of the code and taming the complexity of the overall implementation. However, the use of JavaScript makes it harder for developers, who are new to project (e.g. bachelor students), to contribute without breaking existing code.
The Elm programming language dramatically reduces the mental workload of a programmer when extending or refactoring large projects. This is achieved by capitalizing on the Elm-to-JavaScript compiler that checks extensively the code base for a large variety of errors. It catches simple bugs like typos in variable names but also complicated corner cases in type logic. To make this work, Elm is a functional programming language with roots in ML and Haskell that is statically typed and has only immutable variables.
It requires some design efforts to structure the code in a way that it is easily maintainable. Heike Stephan an I started the reimplementation of TopicExplorer as part of the project for the course on foundations of web technology. It should be able to
- load the topic data in Json format from the existing TopicExplorer backend,
- show the topics as lists of words with color background in the order that is
precomputed based on topic similarity, - allow navigation of the precomputed topic hierarchy.
The first implementation (demo) could just show a hint using colors, which topics would be merged when moving upwards instead of featuring a fully navigable hierarchy. As the demo code is compiled in debug mode, the full interaction history is locally recorded and could saved to a file. Such file would be helpful to report bugs or request new features.
The code consists of three modules, one for each of the tasks listed above. The Json module has several record type aliases to model the nested data
structure incoming from the backend. These records are converted to types that hold the necessary data for implementing the topic view module. The hierarchy module also converts the records from the Json module to extract the hierarchy data. However, it also holds the topic color, which is actually part of the view. The rest of the implementation of the fully navigable hierarchy would require even more access of the hierarchy types to view information.
Two alternative options of refactoring seem possible:
- The hierarchy type gets an additional type parameter that can hold topic records with any additional information.
- The hierarchy type just knows of a topic-id. All the linking to additional topic information must be done elsewhere outside the hierarchy module.
Option 1 has the benefit of direct access from the hierarchy node to additional
topic information. The hierarchy construction would need to two parameters, one for hierarchy information and one for the generic type parameters.
Topic ids would be part of both parameters. All in all this solution seems to lead to code where the hierarchy is a central data structure.
Option 2 would make the hierarchy module much smaller. All the code that connects a hierarchy to the rest would be outside of the hierarchy module. A lean hierarchy implementation would allow several different hierarchies. This seems plausible as the current topic similarity based on similarities in the topic vocabulary has a dual topic similarity based on common appearances of topics in documents. Common appearances of topics in documents lead to connections between topics with potentially different vocabulary. Ranking on topic co-occurrences have been used already as SQL analytics. Thus, different topic hierarchies could be useful in real application scenarios.
In the next part, we explore the impact of option 2 onto the overall implementation.