Volltext vs. abgeleitetes Textformat: Systematische Evaluation der Performanz von Topic Modeling bei unterschiedlichen Textformaten mit Python
Abgeleitete Textformate bergen das Potenzial allgemein verfügbare Korpora zu erstellen und zu publizieren, die urheberrechtlich weitgehend unbedenklich sind. In dieser Masterarbeit werden Hypothesen über die Eignung dieser Textformate für Topic Modeling aufgestellt und überprüft. Hierfür wird eine in Python geschriebene Pipeline implementiert, die den Volltext schrittweise in mehrere Textformate umwandelt und daraus Topic Modelle erzeugt. Anschließend werden zur Bewertung der Topics ihre Kohärenzen errechnet und verglichen. Das verwendete Korpus besteht zum Zwecke der Nachvollziehbarkeit aus gemeinfreien englischen Romanen aus dem 19. und 20. Jahrhundert.
PDF abrufbar auf Zenodo
XML & Machine Learning
Anwendungspotential von Maschinellen Lernprozessen auf XML-Standardisierten Mess- und Prüfdaten
Zielsetzung der Arbeit ist es, die vorhandene Datenerfassung einer realen Produktionsstätte zu untersuchen, deren Potential für eine Anwendung von Maschinellem Lernen (ML) zu evaluieren und die angewandte Methodik für zukünftige Evaluationen gleicher Problemstellungen zu erfassen. Der Fokus liegt dabei in der Verarbeitung von XML-Datensätzen aus Mess- und Prüfdaten einer Produktion. Die Arbeit umfasst dabei die folgenden Aspekte:
· Erarbeitung einer standardisierten Methodik zur Verarbeitung und Evaluierung von Mess- und Prüfdaten aus XML-Datensätzen im Hinblick auf deren Verwendung in ML-Verfahren
· Aufzeigen der bestehenden Datenerfassung und Verarbeitung innerhalb des Anwendungsfalles
· Untersuchung und Testen der vorhandenen Datensätze auf Informationsgewinn durch ML
XML & RDF
Combining graph and tree: writing SHAX, obtaining SHACL, XSD and more
The Shapes Constraint Language (SHACL) is a data modeling language for describing and validating RDF data.This paper introduces SHAX, which is an XML syntax for SHACL. SHAX documents are easy to write andunderstand. They cannot only be translated into executable SHACL, but also into XSD describing XML dataequivalent to the RDF data constrained by the SHACL model. Similarly, SHAX can be translated into JSONSchema describing a JSON representation of the data. SHAX may thus be viewed as an abstract data modelinglanguage, which does not prescribe a concrete representation language (RDF, XML, JSON, …), but can betranslated into concrete models validating concrete model instances.
RDFe – expression-based mapping of XML documents to RDF triples
RDFe is an XML language for mapping XML documents to RDF triples. The name suffix “e” stands for expression and hints at the key concept, which is the use of XPath expressions mapping semantic relationships between RDF subjects and objects to structural relationships between XML nodes. More precisely, RDF properties are represented by XPath expressions evaluated in the context of an XML node which represents the triple subject and yielding XDM value items which represent the triple object. The expressiveness of XPath version 3.1 enables the semantic interpretation of XML resources of any structure and content. Required XPath expressions can be simplified by the definition of a dynamic context whose variables and functions are referenced by the expressions. Semantic relationships can be across document boundaries, and new XML document URIs can be discovered in the content of input documents, so that RDFe is capable of gleaning linked data. As XPath extension functions may support the parsing of non-XML resources (JSON, CSV, HTML), RDFe can also be used for mapping mixtures of XML and non-XML resources to RDF graphs.
Java Integration of XQuery - an Information Unit-Oriented Approach
An infrastructure for integrating XQuery into Java systems is described. The infrastructure comprises a new API (XQJPLUS, built on the standard API XQJ) and a tool for Java code generation. The basic idea of the approach is to deliver query results not in terms of query result items, but in terms of “information units”, ready-to-use entities assembled from the result items. The assembly process is guided by control information embedded into the query result, so that the query controls exactly what will be delivered, and in which form. Information units can represent information in a great variety of forms, including many map types and custom objects. The information units produced by a query are collected into a special container ("info tray") which offers name-based, intuitive access to the units. The query-specific structure of an info tray may be formally defined by a tray schema from which an "info shape" can be generated, a Java class representing a specific kind of info tray and offering compiler checked data access. Info trays also support data integration, as their possibly very heterogeneous contents can be addressed in a uniform way, using path-like expressions.
XQuery as a data integration language
The appropriateness of the XQuery language for data integration is explored. The starting point is an assessment of integration capabilities in an XML-only environment. The next step is an evaluation of the degree to which one may extend these capabilities to heterogeneous environments with multiple media types and various data access protocols. This leads to the identification of a key challenge, which is the structured representation of non-XML data formats by items of the XQuery data model. The current support for such representation is reviewed, and a conceptual base is proposed for modeling the relationship between data model items and instances of non-XML formats. As special facets of data integration, the roles of REST and RDF in XQuery-based integration are discussed, and general limitations of XQuery as an integration language are acknowledged.
XQuery topic tools - concept, user interface, development framework
This paper defines the concept of topic tools, which are command line tools providing a single point of access to a range of functionality. Topic tools conform to a generic model of invocation syntax and basic tool behaviour, concerning user assistence, error diagnostics and invocation reuse. The paper proposes a comprehensive model of the user-perspective - syntax and behaviour - and it introduces a simple development framework making the creation of XQuery topic tools simple and fast. The support offered by the framework includes code generation and the use of a message interface which cleanly isolates the application code from user input and gives it access to validated and augmented information, rather than the raw data of user input. Key properties of framework-based topic tools are early availability, extensibility, user convenience, behavioural consistency and reliability based on very thorough and fully automated input validation.
FOXpath - an expression language for selecting files and folders
A new expression language (FOXpath, short for folder XPath) enables XPath-like addressing of files and folders in a file system. The first version of the language is a modified copy of XPath 3.0, with node navigation removed and file system navigation added. The language is based on the data model XDM 3.0, without assuming any modifications of the model. In a second step, the language was merged back into XPath 3.0, resulting in FOXpath 3.0, which is a superset of XPath 3.0. The new expression language supports node navigation, file system navigation and a free combination of both functionalities within a single path expression. A reference implementation is described, and the possibility of extending the new functionality beyond file systems is discussed.
FOXpath navigation of physical, virtual and literal file systems
The FOXpath language extends the XPath language by adding support for file system navigation. This paper explores possibilities how to extend file system navigation beyond physical file systems and include logical file systems like jar files, SVN repositories or github projects. The extension is based on a set of simple concepts related to URIs and their processing, and it is implemented as a FOXpath processor which supports the navigation of physical and various types of logical file systems.
XML Technologies - miscellaneous
The XML info space
XML-related standards imply an architecture of distributed information which integrates all accessible XML resources into a coherent whole. Attempting to capture the key properties of this architecture, the concept of an info space is defined. The concept is used as a tool for deriving desirable extensions of the standards. The proposed extensions aim at a fuller realization of the potential offered by the architecture. Main aspects are better support for resource discovery and the integration of non-XML resources. If not adopted by standards, the extensions may also be emulated by application-level design patterns and product-specific features. Knowledge of them might therefore be of immediate interest to application developers and product designers.
XDML - an extensible markup language and processor for XDM
XDML is a set of rules how XDM values can be built which are more useful entities as compared to ordinary XDM values. The key idea is to insert into the XDM values control information which guides the interpretation and processing of the data. In particular, it structures the XDM value into named parts and associates these parts with metadata. The control information is evaluated by an XDML processor, which reports and processes the data accordingly. The processing of a part is organized as the execution of operations which the control data bind to the part, but whose actual invocation depends on API calls of the XDML user. The bindings are represented by request messages which encode the actual input to operations selected from an extensible library of available "XDML operations". The operation bindings of a part can be regarded as a specific interface dynamically attached to the data of the part. The net result of this approach is to enable the creation of self-describing XDM values: they encode the way how they are presented to applications, as well as how they should or might be processed. This means that the XDM producer - e.g. XQuery programs - can emit "rich" data whose downstream processing is significantly simplified.
From XML to UDL: a unified document language, supporting multiple markup languages
A proposal is made how to extend the XML node model in order to be compatible with JSON markup as well as XML markup. As XML processing technology (XPath, XQuery, XSLT, XProc) sees instances of the node model, but does not see syntax, it is thus enabled to handle JSON as well as XML. The extended node model is dubbed a Unified Document Language, as it defines the construction of documents from building blocks (nodes) which can be encoded in various markup languages (XML, JSON, HTML).
Node search preceding node construction - XQuery inviting non-XML technologies
We propose an approach how to complement XPath navigation with a node search which does not require node construction. Node search is based on a set of external properties (a “p-face”) which a node may assume in the context of a node collection. Being external, these properties can be retrieved without node construction, and being stored outside the nodes they can be maintained and queried by non-XML technologies, e.g. relational and NOSQL databases. A small set of concepts, carefully aligned with the XQuery data model, allows the seamless integration of various non- XML technologies driving node selection, without introducing any dependencies of XQuery code on any particular technology. A first implementation of the concepts is presented.
Download: node search.pdf
Location trees enable XSD based tool development
Conventional use of XSD documents is mostly limited to validation, documentation and the generation of data bindings. The possibility of additional uses is little considered. This is probably due to the difficulty of processing XSD, caused by its arcane graph structure. An effective solution might be a generic transformation of XSD documents into a tree-structured representation, capturing the model contents in a transformation-friendly way. Such a tree-structured schema derivative is offered by location trees, a format defined in this paper and generated by an opensource tool. The intended use of location trees is an intermediate to be transformed into interesting artifacts. Using a chemical image, location trees can play the role of a catalyst, dramatically lowering the activation energy required to transform XSD into valuable substances. Apart from this capability, location trees are composed of a novel kind of model components inviting the attachment of metadata. The resulting metadata trees enable innovative tools, including source code generators. A few examples illustrate the new possibilities, tentatively summarized as XSD based tool development.
Rethinking transformation – the potential of code generation
A code generator for document to document transformation is introduced. It reduces the development effort to editing a set of metadata items attached to a tree model of the target documents. Metadata values are XQuery expressions which are typically so simple that they do not require genuine programming skills. Nevertheless, expressions are more difficult to provide than static values, and therefore possibilities of further simplifying the development task are explored, striving to enable subject matter experts to define the transformation without writing XQuery expressions. This can be achieved by generating the expressions from assertions about alignments between source and target nodes, although specific requirements will often necessitate additional information. As alignments can be represented graphically by connecting lines, the approach amounts to a solid conceptual foundation for graphical mapping tools. Finally, the underlying model of code generation driven by target document structure is generalized into a conceptual framework which is not restricted to XML data sources. Its usefulness is demonstrated by a simple code generator for transforming RDF data into XML documents.
Download: rethinking transformation.pdf
Neues zu Projektmanagement
Unterschiede zwischen der klassischen und der agilen Welt
Es gibt viele Texte, die sich mit dem Unterschied zwischen dem agilen und dem klassischen Projektmanagement befassen. Nun einen weiteren Text über dieses Thema. Dabei versuche ich die Unterschiede auf zwei Ebenen zu veranschaulichen. Auf der Haltungsebene (Mindset) und auf der Theorieebene.
Videoreihe "Hybrides Projektdesign"
Hybrides Projektdesign hilft, die richtigen Werkzeuge für Ihr Projekt zu finden, sie in Ihrem Unternehmen einzubetten und zum Benefit des Projektes anzuwenden. Oder anders gesagt: Passung erzeugen zwischen Projekt und Management. Karen Dittmann (Steinbeis-Transferzentrum IT-Projektmanagement) und Mehrschad Zaeri (parsQube) diskutieren, philosophieren, denken nach über Firmenkulturen, PM-Königreiche und die Profession Projektmanagement. In ihrem HybridBlog betrachten sie unterschiedliche Facetten der Interaktion zwischen Projektauftrag und Umsetzung und wollen zum Reflektieren der eigenen Vorgehensweise einladen. Für Projektmanager, Führungskräfte von Projektleitung und Personalentwickler, die nach Anregungen zur Problemlösung in ihrem eigenen Projektalltag suchen. Sowie alle Interessierte, die endlich einmal die „Wahrheit“ über Agilität und Co erfahren möchten.
Hier geht es zur Videoreihe.
Das Steinbeis- Magazin 03|20
Von der Königsdisziplin im Projektmanagement: Steinbeis- Unternehmerin Dr. Karen Dittmann und ParsQube-Geschäftsführer Mehrschad Zaeri im Interview über hybrides Projektdesign.
Agility as an ability
parsQube`s observation from our organization development and project management support with several Hardware and software organizations over the time shows large number of demands to support the “Agile journey” in their organizations.
Our very first question is, why Agile? Most of the time the answer is “we have a complex product”. the good news is they care about product but what about project? How can their project support their complex product development? does product life cycle go beyond the project life cycle? does product life cycle contain several project life cycles? Eventually What does agile mean for them? From our perspective the rapid changes in product need to be reflected in project as well, therefore the question is: is your organization or you as a C-Suit or project manager ready enough to welcome external and internal changes? Why agility helps organizations today`s uncertain environment?
The mindset of agility
in our previous post we talked about the benefits of being agile for organizations and leaders. As you already noticed we keep talking about mindset and ability. But what is this ability and What is the idea and mindset behind this ability?
To answer this question, we must differentiate the two mindsets’ humans can hold:
- The Fixed mindset (a change can be a source of risks and should be avoided)
- Growing mindset / Agile mindset (a change can be a source of opportunities and are welcome)
Let’s start with a question: Are you a good cook? If your answer to this question is a clear “Yes” or a clear “No” then this can be characterized as a Fixed mindset but what can be the answer of a growing mindset to this question? And what is the goal and character of Agile mindset?