XML & Maschine Learning
Zielsetzung der Arbeit ist es, die vorhandene Datenerfassung einer realen Produktionsstätte zu untersuchen, deren Potential für eine Anwendung von Maschinellem Lernen (ML) zu evaluieren und die angewandte Methodik für zukünftige Evaluationen gleicher Problemstellungen zu erfassen. Der Fokus liegt dabei in der Verarbeitung von XML-Datensätzen aus Mess- und Prüfdaten einer Produktion. Die Arbeit umfasst dabei die folgenden Aspekte:
· Erarbeitung einer standardisierten Methodik zur Verarbeitung und Evaluierung von Mess- und Prüfdaten aus XML-Datensätzen im Hinblick auf deren Verwendung in ML-Verfahren
· Aufzeigen der bestehenden Datenerfassung und Verarbeitung innerhalb des Anwendungsfalles
· Untersuchung und Testen der vorhandenen Datensätze auf Informationsgewinn durch ML
XML & RDF
The Shapes Constraint Language (SHACL) is a data modeling language for describing and validating RDF data.This paper introduces SHAX, which is an XML syntax for SHACL. SHAX documents are easy to write andunderstand. They cannot only be translated into executable SHACL, but also into XSD describing XML dataequivalent to the RDF data constrained by the SHACL model. Similarly, SHAX can be translated into JSONSchema describing a JSON representation of the data. SHAX may thus be viewed as an abstract data modelinglanguage, which does not prescribe a concrete representation language (RDF, XML, JSON, …), but can betranslated into concrete models validating concrete model instances.
RDFe is an XML language for mapping XML documents to RDF triples. The name suffix “e” stands for expression and hints at the key concept, which is the use of XPath expressions mapping semantic relationships between RDF subjects and objects to structural relationships between XML nodes. More precisely, RDF properties are represented by XPath expressions evaluated in the context of an XML node which represents the triple subject and yielding XDM value items which represent the triple object. The expressiveness of XPath version 3.1 enables the semantic interpretation of XML resources of any structure and content. Required XPath expressions can be simplified by the definition of a dynamic context whose variables and functions are referenced by the expressions. Semantic relationships can be across document boundaries, and new XML document URIs can be discovered in the content of input documents, so that RDFe is capable of gleaning linked data. As XPath extension functions may support the parsing of non-XML resources (JSON, CSV, HTML), RDFe can also be used for mapping mixtures of XML and non-XML resources to RDF graphs.
An infrastructure for integrating XQuery into Java systems is described. The infrastructure comprises a new API (XQJPLUS, built on the standard API XQJ) and a tool for Java code generation. The basic idea of the approach is to deliver query results not in terms of query result items, but in terms of “information units”, ready-to-use entities assembled from the result items. The assembly process is guided by control information embedded into the query result, so that the query controls exactly what will be delivered, and in which form. Information units can represent information in a great variety of forms, including many map types and custom objects. The information units produced by a query are collected into a special container ("info tray") which offers name-based, intuitive access to the units. The query-specific structure of an info tray may be formally defined by a tray schema from which an "info shape" can be generated, a Java class representing a specific kind of info tray and offering compiler checked data access. Info trays also support data integration, as their possibly very heterogeneous contents can be addressed in a uniform way, using path-like expressions.
The appropriateness of the XQuery language for data integration is explored. The starting point is an assessment of integration capabilities in an XML-only environment. The next step is an evaluation of the degree to which one may extend these capabilities to heterogeneous environments with multiple media types and various data access protocols. This leads to the identification of a key challenge, which is the structured representation of non-XML data formats by items of the XQuery data model. The current support for such representation is reviewed, and a conceptual base is proposed for modeling the relationship between data model items and instances of non-XML formats. As special facets of data integration, the roles of REST and RDF in XQuery-based integration are discussed, and general limitations of XQuery as an integration language are acknowledged.
This paper defines the concept of topic tools, which are command line tools providing a single point of access to a range of functionality. Topic tools conform to a generic model of invocation syntax and basic tool behaviour, concerning user assistence, error diagnostics and invocation reuse. The paper proposes a comprehensive model of the user-perspective - syntax and behaviour - and it introduces a simple development framework making the creation of XQuery topic tools simple and fast. The support offered by the framework includes code generation and the use of a message interface which cleanly isolates the application code from user input and gives it access to validated and augmented information, rather than the raw data of user input. Key properties of framework-based topic tools are early availability, extensibility, user convenience, behavioural consistency and reliability based on very thorough and fully automated input validation.
A new expression language (FOXpath, short for folder XPath) enables XPath-like addressing of files and folders in a file system. The first version of the language is a modified copy of XPath 3.0, with node navigation removed and file system navigation added. The language is based on the data model XDM 3.0, without assuming any modifications of the model. In a second step, the language was merged back into XPath 3.0, resulting in FOXpath 3.0, which is a superset of XPath 3.0. The new expression language supports node navigation, file system navigation and a free combination of both functionalities within a single path expression. A reference implementation is described, and the possibility of extending the new functionality beyond file systems is discussed.
The FOXpath language extends the XPath language by adding support for file system navigation. This paper explores possibilities how to extend file system navigation beyond physical file systems and include logical file systems like jar files, SVN repositories or github projects. The extension is based on a set of simple concepts related to URIs and their processing, and it is implemented as a FOXpath processor which supports the navigation of physical and various types of logical file systems.
XML Technologies - miscellaneous
XML-related standards imply an architecture of distributed information which integrates all accessible XML resources into a coherent whole. Attempting to capture the key properties of this architecture, the concept of an info space is defined. The concept is used as a tool for deriving desirable extensions of the standards. The proposed extensions aim at a fuller realization of the potential offered by the architecture. Main aspects are better support for resource discovery and the integration of non-XML resources. If not adopted by standards, the extensions may also be emulated by application-level design patterns and product-specific features. Knowledge of them might therefore be of immediate interest to application developers and product designers.
XDML is a set of rules how XDM values can be built which are more useful entities as compared to ordinary XDM values. The key idea is to insert into the XDM values control information which guides the interpretation and processing of the data. In particular, it structures the XDM value into named parts and associates these parts with metadata. The control information is evaluated by an XDML processor, which reports and processes the data accordingly. The processing of a part is organized as the execution of operations which the control data bind to the part, but whose actual invocation depends on API calls of the XDML user. The bindings are represented by request messages which encode the actual input to operations selected from an extensible library of available "XDML operations". The operation bindings of a part can be regarded as a specific interface dynamically attached to the data of the part. The net result of this approach is to enable the creation of self-describing XDM values: they encode the way how they are presented to applications, as well as how they should or might be processed. This means that the XDM producer - e.g. XQuery programs - can emit "rich" data whose downstream processing is significantly simplified.
A proposal is made how to extend the XML node model in order to be compatible with JSON markup as well as XML markup. As XML processing technology (XPath, XQuery, XSLT, XProc) sees instances of the node model, but does not see syntax, it is thus enabled to handle JSON as well as XML. The extended node model is dubbed a Unified Document Language, as it defines the construction of documents from building blocks (nodes) which can be encoded in various markup languages (XML, JSON, HTML).
We propose an approach how to complement XPath navigation with a node search which does not require node construction. Node search is based on a set of external properties (a “p-face”) which a node may assume in the context of a node collection. Being external, these properties can be retrieved without node construction, and being stored outside the nodes they can be maintained and queried by non-XML technologies, e.g. relational and NOSQL databases. A small set of concepts, carefully aligned with the XQuery data model, allows the seamless integration of various non- XML technologies driving node selection, without introducing any dependencies of XQuery code on any particular technology. A first implementation of the concepts is presented.
Download: node search.pdf
Conventional use of XSD documents is mostly limited to validation, documentation and the generation of data bindings. The possibility of additional uses is little considered. This is probably due to the difficulty of processing XSD, caused by its arcane graph structure. An effective solution might be a generic transformation of XSD documents into a tree-structured representation, capturing the model contents in a transformation-friendly way. Such a tree-structured schema derivative is offered by location trees, a format defined in this paper and generated by an opensource tool. The intended use of location trees is an intermediate to be transformed into interesting artifacts. Using a chemical image, location trees can play the role of a catalyst, dramatically lowering the activation energy required to transform XSD into valuable substances. Apart from this capability, location trees are composed of a novel kind of model components inviting the attachment of metadata. The resulting metadata trees enable innovative tools, including source code generators. A few examples illustrate the new possibilities, tentatively summarized as XSD based tool development.
A code generator for document to document transformation is introduced. It reduces the development effort to editing a set of metadata items attached to a tree model of the target documents. Metadata values are XQuery expressions which are typically so simple that they do not require genuine programming skills. Nevertheless, expressions are more difficult to provide than static values, and therefore possibilities of further simplifying the development task are explored, striving to enable subject matter experts to define the transformation without writing XQuery expressions. This can be achieved by generating the expressions from assertions about alignments between source and target nodes, although specific requirements will often necessitate additional information. As alignments can be represented graphically by connecting lines, the approach amounts to a solid conceptual foundation for graphical mapping tools. Finally, the underlying model of code generation driven by target document structure is generalized into a conceptual framework which is not restricted to XML data sources. Its usefulness is demonstrated by a simple code generator for transforming RDF data into XML documents.
Download: rethinking transformation.pdf
Es gibt viele Texte, die sich mit dem Unterschied zwischen dem agilen und dem klassischen Projektmanagement befassen. Nun einen weiteren Text über dieses Thema. Dabei versuche ich die Unterschiede auf zwei Ebenen zu veranschaulichen. Auf der Haltungsebene (Mindset) und auf der Theorieebene.