Shift-M/51: Michael Kay about XSLT, XML, and software business
hi everybody this is shift m podcast uh the next episode the new episode with a special guest michael k uh he is in my opinion the godfather of xsl xslt um the the the language the format which i love very much you probably know about that if you follow my projects if you follow me and my blog so i'm really glad to have michael with me michael could you please introduce yourself quite quickly okay um i'm michael k um i'm the editor of the 20130 um xslt specifications and i'm the developer of the saxon implementation of those specs you know uh when i tell my readers my listeners about xml most of them or many of them they say that i'm quite old and i love that format because i uh so does me i'm not a modern programmer i don't understand json yaml in all these formats and they keep telling you that xml is dead so that json is the future what's your take on that they're not engaged in the fields in which xml is important and xml is very very widely used in all kinds of document publishing it's used by for things like patents for academic publishing scientific publishing legal publishing all the uk legislation is published in xml um that's there because it's got to last for 50 years well xml legislation last last for 700 years and they chose a format that's capable of surviving for that long um they're not going to change to something else quickly so um xml is definitely here to stay in those sort of fields um what's going to change is that people who need to do something quick and cheap and dirty are going to use the the um the quickest most effective um solution available to them and very often that's that that's json particularly if you're working in a web browser um so it's a it's a distinction between whether you're doing something for the long term or whether you're doing something cheap and cheerful um that's only of local significance you know they claim that xml like you just mentioned it's a good format for documents for something which is large and they have to be standardized somehow but when it's a choice of what language or what format to use inside an application let's say we're developing a map a mobile app or a web app then the choice for xml it looks to them like quite strange they say why not json because we don't need to be standardized we don't need to to publish our documents anywhere it's inside so inside the application they tend to use json more and more well um conventional programming languages have data models that aren't very well suited to documents um using the dom is hard it's a complex structure because the language wasn't designed for that there's not a good mapping between the data structures used in um in the document world and the data structures used in in conventional programming languages which is where a language like xslt comes in it's why there's a niche for a a special purpose language that's designed for that kind of data rather than the the the kind of data that you you get in um in typical data processing applications what is your story how you do how did you end up doing xml stuff xsl stuff is it was it like business money making like it was necessary for you or you really liked that it's a long story um my um my specialism my phd many years ago um was in database technology um and i spent 25 years working in the british computer manufacturer mainframe manufacturer then was icl on database technology and we got to the point in the 90s where we were developing some big publishing systems for clients particularly scientific publishing news publishing um the 90s was when the web started to become sort of industrial scale in that kind of way and at the time i was working with people in fujitsu developing object database technology and we were trying to deliver those sites using object databases and finding that it didn't actually work that well it worked we built some successful systems um but they were very expensive to build and we realized we hadn't got enough sort of reuse and enough power in the tools that we were using to build lots of sites quickly um and xml was coming along so we knew um xml might be the answer to that problem um and then what really triggered it was um we got a request attendant an invitation to tender from oxford university press to do with the dictionaries and that was very much sgml and xml oriented and i said hey this is our opportunity to get into xml we know we've we've known for ages we need to look at it this is the chance and the the marketing people said no that's too high risk we're not bidding for that contract with the technology that we have no experience in so i said well how the hell are we going to get experience in it if we if we ignore an opportunity like this and so they get they they said okay mike um you've got three months um you write a response to that bid see what you can do um and that's really when i started playing with the technology um i started developing what became saxon um as a prototype to show to that customer to show that we could handle xml and solve their problems um and it was it was good fun because i hadn't done any programming for years i've been too senior to do any programming in that kind of company um you expect to spend your time attending meetings not actually writing code um so i wrote that um that prototype and presented it to the customer and the customer liked it and our marketing people said hey this is high risk we'll we'll double your estimates and so they they they doubled the price and so the customer didn't like the price and didn't buy it which i was very aggrieved about but i'd had three months fun anyway um and that prototype eventually became became the saxon product what i realized was that i'd been developing a java library to do basically rule-based transformation of of xml hierarchies and then i saw that xslt was coming along and that was doing rule-based transformation of xml hierarchies and a lot of the concepts seemed to align very closely so i thought let's turn my library into an xslt processor and that that's how it happened and then then i got invited to write book on xslt and and then the as a result of the book i got head hunted by software agency and then it all moved on from there but the key thing is i didn't start it james clark started xslt he invented the language and i picked it up and did versions two and three and and whatever and that's been the story of my career really picking up something a good idea developed by a good ideas person and um industrializing it turning it into into a good product but when you were making this saxon product which i use in all of my projects right now uh there were other products on the market as far as i remember i remember like 15 or maybe more years ago there was the the app apache product certsis or what's the right yeah i mean there were a dozen xslt one zero processors that came out very quickly um and it was a it was a nice small language it was wasn't too expensive to implement so and of course you know there was a hype curve xml had an enormous hype curve at the beginning because you know all the big players oracle and ibm and sun and microsoft all for some reason decided rather than fighting each other they they'd collaborate on it um and so everyone was very excited by having a standard in that area which we'd never had before and then exactly came along about a year behind as the way of processing it and and lots of people had a go at implementing it um you know there was a microsoft implementation a a um the the lotus xsl implementation from lotus which became part of ibm and which then moved into what is now zaylan um and there were lots of lots of what i call hobbyist implementations um saplotron no one knows about cyblotron anymore um there was um one produced by um uchi bhujin in python called four suite it was called i think at some stage so there were lots of them and um how did saxon sort of emerge i think basically because i stuck at it um most of them most people produced a good version one product and then never got beyond it and why did they never get beyond it well i think in the case of the um commercial operations ibm microsoft what have you um that problem was producing a business case the software was free they weren't making any money on it you can produce a version one by promising your managers that you're it's going to take over the world but when it hasn't taken over the world then it's quite hard to get the funding to produce version two um and then the other end of the scale there were the hobbyist people who were writing in their weekends and i i guess their wives told them they wanted them to do something else at the weekend or perhaps they got interested in some other other new technology because they tend to be the sort of people who who um who like to do something new um and doing a version two of something old um is not what you want to spend your weekends doing um so really the the the reason that saxon carried on while those other things failed um was stickability asked up to it and i found a business model that enabled me to to fund the ongoing development and make some money um to continue that development which the other people haven't so um the product was um it it was good but it wasn't in any way you know so much better than all the others that it was going to beat them on technical grounds it was much more that i had a successful business model that enabled me to to keep developing i remember that like 10 years ago when i was choosing the the xslt um engine for my java projects saxon was not an option for me because it was all commercial if i'm not mistaken so you know there's always been an open source and a and and a commercial version um but there are some limitations i remember that it was always um in some ways the limitations weren't fair enough i mean the the the bbc's coverage of the 2012 london olympics was all using the free version of saxon um i resented that slightly since they spent billions on the olympics and i didn't get any of it but that's the you know there's a downside to the business model as well which is um you you i mean for from my point of view the reason for doing open source is that you get millions of users and then if you know one percent or five percent of them um decide that they need need more than the open source version then you get enough revenue to fund the whole thing um and so open source is the is the marketing mechanism it's the it's the lost leader that that brings in the the revenue so i don't mind the fact that there are billions of users of of the open source version um because the the model works well for me but why people now would pay for saxon i know only one feature which i miss in the open source version is the external functions aside from that i have everything so i don't understand how your how your model works what people pay you for um there are a lot of people who pay for it not because they need the extra features but because they like the sense of security of using of having a commercial relationship with a supplier and it's not expensive for them after all and it's a pretty small part of their total i.t budget but they feel more comfortable with a um a commercial product than with an open source one for something that they're that critically dependent on um so that's the answer for some people for other people yes they need one or more of the features they might need streaming they might get benefit from schema awareness and they might benefit from the the optimization capabilities which become significant when you're doing queries on large documents um so yes the open source version is is um is good enough for 90 of users um but once people get stuck in they they find they need the other need something else from the mix of things that we offer in the you know the rest of it and how many programmers do you have right now in-house if you can disclose this information in the team um we're a team of of six people of whom four are developers and you are the developer or you're in the management side now no i'm i'm cutting code every day it's amazing well i know you from from stack overflow you're not only writing code you're also answering questions there and quite helpful you actually answered a few of my questions no i think many of them so that's that's my that's my next question to you so how do you find time for that and how do you feel about the stack or flow platform because most people don't do that and they they claim that this platform is has all the answers possible to be given so that's it so no reason to be there because all the questions have been answered already so people don't spend time there but you do so yes and i do it because i partly because i think it's a good idea in principle and partly because i enjoy it actually everything i do is a combination of of of doing things because i think it's a good thing to do and and doing it because i enjoy it if things don't meet one of those two criteria then they don't get done except for really necessary things like doing my tax returns but on the whole i do things because because um it seems a good idea and i enjoy it and and why do i enjoy it because i think i think it's very important if you're developing software to be in touch with your users um i actually quite enjoyed the first few years of doing saxonic and i was spending half my time doing consultancy and so i actually got out to visit customers in those days consultants eventually actually traveled you actually flew to california and and visited your users and and and and had dinner with them and drinks with them and things like that and that that was fun i miss that now um just just knowing them by email isn't the same um but even if it's just electronic um knowing what your your users are doing knowing what they have difficulty with knowing what they find easy new and what they find hard and picking up ideas from other people answering the questions um that's important it's a it's a contact with the user base um and apart from bug reports you know seeing it's more or less the only contact i get and and that sort of tells you i mean what makes a good product users understand the error messages people will tell you one thing i like about saxon is the error messages well that's a really boring mundane thing but to me um a bad error message is something that really needs to be fixed um that's the that's what users are dealing with every day they're they're they're reading my error messages um if if if those glare out as being unhelpful as being badly spelled then that's their experience of the product so it's important to get it right and i i put a lot of effort into into those sort of little details and to do that you've got to have the the contact with the user base just see i mean getting good error messages is is is really quite an art because um do you phrase the error message in terms of the proper terminology from the spec or do you use the terminology that the users are using out there which might be quite wrong what users call a tag isn't what the specs call a tag um they'll use the tag to mean element so which word am i going to use in an error message it's quite hard to get that sort of thing right and and getting a balance between a message that is technically correct and a message that users understand sometimes requires a fair bit of thought um and then you've got to phrase the error message in terms of what the user was trying to do not in terms of what was going on internally and that that again gives you a um a significant challenge so yeah you have to think about those things to think about those sort of things you first you have to use the product yourself and that's very important um and um my project over the last year has been um translating the java code of saxon into a c-sharp version of saxon and so to do that had to um write a translator for java to c-sharp and so how do you write such a translator well obviously using xslt and i mean what you're doing java is has a syntactic structure you pass it and you get a syntax tree so you've got a tree structured information structure and you're converting that into another tree structured information structure from which you generate c-sharp and how do you transform one tree to another obviously you use xslt it's the the natural choice that anyone would come up with isn't it i'm joking no it's not you'll be surprised but it's not you know i'm working right now with three projects three different teams and they write translators from one programming language to another programming language none of these teams ever considered xslt as a translator but i'm not a member of these teams i only supervise them so i cannot enforce them to make these decisions but they don't make this decision why because they don't probably know about xslt so what they do they build this abstract syntax tree in memory like part like they make it objects not not xml syntax tree which is which is natural as you said but they make the objects like java objects or c plus plus objects and then from these objects they build another source code making this basically println print print line and of course i mean i'm joking because if i wasn't involved in xslt then it wouldn't occur to me to do it that way that way um but but the fact is um when you're familiar with a technology then then um you can see that it's ideally suited to that job yeah it's perfect yeah and it's it's perfect for it and and so that's the way you do it and of course it has um the the benefit is not that it's xml or that it's xslt the benefit is is is primarily the paradigm that you're doing a recursive descent rule-based transformation and that's what xlt is it's a it's a rule-based language you know for most people it's hard to understand this language that's what they complain about i wrote the compiler in xslt just last year the compiler like instead of you know making it's it's a i have a language with a programming language and then i had a job i have a task to implement the compiler from this language to java so i made it entirely in xslt so i have many many style sheets and they go in one after another so it's like a chain of style sheets and then i transfer the in the in the input abstract syntax tree to the final result if you read a textbook on writing compilers it talks about it as a a a pipeline of of of three to three transformations right and that's exactly the typical architecture of an xslt application but when i show you this when i show this code to other programmers most of them just say i don't understand how it works because because it's something i haven't seen before so they just it is and and that's the the um the resistance to xslt um is because for most people where they're coming from is particularly programmers where they're coming from the direction of non-programmers is quite different um but programmers where they're coming from it's it's just so different from anything they've ever seen before um that it requires um it requires some rewiring of the brain and and therefore the you know the enthusiasts are those who get over the the initial learning curve and and and discover why this weirdness is actually actually such a good thing um but for people first coming to it it it can be quite an obstacle let me let me quote you uh i was listening to one of your speeches of the recent speeches at the conferences and you said exactly this most xslt programmers don't know computer science they see examples and they understand how they work so that that sounds really accurate because most people don't really understand the xslt how really they don't understand that it's a functional language functional programming language yeah just now i mean it's i find it fascinating because i find i sometimes find new technologies quite quite hard to get familiar with and to adopt and that's because i think i i when i look at a new technology i i want to have a deep conceptual understanding of it before i i before i use it um i know other people who are much better at picking up something new um who have a different learning style they they learn by example um they see something that works and they bend it and adapt it and make it fit um without ever having a having having a deep understanding it's like you know some people can jump in a car and press the pedals um and and it goes in the right direction and other people really want to understand why you have to turn the steering wheel back after turning it you know um it's uh it's you can over intellectualize things um and i'm i'm on that end of the spectrum probably as a spectrum as well don't you think that we are getting more and more people of the first kind in the in the in the programming industry oh absolutely yes yes um it's the difference between engineers and mechanics um we've got a lot more mechanics now um who are very you know capable of doing a a good job um but they're um they're not computer scientists um and you know as computer scientists we've been building technology to enable those people to build systems and so we shouldn't complain do you think it will be ever possible to create some something like a simplified version of xslt which would look more you know i i'm dreaming about this project for a few years already something which will simplify the the the syntax of xslt because right now it's basically xml and then xslt is is is not a dialect but it's it's a the the language which we use is xml basically an xlt is just a elements which we use there right but maybe we can we can turn it into something more used more more traditional for programmers like java like the language where you have statements statements after statements a lot of people have tried um you know to to produce a different syntax for xslt and it's it's not that difficult to to do and i think what's happened is that when people have done it you realize that actually you thought syntax was the problem and it wasn't and the problem is is um it's not the syntax it's the concept um it's what the hell is a template rule what what the hell does apply templates actually mean um and it's not a you think syntax is the difficulty but it's not um the difficulty is actually the semantics of the language and and and and improving the syntax doesn't help the other thing is that once you've been in once you've got past that stage of of the syntax looking weird you actually realize that there are some benefits for having an xml based syntax [Music] the benefits of any any big um xslt-based application that i've seen ends up exploiting the fact that xslt is xml and that you can generate xslt and you can you can you can modify xlt you can build libraries of xlt components and assemble them in different ways and the fact that you're using the same conceptual tool set to manipulate your data and your source code i mean that's something that comes from lisp isn't it but data and programs are essentially the same thing and xml and xslt is if you like a continuation of that that that lisp concept of not separating data from programs so it's not the syntax which is the problem let me let me ask you something else about the the committee which you sit on like the w3c committee which defines which i did which i did sit on it's now wound up but uh-huh so how these guys work how these committees work can you explain like you you sit there how how they define these standards how for example i can get in is it possible is it an open door or well um as i say the most of those committees have now disbanded through um not having enough people to take things forward i mean i am hoping to reconvene a group to develop an xslt4 i've been promising that for for for a while but it's it's definitely going to happen this year and that's my project this year i would love to join by the way good um [Music] so how does it how does it work the the actual dynamics vary quite a lot from from one group to another um on the xsl working group um i sort of inherited the role of editor from james clark and the way the group worked didn't really change after that um the the process was very much that i was in a kind of chief designer role not just editing the spec which doesn't mean that all the ideas were mine and that i had the final say on everything it means that um people brought their ideas to me and i had to turn them into something that worked or to come back and say no i don't think that will work perhaps something else would work it also means if i brought an idea to the group then it would get subject to a lot of scrutiny um people would ask people would ask a lot of challenging questions it would be improved in the in the course of review but i was still sort of acting as chief designer with a a group around me that was um helping me to get the details right if you like so it's a very constructive and friendly group the xquery group was very different um when i first came to the xquery group i was i was i was sort of shocked and horrified because there were there were six people on the group at least who were capable of being chief designer who are capable of designing a programming language and they all had radically different ideas about what the language should be um so it's a group that had immense tensions on it simply because it had so much talent in the group and there were too many creative individuals with different ideas as to as to where to take the language and which is very much harder to work with in in many ways and and you didn't have the same feeling of a cohesive sense of unified purpose what you did have was a couple of people um on that group and paul cotton who is the chairman for most of the time i was involved um don chamberlain who was the ex-query editor for a lot of the time um mary fernandez who coordinated with the excel group on defining xpath who were extremely good moderators um who took the creative people by the scruff of the neck and banged their heads together and said um you know this is what you agree about this is what you disagree about let's concentrate on you know sorting out the the areas you agree on we'll work out the areas you disagree on next week um and who who forced the process through by managing the the the the the people who would otherwise have killed each other um so yes different groups have have very different dynamics um i'm told the xml schema group before i joined it there were meetings with 40 people there and they all brought ideas to the table and just managing the agenda was one of the biggest challenges because they'd come to a week-long meeting somewhere in a hotel somewhere in florida and they'd have more work on the table than they could get through in a week and so yeah um it's it's it's different for each group and how does groups they get created is it somebody who's deciding that or it's a chaotic process um you don't really know because i've never been involved it's always been going by the time i got on board and how did you get as i say as with software you know i've heard someone say they're two kinds of people they're starters and finishers and i've always been i've always been the kind of person who finished things that other people have started so how did you get on board somebody invited you or you i got invited by sharon adler who was the chair of the xsl working group um basically on the strength of the wiley book so the group had developed xslt 1-0 with sharon adler and the chair and james clark as editor i developed an implementation and i was asked by wiley to um to write a book on xslt which i did and um when the book landed on sharon's desk um she picked up the phone and asked me to join the working group and then i had to persuade icl to give me the time to to let me do that so basically it's not like there's a form which we can fill up fill up oh you can sure yes um people do um they i think most of the people who give it enough time to be worth having on the group are people who um are not just doing it as a hobby they're definitely engaged they need the group to be successful for for professional and business reasons and otherwise they find they they haven't got time to read all the papers and email and then they get lost in meetings and and and they drift away was it profitable for you to be in this group um in what sense what did you get out of it um it's a very it's a creative process it's a rigorous rigorous process it's a very frustrating process um it's all of those at the same time um if as a software developer i come up with an idea for a language feature i can implement it that morning and think it's done if i take that same feature to a standards group it will be challenged why do we need it can't it be done this other way why did you choose that keyword rather than a different keyword can't you solve this other problem at the same time um you will come up with a a vast number of challenges to your little idea that make it bigger or smaller or change it in all sorts of ways and that can be very frustrating because it takes a long time but it also produces a much better result in the end than one person's good idea from a you know sitting in the bath so yeah that's what you get out of it you get the um a very thorough review of ideas in which people bring things to the table and and you end up with synthesis um it doesn't always produce a better result you know sometimes you do get the problem of committee compromises um and you can definitely um there are definitely bad decisions that committees have made in in xslt and xquery and xml and everything else where you just wish um we hadn't made that compromise um but that's the way of the world one of the most difficult things is keeping the design coherent setting yourself design principles and sticking to them with every new feature um an example of that with xslt use error handling um xslt one zero had a sort of principle of no no runtime errors and that's because it was um a lot of the driver for xslt one zero was the idea of running it in the browser and in the browser the last thing you want to do is on the user screen put up something that says error on line 17 of stylesheet and everything should produce an answer if even if it's the wrong answer um so i think that was that was part of the one zero design thinking um but then people realized that that makes debugging very difficult um if an incorrect program just produces blank output then it's very hard for the programmer to work out what they did wrong and so in 2-0 you started to get more of the concept of static errors and dynamic errors and a little more systematic approach to error handling but then you find although you've you've changed what you're trying to achieve you've then got the the fact that you can't change the existing language you're stuck with the way it was designed first time round and so you end up with new features having one philosophy and old features having a different philosophy and you start to to lose the coherence um and that that that's hard to achieve how hard to get right so there's no one single architect in the group it's always the democratic decision-making process yeah it's not democratic in the uh certainly the way w3c works it's not democratic in the sense of taking a vote um the um the tim berners-lee philosophy is is very much the benevolent dictator um the chair has to declare that consensus has been achieved uh whatever that means um so it's it's not numerical counting of votes it's um it's people going away from the meeting being prepared to accept the decision of the group but decision has to be made by the group not by one leader the decision has to be made by the group yeah um and that can be that that can be tough and um yes you know you will get compromises i'll let you have this feature if you let me have that one or more often the compromise is is one person will be very very enthusiastic about some new feature um everyone else thinks it's of marginal value um but it's much easier to get that keep that person quiet by accepting their idea and putting it in the language than to um have more and more arguments as to why it shouldn't be added so it's it's the easiest route route out for a committee is sometimes to accept something it doesn't really want rather than keep fighting against it that's weird and uh you at the same time being in the committee and the chief of uh saxonica your private company so i feel that there is a sometimes could be a conflict of interest when you have this feature you want this feature for your customers you probably already implemented this feature and you gave it to your customers and then you bring this feature to the committee and say i would love to have this in the standard they may say the group may say you know we understand why you're doing that because because you're the i found i found it usually um it doesn't usually work like that i mean one of the first things i did that wasn't in the standard was was was grouping and multiple output files and they sort of go together and i did grouping because it was clearly needed um in all the in all the applications i'd had to write i needed grouping and so i invented a way to do it and then when i joined the group i took my grouping design there um to the to the working group and everyone there accepted that we needed to do grouping and that the feature was needed but there was then lots of constructive criticism about the way i'd implemented it in saxon um and um questions about the how edge cases worked that i hadn't even thought about and you know more use cases how will it handle this problem how will it handle that problem and the group improved the design and i implemented the improved design um so that sort of sin i i regard that as synergy um between doing an implementation having users and developing a standard and most of the time that it was synergistic the um the fact that we had users the fact that i had an implementation the fact that we were developing the language actually um works in harmony and the the source code of saxon is open or not um there's an open source product and there are other features which are proprietary so this demo processing is all proprietary um but that you know the xslt code is largely open source the streaming is is proprietary it's on sourceforge um yeah the the um we're sort of moving away from sourceforge it has historically always been on sourceforge and i but i think we now just use sourceforge for publishing new versions because to make sure people don't download old versions um the main place you get it is from repository on our own saxonica sites and where do you move to github um [Music] we've got some things on github but a lot of it is is on repositories on our own site oh okay we can download it from there we're increasingly moving to you know on on.net it's all in you you download it from from nuget on on node.js you download it from npm um it's part of the sort of and java people download from maven right that's what i do yeah and do you know anything about the possibility of compiling xsl t to some binary code because in my case the performance is quite an issue so xslt is a great standard it's a great idea i write all the style sheets like like i told you in my compiler i have many many of these style sheets but i have like maybe 30 of them and to run all of them one by one it takes seconds it doesn't take microseconds so i'm thinking maybe it would be possible to turn those xslt into some some binary code i mean the answer is quite a few years ago now we did bytecode generation in the commercial product and the enterprise edition and and when we first did it it gave us a performance improvement sort of between 25 and 50 um and these days it tends to be less than that it's very often only 10 or 15 percent which means it's hardly worth doing um the reason the performance advantage has declined over time is that the java hotspot compiler has got better um and we haven't got better um because those guys who write the hotspot compiler really understand what goes fast in in machine code terms to do code generation and make it go fast you've it it's it's a mindset and a knowledge about the behavior of the hardware um that that most immortals don't have you you don't get the um the amount of benefit that you'd expect and and when we look at it in detail the benefit that we are getting is not because we're generating code it's because we're making decisions at the right time and you can actually reproduce that effect of making decisions at the right time um by taking things out of a loop for example out of a runtime loop into a static decision at code generation time you can do that without generating bytecode so it turns out i think that code generation is is not the answer to improved performance the other thing we found is that a lot of xslt workloads people are doing the static analysis on the style sheet or the compilation once for every time they execute and if you do that then it becomes compile time that's important and not run time um it's there an awful lot of workloads where people are spending three seconds compiling the style sheet and then three milliseconds executing it and and if that's the ratio in your workload and the last thing you want is to move more work out of runtime to do more work at compile time in order to reduce the runtime because the runtime is is negligible also for a lot of simple transformations the transformation is a lot faster than the parsing so the xml parser is taking longer than the actual transformation um so if you get a work if you get a workflow like that that's um you know taking a long time um there are all sorts of reasons for it and one is that yours the most common reason is is that the starships being compiled every time it executes and the compilation is taking too long we've only sort of grasped that fairly recently that we really need to put more effort into compile time performance and another reason is simply you're dealing with a high level declarative language um and with high level declarative languages it's like with sql um one line of code can take six hours to execute um you're not writing at a low level where the statements you write in your program have a one-to-one correspondence with with hardware instructions there's a very very indirect relationship and therefore you need to think about the the performance of your your code in a in a different kind of way and i mean this is one of the dilemmas to make your code perform the the idea of a declarative language is that you you don't know what's going on inside it's up to the optimizer to um to work out what's going inside but the reality is that to write efficient code in a language like sql or xslt um you have to have some kind of appreciation of what you're asking the machine to do and whether it's whether it's going to be a quadratic algorithm or an n log n algorithm or or whatever how it's going to scale with with with your data size um your your several layers removed from the machine and yet to achieve performance you've got to understand what's going on in those layers which is challenging how are you how tight are you integrated with the people who develop web browsers like chrome for example or firefox um i had a bit of rant in my blog about 2005 or 2006 um at the stage where the the um the browser developers were were deciding that they didn't like xml didn't like xslt um and the rant was mainly about who are they to decide um you know why should it it's the it's the power struggle you know i i like the idea of a a an open layered architecture in which people own one layer and leave the layer above just to other people whereas the the web browsers remind me of the sort of very vertically integrated um days when um if you if you produce a computer you you controlled what applications were allowed to run on it we're seeing that again with mobile phones aren't we um i i think the web browser should be a sort of neutral platform on which um other people can develop um technologies above it and where we haven't really seen that we've seen a lot of um it's a very proprietary sort of space um and yeah they did they decided they weren't interested in xml um we decided that we were going to do an uh an xslt processor in the browser anyway um which um works pretty well and but of course it's a minority interest the one thing that's pleased me recently actually is we're seeing um most adoption of saxon in the browser has been from people who are xml and xslt enthusiasts who are very much committed members of of the community but we've been seeing a few users recently picking it up who are new to that and and that's nice to nice to see because that's really sad because i think it's that would be a great it is a great technology having xml having web servers deliver xml only and then xslt runs on the client on the chrome and then doesn't yeah um it is absolutely the way absolutely the way things should be you should do all the rendering all the user interaction on the on on the browser and the the message sent um from the server to the browser should be as abstract and as pure data as possible and that means xml um node i mean no doubt that's the way things should be but it hasn't it hasn't found favor except that of course html has tried to develop in that direction of being a a somewhat more abstract formulation they've tried to get rid of the very presentation oriented aspects of html so they've moved html in in in the direction of xml if you like while rejecting xml itself yeah i want to ask you one question which i like from the the slack channel of yours of of your group of the xml group uh the question is um are there any features in xslt or xml or x query which you would if you would have the power you would replace remove change do they exist let's start from that yeah um and they're i guess there are two kinds as well there are there are some that are very little things like the the choice of keyword for xsl value of should have been xsl text and and that leads to a lot of users making the same mistake you know hitting the same problem just because the choice of keyword is wrong and and similarly the the handling of default namespaces every user falls into that same trap that their path expressions don't select anything because they they didn't think namespace is mattered so you know we got the default wrong and that's very hard to to change and then at the other end of the spectrum i have doubts about the some of the really big things that we did um so for example the biggest thing we did in xslt t2 was schema awareness and i think in retrospect it would be hard to say that schema awareness was a success the idea was right you can get considerable benefits from schema awareness if you're writing a big style sheets schema awareness can definitely make it more robust easier to debug and can give you a lot of software engineering benefits but at the same time [Music] it hasn't been successful in terms of adoption most people aren't using it and the reason they aren't using it is because the short-term cost of adoption is is high compared with immediate benefits you get a life cycle benefit over using it but you don't if you're sitting down on monday morning and want to have some code running by lunchtime then you leave out schema awareness because it seems too difficult um so it's uh there's a sense in which it was strategically the right thing to do but some somehow in the way we did it um we just made it and i've constantly been trying to tweak it to to try and make it more of a um you know a magic switch on schema awareness and get all the benefits um but it's very hard to to achieve that it's something that people have to put a lot of investment into before they can get the benefits out so i have doubts about that that sort of thing streaming similarly streaming is really valuable for the the the three percent of users who need it and but it has no value at all for the the other 97 percent um and that makes you wonder whether the complexity of doing it in the spec um was actually justified um so it's a it those those are tough calls um you do want to increase the power of the of the language but at the same time i've always got much more satisfaction from doing doing little things that everyone benefited from um you know like the double bar concatenation operator everyone says what a great idea why couldn't we do that before um rather than the the big strategic things which cost far more what are you working on right now xsl t4 um at the moment we're um in the process of hoping to ship saxony 11 and we've shipped saxon 11 on c sharp we've got to do a maintenance release of that an 11.1 for c-sharp and the
first release on java and um moving forward the saxon c product to that same 11 code base so we're working on that and hopefully that will be out within within a couple of weeks and that means that at the moment we're in that process of running millions of tests and working out why three of them are failing and which is very disputing and frustrating it there are so many tests now that it's a it's a nightmare um norm toby walsh who joined the team last year has been doing a lot of work on automating our build and test process and that's been very very valuable and hopefully hopefully will lead to more reliable releases and more frequent releases but after that when we start getting a clean sheet of paper we've got to make some decisions um i want to put some more effort into the saxon js product that's been you know a bit too quiet for a year um one of the problems is we're not making any money on it i want to see if i can find some way of reproducing the business model that generates some revenue for saxon js which will create a better justification for doing development work on it a key technical challenge with saxon js is doing asynchrony because the javascript platform needs to be asynchronous when you fetch resources um you've got to fetch them asynchronously and that's very hard to um to map into the xslt way of doing things so that's a technical challenge and other things yeah carrying forward the the 4-0 initiative trying to get a group of people together to define a 4-0 and that i hope will be lots of lots of handy little things rather than you know one big strategic thing that takes five years or 10 years i think 3-0 but how is going to happen if you said that the group has been disbanded um have to put together a new a new community group basically they won't be under w3c except as a sort of hands-off relationship and a number of the groups have continued as um community groups with a sort of informal way of working but it's different now because of course um w3c had the model that you only work on a standard if there are going to be at least three implementations um and xslt you know there's no chance now of having three implementations um the only people who've implemented xlt3 well there have been there were three implementations there was mine um there was altovers and then there was arbel braxtmas um but arbol has seems to have found other things to do with his time and altover an interesting company because they always implement the standards but they never participate in developing them and i don't understand the logic of that but that's that's that's up to them that's the way they work they're entitled to do that um so it's um it won't be a standard in the same in in the same way in that you you expect to see lots of implementations there'll be more specification for a next saxon release the world's changed in that way so there's a possibility that we may not see this as a standard we just may see your next version of saxon oh it might yes some people might perceive it as proprietary extensions and will almost certainly provide a way of switching off all the extensions so that you still do have official standards conformance but um yeah but the world changes when um when you haven't got lots of implementations that have to be compatible with each other and when there's little chance of of getting them um you know a lot of programming languages you know you take php and python you don't have lots of alternative implementations of php or python right it's not something you expect that's right and which programming language do you personally like most c what's your favorite well there are lots i haven't used and i'm sure out there somewhere is a perfect language there are lots that i would like to have used more i've done a very little bit of work with scala for example and i would like to have done more with with scala um i would also like i mean i've over the years i've become more and more a fan of functional programming and so writing in a pure functional language would would have an appeal to me now um more and more of my java and javascript is using a functional paradigm um that's that's the way i now choose to write code um so a language that enforces that would be a good thing um what i miss in the programming languages i use is is parallelism and asynchrony um [Music] i haven't found a language where doing multi-threading and parallel processing um becomes really reliable and robust and bug-free so you know it would quite fancy playing with air land um to see if it if it solves some of those problems people say it does i don't know how that that would work for me um java has been a pretty good development for the industry it's um and i mean c-sharp is essentially exactly the same um except in in minor details um but that that mixture of um object-oriented programming um and rich class library is a is a good tool set to work with and of course the tools on top we use intellij make it immensely more productive and they [Music] they solve many of the you know verbosity problems the boilerplate problems that you get in any programming language um including xslt and um yeah i wish intellij had better support for technology but you can do some debugging there you can yes but oxygen does it much better partly because oxygen works with us closely and and redistributes out our products and and we have a we have a good relationship with oxygen we've never established a relationship with intellij with jetbrains although i've tried okay okay my last question uh michael um do you need any help from volunteers for your projects or you have enough people in the team and that's it um i'm not good at managing volunteers let's put it that way um volunteers do need a lot of a lot of managing um we've had some useful contributions over the years um in in code and ideas and tests um but very often um the volunteers only do the the fun bit of the work and they leave us to do the the boring bits so the number of times people have suggested a code change and i've said yes are you going to send us the tests and all the documentation and then you get a sort of blank look what tests um it doesn't help that um in the past we didn't publish the test frameworks so we we we made it difficult um but yes i'm i i regard programming as a professional engineering discipline i i um i don't want to work in the sort of field where it's being done by amateurs in their own time who aren't paid and don't necessarily share your objectives and um aren't working to your timelines and and things like that the sort of vast volunteer initiatives like firefox i don't know how those work i've never participated in them um i can't can't imagine um how you can produce a decent bit of engineering in that sort of environment so yeah but i mean the best contribution people the best contributions we get um are people um testing the product when it comes out and sending us good bug reports that's immensely valuable and people don't believe it when i i say i actually appreciate it when people send in bug reports i do i love it um it's an it's an enormous contribution to the reliability of the product okay i will send you bug reports instead of posting questions on stackover because this is what i do now just go to stack overflow i consider stack overflow as a sort of a bug reporting place instead of yeah it's more than that i mean it's it's also for many people a a a a help site it's a substitute for the fact that your colleagues don't work at the next desk to you the questions you'd have asked um you know to to the person at the next desk you now ask on stack overflow um [Music] but um but yes it's also a a bug reporting site it's not it's not actually a good way of managing bugs because there's no way of saying you know give me a list of the open bugs and what their status is and how long they've been open so it's it it doesn't give that kind of management but yes if someone reports something on stack overflow we transfer it to our bug reporting system and manage it that way which works quite well okay thank you very much for coming i only can wish you really good luck in the xslt development it seems that we are in the difficult time right now so we we definitely all love xslt1 xslt2 but the future is not as clear as it seems right as it has to be well thanks very much it's been fun talking about the um the wider issues and i hope people listening and get some insights from that definitely thanks a lot okay cheers bye
2022-01-22 13:26