A few Linux utilities that are useful for manipulating XMLTV schedule files

In my previous article, Some hints for getting free-to-air satellite channels into the Electronic Program Guide in Kodi or XBMC (or another frontend), I mentioned that schedule “grabber” programs save their files in XMLTV File format. So let’s say you have one or more XMLTV type files, but you want to do some additional manipulation on them before feeding them to your backend software.  Here are a few tools that run under Linux that I have found that may be useful, under certain circumstances.  These are in addition to zap2xml, which I mentioned in my previous article.

NOTE:  To find out if you have a particular program installed on your system, try entering the word which followed by a space and the program name at a Linux command prompt.  If the program is installed on your system, it should show you the path to the file.  Note that you will probably need to use the full path and filename if you are attempting to run the program from a shell script or a cron job!

  • tv_cat – Concatenate XMLTV listings files. The man page description says, “Read one or more XMLTV files and write a file to standard ouput whose programmes are the concatenation of the programmes in the input files, and whose channels are the union of the channels in the input files.”  Or in simple terms, it merges XMLTV format files together.  This program may already be on your backend system but if it’s not, you can typically install it on a Ubuntu/Debian-based system (and possibly in some other Linux distros) by installing the xmltv package.  tv_cat is a bit picky about the format of the files it will combine, so check the output carefully to make sure it is including all the channels.  I had some issues using tv_cat with TVHeadEnd, and wound up using a small quick-and-dirty shell script to combine XMLTV listings files instead (see below).
  • sed – Stream EDitor. This is a utility built into just about EVERY Unix/Linux system out there, and it’s probably available for Windows in some form also. sed is more or less a one-trick pony – it searches for text and replaces it with something else.  You can use it to resolve duplicate channel ID’s in two different XMLTV files by changing them in one of the files, using a command of the form sed -i ‘s/original text/replacement text/’ filename but note that there are some potential “gotchas”, so read the documentation first.  For example, if either the search or replace string contains a / character, it mush be “escaped” with a backward slash, so as not to be confused with the / delimiter character.  So if, for example, your search or replace string included the closing tag you’d use instead. (EDIT: You can also change the delimiter character to avoid this issue – see the first comment below).

    I will note that there are some “purists” out there that will say that you should never use sed to manipulate an XML file, even though it’s easy and (if you are careful to use unique strings that don’t appear anyplace that you don’t want to change) fairly foolproof, so I suppose I had better mention a tool that is specifically intended for manipulating XML files…

  • xmlstarlet – command line XML toolkit. According to the description, “XMLStarlet is a set of command line utilities (tools) which can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIX grep, sed, awk, diff, patch, join, etc commands.” This is another one that you will likely find in your Linux distribution’s repository, at least if you are running a version of Ubuntu or Debian.  This one offers you a lot more flexibility in manipulating XML files, but at the expense of being somewhat more complicated to use.

Since xmlstarlet is a bit difficult for some users to wrap their heads around, I will give here some actual examples of how it could be used on an XMLTV format file, but please note that I am no expert with this so if you have a different proposed usage, please try to figure it out for yourself using the handy documentation, available as a web page or in PDF format.  No offense, but better you should spend a couple hours trying to figure out the correct syntax to achieve whatever results you want than me! 🙂

1. Remove all “Local Programming” entries from an XMLTV file named xmltv.xml and save to newfile.xml:
xmlstarlet ed --delete "//programme[title='Local Programming']" xmltv.xml >newfile.xml
2. Same as above but only for one specific channel:
xmlstarlet ed --delete "//programme[title='Local Programming'][@channel='someid.someaddress.com']" xmltv.xml >newfile.xml
3. To change the value of @channel wherever it appears in the file:
xmlstarlet ed -u "//programme[@channel='someid.someaddress.com']/@channel" -v 'newid.newaddress.com' xmltv.xml
4. To extract all entries for a specific channel to a separate file (non-destructive – does not change the original file):
xmlstarlet sel -t -m "//programme[@channel='someid.someaddress.com']" -c . -n oldfile.xml >newfile.xml

Note that I am not saying that any of the above are the best example of how to do something.  As you can see, especially from the last example given, this program has some rather non-intuitive syntax for its command line arguments (to put it mildly).  If you have any additional – or better – examples of using xmlstarlet to manipulate XMLTV files, please leave them in a comment and I will consider adding them here.

That said, if you need to do an operation on an XMLTV file and don’t want to write a program or script to do it yourself, xmlstarlet could be your salvation – IF you can figure out how to use it!

Now, assuming you are not one of the “purists” that recoils in horror to the idea of editing an .xml file as if it were a text file, here is how you can use a Linux shell script and sed to combine two or more XMLTV format files. This assumes that each file has a header of exactly four lines, the last of which (the fourth line) starts with “<tv", and that each file ends with "” as the final line. It also assumes that you want the output file to be named tv_grab_file.xmltv (if you use Tvheadend, this may be the name you want).

#!/bin/sh
cd /path/to/xmlfiles
sed '$d' first_file.xml > tv_grab_file.xmltv
sed --separate '1,4d; $d' second_file.xml third_file.xml last_file.xml >> tv_grab_file.xmltv
echo '' >> tv_grab_file.xmltv

In the second sed line above, you can list from one to as many additional files as you want to combine with the first file, but if there is more than one file listed on that line then you must use the “–separate” option to remove the headers and footers from all files. And yes, I do realize that instead of using an echo command to write the final line, I could put the final file in that line, like this …

sed '1,4d' last_file.xml >> tv_grab_file.xmltv

… and it would preserve the “” line from that file, but the reason I don’t do it that way is because if for some reason that final file is missing, the “” line would not get written and that xml block would not be closed, which might cause some processors to ignore it completely. I do realize that the same problem exists if the first file happens to be missing, but in my installation that is far less likely to happen, for reasons I won’t go into here. But if you are concerned about it, you could write all four of the header lines using echo statements, then put all your .xml files in a single sed command (like the second one in the above script).

Advertisements