A few Linux utilities that are useful for manipulating XMLTV schedule files

In my previous article, Some hints for getting free-to-air satellite channels into the Electronic Program Guide in Kodi or XBMC (or another frontend), I mentioned that schedule “grabber” programs save their files in XMLTV File format. So let’s say you have one or more XMLTV type files, but you want to do some additional manipulation on them before feeding them to your backend software.  Here are a few tools that run under Linux that I have found that may be useful, under certain circumstances.  These are in addition to zap2xml, which I mentioned in my previous article.

NOTE:  To find out if you have a particular program installed on your system, try entering the word which followed by a space and the program name at a Linux command prompt.  If the program is installed on your system, it should show you the path to the file.  Note that you will probably need to use the full path and filename if you are attempting to run the program from a shell script or a cron job!

  • tv_cat – Concatenate XMLTV listings files. The man page description says, “Read one or more XMLTV files and write a file to standard ouput whose programmes are the concatenation of the programmes in the input files, and whose channels are the union of the channels in the input files.”  Or in simple terms, it merges XMLTV format files together.  This program may already be on your backend system but if it’s not, you can typically install it on a Ubuntu/Debian-based system (and possibly in some other Linux distros) by installing the xmltv package.  tv_cat is a bit picky about the format of the files it will combine, so check the output carefully to make sure it is including all the channels.  I had some issues using tv_cat with TVHeadEnd, and wound up using a small quick-and-dirty Perl script to combine XMLTV listings files instead.
  • sed – Stream EDitor. This is a utility built into just about EVERY Unix/Linux system out there, and it’s probably available for Windows in some form also. sed is more or less a one-trick pony – it searches for text and replaces it with something else.  You can use it to resolve duplicate channel ID’s in two different XMLTV files by changing them in one of the files, using a command of the form sed -i ‘s/original text/replacement text/’ filename but note that there are some potential “gotchas”, so read the documentation first.  For example, if either the search or replace string contains a / character, it mush be “escaped” with a backward slash, so as not to be confused with the / delimiter character.  So if, for example, your search or replace string included the closing tag </display-name> you’d use <\/display-name> instead. (EDIT: You can also change the delimiter character to avoid this issue – see the first comment below).

    I will note that there are some “purists” out there that will say that you should never use sed to manipulate an XML file, even though it’s easy and (if you are careful to use unique strings that don’t appear anyplace that you don’t want to change) fairly foolproof, so I suppose I had better mention a tool that is specifically intended for manipulating XML files…

  • xmlstarlet – command line XML toolkit. According to the description, “XMLStarlet is a set of command line utilities (tools) which can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIX grep, sed, awk, diff, patch, join, etc commands.” This is another one that you will likely find in your Linux distribution’s repository, at least if you are running a version of Ubuntu or Debian.  This one offers you a lot more flexibility in manipulating XML files, but at the expense of being somewhat more complicated to use.

Since xmlstarlet is a bit difficult for some users to wrap their heads around, I will give here some actual examples of how it could be used on an XMLTV format file, but please note that I am no expert with this so if you have a different proposed usage, please try to figure it out for yourself using the handy documentation, available as a web page or in PDF format.  No offense, but better you should spend a couple hours trying to figure out the correct syntax to achieve whatever results you want than me! 🙂

1. Remove all “Local Programming” entries from an XMLTV file named xmltv.xml and save to newfile.xml:
xmlstarlet ed --delete "//programme[title='Local Programming']" xmltv.xml >newfile.xml
2. Same as above but only for one specific channel:
xmlstarlet ed --delete "//programme[title='Local Programming'][@channel='someid.someaddress.com']" xmltv.xml >newfile.xml
3. To change the value of @channel wherever it appears in the file:
xmlstarlet ed -u "//programme[@channel='someid.someaddress.com']/@channel" -v 'newid.newaddress.com' xmltv.xml
4. To extract all entries for a specific channel to a separate file (non-destructive – does not change the original file):
xmlstarlet sel -t -m "//programme[@channel='someid.someaddress.com']" -c . -n oldfile.xml >newfile.xml

Note that I am not saying that any of the above are the best example of how to do something.  As you can see, especially from the last example given, this program has some rather non-intuitive syntax for its command line arguments (to put it mildly).  If you have any additional – or better – examples of using xmlstarlet to manipulate XMLTV files, please leave them in a comment and I will consider adding them here.

That said, if you need to do an operation on an XMLTV file and don’t want to write a program or script to do it yourself, xmlstarlet could be your salvation – IF you can figure out how to use it!

Advertisements

One thought on “A few Linux utilities that are useful for manipulating XMLTV schedule files

  1. I received this comment about sed from a reader via e-mail:

    I can’t speak for every Linux implementation, but if they follow the original Unix standards the problem is slightly different than you described.

    “You can use it to resolve duplicate channel ID’s in two different XMLTV files by changing them in one of the files, using a command of the form sed -i ‘s/original text/replacement text/’ filename but note that there are some potential “gotchas”, so read the documentation first. For example, if either the search or replace string contains a / character, it mush be “escaped” with a backward slash, so as not to be confused with the / delimiter character. So if, for example, your search or replace string included the closing tag </display-name> you’d use <\/display-name> instead.”

    The problem is not sed, it’s the delimiter character that was chosen. Try this:

    sed “s#/#for anything you want to replace it with, including / slash#”

    The shell special characters and when they are resolved are the real problem (quotes vs. double quotes and variable expansion etc…). One does have to be careful about the use of any shell special character, and the delimiter character which may be any non special character which unfortunately ends up being the slash most of the time, but I like using the “#” alternatively.

    Regular expression characters (recognized by the shell), shell control characters such as quotes are all candidates to cause shell problems. Most of these for instance but not all depending when and how they are used:
    − * ? [ ] ‘ ” \ $ ; & ( ) | ^ < > new-line space tab

    If sed is used in a script, the shell should always be specified to avoid future problems and the bourne shell ‘sh’ is the most universal. Unless something really special is needed, then a bourne shell or derivative is recommended. I’m not a purist, I’ve just learned from experience that it’s better to avoid problems. Sed is much more powerful than most people realize. I wrote a very nice post-script processor to generate diff-mark source code listings for multiple files with separate file and document page numbers, table of contents, file headings and line numbers for source code reviews (long before Microsoft or Adobe could do anything similar). Makes it very easy for everyone to find things fast, (page 55 or page 2 of file …, see line 25). For many these days, this type of code review may seem like a waste of time compared to working individually and passing on comments electronically. I see it as a great way to impart useful knowledge to those with less experience and build relationships.

    CJ

    mc2xml quit working and I ran out of EPG data, so it’s back to zap2xml.pl for all of my EPG data until I have to create a scraper of my own (hope that never happens, way too volatile to keep up with).

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s