1

This is a question that has to do more about performance and implementation. I want to create an app for getting info about tv series and would like to use thetvdb.com api. thetvdb offers an xml api and some guides to get u started. For initialization they say you must download a zip file (according to the id of the series you want to add).

The xml consists of three xml files. One is about the actors, one is about the banners and one that has all the info about the tv show inclucing seasons and episodes and overviews etc. The guide says that you parse the xml and add them to the tables of your db as you like.

My question is the following: Is it better to download the xml parse it and store it in my db and delete the xmls? Or should I use the xmls and parse them every time i want to get the show's info and present them to the user? What is the cost of one implementation and what is the cost of the second one?

No code yet since I am still designing it.

Nigel B
  • 3,577
  • 3
  • 34
  • 52
Apostolos
  • 7,763
  • 17
  • 80
  • 150
  • possible duplicate of [Strategies to Implement search on XML file](http://stackoverflow.com/questions/11210600/strategies-to-implement-search-on-xml-file) – Mark O'Connor Sep 02 '13 at 18:54

3 Answers3

1

I would definitely parse the xml files and put them in to a database. Parsing xml files has considerable overhead - and relational databases are designed with the intent of querying the data inside them, and are very performant at this (especially if you design your database well , provide useful indexes and logical joins between the tables.

Moo-Juice
  • 38,257
  • 10
  • 78
  • 128
1

I think the right solution could depend on your application load(how many users will simultaneously use it), the average size and structure of xml files and the need to refresh the data.

The possible solutions are:

  1. If the xml files are not very big and complicated and don't contain a lot of unwanted information you possibly do not need to store this info in your db.
  2. In this case you can add the caching layer to your app that will store the files and repeatedly get them from the thetvdb if the previous request was too long time ago.
  3. If you think that you can send the required information to the clients in more compact or convenient way or it requires additional processing or you are planning to execute some group opperations on your data in the future then the better option is to store the data in db
  4. Also think about refreshing the data in your db (similarly to the 2-nd option).
  5. Also you can store only critical information in your db.

Update: Generally I would not recommend to use local databases as in future it could cause problems with schema migration (when you want to change something in your db schema).

Nailgun
  • 3,999
  • 4
  • 31
  • 46
  • The app will work locally(this is my first thought). So no more than one user simultaneously using the db. By caching you mean saving the xml files as they are extracted from zip file?Or in some other way? – Apostolos Sep 02 '13 at 14:25
  • Yes, just saving the xml files locally. – Nailgun Sep 02 '13 at 14:26
  • Could you be more specific in not using locally databases?Should the app connect remotely to a db?Is this the preferred way nowdays for storing data?How would it cause problems? – Apostolos Sep 02 '13 at 15:42
0

Storing data into a db is definitely the best way if you have to execute particular queries. I'm also using TheTVDB APIs (integrated in Wordpress) and I can give you some PHP code. First you have to create the DB with the following code, then you have the PHP script that insert data from the XML url of a TV show.

SQL query:

CREATE TABLE IF NOT EXISTS `tvshowsinfo` (
  `id` int(9) NOT NULL AUTO_INCREMENT,
  `showname` varchar(255) CHARACTER SET utf8 NOT NULL,
  `lang` varchar(255) CHARACTER SET utf8 NOT NULL,
  `episodeid` int(9) NOT NULL,
  `episodenumber` int(5) NOT NULL,
  `season` int(5) NOT NULL,
  `director` varchar(255) CHARACTER SET utf8 NOT NULL,
  `episodename` varchar(255) CHARACTER SET utf8 NOT NULL,
  `firstaired` varchar(255) CHARACTER SET utf8 NOT NULL,
  `gueststars` text CHARACTER SET utf8 NOT NULL,
  `overview` text CHARACTER SET utf8 NOT NULL,
  `rating` varchar(255) CHARACTER SET utf8 NOT NULL,
  `ratingcnt` int(9) NOT NULL,
  `writer` varchar(255) CHARACTER SET utf8 NOT NULL,
  `episodeimg` varchar(255) CHARACTER SET utf8 NOT NULL,
  `seasonid` int(9) NOT NULL,
  `seriesid` int(9) NOT NULL,
  `thumbht` int(4) NOT NULL,
  `thumbwd` int(4) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 ;

PHP/Wordpress script:

<?php
$url = "Your URL";
$xmlstr = file_get_contents($url);
$elements = json_decode(json_encode((array)simplexml_load_string($xmlstr)),1);

global $wpdb;

$mquery = "INSERT INTO tvshowsinfo (showname,lang,episodeid,episodenumber,season,director,episodename,firstaired,gueststars,overview,rating,ratingcnt,writer,episodeimg,seasonid,seriesid,thumbht,thumbwd) VALUES ";
$i = 0;
foreach ($elements['Episode'] as $element) {
    if ($i !== 0) { $mquery .= ", "; }
    if ($element["Combined_season"] !== "0") {
        $mquery .= "('".mysql_real_escape_string($_POST["seriesname"])."', '".$element["Language"]."', '".$element["id"]."', '".$element["Combined_episodenumber"]."', '".$element["Combined_season"]."', '".mysql_real_escape_string($element["Director"])."', '".mysql_real_escape_string($element["EpisodeName"])."', '".$element["FirstAired"]."', '".mysql_real_escape_string($element["GuestStars"])."', '".mysql_real_escape_string($element["Overview"])."', '".$element["Rating"]."', '".$element["RatingCount"]."', '".mysql_real_escape_string($element["Writer"])."', 'http://thetvdb.com/banners/".$element["filename"]."', '".$element["seasonid"]."', '".$element["seriesid"]."', '".$element["thumb_height"]."', '".$element["thumb_width"]."')";
        $i++;
    }
} 

$wpdb->query($mquery);
$wpdb->print_error();

/* DEBUG */

?>

<pre><?php print_r($elements); ?></pre>
Flavio Li Volsi
  • 588
  • 4
  • 15
  • Hello Thunder...as i checked the episode info comes from an xml that is inside the zip and not from an xml on web, like online. So some file manipulation is required i think. Shouldn't you be better using a different table for the episodes with a foreign key to the seriesid? – Apostolos Sep 03 '13 at 21:03
  • You are right, fetching automatically the xml file requires some manipulation like a system() function (ex.: system('wget file.zip && unzip file.zip'), etc.) I don't use different tables because I have to query different TV series at the same time. – Flavio Li Volsi Sep 03 '13 at 23:45
  • So what you have is different rows with the same id(seriesid) and different episode id? – Apostolos Sep 05 '13 at 18:50
  • Exactly, different rows with same seriesid and same seriesname (this one inherited by me with an input text). – Flavio Li Volsi Sep 05 '13 at 22:25