Page 1 of 1

Possible New API site?

Posted: Sat Sep 10, 2016 3:27 pm
by leaderdog
Hi Evilhero,

I've been hunting around for a site that could be better for api than CV. CV while good(ish) is so very slow. Apparently, a lot of the original developers there are gone.

From what I understand comics.org (grand comic db) has an api. I'm not smart enough to figure out how it works or even how to find it haha. I checked some of the issues that came out from this week and all are on there but not all of them have covers which kind of sucks. It's fast though. when doing a search on the site it pops up immediately and the best choice is always at or near the top.

Someone in March in a different forum posted this about the databases:
Recorded Issues

Grand Comics Database: 1,257,285
Comicvine: 471,384
Comicbookdb: 346,624

But as I said, not all the covers are available. Not sure if it's worth a look, or use this place for data and just get the covers from CV if comics.org doesn't have them? Or if it's all more work than it needs to be. ;)

When I'm adding new series (number 1s) since Mylar doesn't have them indexed... or I don't think they are as the option to add them isn't there, I have to search for them and CV is soooooooo slow, what should take minutes ends up taking way longer than should be necessary. No fault of Mylar of course I know it's CV slowing it down.

comics.org is user added info too, so maybe new release images would be quicker to show up if supported by Mylar. Who knows.

I set up an account but I still can't find the api info. Maybe they're still looking for someone to help implement their API?

Just a suggestion. ;)

Re: Possible New API site?

Posted: Sun Sep 11, 2016 10:29 pm
by evilhero
Heh, originally when Mylar was first started - it used 2 sources to get data from:
- Comicvine to get the covers, and series data
- GCD to get the entire issue data set (issue number, pub date, store date, and title)

The problem was that in trying to merge 2 very different datasets, it became a horrible mass of well..dung. I mean it worked to a certain extent, but if Mylar couldn't match the series title to a series title on CV, then it had to either be manually picked from a list that Mylar generated, or enter the exact ComicID (kinda like adding a series now directly by the ComicID). Unfortunately, at that time, the API for CV really sucked. I mean, if you think the API now is bad, at that time it was so bad you couldn't do basically anything of relevance (you couldn't pull any issue information thru the API, for one).

The thing with GCD is that while it has an enormous amount of information, it's almost completely inaccessible without the use of a parser. The whole API thing for them is, I'm pretty sure at this point, just a catchphrase in order to garner more hits to their site. It's been discussed before I started Mylar (so like pre-2011, probably 2008), and it's now 2016 - with nothing available. And yes the website is fast, but you also have to remember that there are probably thousands of people (if not more) than access CV and possibly even more that use it's API with other apps - whereas I'm pretty sure the number of people using GCD is nowhere near that amount, and since they don't have an API running they're not getting hammered by tagging / organizing comic apps.

I actually left all the GCD in Mylar, just in case they ever decide to open it up again, or if CV dies off or something then it's a fallback, that with some modifications would work again.

CBDB I have a parser already built that I was using for abit years ago (before CV updated their API, and I was between CV+GCD), but never integrated it because, well I just didn't like how things worked there. I mean, the guy who runs it has said very openly that he has no plans to ever have an API of anykind as he doesn't want to go that route with the data. He wants people to hit the site, and get the ad-banners (maybe?), or some other reason - but the main one being he doesn't wanna share it with anyone in an open way.

As far as nowadays, the #1's do get indexed by Mylar - but only when CV actually puts the data in place. Lately, for whatever reason, they've been doing it on Wednesday - whereas previously it was done late Monday/early Tuesday. So what's happening is that because the data isn't in place, Mylar can't mark the issues as Wanted properly and initiate any downloads it sees. Once the info is updated on CV, then Mylar gets it and then it can download - but I noticed this the last few weeks, issues weren't being grabbed until mid-afternoon on Wed, and sporadically at best. So in these cases, that's CV that's not updating the information in a timely manner. I can have it mock up the dates temporarily when you use the alt_pull 2 method, as the normal pull method does this by default - but it also causes some problems for issues and dates.

You can try to use the watch feature for #1's..Mylar will continually poll CV (~every 6 hrs) for any new information about a new series, and if it's available will auto-add the series to your watchlist and mark said issue as Wanted. I'm going to move this to the new site tho in some form, as it also tends to be farily lengthy in comparing search results as it does an open search for matches.

I'm all for including GCD again in some form, but with it also brings a huge amount of back-end code to change as well as a probably even bigger amount of issues that will get logged by users. I'm not saying it's not worth it, but I'm not sure I can dedicate as much time to the change, as I have in the past. So it's kinda like a 50/50 kinda deal ;)

Re: Possible New API site?

Posted: Mon Sep 12, 2016 12:53 am
by leaderdog
Hi Evilhero,

Wow, nice explanation! :)

I didn't know what the watch button did, so I'll use that from now on. I just manually added the new issues.

CV api works great, just slow, but you're right, it's probably the most popular as it's basically the only one. Other than Marvel and DC's own api but I'm not sure if that would be wise to tap into. They're supposedly extensive, but I've never looked at them. But I would assume that CV is using that to fill their site. I'm not sure of course.

Mylar is humming along nicely right now. A few hiccups here and there but I'm not complaining haha

Thanks.

Re: Possible New API site?

Posted: Fri Sep 16, 2016 7:44 pm
by Telecart
Something that occurred to me - for the case where CV is being slow - why even wait for them? Before discovering Mylar, I'd had some (meager) success getting automation going using a program called Otomatic, which is designed for TV shows but basically lets you set up rules for whatever regular expression you want in a feed. Obviously, not as good as knowing what you're doing, but as a fallback could work, no?

Re: Possible New API site?

Posted: Sun Sep 18, 2016 6:59 pm
by evilhero
Telecart wrote:Something that occurred to me - for the case where CV is being slow - why even wait for them? Before discovering Mylar, I'd had some (meager) success getting automation going using a program called Otomatic, which is designed for TV shows but basically lets you set up rules for whatever regular expression you want in a feed. Obviously, not as good as knowing what you're doing, but as a fallback could work, no?
You HAVE to wait for CV - simply because we're using their API. If you request more frequently then their threshold for API usage - they'll ban your IP entirely. If you parse their website via scraper - they'll ban your IP entirely. Basically if you do something they're not aware of, or try to hammer their site - they'll ban the IP. Banned IP, means cannot use Mylar with CV - which means it's pretty much looking up exiting information that you already have in your local db and not updating squat.

Any API site that's being used, whether it's CV or some other site, or even nzb providers - they all limit the amount of times you can request with their API in a given time frame. Go over that, and you'll receive a ban. It's not unreasonable either, but CV has had to impose some really strict restrictions on doing things simply because they are the only comic source out there with an API that's publicly accessible.

Re: Possible New API site?

Posted: Wed Sep 28, 2016 4:11 pm
by Telecart
Totally get that. What I meant was CV staff being slow to update the data. Why not fallback to regex in such a scenario?

Re: Possible New API site?

Posted: Wed Sep 28, 2016 9:15 pm
by evilhero
Telecart wrote:Totally get that. What I meant was CV staff being slow to update the data. Why not fallback to regex in such a scenario?
Well using regex's aren't going to make the data for issues appear when it's not there in the first place ;)

CV actively monitors website parsing and will aggressively ban ip's that utilize such a method. In doing so it would also ban access to the api from said ip address (search comicrack forums for the problems that happened to CV scraper due to actually scraping Web pages instead of solely using the api).

Mylar needs to have the issue data present in order to fully mark an issue as Wanted and start searching for said issue.

For the pullist, it matches first against series name and issue number for the pullist status to flip to Wanted, as well as then showing up in the Upcoming section of the Wanted tab. Only when mylar is able to retrieve the data from CV and is able to retrieve the pub/store dates does it actually mark it as Wanted to be searched.

Parsing the Web page isn't going to make the data present when it's not there, and combined that you would then have several hundred (dunno an actual number) users parsing the site for info at relatively the same time.. It wouldn't be a good scenario.

The alt_pull 2 method gets around it by doing all the CV api calls, but the lack of updates prior to Wed by CV lately has really affected things. I fixed a few issues on the new site to account for some things, and I know how to get around the lack of info from CV yet still search for the issues (the non alt_pull 2 methods use it) - it's just getting the sql and php backend to be able to perform it all dynamically.

Re: Possible New API site?

Posted: Sun Oct 02, 2016 9:50 pm
by Telecart
Gotcha. Yeah, my thought is how do we actually avoid reliance on CV -- use it when they're there, but fail gracefully when they're not. My thought was not to scan the CV page, but to simply scan the RSS feeds for all items all the time. I might not have a full grasp of Mylar's architecture to understand if there's a problem with that, but it seemed to work sort-of okay with OTOMATIC, though of course it was "dumb" and didn't know as much about what it was actually automating as Mylar does.

Re: Possible New API site?

Posted: Thu Oct 13, 2016 1:38 pm
by evilhero
This is what the alt_pull 2 method does currently on the intermediary site - it polls CV for the most recently updated items that are on the pullist and then adjusts things accordingly so that when a user retrieves the pull-list from the site it will have the relevant ComicID/IssueID information already pre-populated and thus saving on more CV API hits that are unnecessary. I tried before to just get the most recently updated items, but there were too many cross-posts and duplicate items on the feed that caused problems with parsing the feeds - especially when there were multiple volumes of the same series being updated at the same time.