File Name Parsing Ignoring Words After '-' Character

Post any problems / bugs / issues that are Mylar-related in here.
Post Reply
lodasi
Posts: 2
Joined: Thu Aug 20, 2020 6:28 pm

File Name Parsing Ignoring Words After '-' Character

Post by lodasi »

Recently I noticed that Mylar is parsing files and ignore the text after the hyphen character.

Unfortunately, that is leading Mylar to parse things incorrectly for titles that begin with Star Wars or Empyre to name a couple examples and in some instances replacing files incorrectly.

Version: 3af6bb19fec0d2426d076ae574a3523a7a12ad6e (master)
OS: Windows 10 Version 1909 (OS Build 1836.959)

Files: Manually downloaded using SABnzbd from alt.binaries.comics.dcp

Code: Select all

20-Aug-2020 00:37:22 - INFO    :: mylar.run.2890 : ThreadPoolExecutor-0_19 : [FOLDER-CHECK] Checking folder C:\Downloads\Complete for newly snatched downloads
20-Aug-2020 00:37:22 - INFO    :: mylar.traverse_directories.1499 : ThreadPoolExecutor-0_19 : there are 19 files.
20-Aug-2020 00:37:22 - INFO    :: mylar.Process.427 : ThreadPoolExecutor-0_19 : I have located 19 files that I should be able to post-process. Continuing...
20-Aug-2020 00:37:22 - INFO    :: mylar.Process.564 : ThreadPoolExecutor-0_19 : Now checking: Batman [42721]
20-Aug-2020 00:37:22 - WARNING :: mylar.Process.569 : ThreadPoolExecutor-0_19 : Batman [2011] is either Paused or in an Ended status with 100% completion. Ignoring for match.
20-Aug-2020 00:37:22 - INFO    :: mylar.Process.564 : ThreadPoolExecutor-0_19 : Now checking: Batman [796]
20-Aug-2020 00:37:22 - WARNING :: mylar.Process.569 : ThreadPoolExecutor-0_19 : Batman [1940] is either Paused or in an Ended status with 100% completion. Ignoring for match.
20-Aug-2020 00:37:22 - INFO    :: mylar.Process.564 : ThreadPoolExecutor-0_19 : Now checking: Star Wars [79398]
20-Aug-2020 00:37:22 - INFO    :: mylar.Process.564 : ThreadPoolExecutor-0_19 : Now checking: Star Wars [55553]
20-Aug-2020 00:37:22 - WARNING :: mylar.Process.569 : ThreadPoolExecutor-0_19 : Star Wars [2013] is either Paused or in an Ended status with 100% completion. Ignoring for match.
20-Aug-2020 00:37:22 - INFO    :: mylar.Process.564 : ThreadPoolExecutor-0_19 : Now checking: The Star Wars [67024]
20-Aug-2020 00:37:22 - WARNING :: mylar.Process.569 : ThreadPoolExecutor-0_19 : The Star Wars [2013] is either Paused or in an Ended status with 100% completion. Ignoring for match.
20-Aug-2020 00:37:22 - INFO    :: mylar.Process.564 : ThreadPoolExecutor-0_19 : Now checking: Star Wars [11246]
20-Aug-2020 00:37:22 - WARNING :: mylar.Process.569 : ThreadPoolExecutor-0_19 : Star Wars [1998] is either Paused or in an Ended status with 100% completion. Ignoring for match.
20-Aug-2020 00:37:22 - INFO    :: mylar.Process.564 : ThreadPoolExecutor-0_19 : Now checking: Star Wars [2914]
20-Aug-2020 00:37:22 - WARNING :: mylar.Process.569 : ThreadPoolExecutor-0_19 : Star Wars [1977] is either Paused or in an Ended status with 100% completion. Ignoring for match.
20-Aug-2020 00:37:22 - INFO    :: mylar.Process.564 : ThreadPoolExecutor-0_19 : Now checking: Star Wars [123860]
20-Aug-2020 00:37:22 - INFO    :: mylar.Process.564 : ThreadPoolExecutor-0_19 : Now checking: Star Wars: Bounty Hunters [125679]
20-Aug-2020 00:37:22 - INFO    :: mylar.dbUpdate.79 : ThreadPoolExecutor-0_19 : Starting update for 1 active comics
20-Aug-2020 00:37:22 - INFO    :: mylar.dbUpdate.121 : ThreadPoolExecutor-0_19 : Refreshing/Updating: Star Wars: Bounty Hunters (2020) [125679]
20-Aug-2020 00:37:22 - INFO    :: mylar.addComictoDB.86 : ThreadPoolExecutor-0_19 : aliases currently: None
20-Aug-2020 00:37:22 - INFO    :: mylar.validateAndCreateDirectory.1717 : ThreadPoolExecutor-0_19 : [DIRECTORY-CHECK] Found comic directory: G:\Comics\Marvel\Star Wars Bounty Hunters (2020)
20-Aug-2020 00:37:24 - INFO    :: mylar.addComictoDB.128 : ThreadPoolExecutor-0_19 : Now adding/updating: Star Wars: Bounty Hunters
20-Aug-2020 00:37:24 - INFO    :: mylar.addComictoDB.181 : ThreadPoolExecutor-0_19 : Sucessfully retrieved details for Star Wars: Bounty Hunters
20-Aug-2020 00:37:24 - INFO    :: mylar.addComictoDB.187 : ThreadPoolExecutor-0_19 : Previous version detected as None - seeing if update required
20-Aug-2020 00:37:24 - INFO    :: mylar.addComictoDB.226 : ThreadPoolExecutor-0_19 : Directory (G:\Comics\Marvel\Star Wars Bounty Hunters (2020)) already exists! Continuing...
20-Aug-2020 00:37:27 - INFO    :: mylar.addComictoDB.347 : ThreadPoolExecutor-0_19 : Sucessfully retrieved issue details for Star Wars: Bounty Hunters
20-Aug-2020 00:37:31 - INFO    :: mylar.updateissuedata.1104 : ThreadPoolExecutor-0_19 : Now adding/updating issues for Star Wars: Bounty Hunters
20-Aug-2020 00:37:32 - INFO    :: mylar.addComictoDB.397 : ThreadPoolExecutor-0_19 : returning to dbupdate module
20-Aug-2020 00:37:33 - INFO    :: mylar.dbUpdate.292 : ThreadPoolExecutor-0_19 : In the process of converting the data to CV, I changed the status of 3 issues.
20-Aug-2020 00:37:33 - INFO    :: mylar.dbUpdate.326 : ThreadPoolExecutor-0_19 : I have added 0 new issues for this series that were not present before.
20-Aug-2020 00:37:33 - INFO    :: mylar.forceRescan.945 : ThreadPoolExecutor-0_19 : [FILE-RESCAN] Now checking files for Star Wars: Bounty Hunters (2020) in G:\Comics\Marvel\Star Wars Bounty Hunters (2020)
20-Aug-2020 00:37:33 - INFO    :: mylar.traverse_directories.1499 : ThreadPoolExecutor-0_19 : there are 3 files.
20-Aug-2020 00:37:33 - INFO    :: mylar.forceRescan.1499 : ThreadPoolExecutor-0_19 : [FILE-RESCAN] Total files located: 3
20-Aug-2020 00:37:34 - INFO    :: mylar.forceRescan.1608 : ThreadPoolExecutor-0_19 : [FILE-RESCAN] I have physically found 3 issues, ignored 0 issues, snatched 0 issues, and accounted for 0 in an Archived state [ Total Issue Count: 3 / 3 ]
20-Aug-2020 00:37:49 - INFO    :: mylar.dbUpdate.352 : ThreadPoolExecutor-0_19 : Update complete
20-Aug-2020 00:41:49 - INFO    :: mylar.duplicate_filecheck.2114 : ThreadPoolExecutor-0_19 : [DUPECHECK] Duplicate check for C:\Downloads\Complete\bytes (1).1\Star Wars - Bounty Hunters 004 (2020) (Digital) (Kileko-Empire).cbz
20-Aug-2020 00:41:49 - INFO    :: mylar.duplicate_filecheck.2145 : ThreadPoolExecutor-0_19 : [DUPECHECK] Existing Status already set to Downloaded
20-Aug-2020 00:41:49 - INFO    :: mylar.duplicate_filecheck.2174 : ThreadPoolExecutor-0_19 : [DUPECHECK] Existing file within db :Star Wars 004 (2020-03).cbz has a filesize of : 29398044 bytes.
20-Aug-2020 00:41:49 - INFO    :: mylar.duplicate_filecheck.2241 : ThreadPoolExecutor-0_19 : [DUPECHECK-FILESIZE PRIORITY] [#4] Retaining newly scanned in filename : C:\Downloads\Complete\bytes (1).1\Star Wars - Bounty Hunters 004 (2020) (Digital) (Kileko-Empire).cbz
20-Aug-2020 00:41:49 - INFO    :: mylar.duplicate_process.206 : ThreadPoolExecutor-0_19 : [DUPLICATE-CLEANUP] New File will be post-processed. Moving duplicate [G:\Comics\Marvel\Star Wars (2020)\Star Wars 004 (2020-03).cbz] to Duplicate Dump Folder for manual intervention.
20-Aug-2020 00:41:49 - INFO    :: mylar.validateAndCreateDirectory.1717 : ThreadPoolExecutor-0_19 : [DUPLICATE-CLEANUP][DIRECTORY-CHECK] Found comic directory: C:\Downloads\Duplicates
20-Aug-2020 00:41:50 - WARNING :: mylar.duplicate_process.232 : ThreadPoolExecutor-0_19 : [DUPLICATE-CLEANUP] Successfully moved G:\Comics\Marvel\Star Wars (2020)\Star Wars 004 (2020-03).cbz ... to ... C:\Downloads\Duplicates\Star Wars 004 (2020-03).cbz
20-Aug-2020 00:41:50 - INFO    :: mylar.Process_next.2132 : ThreadPoolExecutor-0_19 : [POST-PROCESSING]  [1/3] Starting Post-Processing for Star Wars issue: 4
20-Aug-2020 00:41:54 - INFO    :: mylar.run.112 : ThreadPoolExecutor-0_19 : ct_check: b'ComicTagger 1.3.3 [ninjas.walk.alone / SHURIKEN]\r\nCopyright (c) 2012-2020 ComicTagger Team\r\nDistributed under Apache License 2.0 (http://www.apache.org/licenses/LICENSE-2.0)\r\n'
20-Aug-2020 00:41:54 - INFO    :: mylar.run.206 : ThreadPoolExecutor-0_19 : b'Archive is not a RAR.\r\n'
20-Aug-2020 00:41:54 - INFO    :: mylar.run.207 : ThreadPoolExecutor-0_19 : None
20-Aug-2020 00:41:54 - WARNING :: mylar.run.247 : ThreadPoolExecutor-0_19 : [META-TAGGER][COMIC-TAGGER] file is not in a RAR format: Star Wars - Bounty Hunters 004 (2020) (Digital) (Kileko-Empire).cbz
20-Aug-2020 00:41:54 - INFO    :: mylar.run.186 : ThreadPoolExecutor-0_19 : [META-TAGGER] ComicRack tagging meta-tagging processing started.
20-Aug-2020 00:41:56 - INFO    :: mylar.run.206 : ThreadPoolExecutor-0_19 : b'C:\\PythonPrograms\\mylar3\\cache\\mylar_ojw6c8eb\\Star Wars - Bounty Hunters 004 (2020) (Digital) (Kileko-Empire).cbz: Already has ComicRack tags. Not overwriting.\r\n'
20-Aug-2020 00:41:56 - INFO    :: mylar.run.207 : ThreadPoolExecutor-0_19 : None
20-Aug-2020 00:41:56 - INFO    :: mylar.run.270 : ThreadPoolExecutor-0_19 : [META-TAGGER][COMIC-TAGGER] Successfully wrote ComicRack tagging [C:\PythonPrograms\mylar3\cache\mylar_ojw6c8eb\Star Wars - Bounty Hunters 004 (2020) (Digital) (Kileko-Empire).cbz]
20-Aug-2020 00:41:56 - INFO    :: mylar.run.186 : ThreadPoolExecutor-0_19 : [META-TAGGER] Comicbooklover tagging meta-tagging processing started.
20-Aug-2020 00:41:57 - INFO    :: mylar.run.206 : ThreadPoolExecutor-0_19 : b'width= 128\r\nSave complete.\r\n\r\nSuccessful matches:\r\n------------------\r\nC:\\PythonPrograms\\mylar3\\cache\\mylar_ojw6c8eb\\Star Wars - Bounty Hunters 004 (2020) (Digital) (Kileko-Empire).cbz\r\n'
20-Aug-2020 00:41:57 - INFO    :: mylar.run.207 : ThreadPoolExecutor-0_19 : None
20-Aug-2020 00:41:57 - INFO    :: mylar.run.270 : ThreadPoolExecutor-0_19 : [META-TAGGER][COMIC-TAGGER] Successfully wrote Comicbooklover tagging [C:\PythonPrograms\mylar3\cache\mylar_ojw6c8eb\Star Wars - Bounty Hunters 004 (2020) (Digital) (Kileko-Empire).cbz]
20-Aug-2020 00:41:57 - INFO    :: mylar.Process_next.2463 : ThreadPoolExecutor-0_19 : [POST-PROCESSING] Sucessfully wrote metadata to .cbz (Star Wars - Bounty Hunters 004 (2020) (Digital) (Kileko-Empire).cbz) - Continuing..
20-Aug-2020 00:41:57 - INFO    :: mylar.validateAndCreateDirectory.1717 : ThreadPoolExecutor-0_19 : [POST-PROCESSING][DIRECTORY-CHECK] Found comic directory: G:\Comics\Marvel\Star Wars (2020)
20-Aug-2020 00:41:58 - INFO    :: mylar.Process_next.2645 : ThreadPoolExecutor-0_19 : [POST-PROCESSING] move successful to : G:\Comics\Marvel\Star Wars (2020)\Star Wars 004 (2020-05).cbz
20-Aug-2020 00:41:59 - INFO    :: mylar.foundsearch.856 : ThreadPoolExecutor-0_19 : [POST-PROCESSING][UPDATER] Setting status to Post-Processed in history.
20-Aug-2020 00:41:59 - INFO    :: mylar.foundsearch.913 : ThreadPoolExecutor-0_19 : [POST-PROCESSING][UPDATER] Updating Status (Post-Processed) now completed for Star Wars issue: 4
20-Aug-2020 00:41:59 - INFO    :: mylar.Process_next.2817 : ThreadPoolExecutor-0_19 : [POST-PROCESSING] Post-Processing completed for: Star Wars #4
User avatar
evilhero
Site Admin
Posts: 2883
Joined: Sat Apr 20, 2013 3:43 pm
Contact:

Re: File Name Parsing Ignoring Words After '-' Character

Post by evilhero »

The problem, in this particular case that you showed logs for at least, is that the series you're trying to manually post-process doesn't have an issue #4 according to CV.

Long-Winded detailed answer:
So when Mylar goes to post-process and check against your watchlist for series, it checks the Star Wars - Bounty Hunters series and see's that it stops at issue #3. Because you're trying for #4, it then drops down to check the possible alternate naming (one of which is triggered by the - as that's consistent with different naming conventions breaking up series titles). Because the alternate name is 'Star Wars', it finds your 2020 series which has an issue #4 that passes the date check. Which means the store date was checked against the date in the filename - but because you're manually post-processing it and not searching/downloading via Mylar the amount of information Mylar has available to make informed decisions is limited to what's in the filename, in this case the year of 2020.

So Star Wars #4 was published in 2020 and thus it matches. Since you couldn't search for it (because it doesn't exist on CV), there's no way it would be matched against if the process was thru the typical search/download/post-process method within Mylar itself (as then it passes all the relevant information from CV between everything).

Short-winded brief answer:
Basically the main problem is that CV hasn't been updated in a timely manner (since #4 was released yesterday according to all info outside of CV), and as such Mylar has no way to validate the validity of the issue - and so it rejects it as a valid issue, which then mistakingly matches on an alternate name matching.

Not sure if there's a valid solution as the alternate name matching aspect is a valid aspect to keep due to the numerous naming schemes that users / scanners / publishers have in regards to issues.
lodasi
Posts: 2
Joined: Thu Aug 20, 2020 6:28 pm

Re: File Name Parsing Ignoring Words After '-' Character

Post by lodasi »

Thank you for the explanation. I suspected it had to do with the CV lag, but I didn't know if this was unintended behavior or not, hence this post.

Thank you again.
User avatar
evilhero
Site Admin
Posts: 2883
Joined: Sat Apr 20, 2013 3:43 pm
Contact:

Re: File Name Parsing Ignoring Words After '-' Character

Post by evilhero »

Just an update on this, the latest master release (v0.4.4) as of yesterday, should address this problem.
Post Reply