[Cook] (ingredients-)fingerprint question.

Aryeh Friedman aryeh.friedman at gmail.com
Tue Nov 17 09:59:10 EST 2009


On Mon, Nov 16, 2009 at 4:17 PM, Pieter den Hertog <pieter at dwark.nl> wrote:
> I apologise if its a bit messy.  Bottom-line is: the combination of
> fingerprint and ingredients-fingerprint behaves in an unexpected and
> (for me at least) counter intuitive manner:
>
> Trying to build a target for the second time, with a new ingredients-list,
> fails even though the ingredients-fingerprint flag is set and the new
> ingredient is younger than the target...
>
>
> details below.
>
>
> Aryeh M. Friedman wrote:
>>
>> Pieter den Hertog wrote:
>>>
>>> I was trying to understand the effects of (ingredients-)fingerprint and
>>> constructed a simple
>>> example, hence the use of [glob *.txt]. The example entails an empty dir
>>> with files appearing
>>> one by one... so something like: (1.txt, 2.txt, ... n.txt) -> summary. On
>>> each file appearing
>>> I wanted to (re)build the summary.
>>
>> Do they appear frequently (i.e. as soon one cook is done there is an other
>> file ready)?  If not I suspect cook will be run from cron or some other
>> script (that is likely run by cron).... that being the case the case of
>> files appearing after you make your manifest is quiet unlikely  and thus
>> there is very little reason to use non-native pattern matching (i.e. glob or
>> some other OS  based method)... the main reason for wanting to do it this
>> way instead of the glob way is that cook patterns are reversible (see the
>> tutorial and the chapter on pattern matching in the manual)... additionally
>> it lets you match on specific components of the pattern instead of all or
>> nothing... for example even though not specifically stated in the tutorial
>> or the manual it is possible to have some very powerful recipes that take
>> advantage of this... for example here is the default java recipe for the
>> main project I use cook for:
>> obj/%0%.class: src/%0%.java
>>       set mkdir
>> {
>>       [setColor red];
>>       echo JAVA':'Compiling [target];
>>       [setColor white];
>>
>>       javac -d obj -cp [classPath] -sourcepath src src/%0%.java;
>> }
>>
> I must confess I'm not following here ... from your (excellent tutorial by
> the way) I understand the %0 to match the path leading up to a target and %
> to match the target name.  So you have a one-on-one relationship:
>
>  obj/deeper/path/funky.class <- src/deeper/path/funky.java
>
> and the recipe can be used for funky, spunky, ... *.java sources.
>
> In my case its more like a set of files that get crunched into one file and
> where the set changes over time:
>
> 20090101.txt -> cook -> 200901.summary
> 20090101.txt 20090102.txt -> cook -> 200901.summary
> 20090101.txt ... 20090131.txt -> cook -> 200901.summary
> 20090201.txt -> cook -> 200902.summary
> 20090201.txt 20090202.txt -> etc..

Then shouldn't this work (forgetting for the minute about the orginal
issue of when stuff is fingerprinted) [tell me if the tutorial is
unclear on this because I am about tyo start writting a second
edition]:


set fingerprint;

out=[fromto %0%.txt %0%.summary [match_mask %0%.txt [manifest]];

all: [out]
{
     echo all logs summerized;
}

%0%.summart: %0%.txt
{
      cat %0%.txt > [target];
}

the out var is the only one that might be unclear from the tutorial it
basically says find all XXX.txt in the manifest [reguardless of
leading path] and cook them by doing the cat into XXX.summary in the
same dir.

Basically % can appear anywhere in the file name and everything else
is wild hard coded [think of it as identical to * in normal regular
expressions] and %0 just means "any leading path" that also matchs the
file name pattern.  Therefor %0%.txt is all txt files reguardless of
leading path.   Fromto just says any word that matchs the first arg
rewrite it to be the second arg [in our case XXX.txt to XXX.summnary]


I suspect this will elimante the need for messing with telling all to
expect fingerprints.  This also completly avoids the issue of fake new
*.txt in that it will do an auto fp on first look.

>
>
> So each day brings a new file. A daily cook run (from cron) should cook all
> the monthlies, ignoring past months that completed already.
>
> Due to some evil software, past files (e.g. 20090115.txt while its november
> already) might get reconstructed and shoved into the directory again with
> exactly the same contents as it had on the 15th of Jan this year. In general
> I can't rely on the timestamps to have any meaning whatsoever... So I was
> looking to use:
> - the ingredients-fingerprint flag to pick up on new files
>  through sheer presence, and
> - the fingerprint flag to prevent unnecessary rebuilds in
>  case files still had the same contents...
>
>
>
>> or here is one we use for building web pages from their components (the
>> company's main product is a widget based framework for making web frameworks
>> [with a few vertical market examples thrown in]):
>>
>> site/ROOT/%0%.html: src/frontend/pages/%0%/definition
>>       set mkdir
>> {
>>       [setColor green];
>>       echo HTML':'Creating [target];
>>       [setColor white];
>>
>>       [infiniteGlue] src/frontend/pages/%0%/definition>[target];
>> }
>> (set color is a little function I wrote that sets the text color on a
>> ansi/vt100 term)... basically it says for all sources in our source tree
>> (some 1000+ classes) we do exactly the same thing....
>>
>> "[infiniteGlue]" is a language we wrote for "gluing" together files from
>> components (think cpp on steroids).
>>
>
> Is 'definition' a directory (whose entries are the components) or is it a
> file that read by [infiniteGlue] to get at the filenames of the components
> for a particular page under site/ROOT ?

That is proprietary until after we make the next release or so (patent pending).

>
> I appreciate the importance of generality of recipes and as soon as I have
> my head wrapped around the interaction between fingerprint and
> ingredients-fingerprint, that's where I'll head next ...
>
>> The reason for making the recipes more general then needed is we have a
>> single master cook book that is shared among many projects (all maintained
>> by aegis as different projects)... therefor a client's site cookbook is
>> usually no more then:
>>
>> (includes not shown):
>>
>> classPath=[getClassPath backend infinteGlue isuParse thistest];
>>
>> all: [html] [css] [js] db
>>       set default
>> {
>>       echo [project] built;
>> }
>>
>> install: all
>>       set nodefault
>> {
>>       echo [project] installed to personal instance;
>>
>>       rm -rf site/ROOT/fragments;
>>       touch [home]/isu2/servlet/WEB-INF/web.xml;
>>       touch [home]/isu2/ROOT/WEB-INF/web.xml;
>>       cp -Rv site/'*' [home]/isu2;
>> }
>>
>> [aegis supplies the [project] and [home] vars and all the other vars are
>> generated by the includes not shown] (some of this needs to be refactored
>> and made more general [form example [home]/isu2 is rarely the right
>> destination dir but some other tools assume all sites live there so we are
>> stuck with it for now)
>>
>>
>>>
>>> That's when I noticed the rebuild doesn't happen if 2.txt is added, but
>>> after adding 3.txt
>>> everything works exactly as expected.
>>
>> ingredients-fingerprint if I understand it correctly does not tell cook to
>> generate finger prints for the ingredients, it only tells it to expect the
>> ingredients, to have fingerprints vs. time stamps (if you have "set
>> fingerprint" globally this is redundant anyways)
>>>
>>> Using your suggestion, I can work around the special case (i.e. the
>>> second run for the
>>> summary target) with the following recipe:
>>>
>>> inp = [glob "*.txt"];
>>> summ : [inp]
>>> set fingerprint ingredients-fingerprint
>>> ["if" [in [count [inp]] 2] "then" forced]
>>> {
>>> cat [need] > [target];
>>> }
>>
>> That gate really shouldn't be needed (in 7 years of using cook I have only
>> written one recipe that needed a gate and even on that one my gut says I am
>> doing something wrong if I need the gate at all)... Peter (or Walt) if
>> you're reading this would a :: work instead (i.e. tell it to examine all the
>> ingredients [recursivally] even if the immediate ones are up to date)?
>>
>> Namely if this is all your doing shouldn't the following cook book do it:
>>
>> set fingerprint;
>> manifest=[match_mask %0%.txt [collect find . -name "'*.txt'"]];
>>
>> summ: [manifest]
>> {
>>   cat [manifest] > [target];
>> }

See above.   I should note if you need to rm the txt file or anything
do it in the implicit recipie not the explicit one.

>
> Your recipe works in a normal world: new file appear and are younger than
> the target, using the cookbook you gave:
>
> $ echo 1 > 1.txt
> $ cook
> /* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:33 2009 */
> cook: cat ./1.txt > summ
> $ echo 2 > 2.txt
> $ cook
> /* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:33 2009 */
> cook: cat ./1.txt ./2.txt > summ
>
> But if 3.txt appears and is older than the target, summ won't get built.
>  That's why I thought adding the ingredients-fingerprint flag
> would help.  But to my surprise (new run, empty directory except for the
> cookbook which now has the ingredients-fingerprint flag set for the summ
> recipe):
>
> $ echo 1 > 1.txt
> $ cook
> /* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:36 2009 */
> cook: cat ./1.txt > summ
>
> $ echo 2 > 2.txt
> $ cook
> /* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:36 2009 */
> cook: the "summ" file is already up to date, no action required
> $
> $ cat summ
> 1
> $ ls -ltc --time-style=+%s
> total 12K
> -rw-r--r-- 1 pdh pdh 2 1258403810 2.txt
> -rw-r--r-- 1 pdh pdh 2 1258403806 summ
> -rw-r--r-- 1 pdh pdh 2 1258403805 1.txt
>
> $
> $ cat .cook.fp
> "1.txt" = { 1258403805
> ".4CLFTK5xPYw5FeI8bG8u8J0LCIjs0NLTho0SpjIrGdJ5Keipp8iX88ICq0EVZ3R0ntqF40NdtwQ"
> }
> "2.txt" = { 1258403810
> ".:EEaxKnhRJWpD49ubks2Sy0Qu0Kh03LT800DElUiSBI72h6gdyKVBLJH80voXgX0LEhdl0PJiD4"
> }
> "summ" = { 1258403812 1258403812 1258403806
> ".4CLFTK5xPYw5FeI8bG8u8J0LCIjs0NLTho0SpjIrGdJ5Keipp8iX88ICq0EVZ3R0ntqF40NdtwQ"
> "DqFpFO17Df1sv18e9L2n3J80GWFJN0fmq2P02pgm1bUsyh.IYqb4k8u9Qz0X5zw40usnhH030g3M"
> }
>
> So using the combination of fingerprint and ingredients-fingerprint
> rejuvenates the target's fingerprint to 1258403812 while it's actually
> 1258403806 seconds since the epoch, preventing the target from being built
> even though its ingredients-list has changed from "1.txt" to "1.txt 2.txt".
>
> Which is rather counter-intuitive given the advertised nature of the
> ingredients-fingerprint flag...
>
> Only after adding the third file, do we get cooking again:
>
> $ echo 3 > 3.txt
> $ touch summ
> $
> $ ls -ltc --time-style=+%s
> total 16K
> -rw-r--r-- 1 pdh pdh 2 1258404192 summ
> -rw-r--r-- 1 pdh pdh 2 1258404189 3.txt
> -rw-r--r-- 1 pdh pdh 2 1258404177 2.txt
> -rw-r--r-- 1 pdh pdh 2 1258404173 1.txt
> $
> $ cook
> /* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:43 2009 */
> cook: cat ./1.txt ./2.txt ./3.txt > summ
>
> o-> Now that's what I wanted on the second run as well: summ might be newer
> than all of the files listed on its dependency list but since the list
> itself changed (2.txt was added) it should've rebuild the target.
>
>
> Anyway, this is probably a borderline case only present in my specific case
> because some braindead piece of software is shoving files into a directory:
> out-of-order and with insane timestamps.
>
>
>
>>>
>>> So that works and my problem is solved, but it still has me puzzled about
>>> the
>>> ingredients-fingerprint: its purpose is to track a changes to the list of
>>> ingredients
>>> for which it would be necessary to record the very first ingredients-list
>>> used to
>>> build the original first incarnation of the target... right?
>>
>> I don't think so... as explained above
>
> Actually, from the documentation and some experimenting I believe the
> ingredients-fingerprint takes a fingerprint of the string containing
> the names of the ingredients (after some lexical sort) and it's independent
> of the actual contents of target or ingredients, e.g.:
>
> $ cat .cookrc
> all: sum1 sum2;
>
> sum1 : 1.txt 2.txt
>    set ingredients-fingerprint
> {
>    function print sum1([need])
>    [write [target] Hello,];
> }
>
> sum2: 2.txt 1.txt
>    set ingredients-fingerprint
> {
>    function print sum2([need])
>    [write [target] world!];
> }
>
> $ touch 1.txt 2.txt && cook; cook;
> /* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:55 2009 */
> /* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:55 2009 */
> pdh at cajun$ cat .cook.fp
> "sum2" = { 1258404920
> ""
> "DqFpFO17Df1sv18e9L2n3J80GWFJN0fmq2P02pgm1bUsyh.IYqb4k8u9Qz0X5zw40usnhH030g3M"
> }
> "sum1" = { 1258404920
> ""
> "DqFpFO17Df1sv18e9L2n3J80GWFJN0fmq2P02pgm1bUsyh.IYqb4k8u9Qz0X5zw40usnhH030g3M"
> }
>
>
> $ cat sum1 sum2
> Hello,
> world!
>
> o-> The "" entry is where the target's file-fingerprint would've gone had
> that been enabled.  The other entry is a fingerprint of the string "1.txt
> 2.txt" by the looks of it.  The actual contents of the targets doesn't
> influence the ingredients-fingerprint and neither does the contents of the
> ingredients used:
>
> echo blahblah > 1.txt
> $ cook sum1
> /* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 19:31 2009 */
> cook: sum1(1.txt 2.txt)
> $ cat .cook.fp
> "sum2" = { 1258396272
> ""
> "DqFpFO17Df1sv18e9L2n3J80GWFJN0fmq2P02pgm1bUsyh.IYqb4k8u9Qz0X5zw40usnhH030g3M"
> }
> "sum1" = { 1258396272
> ""
> "DqFpFO17Df1sv18e9L2n3J80GWFJN0fmq2P02pgm1bUsyh.IYqb4k8u9Qz0X5zw40usnhH030g3M"
> }
>
> o-> After changing the contents of 1.txt and rebuilding only sum1, both sum1
> and sum2 still have the same ingredients-fingerprint because their
> dependencies are still the same.
>
> That's why I thought I could track file additions/removals through the
>  ingredients-fingerprint flag (and indeed that works except between the
> first and second run).
>
> Which is also why $6.7.2 of the manual confuses me: why would I want to
> 'remember' dependencies in a file to track vanishing dependencies if I have
> an ingredients-fingerprint flag?
>
>
>
>>>
>>> The background to all this is that I'm trying to automate part of a
>>> reporting process
>>> where cisco syslog files (after duplicate removals and some name mangling
>>> which totally
>>> destroys their timestamps) are shoved into a directory where I can read
>>> them.
>>>
>>> The logs are daily files and the summary should reflect a (running)
>>> monthly totals on
>>> some specific events in the network. I wanted to be selective about
>>> rebuild the totals
>>> since they take a looong time to complete.
>>>
>>> I'll have a look at aegis as well to see if that's a better fit for my
>>> particular challenge
>>> (though cook seemed a pretty straightforward tool for such a job).
>>>
>> Aegis is overkill unless you need to "rewind" the logs to any given past
>> state... a better use of aegis in your case would be to use it to keep track
>> of router configs (and any other config) as they change over time... read
>> Peter's article "Aegis Is Only For Software, Isn't It?"
>> (http://aegis.sourceforge.net/auug96.pdf)
>
> That's what I thought.
>
> Again thanks for the feedback & tips.
>
>
>>>
>>>
>>> Thanks for the feedback!
>>>
>>>
>>> Aryeh M. Friedman wrote:
>>>>
>>>> I forgot to ask last time (and I guess this time) why your using [glob
>>>> *.txt] instead of %.txt or %0%.txt as your ingredient list (it shouldn't
>>>> matter in this case but it is non-standard)
>>>>
>>>> Aryeh M. Friedman wrote:
>>>>>
>>>>> Does set force do the job?
>>>>>
>>>>> Also unless I am not sure of what your doing this seems more like a job
>>>>> for aegis or an other cms then for cook (I am assuming that XXX.txt are
>>>>> infrequently updated in the real world)
>>>>>
>>>>> pdh wrote:
>>>>>>
>>>>>> In this case I'm just working with text files where the actual content
>>>>>> and presence
>>>>>> of the input files are more important than timestamps... (it is not
>>>>>> about building a
>>>>>> program)... timestamps are messed up by other tools beyond my
>>>>>> control...
>>>>>>
>>>>>> The recipe actually works, except for the second run ... and as far as
>>>>>> I can tell,
>>>>>> that's because the ingredients-list is not fingerprinted on the first
>>>>>> run.
>>>>>>
>>>>>> Using only the ingredients-fingerprint feature, I still have strange
>>>>>> results:
>>>>>>
>>>>>> recipe:
>>>>>>
>>>>>> summ : [glob "*.txt"]
>>>>>> set ingredients-fingerprint
>>>>>> {
>>>>>> cat [need] > [target];
>>>>>> }
>>>>>> then
>>>>>> {
>>>>>> function print "Ingredients-list:" [need];
>>>>>> }
>>>>>>
>>>>>> Starting with an empty directory with only this recipe present:
>>>>>>
>>>>>> $ echo 1 > 1.txt
>>>>>> $ cook
>>>>>> /* .cookrc.list, /home/pdh/dev/buildtools/cook, Sun Nov 15 15:06 2009
>>>>>> */
>>>>>> cook: cat 1.txt > summ
>>>>>> cook: Ingredients-list: 1.txt
>>>>>> $
>>>>>> $ echo 2 > 2.txt
>>>>>> $ touch summ
>>>>>> $ cook
>>>>>> /* .cookrc.list, /home/pdh/dev/buildtools/cook, Sun Nov 15 15:06 2009
>>>>>> */
>>>>>> cook: Ingredients-list: 1.txt 2.txt
>>>>>> cook: the "summ" file is already up to date, no action required
>>>>>> $
>>>>>>
>>>>>> From the documentation I expected cook to rebuild the target
>>>>>> (regardless
>>>>>> of its timestamp) because it's ingredients-list changed but that's not
>>>>>> the
>>>>>> case.
>>>>>>
>>>>>> So, it that correct behavior then? I am not a programmer so this could
>>>>>> very well be desired behaviour for cook as a build tool for programs.
>>>>>> (If so, I need to solve my problem through other means).
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> Aryeh M. Friedman wrote:
>>>>>>>
>>>>>>> Fingerprints are based on file content not on time stamp (i.e. I can
>>>>>>> "touch" the file and cook will still see it as the same)... the only way to
>>>>>>> fix this is to use time stamps not finger print on the particular file (I
>>>>>>> don't have the manual handy so you will have to look the syntex up)... but
>>>>>>> you really should be sure if this is what you *REALLY* want because
>>>>>>> fingerprint using actual file contents is 99.9999...% of the time correct
>>>>>>> (namely every time I think I have a problem like this it turns out to be
>>>>>>> something else)
>>>>>>>
>>>>>>> dwark wrote:
>>>>>>>>
>>>>>>>> Hi List,
>>>>>>>>
>>>>>>>> I am new to cook and I'm trying to create a recipe that will
>>>>>>>> create a summary out of a changing set of files. The summary
>>>>>>>> should be rebuilt whenever a file is added/removed or updated
>>>>>>>> (i.e. contents changed).
>>>>>>>>
>>>>>>>> The combination of fingerprints and ingredients-fingerprint
>>>>>>>> looks to be a perfect match and indeed works largely as advertised.
>>>>>>>> Except for when the set of input files grows from 1 to 2 files,
>>>>>>>> for the first time.
>>>>>>>>
>>>>>>>> It looks like the ingredients-fingerprint is not taken when
>>>>>>>> the target is being built for the first time. The second
>>>>>>>> time around, cook can't see if the list has changed.
>>>>>>>>
>>>>>>>> Another thing is that fingerprinting the target itself, seems
>>>>>>>> to rejuvenate the target file to an age younger than it should
>>>>>>>> be (in the example below, summ should not be younger than 2.txt
>>>>>>>> during the second run).
>>>>>>>>
>>>>>>>> Am I missing something, like an additional flag or something? The
>>>>>>>> current workaround is to force the recipe when there are only 2 input files.
>>>>>>>>
>>>>>>>> Any hints/pointers would be appreciated.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Cook-users mailing list
>>>> Cook-users at lists.auug.org.au
>>>> http://lists.auug.org.au/listinfo/cook-users
>>>
>>>
>


More information about the Cook-users mailing list