[Cook] (ingredients-)fingerprint question.
Pieter den Hertog
pieter at dwark.nl
Tue Nov 17 08:17:21 EST 2009
I apologise if its a bit messy. Bottom-line is: the combination of
fingerprint and ingredients-fingerprint behaves in an unexpected and
(for me at least) counter intuitive manner:
Trying to build a target for the second time, with a new
ingredients-list, fails even though the ingredients-fingerprint flag
is set and the new ingredient is younger than the target...
details below.
Aryeh M. Friedman wrote:
> Pieter den Hertog wrote:
>> I was trying to understand the effects of (ingredients-)fingerprint
>> and constructed a simple
>> example, hence the use of [glob *.txt]. The example entails an empty
>> dir with files appearing
>> one by one... so something like: (1.txt, 2.txt, ... n.txt) -> summary.
>> On each file appearing
>> I wanted to (re)build the summary.
>
> Do they appear frequently (i.e. as soon one cook is done there is an
> other file ready)? If not I suspect cook will be run from cron or some
> other script (that is likely run by cron).... that being the case the
> case of files appearing after you make your manifest is quiet unlikely
> and thus there is very little reason to use non-native pattern matching
> (i.e. glob or some other OS based method)... the main reason for
> wanting to do it this way instead of the glob way is that cook patterns
> are reversible (see the tutorial and the chapter on pattern matching in
> the manual)... additionally it lets you match on specific components of
> the pattern instead of all or nothing... for example even though not
> specifically stated in the tutorial or the manual it is possible to have
> some very powerful recipes that take advantage of this... for example
> here is the default java recipe for the main project I use cook for:
> obj/%0%.class: src/%0%.java
> set mkdir
> {
> [setColor red];
> echo JAVA':'Compiling [target];
> [setColor white];
>
> javac -d obj -cp [classPath] -sourcepath src src/%0%.java;
> }
>
I must confess I'm not following here ... from your (excellent
tutorial by the way) I understand the %0 to match the path leading up
to a target and % to match the target name. So you have a one-on-one
relationship:
obj/deeper/path/funky.class <- src/deeper/path/funky.java
and the recipe can be used for funky, spunky, ... *.java sources.
In my case its more like a set of files that get crunched into one
file and where the set changes over time:
20090101.txt -> cook -> 200901.summary
20090101.txt 20090102.txt -> cook -> 200901.summary
20090101.txt ... 20090131.txt -> cook -> 200901.summary
20090201.txt -> cook -> 200902.summary
20090201.txt 20090202.txt -> etc..
So each day brings a new file. A daily cook run (from cron) should
cook all the monthlies, ignoring past months that completed already.
Due to some evil software, past files (e.g. 20090115.txt while its
november already) might get reconstructed and shoved into the
directory again with exactly the same contents as it had on the 15th
of Jan this year. In general I can't rely on the timestamps to have
any meaning whatsoever... So I was looking to use:
- the ingredients-fingerprint flag to pick up on new files
through sheer presence, and
- the fingerprint flag to prevent unnecessary rebuilds in
case files still had the same contents...
> or here is one we use for building web pages from their components (the
> company's main product is a widget based framework for making web
> frameworks [with a few vertical market examples thrown in]):
>
> site/ROOT/%0%.html: src/frontend/pages/%0%/definition
> set mkdir
> {
> [setColor green];
> echo HTML':'Creating [target];
> [setColor white];
>
> [infiniteGlue] src/frontend/pages/%0%/definition>[target];
> }
> (set color is a little function I wrote that sets the text color on a
> ansi/vt100 term)... basically it says for all sources in our source tree
> (some 1000+ classes) we do exactly the same thing....
>
> "[infiniteGlue]" is a language we wrote for "gluing" together files from
> components (think cpp on steroids).
>
Is 'definition' a directory (whose entries are the components) or is
it a file that read by [infiniteGlue] to get at the filenames of the
components for a particular page under site/ROOT ?
I appreciate the importance of generality of recipes and as soon as I
have my head wrapped around the interaction between fingerprint and
ingredients-fingerprint, that's where I'll head next ...
> The reason for making the recipes more general then needed is we have a
> single master cook book that is shared among many projects (all
> maintained by aegis as different projects)... therefor a client's site
> cookbook is usually no more then:
>
> (includes not shown):
>
> classPath=[getClassPath backend infinteGlue isuParse thistest];
>
> all: [html] [css] [js] db
> set default
> {
> echo [project] built;
> }
>
> install: all
> set nodefault
> {
> echo [project] installed to personal instance;
>
> rm -rf site/ROOT/fragments;
> touch [home]/isu2/servlet/WEB-INF/web.xml;
> touch [home]/isu2/ROOT/WEB-INF/web.xml;
> cp -Rv site/'*' [home]/isu2;
> }
>
> [aegis supplies the [project] and [home] vars and all the other vars are
> generated by the includes not shown] (some of this needs to be
> refactored and made more general [form example [home]/isu2 is rarely the
> right destination dir but some other tools assume all sites live there
> so we are stuck with it for now)
>
>
>>
>> That's when I noticed the rebuild doesn't happen if 2.txt is added,
>> but after adding 3.txt
>> everything works exactly as expected.
>
> ingredients-fingerprint if I understand it correctly does not tell cook
> to generate finger prints for the ingredients, it only tells it to
> expect the ingredients, to have fingerprints vs. time stamps (if you
> have "set fingerprint" globally this is redundant anyways)
>>
>> Using your suggestion, I can work around the special case (i.e. the
>> second run for the
>> summary target) with the following recipe:
>>
>> inp = [glob "*.txt"];
>> summ : [inp]
>> set fingerprint ingredients-fingerprint
>> ["if" [in [count [inp]] 2] "then" forced]
>> {
>> cat [need] > [target];
>> }
>
> That gate really shouldn't be needed (in 7 years of using cook I have
> only written one recipe that needed a gate and even on that one my gut
> says I am doing something wrong if I need the gate at all)... Peter (or
> Walt) if you're reading this would a :: work instead (i.e. tell it to
> examine all the ingredients [recursivally] even if the immediate ones
> are up to date)?
>
> Namely if this is all your doing shouldn't the following cook book do it:
>
> set fingerprint;
> manifest=[match_mask %0%.txt [collect find . -name "'*.txt'"]];
>
> summ: [manifest]
> {
> cat [manifest] > [target];
> }
Your recipe works in a normal world: new file appear and are younger
than the target, using the cookbook you gave:
$ echo 1 > 1.txt
$ cook
/* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:33 2009 */
cook: cat ./1.txt > summ
$ echo 2 > 2.txt
$ cook
/* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:33 2009 */
cook: cat ./1.txt ./2.txt > summ
But if 3.txt appears and is older than the target, summ won't get
built. That's why I thought adding the ingredients-fingerprint flag
would help. But to my surprise (new run, empty directory except for
the cookbook which now has the ingredients-fingerprint flag set for
the summ recipe):
$ echo 1 > 1.txt
$ cook
/* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:36 2009 */
cook: cat ./1.txt > summ
$ echo 2 > 2.txt
$ cook
/* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:36 2009 */
cook: the "summ" file is already up to date, no action required
$
$ cat summ
1
$ ls -ltc --time-style=+%s
total 12K
-rw-r--r-- 1 pdh pdh 2 1258403810 2.txt
-rw-r--r-- 1 pdh pdh 2 1258403806 summ
-rw-r--r-- 1 pdh pdh 2 1258403805 1.txt
$
$ cat .cook.fp
"1.txt" = { 1258403805
".4CLFTK5xPYw5FeI8bG8u8J0LCIjs0NLTho0SpjIrGdJ5Keipp8iX88ICq0EVZ3R0ntqF40NdtwQ"
}
"2.txt" = { 1258403810
".:EEaxKnhRJWpD49ubks2Sy0Qu0Kh03LT800DElUiSBI72h6gdyKVBLJH80voXgX0LEhdl0PJiD4"
}
"summ" = { 1258403812 1258403812 1258403806
".4CLFTK5xPYw5FeI8bG8u8J0LCIjs0NLTho0SpjIrGdJ5Keipp8iX88ICq0EVZ3R0ntqF40NdtwQ"
"DqFpFO17Df1sv18e9L2n3J80GWFJN0fmq2P02pgm1bUsyh.IYqb4k8u9Qz0X5zw40usnhH030g3M"
}
So using the combination of fingerprint and ingredients-fingerprint
rejuvenates the target's fingerprint to 1258403812 while it's actually
1258403806 seconds since the epoch, preventing the target from being
built even though its ingredients-list has changed from "1.txt" to
"1.txt 2.txt".
Which is rather counter-intuitive given the advertised nature of the
ingredients-fingerprint flag...
Only after adding the third file, do we get cooking again:
$ echo 3 > 3.txt
$ touch summ
$
$ ls -ltc --time-style=+%s
total 16K
-rw-r--r-- 1 pdh pdh 2 1258404192 summ
-rw-r--r-- 1 pdh pdh 2 1258404189 3.txt
-rw-r--r-- 1 pdh pdh 2 1258404177 2.txt
-rw-r--r-- 1 pdh pdh 2 1258404173 1.txt
$
$ cook
/* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:43 2009 */
cook: cat ./1.txt ./2.txt ./3.txt > summ
o-> Now that's what I wanted on the second run as well: summ might be
newer than all of the files listed on its dependency list but since
the list itself changed (2.txt was added) it should've rebuild the target.
Anyway, this is probably a borderline case only present in my specific
case because some braindead piece of software is shoving files into a
directory: out-of-order and with insane timestamps.
>>
>> So that works and my problem is solved, but it still has me puzzled
>> about the
>> ingredients-fingerprint: its purpose is to track a changes to the list
>> of ingredients
>> for which it would be necessary to record the very first
>> ingredients-list used to
>> build the original first incarnation of the target... right?
>
> I don't think so... as explained above
Actually, from the documentation and some experimenting I believe the
ingredients-fingerprint takes a fingerprint of the string containing
the names of the ingredients (after some lexical sort) and it's
independent of the actual contents of target or ingredients, e.g.:
$ cat .cookrc
all: sum1 sum2;
sum1 : 1.txt 2.txt
set ingredients-fingerprint
{
function print sum1([need])
[write [target] Hello,];
}
sum2: 2.txt 1.txt
set ingredients-fingerprint
{
function print sum2([need])
[write [target] world!];
}
$ touch 1.txt 2.txt && cook; cook;
/* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:55 2009 */
/* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 21:55 2009 */
pdh at cajun$ cat .cook.fp
"sum2" = { 1258404920
""
"DqFpFO17Df1sv18e9L2n3J80GWFJN0fmq2P02pgm1bUsyh.IYqb4k8u9Qz0X5zw40usnhH030g3M"
}
"sum1" = { 1258404920
""
"DqFpFO17Df1sv18e9L2n3J80GWFJN0fmq2P02pgm1bUsyh.IYqb4k8u9Qz0X5zw40usnhH030g3M"
}
$ cat sum1 sum2
Hello,
world!
o-> The "" entry is where the target's file-fingerprint would've gone
had that been enabled. The other entry is a fingerprint of the string
"1.txt 2.txt" by the looks of it. The actual contents of the targets
doesn't influence the ingredients-fingerprint and neither does the
contents of the ingredients used:
echo blahblah > 1.txt
$ cook sum1
/* .cookrc.list, /home/pdh/dev/buildtools/cook, Mon Nov 16 19:31 2009 */
cook: sum1(1.txt 2.txt)
$ cat .cook.fp
"sum2" = { 1258396272
""
"DqFpFO17Df1sv18e9L2n3J80GWFJN0fmq2P02pgm1bUsyh.IYqb4k8u9Qz0X5zw40usnhH030g3M"
}
"sum1" = { 1258396272
""
"DqFpFO17Df1sv18e9L2n3J80GWFJN0fmq2P02pgm1bUsyh.IYqb4k8u9Qz0X5zw40usnhH030g3M"
}
o-> After changing the contents of 1.txt and rebuilding only sum1,
both sum1 and sum2 still have the same ingredients-fingerprint because
their dependencies are still the same.
That's why I thought I could track file additions/removals through the
ingredients-fingerprint flag (and indeed that works except between
the first and second run).
Which is also why $6.7.2 of the manual confuses me: why would I want
to 'remember' dependencies in a file to track vanishing dependencies
if I have an ingredients-fingerprint flag?
>>
>> The background to all this is that I'm trying to automate part of a
>> reporting process
>> where cisco syslog files (after duplicate removals and some name
>> mangling which totally
>> destroys their timestamps) are shoved into a directory where I can
>> read them.
>>
>> The logs are daily files and the summary should reflect a (running)
>> monthly totals on
>> some specific events in the network. I wanted to be selective about
>> rebuild the totals
>> since they take a looong time to complete.
>>
>> I'll have a look at aegis as well to see if that's a better fit for my
>> particular challenge
>> (though cook seemed a pretty straightforward tool for such a job).
>>
> Aegis is overkill unless you need to "rewind" the logs to any given past
> state... a better use of aegis in your case would be to use it to keep
> track of router configs (and any other config) as they change over
> time... read Peter's article "Aegis Is Only For Software, Isn't It?"
> (http://aegis.sourceforge.net/auug96.pdf)
That's what I thought.
Again thanks for the feedback & tips.
>>
>>
>> Thanks for the feedback!
>>
>>
>> Aryeh M. Friedman wrote:
>>> I forgot to ask last time (and I guess this time) why your using
>>> [glob *.txt] instead of %.txt or %0%.txt as your ingredient list (it
>>> shouldn't matter in this case but it is non-standard)
>>>
>>> Aryeh M. Friedman wrote:
>>>> Does set force do the job?
>>>>
>>>> Also unless I am not sure of what your doing this seems more like a
>>>> job for aegis or an other cms then for cook (I am assuming that
>>>> XXX.txt are infrequently updated in the real world)
>>>>
>>>> pdh wrote:
>>>>> In this case I'm just working with text files where the actual
>>>>> content and presence
>>>>> of the input files are more important than timestamps... (it is not
>>>>> about building a
>>>>> program)... timestamps are messed up by other tools beyond my
>>>>> control...
>>>>>
>>>>> The recipe actually works, except for the second run ... and as far
>>>>> as I can tell,
>>>>> that's because the ingredients-list is not fingerprinted on the
>>>>> first run.
>>>>>
>>>>> Using only the ingredients-fingerprint feature, I still have
>>>>> strange results:
>>>>>
>>>>> recipe:
>>>>>
>>>>> summ : [glob "*.txt"]
>>>>> set ingredients-fingerprint
>>>>> {
>>>>> cat [need] > [target];
>>>>> }
>>>>> then
>>>>> {
>>>>> function print "Ingredients-list:" [need];
>>>>> }
>>>>>
>>>>> Starting with an empty directory with only this recipe present:
>>>>>
>>>>> $ echo 1 > 1.txt
>>>>> $ cook
>>>>> /* .cookrc.list, /home/pdh/dev/buildtools/cook, Sun Nov 15 15:06
>>>>> 2009 */
>>>>> cook: cat 1.txt > summ
>>>>> cook: Ingredients-list: 1.txt
>>>>> $
>>>>> $ echo 2 > 2.txt
>>>>> $ touch summ
>>>>> $ cook
>>>>> /* .cookrc.list, /home/pdh/dev/buildtools/cook, Sun Nov 15 15:06
>>>>> 2009 */
>>>>> cook: Ingredients-list: 1.txt 2.txt
>>>>> cook: the "summ" file is already up to date, no action required
>>>>> $
>>>>>
>>>>> From the documentation I expected cook to rebuild the target
>>>>> (regardless
>>>>> of its timestamp) because it's ingredients-list changed but that's
>>>>> not the
>>>>> case.
>>>>>
>>>>> So, it that correct behavior then? I am not a programmer so this could
>>>>> very well be desired behaviour for cook as a build tool for programs.
>>>>> (If so, I need to solve my problem through other means).
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Aryeh M. Friedman wrote:
>>>>>> Fingerprints are based on file content not on time stamp (i.e. I
>>>>>> can "touch" the file and cook will still see it as the same)...
>>>>>> the only way to fix this is to use time stamps not finger print on
>>>>>> the particular file (I don't have the manual handy so you will
>>>>>> have to look the syntex up)... but you really should be sure if
>>>>>> this is what you *REALLY* want because fingerprint using actual
>>>>>> file contents is 99.9999...% of the time correct (namely every
>>>>>> time I think I have a problem like this it turns out to be
>>>>>> something else)
>>>>>>
>>>>>> dwark wrote:
>>>>>>> Hi List,
>>>>>>>
>>>>>>> I am new to cook and I'm trying to create a recipe that will
>>>>>>> create a summary out of a changing set of files. The summary
>>>>>>> should be rebuilt whenever a file is added/removed or updated
>>>>>>> (i.e. contents changed).
>>>>>>>
>>>>>>> The combination of fingerprints and ingredients-fingerprint
>>>>>>> looks to be a perfect match and indeed works largely as advertised.
>>>>>>> Except for when the set of input files grows from 1 to 2 files,
>>>>>>> for the first time.
>>>>>>>
>>>>>>> It looks like the ingredients-fingerprint is not taken when
>>>>>>> the target is being built for the first time. The second
>>>>>>> time around, cook can't see if the list has changed.
>>>>>>>
>>>>>>> Another thing is that fingerprinting the target itself, seems
>>>>>>> to rejuvenate the target file to an age younger than it should
>>>>>>> be (in the example below, summ should not be younger than 2.txt
>>>>>>> during the second run).
>>>>>>>
>>>>>>> Am I missing something, like an additional flag or something? The
>>>>>>> current workaround is to force the recipe when there are only 2
>>>>>>> input files.
>>>>>>>
>>>>>>> Any hints/pointers would be appreciated.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Cook-users mailing list
>>> Cook-users at lists.auug.org.au
>>> http://lists.auug.org.au/listinfo/cook-users
>>
>>
More information about the Cook-users
mailing list