WEBVTT NOTE This file was generated by Descript 00:00:00.000 --> 00:00:00.930 Manuel: All right, we're back! 00:00:01.638 --> 00:00:06.240 We're going to be fuzz testing now, and I had to struggle a little bit because 00:00:06.240 --> 00:00:11.979 I haven't done go fuzzing before, and what I did here as well in the previous 00:00:12.000 --> 00:00:19.169 episode: I told it to do one to a in our Excel column index mapping, but then 00:00:19.169 --> 00:00:24.929 midway through recording, I told it to do zero, so it did some weird stuff there. 00:00:24.929 --> 00:00:25.919 I went over it. 00:00:26.400 --> 00:00:31.439 This is the kind of things that LLMs have a notoriously hard problem with, 00:00:31.540 --> 00:00:35.110 hard time with is just basically reasoning and computing stuff. 00:00:35.230 --> 00:00:39.190 They can't do: they operate on probabilistic vibes. 00:00:39.610 --> 00:00:44.199 So I had to compute those by hand, like some of them, and made all the 00:00:44.199 --> 00:00:47.179 unit test pass, actually just had it like tab complete this because it 00:00:47.199 --> 00:00:48.850 probably have seen this code enough. 00:00:49.600 --> 00:00:52.804 And then in preparation for the fuzzing, I just created the inverse. 00:00:53.936 --> 00:00:58.270 Just so that in the fuzzing test, I can convert an index to alpha and an alpha 00:00:58.300 --> 00:01:00.160 back to index and the two should match. 00:01:00.790 --> 00:01:07.440 So, however in go, you have to write a fuzzer this way: you write 00:01:07.500 --> 00:01:11.490 a function that's called fuzz something, it takes a testing F and 00:01:11.490 --> 00:01:13.350 inside that you create a seed bag. 00:01:14.550 --> 00:01:18.980 And then using the seeded list of starting points, you pass that to 00:01:18.980 --> 00:01:22.740 the fuzzer, which will start mutating them according to heuristics. 00:01:22.970 --> 00:01:25.570 So I had to read up on a tutorial on how to do that. 00:01:25.620 --> 00:01:28.110 You know, this is not something I know well, so I didn't want 00:01:28.110 --> 00:01:29.850 to have ChatGPT make stuff up. 00:01:32.220 --> 00:01:34.620 I knew otherwise I would probably waste time. 00:01:34.620 --> 00:01:37.350 This is relatively new, so it probably hasn't seen it much 00:01:37.350 --> 00:01:38.850 in its training corpus either. 00:01:39.540 --> 00:01:41.400 So I read up on this. 00:01:41.490 --> 00:01:44.430 This is the way you do training corpus stuff, this is the way you 00:01:44.430 --> 00:01:49.050 do the the test stuff, not training corpus testing, fuzzing, corpus. 00:01:50.100 --> 00:01:55.380 So I had to experiment a little bit to have a good grippy prompt 00:01:55.410 --> 00:02:00.450 for a go fuzzer probably because it hasn't seen this enough yet. 00:02:01.020 --> 00:02:04.050 And this is what I came up with is I pasted the whole function. 00:02:06.540 --> 00:02:10.200 And then, after the whole function, I said: "what is a good way to test 00:02:10.200 --> 00:02:12.030 this function to fuzz this function?" 00:02:12.060 --> 00:02:16.500 if I just left it at that and I didn't add this, the bottom here, it would just 00:02:16.500 --> 00:02:18.630 like, give me generalities on fuzzing. 00:02:18.630 --> 00:02:22.050 So let's see: what it says here, isn't it going to be like, oh yeah, 00:02:22.050 --> 00:02:23.580 fuzzing is this and this and this. 00:02:25.420 --> 00:02:29.870 I called this corporate drone hiring interview bullshit. 00:02:29.870 --> 00:02:32.780 It's like stuff, wait, it's like not wrong, but it's also not good. 00:02:32.780 --> 00:02:36.560 And this one, because it's probabilistic, sometimes it gives good 00:02:36.560 --> 00:02:38.180 results, sometimes gives bad results. 00:02:38.180 --> 00:02:40.790 So this actually looks pretty good, I think because I gave 00:02:40.820 --> 00:02:42.050 it the body of the function. 00:02:42.410 --> 00:02:46.550 If we remove the body of the function and just give an interface, it 00:02:46.580 --> 00:02:51.470 probably is going to be very generic. 00:02:51.620 --> 00:02:54.110 So I get "test it on valid and invalid outputs." 00:02:54.590 --> 00:02:55.160 So. 00:02:56.630 --> 00:02:57.650 Kind of useless, right? 00:02:59.150 --> 00:03:07.810 So it really comes down to the prompting here, and I hope I saved this, let me 00:03:07.810 --> 00:03:13.210 see; can I just re insert my function? 00:03:13.210 --> 00:03:13.750 Here we go. 00:03:14.320 --> 00:03:19.000 And then we saw, as soon as I put in the body, it's actually able to infer. 00:03:19.060 --> 00:03:21.220 Oh 26 and an important number like. 00:03:21.730 --> 00:03:23.050 Probably has seen. 00:03:24.580 --> 00:03:28.120 It maybe, you know, I have no idea of knowing, but maybe it infers from the 00:03:28.120 --> 00:03:34.060 structure of fuzz tests and from the structure of unit tests that there's a 00:03:34.060 --> 00:03:39.370 correspondence there, and that's why it's able to provide, some of these things. 00:03:39.400 --> 00:03:42.880 You notice that this is significantly less interesting than the one we 00:03:42.880 --> 00:03:44.590 got just a couple of minutes ago. 00:03:44.620 --> 00:03:49.270 So sometimes it's worth, you know, just regenerating and see maybe it grips 00:03:49.300 --> 00:03:51.610 on to something a little bit better. 00:03:54.760 --> 00:03:57.730 But still it's very generic: we don't want to do mutational testing 00:03:57.730 --> 00:03:59.380 because that's up to the fuzzer. 00:04:00.080 --> 00:04:04.760 You can see it's not a good prompt because it will sometimes work sometimes not. 00:04:04.760 --> 00:04:10.520 So we do need to steer it a little bit more, and a way to do this is to ask it 00:04:10.520 --> 00:04:13.460 very specifically about concrete things. 00:04:13.490 --> 00:04:17.730 We know that in go, we do have a seed corpus. 00:04:20.730 --> 00:04:23.300 "To populate the seed corpus." 00:04:24.180 --> 00:04:29.030 And we tell it to give it a list, it's going to probably create a bulleted list 00:04:29.030 --> 00:04:32.660 and because it has self attention, it will probably create more bullet points. 00:04:32.660 --> 00:04:36.260 It's a way to trick it into giving more structured information. 00:04:37.416 --> 00:04:41.840 "Give a list of tests to run in the fuzz body test body itself", and 00:04:41.840 --> 00:04:43.190 then let's see what happens here. 00:04:54.770 --> 00:04:57.020 So, much more interesting, right? 00:04:57.230 --> 00:05:00.820 Not only is this structured correctly, we have these two steps 00:05:00.820 --> 00:05:02.440 that are important for a fuzzer. 00:05:03.610 --> 00:05:06.640 We also have much more concrete suggestions of things. 00:05:07.090 --> 00:05:10.690 So here we would do the same structure as before. 00:05:10.720 --> 00:05:14.890 So let me just see how this works. 00:05:15.460 --> 00:05:17.140 So we have to do F. 00:05:17.740 --> 00:05:18.250 Add. 00:05:18.710 --> 00:05:23.090 I'm going to take this to seed the test corpus and you will see there is something 00:05:23.090 --> 00:05:26.261 for the test for the fuzzer to grip onto. 00:05:27.035 --> 00:05:30.320 Not in our function that we unit tested, but the other 00:05:30.320 --> 00:05:31.820 one, which I didn't unit test. 00:05:32.510 --> 00:05:36.020 So "start with empty or invalid values." 00:05:36.100 --> 00:05:44.470 We're going to do "empty or invalid values is equal to" and. 00:05:45.130 --> 00:05:47.230 What does it tell us negative numbers? 00:05:47.530 --> 00:05:55.240 So it filled that in, and then we can just have this, it doesn't know to add it. 00:05:55.270 --> 00:05:55.600 Right. 00:05:55.660 --> 00:06:01.120 So it still thinks it's doing an empty a unit test, but now it will recognize 00:06:01.120 --> 00:06:02.770 that we're seeding a test corpus. 00:06:03.220 --> 00:06:12.820 So here, it will mirror the structure we had above and I can just tab complete it. 00:06:14.350 --> 00:06:17.590 Oh, so here it actually started fuzzing, which is interesting. 00:06:18.070 --> 00:06:19.030 That's not what I want. 00:06:19.120 --> 00:06:21.850 I want it to like add 10 random numbers. 00:06:22.916 --> 00:06:25.780 Maybe not a thousand. 00:06:28.930 --> 00:06:30.220 And F ad. 00:06:31.480 --> 00:06:32.410 There we go. 00:06:32.860 --> 00:06:34.240 Let's make it a little bit more. 00:06:34.340 --> 00:06:39.290 Actually, I'm going to do this on purpose: I'm not going to have it above 16,000. 00:06:39.680 --> 00:06:44.360 And the other thing I'm gonna do is like fix seed. 00:06:45.380 --> 00:06:51.910 And, see, I, I kind of know what I wanted here, I was trying to see how far it goes. 00:06:52.480 --> 00:06:56.200 And then, this, I'm going to leave out, I'm going to have the fuzzer 00:06:56.200 --> 00:06:59.020 do that actually to make things a little bit more interesting. 00:06:59.020 --> 00:07:00.190 I'm just going to remove all of this. 00:07:00.850 --> 00:07:02.950 So test to run into fuzz body. 00:07:02.980 --> 00:07:07.930 So the way fuzzing in go works is that you do this, and then you give it a 00:07:07.930 --> 00:07:13.870 function that has a mutated value and just have it run standard unit tests. 00:07:14.380 --> 00:07:20.200 So the first thing we would do is of course this thing here, which is 00:07:20.350 --> 00:07:26.410 we take a column to alpha string, we convert it to a string and then we put 00:07:26.410 --> 00:07:30.310 it back into an integer, and we go out. 00:07:30.460 --> 00:07:36.220 And however, it's very possible that we exceed the valid range, right, so 00:07:36.730 --> 00:07:45.460 if the column index is smaller than zero or equal to zero, or the column 00:07:45.460 --> 00:07:57.320 index is bigger than 16384 , which is outside of the valid Excel column range. 00:07:58.700 --> 00:08:01.230 So this is something we already unit tested. 00:08:02.990 --> 00:08:03.920 This is all fine. 00:08:03.920 --> 00:08:07.199 And then if it is actually a valid value, we're going to do this. 00:08:07.518 --> 00:08:09.950 We could have looked at this. 00:08:11.990 --> 00:08:13.760 So we kind of did this here, right? 00:08:15.500 --> 00:08:18.469 So, let me just paste this in and then see if we can keep 00:08:18.469 --> 00:08:19.940 some of those as good ideas. 00:08:22.010 --> 00:08:25.340 And as I said, you can try regenerating things here. 00:08:25.730 --> 00:08:29.210 See what it gives you, maybe it will return more interesting things. 00:08:29.960 --> 00:08:32.870 This is obviously enough for fuzzing? 00:08:32.930 --> 00:08:34.460 Well, we already have here, but. 00:08:38.599 --> 00:08:42.860 So, we want to test that. 00:08:44.240 --> 00:08:47.900 Then we want to verify that the return string has the correct length. 00:08:48.470 --> 00:08:52.730 So, let's see if it grasps. 00:08:52.730 --> 00:08:54.950 So if it's bigger than three that's that's wrong. 00:08:54.980 --> 00:08:55.619 That's true. 00:08:57.248 --> 00:08:58.400 That's a good one too. 00:09:00.110 --> 00:09:07.010 This is an interesting pattern, EqualFold, I've never seen that before. 00:09:08.030 --> 00:09:11.120 This is an interesting part about large language models, they will sometimes 00:09:11.120 --> 00:09:16.130 suggest idiomatic ways of doing things like this one, I didn't know. 00:09:16.250 --> 00:09:18.110 I don't know if it's correct, but I think so. 00:09:18.829 --> 00:09:22.130 Probably is, it will also suggest APIs that don't exist. 00:09:22.130 --> 00:09:25.750 So the previous time I did this it suggested having a function 00:09:25.780 --> 00:09:29.390 that's called isUppercase which didn't exist but then I just had 00:09:29.390 --> 00:09:31.430 to like create it because why not? 00:09:33.928 --> 00:09:34.580 So. 00:09:42.410 --> 00:09:43.028 Yeah, that sounds good. 00:09:44.165 --> 00:09:47.540 "Test with negative input and ensure that the function returns, that we did. 00:09:49.040 --> 00:09:52.310 Alright there, we did that, invalid values, we did that. 00:09:53.030 --> 00:09:57.530 Same input value multiple times, this is all up to the fuzzer. 00:09:57.950 --> 00:10:00.500 So it gave us like a couple of more ideas. 00:10:01.980 --> 00:10:02.850 These are pretty cool. 00:10:05.310 --> 00:10:07.890 I mean, they kind of match what we had before, but for the sake of 00:10:07.890 --> 00:10:10.620 example, these are like additional tests we could add in here. 00:10:11.190 --> 00:10:16.080 Of course, our back and forth test, it's pretty good, but what if both 00:10:16.080 --> 00:10:19.710 functions are broken, which is something that I did in the past. 00:10:19.740 --> 00:10:24.720 So, more tests, can't really hurt. 00:10:24.780 --> 00:10:26.730 I am not sure about the 7 0 2. 00:10:27.180 --> 00:10:29.460 But I suspect it's 26 squared. 00:10:29.790 --> 00:10:30.400 It's not. 00:10:33.420 --> 00:10:35.580 'cause what does ZZ do? 00:10:37.380 --> 00:10:40.080 Shouldn't this be 26 squared. 00:10:47.460 --> 00:10:49.230 It should be 27 squared, right? 00:10:49.770 --> 00:10:50.070 No. 00:10:52.950 --> 00:10:54.510 It's 27 times 26. 00:10:54.510 --> 00:10:56.490 I'm always a pro for off by one. 00:10:56.490 --> 00:10:58.710 So I'm going to trust ChatGPT on this one. 00:10:59.640 --> 00:11:00.420 And. 00:11:06.180 --> 00:11:07.860 All right, I'm gonna leave it here. 00:11:07.950 --> 00:11:12.480 And one thing we can do with go fuzz testing is that first 00:11:12.480 --> 00:11:13.800 we want to test our test bag. 00:11:15.180 --> 00:11:19.030 So maybe we can add a few more functions here. 00:11:20.040 --> 00:11:26.070 So one thing that would be good is generate multiples of 26 and 27. 00:11:28.710 --> 00:11:31.680 So multiples of 26, 27. 00:11:32.310 --> 00:11:39.450 I am skeptical about those, but maybe, it can't hurt, it's a little bit too many. 00:11:42.690 --> 00:11:44.790 "Generate 10." 00:11:44.980 --> 00:11:50.020 The counting stuff is like not something that LLMs are very good 00:11:50.020 --> 00:11:54.400 at, but it looks like this is still in the realm of possibilities. 00:11:55.120 --> 00:12:02.110 So let's test our test, or seed corpus, just to make sure that 00:12:02.500 --> 00:12:04.030 we don't already fail here. 00:12:04.540 --> 00:12:09.740 All right, so our seed corpus passes, and now we're going to fuzz. 00:12:10.090 --> 00:12:14.240 So the way this works in go, I think is you just run it with "go test -fuzz". 00:12:15.340 --> 00:12:25.480 And then here, I think actually, because of this test, it won't fail. 00:12:26.439 --> 00:12:33.970 So, just to check if maybe it finds it, just add something weird in here. 00:12:34.630 --> 00:12:38.800 If index is equal to 55, just return this. 00:12:40.840 --> 00:12:41.740 Let's see what happens. 00:12:42.160 --> 00:12:43.570 Is the fuzzer able to find it? 00:12:48.400 --> 00:12:52.090 And it was able to find it, which, you know, it was pretty quick cause 00:12:52.090 --> 00:12:53.380 we're just dealing with integers. 00:12:53.740 --> 00:13:01.570 So this is fuzzing with LLMs, which this is kind of useful, but then also, if I had 00:13:02.440 --> 00:13:07.240 more fuzzing experience in go for example and more stuff to copy paste in here. 00:13:07.810 --> 00:13:12.850 I would probably, if I had a good fuzz thing here. 00:13:12.850 --> 00:13:13.030 Right? 00:13:13.030 --> 00:13:16.600 A good, good fuzz harness for an existing say state machine or some 00:13:16.600 --> 00:13:20.230 kind of like complicated construct, I would actually just paste it in and 00:13:20.230 --> 00:13:23.140 then ask it to fuzz in a similar way. 00:13:23.830 --> 00:13:27.010 And that would work pretty well, they would work the same way as 00:13:27.010 --> 00:13:28.420 the unit tests we did before. 00:13:29.620 --> 00:13:30.130 That's it.