Episode Transcript
[00:00:10] Speaker A: All right, welcome to Arbery Digital experiences. This is episode eleven.
I'm Tad Reaves, principal architect at Arboretigital, and I'm joined here by Duane Hale, who is our newly appointed CTO of Arbery Digital, as well as Bryce Acker, who is the CEO at Arbery Digital. And today we're going to talk about am war stories, for lack of a better word. The good, the bad, the ugly, the things that have gone well, the things that have gone enormously poorly, and what we've learned about all that because we've been through a couple of things.
So we're going to get right into this. First off, though, I want to give a chance for Bryce and Duane to introduce themselves. Bryce, why don't you go first?
[00:00:54] Speaker B: Sure. And with that introduction, did you just call us old? Is that what you're saying?
[00:01:00] Speaker A: Experienced? Experienced, yes.
Awesome.
[00:01:04] Speaker B: So hey, my name is Bryce Acker.
Me and partners founded Arboret Digital five years ago. Now it's awesome. I've been in the really the consulting world most of my career and specifically Adobe experience manager for about 15 years.
Really wet my teeth actually. It was a day cq back in the day and has since been rebranded, built a whole monitoring team with Rackspace, which where I met Tad and Dwayne and sort of got the band back together at Arbery and really happy to have them. And also, like Tad said, congrats to Dwayne, who was just appointed our CTO.
[00:01:53] Speaker C: I'm Dwayne Hale. I'm the newly minted CTO of Arbery Digital. I've had a little over coming up on eleven years experience in the AEM world, both on the development side of things and on the operations side of things. Prior to that, however, I was developing in something that wasn't aemeral. So counting that experience, probably more around 1213 years of software development and operations experience.
[00:02:22] Speaker A: Outstanding.
And for me, I've been in Am land for the last 14 years. I guess it's been 2010 when I got involved. I remember very clearly the time when one of my co workers found out that day was getting purchased by Adobe. There was audible expletives in both joy and what does this mean for the future? Obviously it's meant a pretty bright and varied future for all of us because we've gotten a chance to really just get involved in a broad variety of infrastructures and applications and so forth. AEM being the center of a lot of that and because of the fact that these are the usually really massive implementations, usually massive sites for big companies. And so forth that has to plug into a lot of things.
And as a result, we've gotten a chance to see things go every which way in terms of things.
[00:03:20] Speaker B: And if I may, real quick, just talking about that acquisition, I remember that distinctly as well.
You said that there was a couple different expos. What does this mean?
It really ended up being a great fit for, I think both companies. We were partners, company mine, we were partners with D Software. And really at that point it was really gaining momentum.
But they had, I believe they had two north american sales reps and they had no development footprint. I believe everything was over in Switzerland at the time of the acquisition.
It was obviously a great tool even before it got rebranded as WEM and then AEM. Most people don't remember the WeM, but they were starting to sell it. And so all these companies were doing, I was actually on a project migrating Kellogg's, Kellogg's dot over at the time and there was really no tech knowledge North America. So they were partnering with a lot of consulting firms to basically, once they sold the product to help those customers tailor it to their needs. So it really ended up being a good fit with Adobe, just being able to have a tech team as well, internal house, a lot more product development, but really for the product itself, obviously Adobe has a world class sales staff, so they really ratcheted that up and have brought that tool to where it is now, where it's a significant market share across corporate America.
[00:05:00] Speaker A: That's right. And I think though, the amount of muscle that it brought to the table and the amount of credibility that Adobe brings to the table at that point because that whole amateur acquisition had just kind of taken place. So people really knew that Adobe not only was just serious about design, but really serious about web marketing as well. And, but as a result, as the company's involved, there's so many different requirements that all these different websites have. So many different requirements that like, you know, a big site has. There's no template, there's no one thing. One.
[00:05:36] Speaker B: Well, there's a lot of same, same but different.
[00:05:39] Speaker C: Right.
[00:05:39] Speaker B: There's a lot of things that people do similarly and then there's a lot of customization. So everyone has their own bespoke use cases.
[00:05:45] Speaker A: Well, that's right. That's right.
Well, firing into this, the format that I wanted to go with on this just to kind of get things started was to kind of plumb the depth a little bit of some of the things that Dwayne, you and Bryce have seen over the course of doing a lot of different am implementations, because three of us, we put together a list of all of the companies that we've dealt with and all the different sites that we've dealt with. And that list is.
[00:06:14] Speaker B: It's long.
[00:06:15] Speaker A: It's shockingly long. Yeah. But. Oh, I remember that one. Oh, that was terrible. Yeah, good. That one was over.
[00:06:21] Speaker B: It was a neat exercise because there were, I mean, there's, you know, a handful of, like three month projects that I've done that.
[00:06:26] Speaker A: That's right.
[00:06:27] Speaker B: Without going back and then going through a lot of old logs and a lot old, you know, just notes I've taken from different various projects or whatnot. And lessons learned. You totally forget about some of those smaller ones, too, even though some of those are, were very foundational as well.
[00:06:47] Speaker A: That's right. Some of those three month ones were pretty intense.
But here's my first, I guess, leading question in terms of AEM or implementation projects. We don't even necessarily have to limit this just to AEM, but I know where our experience lies here. So in terms of a team, what would you say was the best composed team that you've been on, where you had developers, architects, QA folks, project management? It was right sized for the task at hand.
They were ultimately successful because they had the right number of people. Not way too overboard, just only surviving through complete throngs of people.
It was a good team and got the job done. Does any, any projects come to mind for that?
[00:07:44] Speaker B: Dwayne, you want to take that first? I got one specifically.
[00:07:48] Speaker C: Yeah, I've got one from years ago. You may or may not remember it, Bryce, but it was when I worked prior to managed services, I was working in consulting, and that's kind of how I've come back around. But the team was mainly all in house. And really the only parts of the team that weren't actually part of the consultancy that was working with this customer was the QA. People were provided by the customer. They knew the website in and out and the functionality of it in and out, way better than anybody. We would have brought in third party or even from our own organization. And the business stakeholders, of course, they were from the customer themselves. But everything else, it was right sized in terms of project managers, right sized in terms of development, and right sized in terms of both having operations, people that could help spin up or deploy code, spin up dev boxes, spin up staging boxes. And it really was, it's kind of a double edged sword. Cause we'll get, I'm sure we'll get to that in later questions. But when. When you're talking about the composure of the team, it was really the idea, situation, and the skill sets of the team as well, really meshed together well.
[00:09:17] Speaker A: I think one of the challenges, I was gonna say one of the. One of the challenges in that is, is acquiring such a team. It's one thing when it's fortuitous that you have all these people there, but it's another thing to try to keep that team together and try to. Try to say, hey, we want to implement AEM.
Let's have this team. Let's have a team of that size. Because especially now that AEM is so much more mature, it's pretty difficult to hire as full time employees a team of that size.
[00:09:50] Speaker B: Yeah, totally agree. You know, honestly, with more experience, people, you know, depending on the project, you know, you can get sort of long in the tooth. And as people gain experience and confidence, you know, with a tool, with platform, they want to go solve bigger and bigger problems. And that's sometimes hard to keep a whole team focused on one problem depending on the size and complexity of it. So mine would have been really. I think that, like, one of the best, largest teams that I worked on for was the Myford mobile project.
And it was, honestly, because it was such a.
There were so many technologies involved in it as well, and really such a front and center focus for Ford at the time. And so this was their first mobile app that it really connected with their vehicles. It was around the 2012, I think, something that timeframe.
They had a new car called the C Max, which sort of a glorified golf cart, honestly, at the time. But electric vehicles have come a long way in a decade.
But the mobile app would interact with the car, tell you where the car's at. You could probably program charging times. You could see your battery state. There was a lot of gamification to it, too, where there were regional and global leaders lists, I think, on miles driven and energy saved and various other categories.
And it was obviously a website as well. So my four mobile.com. so it was a mobile app and the web version of that, and then the plumbing. Behind the scenes, obviously, you're dealing with vehicles. And so there's inboard, like on the car computers that have a cell signal out.
It had their own IP address that's going into this whole management tier, or there's a large database and a large sort of controlling application that would interact with the cars. And that surfaced sort of an API for our side of the house, which was the mobile app and the website and all of that was sort of derived out of AEM itself. And I say sort of the website was all managed within AEM.
There was a lot of trials and tribulations around that project that were sort of firsts.
Adobe had recently acquired Foamgap, which was Adobe Cordova, which is a library allowing you to build a mobile application and, sorry, write a mobile application and build it to multiple platforms. I think they did Windows seven. We only targeted iOS and Android at the time, but basically, or it's one code base, but built to both platforms.
We had a large team, we had a large onshore team and also offshore that I was helping manage and architect the AEM solution of it.
We were working hand in hand with Adobe Engineering because again, they had just bought that product, Adobe, Sorry, Apache Cordova, branded as Phonegap. And so this was also, it was the very first mobile app using Phonegap published where the content came from, admiration for the mobile app itself. It wasn't actually pulling from AEM server, it got packaged. So it was authored in AEM. The whole website was driven by AEM. But when you went to build the mobile application from the store, it was pulling the content out of AEM and then creating a package for the, for the app store. But you know, I felt it was staffed really well. We had all these different skill sets and tools and platforms that we had to interact across teams of, we had weekly meetings across those leads. We had internal project plans per expertise.
I think there was a lot of lessons learned that I loved from that project.
A lot of us were actually traveling up and going to Chicago at the time and we had a whole floor. And for our stand ups, there was a lot of soccer fans on that team. And so we sort of get big circle again. We had the space to even do this too, so there would be about 30 people in a big circle and you weren't able to give your update unless you had the ball and then you kick it to someone else and then they give their standalone update. And it kept it quick and it kept it to the point and it also was very collaborative. So there's still a lot of colleagues from that project that I keep in touch with. And it's really interesting to see how many have gone to be sort of captains of industry leaders, tech leads, you know, started their own companies themselves and from that whole project.
And so it was, you know, it was, I think they for, had the budget, they had the corporate focus on it and staffed it really, really well.
[00:15:09] Speaker A: I think one of the things too that a lot of successful projects that I've been on have. That's similar to what you're just saying. There is that we got a whole bunch of disciplines, but a lot of times there are multiple different areas where those have to be in the. Then you end up having to be really flexible and appropriate with how you deal with remote learners and not have a bunch of stops on whether or not you can have people. If you need to get people all at the same time to go and talk something over, that you get them all in the house at the same time to talk something over. It's tough when you have kickoffs and so forth that don't occur, that are all remote. People haven't met each other. They don't know their communication styles and all that sort of stuff, and it becomes really, really tough to really kind of get the team coming.
[00:15:59] Speaker B: Yeah.
What was also interesting is we didn't interface with them too often because it's mostly like the mobile app development of the database and that whole focus. But we did actually find a critical bug in the logic of the programs on the vehicle itself and then ended up like, you know, we initially thought it was with an r code. Then we found we were giving the proper commands, and everything was there, and it was actually in the logic and the chip in the car. And so worked with the, you know, the engineering teams in Detroit that were. That were part of that programming that so got access to them. Right. So, you know, sometimes that's a hard thing to do, is to say there's a bug in someone else's stuff and we need the right, appropriate engineers to help look at this. But they got us all together.
[00:16:50] Speaker A: Well, switching gears from teams and how well composed or not well composed they are. Let's talk about deployment and pipeline and process around that.
What's a team that you've seen that really had things together in terms of a site that was easy to deploy to, it was reliable to deploy to. Like, there wasn't a lot of deployment related outages. So the testing was the right amount of testing to be able to get stuff done?
[00:17:21] Speaker B: Sure.
Yeah, I'll start on that one, Dwayne. So there's a couple of different projects. I've liked this paradigm, you know, from the get go. If you can implement, it's not trivial to get to that point where, you know, your CI CD pipeline is. Is tuned and. And, you know, well oiled is pointed the right environments. Everyone knows how to use it. There's. There's knowledge around how it works and the shortcomings and then as well as the training of the development teams, and everyone's comfortable with what branches we're using, where we check things in. But there's been a couple different projects that have been on. The one with the World bank, that Ford project, and a couple different other ones that we instrumented, where it really was. The development process was streamlined in the fact that when you check in your code to certain branches, it would automatically trigger deployment processes. And a lot of people do that now. But again, go back ten years, that was not the norm.
And one of the other things that I always was impressed with that I haven't actually seen implemented really well across the board at a lot of different projects, a lot of different companies was commit messaging, enforcement rules, where to commit code. You had to tie it to a story, it had to be formatted properly.
I think that tools like fisheye, I believe in that case, but there's a couple of different tools there. But it would auto tag your commits, it would link to that story. You could look through your GitHub repository and drill directly into that Jira story, see who the people were that integrated with it, and then soup the nuts. The whole CI CD platform was, was running all the deployments and or another thing that a lot of companies don't get to, that. We've always stressed that it is clearly best practice, but it's sometimes hard to do. And we're still dealing with some customers right now that are not in this case, but promoting your builds up instead of rebuilding per environment. So it's obviously, you can rebuild dev lots of times and maybe even in the QA, but that release package that you're testing on, you should never rebuild that release package. When you're going to the production environment, there can be something that's added to it that you're not sure someone might have cherry picked something out of it. It's not necessarily the exact same package that you're releasing that you tested on. So you don't have that 100% certainty that the same code that you validated in your, your nonprofit UAT environments is the same thing that we're releasing in production if you don't just promote packages versus rebuild them.
[00:20:24] Speaker A: Yeah, that's something that we've been, I think the first time, I think the first blog post I even ever wrote ever was on that exact subject. And I think that was like in 2015 or something like that. And it's been the case since then that, yeah, you don't want to just keep building because you don't know. You don't know what you're putting into prod.
[00:20:44] Speaker B: You don't know what somebody get to that a lot of companies don't necessarily get to. But I would stress that it's worth the investment to focus on those CI CD pipelines and to really work out those kinks and understand that it will save cost and time down the road.
[00:21:03] Speaker A: Now in terms of ugly, any guys have stories you want to share on ways that that went south? Because sometimes people are like, yeah, yeah, but it's costly to make a pipeline, so maybe we'll do it later.
[00:21:17] Speaker B: I don't know if you have anything on that dwelling.
[00:21:19] Speaker C: Yeah, I've grown into plenty of customers that without berating them in their development cycles or anything like that, they would literally build the package locally on one developer's machine, upload it to package manager, click, install.
Very rarely actually was successful a lot. There was a lot of reverting back and then figuring out what went wrong with that particular build and then trying again and kind of to your point, Bryce, I've also been on the other side of that fence where you have automatic builds to your dev, your QA, your stage, then you manually build a tagged release out to. I mean this was 2013, so we weren't quite there where you're using an artifact repository, but we were using tagged builds for production. So you knew exactly what ended up in production. What everything. That to your point, having release notes that then drilled into the individual user stories or bug fixes that was also linked to the individual feature branches or bug fix branches so you could go back and see exactly what was changed, when it was changed. And it did give you somewhat of a chain of custody and make sure if you're deploying 1.58, that is exactly what you're deploying. Like I said before, I've seen the exact opposite where the lead devs building, naven building the package on their local and then physically uploading it into package manager. And it usually, usually bad idea. Basically.
[00:23:00] Speaker B: Honestly, before CI CD we were managing code in packages of DeSeq packages. Basically I had a whole file system that was just packages that were just for the version of the code.
I think that was even really git was just coming around and being adopted at the time. A lot of the initial projects were on either perforce or mavende or see if we even had a code repository on some of the very earlier ones I was on.
[00:23:32] Speaker C: Or subversion.
[00:23:34] Speaker B: Subversion, yeah, that's SBN for sure. I think honestly, Adobe is still on subversion for adobe.com and their whole code base.
[00:23:43] Speaker A: But I think you can get a little bit too overboard on that too because there's all these kind of north stars that you want to go for. Sure we don't want to go down again because somebody did something janky. And so we're going to put all this process in. There's the right amount of process to put in also because then you have an overly prescriptive, overly restrictive deployment pipeline where to get anything done. Let's just say you notice that one little thing is a QA fail in prod or something, or you have an issue. And the only way to get it back through to prod is whatever 2 hours worth of, you know, required pipeline steps to go this and it has to go through this check and then it has to go through this third party thing and then it has to go this and then it's finally at Dev and then you have to go this and then now it goes to stage, now it has to get an automated load test and then it like, like, and then that was just to change like a rewrite rule or something like that. So, so there's, there's, there's, there's the right size and then there's too far. And I think that there's judgment that has to be applied as to what. There's not a one size that fits all for everybody.
And if I may so I wouldn't.
[00:24:51] Speaker B: Call it a CI CD, but another awesome one that we did was it was our team at Rackspace, honestly. And Dwayne and Peter, one of my partners, put together a fantastic automation stack wherever not doing code deployments necessarily in this case, but doing AEM environment deployments. And so we really did a good job of standardizing, especially for all of the sites that we were supporting, one we knew. So we had a team supporting like 30 plus of the largest AEM sites worldwide. And initially when I first took over that team, it was all bespoke. And so we really quickly learned that we need to make these in a standardized way so that when you come into an environment and you're trying to support all of these various customers, you know where to look, you know, sort of what might go wrong. All the configurations are the same, all the locations are the same, all the versions, we know exactly what AEM version is being deployed out in that. And that whole, I think it was ansible tower and then whatever the open source version of that is ran and orchestrated, all of that. So doing the setup of the operating system, doing a configuration of the operating system, and there's some types of disk locks sometimes that we used to run into issues with installing AEM, upgrading AEM, securing AEM.
So where you're hardening it, turning off HTTP requests, installing SSL certs, that didn't used to be trivial back in the day. A lot of, you know, we take it for granted how mature the product is at this point. But there was a lot of undocumented, dark areas of AEM that once we figure out how to do it, we were then able to script it and automate it and repeat that over and over and over for all of our customers as well. As you build out Dev, now we're going to build out Uit in the same way, maybe instant size it up a little bit, build out production in the same way. And you have that certainty that you have like for like, for like. And so that automation stack, I was always really impressed with where we got to on that whole tool set.
[00:27:13] Speaker C: Yeah. And kind of on the other side of the fence of that. Even if you're not managing, you know, like we were, what was 120 plus a. M. Installations, you know, Dev Qa Uat for, you know, 30 something customers. Even if you're just a small shop and you're trying to deploy to your production website, if you follow infrastructure as code, you're not going to likely run into the scenario of, hey, one of our core functionalities relies on, let's say ACS Commons, and then you deploy your code to prod. And that package is missing because you didn't embed it in the Araven build. And Bob, who was supposed to install it, ended up taking a vacation day. But you know, two days before the deployment and didn't communicate or communication broke down that that package hadn't been installed yet. So it really. Yeah, yeah, it could really save a lot of headache down the road.
[00:28:09] Speaker B: Yeah, poor Bob. He didn't install that package.
[00:28:14] Speaker A: Now another one I want to talk to above and beyond. We're talking about deployments and updates and things like that. But then there's the topic of once it's out there, how do you keep an eye on it? And so that's the subject of monitoring and alerting and log analysis.
So where have you seen that go really well? And where have you seen the wrong amount of that taking place?
[00:28:46] Speaker B: Do I want this one first?
[00:28:48] Speaker C: Yeah, I'll take this one. So one of the times that it probably fits a couple different categories, it's the, probably the worst outage I've ever been a part of. We had a, let's say, consumer services customer, and they would send out newsletters for new offerings, not necessarily new products, but new things that you could, you know, go and do with this particular customer. And their interdepartmental communication was not the best. So their marketing team had sent out a newsletter that contained UTM tracking codes that hadn't been communicated to the operations team, which was us at the time, and hadn't been communicated to the developers. So nobody on the operations end of things new to ignore those parameters at the dispatcher level because your front end code is what's really going to reach out to your analytics engine and say, hey, x amount of people came in with this UTM tracking code. So because there was a lack of communication there, we ended up getting an alert. Well, we got an alert that their publishers were starting to run out of resources and basically get ready to fall over. But at the same time, the customer contacted us and they're, hey, you know, we're getting reports that the site is really slow to use for the new newsletter content we sent out. You know, can you guys take a look at it? And at the time, we didn't have any kind of log aggregation per se, but we knew what to look for. Typically, when your publisher is getting hit by a lot of requests, it's usually query strings that you didn't account for either within the code itself. If you actually have to use those query strings or you didn't account for ignoring them at the dispatcher. In this case, it was UTM codes. And so every request for this newsletter content was coming in and then hitting the backend publisher. So it kind of fits both categories. One of the worst outages basically was customer calls and then kind of fits the other category of. Even though we didn't have log management, we knew what to look for in the logs because we had experienced this across a couple of different customers. Now, that may not be the case if, let's say you're the ops guy at Megacorp, you know, and you don't really know to look for that, right? Because you may not have had the experience of customer a doing the same thing, you know, a year and a half ago or six months ago. And so that really was one of the times that being aware of what typically is in your logs and what is an anomaly really helped out and got them back up on their feet within maybe half hour, however long it took to modify the dispatcher config and then reload the configuration. And then we immediately started seeing the publisher performance coming up and the resource utilization going back down.
[00:31:47] Speaker A: Good.
[00:31:50] Speaker B: Makes sense.
[00:31:51] Speaker A: Sorry, I had an Internet hiccup there, but here we are.
But to your point there too, in terms of being able to pin down what's an anomaly and what's regular, that's where aggregation is so important. And that's where I've seen this happen so many times, where you say, hey, it's mostly good, or if we need something, we can download some logs.
That basically eliminates your ability to tell whether or not something was an anomaly or whether or not something is isolated. And that sometimes that is everything. So you're like, oh yeah, I always see those in the logs. Don't worry about that. It's always in the logs. Well, is it, did it really, you know, did it start during. Only this time? So. Cause there's some times that I said, hey, I finally got my dashboard together and guess what? Did you know that you, that the whole website was hard down for 8 hours last night? You didn't because you thought Akamai was picking it up. But in reality, any non cache response, which is this whole subsection of the site, was actually hard down the whole night, and your monitoring wasn't catching that. So a lot of times too, you're not going to get that data unless you've got enough gear that's looking for that and is testing for all those different eventualities.
[00:33:07] Speaker B: Yep.
So best dashboard and monitoring that I've seen, and the worst was same company go back to my rackspace days. So we had this tool, it was CA's interscope, and because of some projects like Dwayne's talking about there another similar website. So in this case it was Fox Sports.
They had a cash hit ratio of 3%.
So needless to say. And they scaled up all their publisher tiers, they had a bunch of dispatcher tiers, and there was a whole, that architecture was very bespoke, very custom. We really actually inherited it, but made a lot of changes and a lot of improvements there over time. But because of that, and because all the issues that they were having, we, we really put together what I felt was the best monitoring, alerting, dashboarding I've really ever seen in my career. And I could, you know, because we were a 24/7 you know, 365 support team. This was not just like me. And then my aim architects or engineers or anything that's looking at this. This is a wider team of site reliability engineers and 50 plus other people that are really Java and Linux experts, but not as deep of an expertise in AEM, for instance, they need to look at it. And so we had this dashboard that for every customer I could go in, and it was just literally like I love to get to a single pane of glass that had all of the publisher threads, all the, the author threads that I felt were important, tiered into. These are the most critical pieces for this customer. And it's all right here in one view. It's all aggregated. The charts were fantastic. The alerting, the monitoring, everything worked really, really well. We did a lot of work to your point, Dwayne, in log analysis, in log aggregation, where certain strings that we knew if they popped up, stuff's going to go south pretty quickly, all those eventually would trigger alerts and or it was a net new one, we knew exactly how to add that to our alert monitor. There same company, worst monitoring experience.
The business made a decision that they wanted to go with a cheaper toolset, and so they canceled that license and basically told us we had to redo everything that we had just spent the past two years doing.
And so, no slight on new relic, but we switched over to new relic and we never ever got to a point where I felt comfortable that I could come in and I could see, I knew exactly the state everyone was in. Never really got alerting going, never got any of the monitoring, any of the log aggregation, definitely no good dashboards or anything like that. And sadly, then the year after that, they switched tools again. To me, that was such a lesson learned in the fact that the decision was to reduce costs. They had determined that the current license was more expensive than the other toolset, but they clearly did not make a decision based on the cost to re implement everything that we've done, plus the time sink already in there and the cost that they put into it, the investment to get to a point where everything was just, the trains were running on time, and now all of a sudden we're going to go to a whole new platform yet to redo everything again. We never got there. And then clearly the team sort of got a little jaded too, because then they changed tools again the subsequent year. So you never got to a point where why would we put the investment in this when they might just yank the rug out from underneath us again? So it was a terrible experience for quite a little bit there because there was essentially no monitoring, even though we had a monitoring tool.
[00:37:25] Speaker A: Well, that's the thing too, is that usually for a single company, it's a lot easier to do some cost estimation of what's an outage worth? Like, so I'm down for ten minutes or I'm down for an hour, then what's that worth the company? Because a lot of times there is a number that you can put on that in terms of either. Either it's lost revenue because it's an ecommerce site, or it's lost.
If the brand is impacted by that, then usually put a number on that too. But if it's across a larger number of customers, all at the same time, you're like, good. Well, is it worth losing a customer? Because. Because now all of a sudden you didn't see it coming. It went down and you did not see it coming.
Because that's always the goal, is that it's not just. This is the other question that I run into with customers saying that it's okay to just rely on front end monitoring. Because front end monitoring just tells you if it's already slow or already down. And you should have caught it first. You should have caught it when it was starting to look like it was going to go down. Because the ideal is you're not down. The ideal is it's always humming, so you gotta catch it first. And a lot of times, especially with.
[00:38:29] Speaker B: AEM, proactive versus reactive.
[00:38:31] Speaker A: Yes. Cause that's a lot of times too, is like a lot of times you get an Am instance that spins out of control right the moment that it started to spin out of control. A lot of times it's like 8 hours before it actually died. And this even applies to am as a cloud service. I had a very recent, very recent experience with this where cloud service was an auto scaling. Should have auto scale. Did not, was nothing. Uh, started dying. And it was dying repeatedly, hours and hours before it actually died. Uh, no monitoring caught it because it actually hadn't failed yet. But the cpu was rising on a container and it Rose, Rose, Rose, rose until it just died. Didn't auto scale and then actually know that this, the service was taken out.
When should you have acted? You should have acted 8 hours before it died. Not with. Not when the front end monitor finally said it died.
So it's not having. That is kind of huge.
Well, let's switch gears a little bit then now too. So how about. So when it comes to planning of a migration. Cause it feels like, I don't know, for all of us like that, especially before the age of when stuff started to go, cloud service seemed like always, almost everybody's project was somewhere in the middle of a migration. So. And that's still, as consultants, a lot of what we end up getting into is somebody going from one version to another. They're going from on premise to cloud, cloud back to on premise. It's happening as well off onto edge delivery. There's all these things to plan with respect to a migration. What's one where you saw things, you saw estimation and, and team sizing and so forth, done really well from a project planning perspective?
[00:40:20] Speaker C: Yeah, I could take that. So I saw that, you know, back with the double edged sword project I was talking about earlier, the team, it didn't really apply in this context because they were coming from a separate content management system altogether. But the project just migrating from that system to the, the current AEM system they were looking to go on to took better part of, I want to say, a year and a half, just because of all the different moving parts of everything. And so during those times, more service packs come out. So with trying not to put them on a platform that is already outdated by the time they launch on it, we already incorporated into the development cycles. Hey, here's the release notes for the service pack. Is that going to impact any of the functionality on the site? No. All right, let's get that in the dev. We'll let it simmer in dev for a little while as we continuously build to it. See, you know, via automated testing and physical QA testers. Let's see, you know, does it affect any of the functionality of the site and then promote that up through, you know, Dev, QA, Uat to an eventual, where they're, they are on the latest service pack in production and it's not this hair on fire, it's already six months old. They're releasing another service pack. We got to play catch up with whatever's the latest. We had already incorporated that into the migration plan. So there was time, there was resources to address that before ever even going live. And I've seen the exact opposite of that. I had a customer realize that this was fairly recently. They run still on 6.21 of the latest service packs for that, but it's unsupported. So they had come to us and said, hey, spin up 6.5 development environment. And they deployed their code, deployed some test content, it totally broke. And they come back and say, hey, spin down the environment. We don't have the development resources to even address this migration because we didn't account for these things. And the major differences between the platforms themselves. Uh, because as you guys know, we didn't recommend at the time moving from, you know, such a large jump from six two to six five without actually doing the homework and, you know, in place. Migration was already out of the question because it was too many jumps to get to the final end destination.
[00:42:51] Speaker B: Six. Pretty broken as well, I believe.
[00:42:54] Speaker C: Yeah, even, even, you know, when it was new, it, it had its quirks, but yeah, that was probably one of the worst planned migrations I had seen versus the other one, which, even though it was years ago, I think they were moving to six one or six two. We had already incorporated that into the development lifecycle. Right. That, hey, this project is going to take so long. We're going to have to address the service pack elephant in the room.
[00:43:22] Speaker A: I think part of that part. So, so for me too, on that planning of a migration and executing a migration, a lot of times you want to be able to say, hey, this thing just came out. I want to do this thing, and then you want to do it. But with a big website, there's so many dependencies, there's people to get, there's things that you weren't thinking of and all that kind of stuff. And I would say probably of all the migrations that I did, just one I'm thinking of right now is medical devices company. And the upgrade that was done was preceded by a whole discovery project where we looked at all the dependencies and so forth and looks a good, well, what, here's the best practices analyzer report and here's what we got to do. And there's a bunch of this stuff. So, okay, so here's how big the software development project is going to be.
Capacity planning, who are the people, all that sort of stuff. And then rolling into an actual implementation with a fully defined set of requirements that came from enough time spent on it. And then you've got high quality project manager, high quality person doing customer relations and so forth. So that the tech guys are just focusing on tech.
I don't think it can be really overstated how important that is. Like, all of us on this call right now have, have worn the client relations and project manager hat at the same time as trying to be a technical architect and a developer and a sysadmin. And that is, that's a hard set of context switches to be doing on a regular basis.
That's just rough. So having somebody do that is like, that's gold. I don't know that a good project manager knows exactly how much that they really bring to the table, but it's big.
[00:45:02] Speaker B: It is big. It's, you know, I've worked with a lot of different project managers, good, the bad, the ugly and the good ones are worth their weight in gold. And they, they keep things rolling. They ask the right questions.
They have the foresight to see bottlenecks down the road. Like, you know, if this doesn't get done, that's going to delay all these other steps down the road. And you really have to sort of understand how the puzzles put together to put those pieces together. And it's not trivial.
And usually, obviously, I always find the best teams are really strong architect and project manager that are really good, really good partnership relationship.
I'm not going to try to name names on some of the poor projects on my end, but poorly planned early on in my career.
Just in regards to planning itself, one of the best rules of thumbs that I've used that actually, I don't know why it works, but it works deep, decently well, is put it all down, estimate it out, do your best effort on an estimation of how you really think things are going to take from the tech perspective, whatever, and then add 40%.
Yeah, they're just, you just, there's so much either churn or unknowns or delays on waiting on ancillary teams or delays on decisions from leadership or, you know, whatever it is that you can never plan. And even if you have the most perfect project plan, it's shiny, has everything in the steps, it's down to the second, it doesn't matter. The reality is it's going to be slightly different, if not wholly different from that plan.
And I think that a lot of teams, too.
So the plan can be a living, breathing document, right.
Don't be afraid to change it. Once you get into it, the facts change.
I've seen that done really well and I've seen that done poorly, where they point to the plan and they said, no, the deadline was tomorrow, so launch it. And it's like it's not ready. But we can fool ourselves that we put this plan down six months ago and maybe it's going to take us seven months instead of six.
Make sure it's ready before we're launching it, depending on the use case. But, you know, some of the worst projects didn't even have a plan, I'd say that I've been on and, yeah, typically very clear.
And sometimes you get into those quagmires and you don't even know how to get out of them because there just is not a plan.
I would say some of, some of the, you know, you preface as with ugly. So one of the worst ones I saw was really more like a holistic project where they just did not future think at all.
We came in and it wasn't an AM project, actually. So EMC's documentum and a very large, very large tax accounting firm, you have pretty much four to choose from, right?
One of those we came in, we're doing their tax file system for them. When we came into the project, it had already been going on for two or three years, and it was $15 million, and they were scrapping it all. And the reason was because the product and the solution that they had chosen, which by the time they made that decision, it should have been obvious, but it wasn't. It was really a transition period. A lot of different things. The whole tool, the previous paradigm, was an application that got deployed on every single employee's workstation, and instead they were going to more of a web based, sort of a SaaS model type system. And the whole platform and everything around it was clearly a big industry shift. I was changing. But they spent years developing this tool that was no longer going to be supported, and EMC had made that clear. No longer and be supported, and really was a terrible paradigm that worked before. We had a lot of very robust websites where you could just hook in and be a SaaS service. But, yeah, so they went, I think it was two or three years, $50 million, and decided to start from scratch.
Only a company like that, that sort of mints money, you can get away with those types of decisions and. And not everyone got canned and whatnot, but, yeah. So I think that's definitely one of the ugliest I've seen. I've really worked well with, or really been on some great projects with great project managers.
There's actually a lady out of bounties is one of the best I've ever worked with, and she just really did a good job of making sure the plan was updated, that there was dependencies built in, that you had, you know, decent estimations on what we could estimate out, and could estimate that whole project plan out, and really ran a tight ship in regards to, you know, stand ups and sprint planning and retrospectives and any of those things that can be just boring, mundane things. But if you do it right, if you're not spending too much time on it, you can really get a lot out of those, those meetings, and also just even on daily, you know, status stand ups, they order your blockers. What are you doing today? What'd you do yesterday?
Ran a tight ship so that it was bang, bang, bang. We got it done. No one got bored.
Most that everyone would be like a half hour, but the team that you've done in 15 minutes and everyone could get onto their work. And because of that, it wasn't a burden to people. Where I've seen a lot of projects where the stand up just, you know, you get into a lot of solutioning. There's ten people in there, but only two are really doing something. And everyone's sort of checked out. You never really get to the true status of those things. So, yeah, I think good.
[00:51:17] Speaker A: Yeah, I think for those, it's part of it there is recognizing that not everybody's the same and not everybody is. The team isn't just an amorphous blob of people, but people have jobs and people are good at certain things, and you got to let the guys who are really good at those things do those things. But if you're trying to solution on something, you need a couple of architects there, and then the developer is concerned. You don't need every single last person, including the designers and all. It's just, it's a waste of time.
[00:51:44] Speaker B: Some people have done it before, too. Yep. You know, I think that's where we come in a lot. We've done, you know, you know, tons of migrations. I think it could be in the hundreds at this point collectively as a team.
And, you know, we're able to use all of those former project plans. I think one thing that we do really well with customers is sit down with them for a week and go through all their requirements, because that's another thing that without the experts in a lot of companies don't think through a lot of those requirements. What are our objectives here? What are our load time objectives? What are these key performing indicators that we know that we were successful? At the end of the day, also get all the subject matter experts in a room and ask them what they need the tool to do. A lot of companies don't really do the due diligence up front to ask the wider teams that are going to be using this tool or this platform, what is it that you're getting out of it? What does the current state not get you?
What are you hoping to add in the future and have those conversations ahead of time while going into the planning portion of it?
[00:52:58] Speaker A: Absolutely. Absolutely. I think that if I was going to try to put a bow on, are we old? I don't think we're old, but what have we gotten from experience and what is the value that that experience can bring?
Is being able to establish requirements before a project. And that to me, that is the essence of an excellent project and what is going to be a successful project because you take something like, oh, we didn't really plan for any users in China, but we've got users in China. But hold on, we bought this whole other thing that just doesn't even work there and we already signed a three year contract with them and they don't even work in China.
[00:53:39] Speaker B: Like your topic. But. Yes, exactly.
[00:53:42] Speaker A: But that's an example of know. Yeah, you could have found that out if you'd adequately done your requirements work beforehand, you'd gotten down to the bottom of what needs to be done and so forth. And I think that that also comes down to like, why? Why I like being experienced yet, you know, small and nimble and so forth, is because a lot of times what these projects get driven by some SVP got a really good, really slick PowerPoint from somebody or went to a conference and he's super pumped about something. And you got to say, I get that the sales guy told you a great story, but we need to figure out if that's the right tool for you or that's the right way for you. And I might be making the sales guy mad with what I tell you, but sorry about that. I'm trying to look out for you.
[00:54:30] Speaker C: And one of the things that I've taken away from some of these really bad migration plans, and some of them haven't been super terrible. I mean, they almost got us completely through the migration. I had a banking customer, you guys might be aware of the customer, and they were moving from six four to six five. Well, we had already migrated all the content, their code, everything was migrating. We were on target for the actual go live date, and I want to say it was about a week before the go live date. They brought in one of their subject matter experts on the requirements are not requirements, but on the regulations that were applied to them. And come to find out that all the content was showing up in AEM as being authored as admin with the last modified date of when we ran the migration script. So we had to scrap all of that because they had to preserve the metadata as is, from 6.4 to 6.5. So we had to go back and basically kind of reinvent the wheel around CRX to Oak because that would preserve essentially every property of the node when you brought it from one AEM instance to the next. So knowing that in the beginning it would have been a simple yes or no question, do you have to preserve all metadata. Knowing that in the beginning would have saved a lot of headache and a lot of wasted time getting almost to the finish line. You're a block away, you're running a marathon. You see the finish line, and then next thing you know, you got to go back to the beginning and basically start from scratch. Now, it would have been worse had it not just been the content. That was something that once we developed a process around CRX two oak, we were able to execute that within a couple work days. But still, it delayed the project by about a week and a half, two weeks, while we went back to the drawing board and figured out, hey, how are we going to get the content from this instance over to this instance without modifying it at all?
[00:56:33] Speaker B: Yeah, Eddie makes a lot of sense, for sure.
[00:56:40] Speaker A: I think that's all the time we have for battle stories at this point. But. But, man, it's been great. Now I've got. It's all these old memories you bring back, but, yeah, but, but thanks, you guys, for coming on and, yeah, we'll talk again soon.
[00:56:57] Speaker C: Yeah, thanks for having us on. Try not to have too many nightmares tonight.
[00:57:04] Speaker B: There's a couple.
[00:57:08] Speaker A: Yeah, there's a couple ones like that, too. Specifically, Bryce, where I remember you like. Yeah, I actually haven't slept all day because. Yeah, yeah. So. Because, for example, we haven't even touched on the subject of backups and that.
Yeah, that could be a whole other, you know, PTSD inducing podcast.
[00:57:26] Speaker B: I think we should. There's some good stories on that for sure.
Awesome.
[00:57:32] Speaker A: All right, good. All right, well, see you guys.
[00:57:35] Speaker B: See you next time.
[00:57:35] Speaker C: Have a good one.