Using streamstats we can put a number to how much higher a source count is to previous counts: If the source count was significantly higher than any previous source counts I would consider it anomalous. Using streamstats to get neighboring valuesĪs an alternative to MLTK, I use streamstats to mimic how I–as an analyst–investigate an alert.įor our example of a user being seen logging in from an anomalous number of sources, I would start by looking at historical source counts over the past 30 days. If identity data in Splunk for different types of users is high quality, reflects different usage patterns, and there are less than 1024 of them then MLTK may be the direction to go. Unfortunately, outside of editing config files and making sure you have enough processing power, the DensityFunction is limited to 1024 groupings and 100,000 events before it starts sampling data. This algorithm is meant to detect outliers in this kind of data. One of the included algorithms for anomaly detection is called DensityFunction. Splunk’s Machine Learning Toolkit (MLTK) adds machine learning capabilities to Splunk. It’s getting even worse because more events aren’t getting buried by high counts during certain hours of day. Requiring more than 15 data points, there are 14,298 results. The distribution of source count is an exponential distribution: One example would be if we were looking for users logging in from an anomalous number of sources in an hour. This means more data equals more outliers equals more alerts. Standard deviation can be used to find outliers but a certain percentage of data will always be seen as outlier. In security contexts, user behavior is most often an exponential distribution, low values being commonly seen with high values being more rare. Using standard deviation to find outliers is generally recommended for data that is normally distributed. Standard deviation measures the amount of spread in a dataset using the value’s distance from the mean. I will also walk you through the use of streamstats to detect anomalies by calculating how far a numerical value is from its neighbors. In this tutorial we will consider different methods for anomaly detection, including standard deviation and MLTK. Standard deviation, however, isn’t always the best solution despite being commonly used. Detecting anomalies is a popular use case for Splunk.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |