Online bidding is getting popular in new gen e-commerce web sites. However, human bidders on the site are becoming increasingly frustrated with their inability to win auctions vs. their software-controlled counterparts. As a result, usage from the site’s core customer base is plummeting. Bidding robots like “BidRobot” and “AuctionSniper” are pieces of software which are programmed to bid on auctions by humans, in their stead and are capable of bidding on multiple auctions simultaneously. Humans are not capable to monitor these auctions continuously, whereas these bots not only monitor these auctions nonstop but can put in last minute bid which gives them a competitive advantage over their human counterparts. This poses a problem for the web sites which want to filter out these bots to make the process of bidding fairer.
In this project, different methods are explored to efficiently identify online auction bids that are placed by “robots”, helping the site owners easily flag these users for removal from their site to prevent unfair auction activity.
Due to the privacy concerns of the users most of the data was obfuscated hence it’s inappropriate to use the existing features. Due to the dearth of actionable features, new features have to be created which could help in differentiating between humans and bots. To handle the larger volume of the data a scalable algorithm called Xgboost has been used. It uses distributed computing to reduce the processing time and provide faster results. One of the striking features of Xgboost algorithm is that an importance graph can be plotted which renders the amount of gain each feature is giving which can then be used to optimize the features used.
The predictions made by the algorithm are compared against the labels in the test dataset, to evaluate the accuracy metric. This metric is used to measure the potency of the algorithm used. The effectiveness of created features are plotted in a histogram which is deployed onto server using interactive platform named ShinyApp.
Motivation behind doing this project
Online bidding is a service in which users sell or bid for products via the Internet. It facilitates online activities between buyers and sellers in different locations or geographical areas. A bid site is essentially an online auction for services or products. Clients post a product on a bid site and independent contractors with expertise in the project’s field then bid on the project. Bidding robots like “BidRobot” and “AuctionSniper” are some examples of software resembling bots. Auction sniping is the practice of placing a bid which is likely to exceed the current highest bid (which may be hidden) as late as possible. Usually seconds before the end of the auction, the bid is placed giving other genuine bidders no time to outbid the sniper. As a result, human bidders on the site are becoming increasingly frustrated with their inability to win auctions over Robots. Hence, usage from the site’s core customer base is decreasing. Humans are not capable to monitor these auctions continuously, whereas these automatons not only monitor these auctions nonstop but can put in last minute bid which gives them a competitive advantage over their human counterparts. This poses a problem for the web sites which want to ban these bots to make the process of bidding fairer.
Case Study: Online Auction Fraud
Online/Internet Auction Fraud refers to any type of fraud that uses email, websites, chat rooms or any Internet related means to present fraudulent solicitations to prospective victims, to conduct fraudulent transactions. Auction fraud is one of the fastest-growing crimes on the Internet. It can take many forms, most common types of auction fraud involve either a seller failing to send an item, or sending an item that is significantly different from what was promised in the auction listing. Multiple bidding which is highly rated auction fraud in recent times is used to buy an item at a lower price. This occurs when a buyer places multiple bids (some high and some low) on the same item using different aliases. The multiple high bids by the same buyer cause the price to escalate, which scares of other potential buyers from bidding. Then, in the last few minutes of the auction, the same buyer withdraws their high bids, only to purchase the item with their much lower bid. These multiple bids are usually done with the help of bots and the process is referred as Auction Sniping. It is defined as:
“a technique where a user in a timed online auction waits until the time limit is nearly expired before entering a bid. The other participants in the auction do not have enough time to enter a counterbid, allowing the sniper to win the auction. A sniper may enter the bids manually using a software package (bots)”
In eBay, 65% of sessions are bot sessions. Bots are mainly used for
- Manipulating feedback forms to get abundant positive reviews
- Placing bids faster than humans, raising the bidding price or winning the auction
- Hacking a trustworthy seller’s account and starting a bogus auction.
With advancements in technology happening every day and Artificial Intelligence becoming more integrated into everyday society the line between human intelligence and computer intelligence continually becomes slimmer and slimmer. Many argue that this is good for society, as we can automate some menial tasks thus saving people time and money. There are, however, also consequences to these technological advancements as well. People have begun using these robots to get advantages over others as it becomes harder and harder to distinguish between human and robot behavior. One auction site in particular has been having issues with robots as their human customers feel like the robots have an unfair advantage due to the faster reaction speeds and processing speeds. To combat this they have set up systems in order to try to differentiate their robot from their human customers. Differentiation has proven a problem for this particular site, though, as robot behavior almost identically mimics the behavior of humans. All of their predictors proved not accurate enough so they went to Kaggle with the issue. They uploaded a good amount of data on Kaggle for other programmers to utilize and work on with the intentions that someone would create a more accurate predictor and help solve their issues of Human or Robot.
- The goal is to predict if a bidder is a human or a robot based on his history of bids on an online auction platform.
- If such bot bidders are identified, the administrators of online auction sites such as eBay, WebStore, OnlineAuction, Overstock etc. can block the bot bids.
- In this project, different methods are explored to efficiently identify online auction bids that are placed by “robots”, helping the site owners easily flag these users for removal from their site to prevent unfair auction activity.
Features we created to deal with the dataset
- Total_bids – The total number of bids made by a bidder (Human/Bot). This particular feature considers the entire bid datatset and calculates the frequency of bids made by each bidder. Each bidder is associated with a bidder_id, through outcome we recognize who the bidder is. Hence calculating the occurrences of each bidder_id gives us the total count.
- Country_count – Total number of countries from which the bids are made by a particular Human/Bot. Every country that is included in the bids dataset has both human & bot bidders. As stated above each bidder_id is associated with an outcome indicating human or bot. All the bidders from each country is considering and using outcomes they are segregated as bots and humans.
- Device_count – Various bidders have used one or many devices for bidding for an auction. Thus this feature gives us the number of devices used by a Human/Bot to make bids (Hypothesis – number of devices used by a human has to be more as software bots do not need constant supervision)
- Auction_count – With the help of bidder_id, the frequency of it appearing in the auctions are considered which gives the auctions participated by each bidder. Using outcome the bidder will be known as human or bot. Thus auction_count specifies the number of auctions in which the humans or bots participated.
- Bids_per_auc – Each auction is associated with an ID. The number of bidders associated for a particular auction is calculated, now finding out the number of the times particular bidder has bid that auction gives us this feature. (Hypothesis – the bids made by the bots has to be less as compared to humans as they are programmed only to bid when a certain number of increments )
- Perc_auc – Using the knowledge from above features, the total number of auctions won by a bidder to the total number of auctions participated by the bidder gives us the percentage of bidder’s that have participated in auction.
- Bids_after – Usually an auction is won by a bidder who bids at the nth moment. Keeping this perspective in mind, statistical measure unit (median) of the time is calculated. The median is calculated for each auction i.e., 50% time elapsed since the auction has started. During this time period the numbers of bids made are considered; which defines the stated feature.
- Bids_before – Similar to the above feature, the median of the time elapsed since the auction has started to till it reaches the mid auction period is calculated. During that time period the bids bidded by the bidders are taken into consideration. Hypothesis -> The bots would make majority of their bids at the end of the auction as they have the upper hand of last minute bidding).
This features we created were passed to XGBoost Algorithm using R tool. The feature importance graph can be created using XGBoost algorithm. The depth of a feature used as a decision node in a tree can be used to assess the relative importance of that feature with respect to the predictability of the target variable. Features used at the top of the tree are used contribute to the final prediction decision of a larger fraction of the input samples. The expected fraction of the samples they contribute to can thus be used as an estimate of the relative importance of the features. The graphs obtained here as follows:
1. The below graph shows that given features shows poor performance over gain, which indicates that new features must be used.
- The screenshots are obtained using shinyapps which is user interactive where-in plots can be obtained for selected features. The drop down menu shown above lists the created as well as existing features. The dimension 1 & dimension 2 are X and Y axis respectively.
3. Histogram showing the results in shiny app
As community outreach program, we went to a institute named Indian Institute of Specialized Education(IISE) which is located in Rajajinagar,Bangalore. IISE is a private institute which teaches core subjects to 10th, 11th and 12th students . As a part of volunteering work they teach English, computer basics to under-previleged kids aged between 13-15 So we also actively took part in teaching English and Computer basics.
These activities are conducted for free by the organization for government school/PU students. On second saturday of every month , intrested students are gathered. We have been actively participating since August 2015.