User:LI AR/Books/Cracking the DataScience Interview
| The Wikimedia Foundation's book rendering service has been withdrawn. Please upload your Wikipedia book to one of the external rendering services. |
You can still create and edit a book design using the Book Creator and upload it to an external rendering service:
|
| This user book is a user-generated collection of Wikipedia articles that can be easily saved, rendered electronically, and ordered as a printed book. If you are the creator of this book and need help, see Help:Books (general tips) and WikiProject Wikipedia-Books (questions and assistance). Edit this book: Book Creator · Wikitext Order a printed copy from: PediaPress [ About ] [ Advanced ] [ FAQ ] [ Feedback ] [ Help ] [ WikiProject ] [ Recent Changes ] | ||||||||
Cracking the DataScience Interview
Basic Stuff To Know
- Generic pages
- Glossaire_de_l'exploration_de_données
- Big_data
- Inspired from books like:
- "A collection of Data Science Interview Questions Solved in Python and Spark vol I & II"
- "120 real data science interview questions"
- Tips / Known Limits of DS
- DataScience is (very) experimental (Andrew Ng): https://pbs.twimg.com/media/CBXshmjWgAAgLKa.jpg
- Overfitting
- Bias–variance_tradeoff / http://www.ritchieng.com/machinelearning-learning-curve/
- Sampling_bias
- Survivorship_bias
- Selection_bias
- Concept_drift
- Correlation_does_not_imply_causation
- Curse_of_dimensionality
- Machine Learning definition and types
- Artificial_intelligence
- List_of_machine_learning_concepts
- Machine_learning
- Data_mining
- Knowledge_extraction
- Knowledge_extraction#Knowledge_discovery
- Pattern_recognition
- Signal_processing
- Supervised_learning
- Semi-supervised_learning
- Unsupervised_learning
- Reinforcement_learning
- Online_machine_learning
- Incremental_learning
- Q-learning
- One-shot_learning / https://www.quora.com/What-is-zero-shot-learning
- Feature_learning
- Learning_to_rank
- Similarity_learning
- Biclustering
- Natural_language_processing
- Biomimetics
- Collective_intelligence
- Data_stream_mining
- Sequential_pattern_mining
- Clickstream
- Semantics
- Semantic_Web
- Speech_recognition
- Speech_synthesis
- Collaborative_filtering
- Competitions
- https://www.kaggle.com/
- https://www.datascience.net/fr/home/
- http://dreamchallenges.org/
- https://www.drivendata.org/competitions/
- https://www.testdome.com/tests/data-analysis-test/65
- http://www.crowdanalytix.com/
- https://www.topcoder.com/community/data-science/
- https://www.datasciencechallenge.org/
- http://tunedit.org/challenges
- https://datasciencebowl.com/competitions/
- https://www.innocentive.com/ar/challenge/browse
- http://tamids.tamu.edu/2018-tamids-data-science-competition/
- https://hackerearth.com
- https://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/
- http://www.kdnuggets.com/datasets/index.html
- https://aws.amazon.com/public-datasets/
- https://www.kaggle.com/datasets
- https://data.fivethirtyeight.com
- https://www.quandl.com/
- https://opendata.socrata.com/
- https://cloud.google.com/bigquery/public-data/
- https://github.com/BuzzFeedNews
- https://en.wikipedia.org/wiki/Wikipedia:Database_download
- http://mlr.cs.umass.edu/ml/datasets.html
- https://data.world/
- https://www.data.gov/
- https://www.data.gouv.fr/fr/
- https://data.worldbank.org/
- https://www.reddit.com/r/datasets/top/?sort=top&t=all
- http://academictorrents.com/browse.php?cat=6
- http://www.kdnuggets.com/2015/04/awesome-public-datasets-github.html
- http://www.kdnuggets.com/?s=datasets
- https://www.springboard.com/blog/free-public-data-sets-data-science-project/
- https://www.dataquest.io/blog/free-datasets-for-projects/
- https://github.com/awesomedata/awesome-public-datasets
- https://elitedatascience.com/datasets
- https://blog.journeyofanalytics.com/50-free-datasets-for-data-science-projects/
- https://www.datascienceweekly.org/data-science-resources/data-science-datasets
- Software
- http://www.databaseetl.com/data-mining-tools/
- IDEs / DS-GUI
- R
- (DS-GUI) :Rattle_GUI http://rattle.togaware.com/
- (IDE) :RStudio https://www.rstudio.com
- Python
- Java
- Online
- Paid Software
- (DS-GUI) :Minitab https://minitab.com/
- (DS-GUI) :Tableau_Software https://www.tableau.com/
- R
- R/Packages
- https://cran.r-project.org/
- https://cran.r-project.org/web/views/
- https://cran.r-project.org/web/views/MachineLearning.html
- https://cran.r-project.org/web/views/Bayesian.html
- https://cran.r-project.org/web/views/Cluster.html
- https://cran.r-project.org/web/views/NaturalLanguageProcessing.html
- https://cran.r-project.org/web/views/Survival.html
- https://cran.r-project.org/web/views/TimeSeries.html
- Python
- C++
- Alteryx
- https://www.alteryx.com/ [Commercial]
- Comparison
- DeepLearning
- GANs (Generative Adversial Networks)
- DataViz
- https://matplotlib.org/
- https://plot.ly/
- :GGobi http://www.ggobi.org/
- http://ggplot2.org/
- http://ggvis.rstudio.com/
- https://d3js.org/
- https://datascienceplus.com/creating-graphs-with-python-and-goopycharts/
- https://www.tableau.com/ [Commercial]
- http://bokeh.pydata.org/en/latest/ [Python]
- http://pyqtgraph.org/ [Python]
- https://uber.github.io/deck.gl [Uber's internal DataViz tool]
- http://rawgraphs.io/
- http://scidavis.sourceforge.net/
- http://home.gna.org/veusz/
- http://jwork.org/dmelt/
- Graphs
- GUI
- Data Manipulation
- Annotate examples: https://prodi.gy/
- Data_pre-processing
- Data_cleansing
- Data_reduction
- Data_wrangling
- Data_scrubbing
- Data_editing
- Data_scraping
- Data_curation
- Data_pre-processing
- Data_fusion
- Data_integration
- Data_binning
- Sanitization_(classified_information)
- Extract,_transform,_load
- Imputation_(statistics)
- Interpolation
- Outlier
- https://github.com/Quartz/bad-data-guide
- https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis
- Local_case-control_sampling#Imbalanced_datasets
- Sampling_(statistics)
- Sampling_(statistics)#Stratified_sampling
- Stratified_sampling
- Jackknife_resampling
- Oversampling_and_undersampling_in_data_analysis
- Oversampling_and_undersampling_in_data_analysis#SMOTE
- AdaBoost
- "Essay Why Most Published Research Findings Are False"
- "A Few Useful Things to Know about Machine Learning"
- Working with text
- Unicode_equivalence#Normalization
- URL_normalization
- Text_segmentation
- N-gram
- Tokenization_(lexical_analysis)
- Stemming
- Word2vec https://www.tensorflow.org/tutorials/word2vec
- https://google.github.io/seq2seq/
- NLP in Python
https://github.com/explosion/thinc
- Working with spatial data
- Spatial_data
- Trend_surface_analysis
- Variogram
- Geary's_C
- Moran's_I
- Spatial_descriptive_statistics#Ripley.27s_K_and_L_functions
- Signal processing
- Signal processing - Images
- Techniques for Feature/Attribute Selection/Dimensionality Reduction
- High-dimensional_statistics
- Dimensionality_reduction
- Factor_analysis
- Principal_component_analysis
- Independent_component_analysis
- Singular_value_decomposition
- Multidimensional_scaling
- T-distributed_stochastic_neighbor_embedding
- Autoencoder
- Deep_learning#Stacked_.28de-noising.29_auto-encoders
- Elastic_map
- Linear_discriminant_analysis
- Signal processing
- Working with spatial data
- Maths (Stats / Algebra)
- Inspiration for this section: https://github.com/soulmachine/machine-learning-cheat-sheet
- Pseudo-random_number_sampling
- Glossary_of_probability_and_statistics
- Bijection,_injection_and_surjection
- Mean
- Harmonic_mean
- Median
- Mode_(statistics)
- Range_(mathematics)
- Quartile
- Interquartile_range
- Variance
- Covariance
- Standard_deviation
- Collinearity#Usage_in_statistics_and_econometrics
- ANOVA
- ANCOVA
- MANOVA
- ANORVA
- Moving_average
- EWMA_chart
- Exponential_smoothing
- Autoregressive_model
- Autoregressive–moving-average_model
- Autoregressive_integrated_moving_average
- Autocorrelation
- Cross-correlation
- Entropy_in_thermodynamics_and_information_theory
- Moment_(mathematics)
- Residual
- Expected_value
- Likelihood_function
- Cumulative_distribution_function
- Probability
- Probability_mass_function
- Probability_density_function
- Prior_probability
- Prior_knowledge_for_pattern_recognition
- Permutation https://fr.wikipedia.org/wiki/Arrangement
- Combination https://fr.wikipedia.org/wiki/Combinaison_(math%C3%A9matiques)
- Dependent_and_independent_variables
- Independence_(probability_theory)
- Hoeffding's_inequality
- Pareto_efficiency
- Nash_equilibrium
- Pareto_principle
- Tensor
- Tensor_product
- Cross_product
- Taxicab_geometry
- Norm_(mathematics)#Euclidean_norm
- Lp_space
- Norm_(mathematics)
- Determinant
- Trace_(linear_algebra)
- Eigenvalues_and_eigenvectors
- Projection_(mathematics)
- Curvature
- Convolution
- Hadamard_product_(matrices)
- Kernel_(statistics)
- Radial_basis_function
- Logit
- Latent_variable
- Inference
- Statistical_inference
- Inductive_reasoning
- Deduction_and_induction
- Transduction_(machine_learning)
- Stochastic
- Stochastic_process
- Probability_theory
- Probability
- Posterior_probability
- Statistic
- Statistics
- Gaussian_noise
- Bayesian_inference
- Bayes_rule
- Bayes'_theorem
- Bayesian_network
- Naive_Bayes_spam_filtering
- Naive_Bayes_classifier
- Belief_propagation#Approximate_algorithm_for_general_graphs
- Loss_function
- Regularization_(mathematics)
- Normalization_(statistics)
- Quantile_normalization
- Nyström_method (+PCA)
- Preference_(economics)
- Delaunay_triangulation
- Neighbourhood_(mathematics)
- Genetic Algorithms
- Mutation_(genetic_algorithm)
- Crossover_(genetic_algorithm)
- Selection_(genetic_algorithm)
- Fitness_function
- Utility#Utility_functions
- SVM
- Neural Networks
- Rectifier_(neural_networks)
- Backpropagation
- Gradient
- Gradient_descent
- Stochastic_gradient_descent
- Gradient_boosting
- http://www.wildml.com/deep-learning-glossary/#gradient-clipping
- http://www.wildml.com/deep-learning-glossary/#batch-normalization
- http://www.wildml.com/deep-learning-glossary/#backpropagation
- http://www.wildml.com/deep-learning-glossary/#momentym
- http://www.wildml.com/deep-learning-glossary/#sgd
- https://visualstudiomagazine.com/articles/2015/07/01/variation-on-back-propagation.aspx
- Softmax is a "discriminant learning metric": examples for all classes!={i} help learn even for class {i} since sum of evaluations is forced to be 1 (the method creates a link in the evaluations of the classes)
- Sigmoid_function
- Hyperbolic_function#Tanh
- Dropout_(neural_networks)
- Radial_basis_function
- Hebbian_theory
- Signal processing
- Signal_processing
- Low-pass_filter
- High-pass_filter
- Energy_(signal_processing)
- Fast_Fourier_transform
- Wavelet
- Discrete_wavelet_transform
- Coherence_(signal_processing)
- Kalman_filter
- Time Series
- Time_series
- Decomposition_of_time_series
- Seasonal_adjustment
- Seasonality
- Frequency_domain
- Time_domain
- Spectral_density
- Games
- Distances
- Distance
- Euclidean_distance [dim1]
- Edit_distance
- Hamming_distance
- Manhattan_distance [dim1]
- Levenshtein_distance
- Needleman–Wunsch_algorithm
- Minkowski_distance [dim n == generalization]
- Mahalanobis_distance
- Canberra_distance
- Distance_correlation
- Angular_distance
- String_metric
- Jaro–Winkler_distance
- Jaccard_index
- Kendall_tau_distance
- Chebyshev_distance
- Tf–idf
- Neural_coding
- For graphs: http://blog.smola.org/post/33412570425
- https://fr.wikipedia.org/wiki/Algorithme_de_Needleman-Wunsch
- Clouds
- Hausdorff_distance [between clouds of points, a point and a cloud]
- Distance#Distances_between_sets_and_between_a_point_and_a_set
- Distributions
- Discrete_uniform_distribution
- Normal_distribution
- Bernoulli_distribution
- Binomial_distribution
- Poisson_distribution
- Chi-squared_distribution
- Log-normal_distribution
- Pareto_distribution
- Chi-squared_distribution
- Gibbs_distribution
- Weibull_distribution
- Gamma_distribution
- Beta_distribution
- Hypergeometric_distribution
- Dirac_delta_function
- https://ercim-news.ercim.eu/en107/special/robust-and-adaptive-methods-for-sequential-decision-making [Characterization of the simplicity of a distribution: BernsteinExponent+TsybakovMarginCondition]
- Evaluation
- Performance_indicator
- Mean_absolute_percentage_error
- Mean_absolute_scaled_error
- Symmetric_mean_absolute_percentage_error
- Regression-kriging
- https://www.kaggle.com/wiki/RootMeanSquaredLogarithmicError
- http://weka.sourceforge.net/packageMetaData/percentageErrorMetrics/index.html
- http://weka.sourceforge.net/packageMetaData/logarithmicErrorMetrics/index.html
- Information_gain_ratio
- Kullback–Leibler_divergence
- Gini_coefficient
- Pearson_correlation_coefficient
- Entropy
http://www.cbcb.umd.edu/~salzberg/docs/murthy_thesis/node15.html
- Akaike_information_criterion https://twitter.com/DataSciFact/status/963129411250933760
- Bayesian_information_criterion
- Brier_score == RMSE
- Structural_similarity
- Type_I_and_type_II_errors
- False_positive_rate
- False_coverage_rate
- False_discovery_rate
- Confusion_matrix
- Accuracy_and_precision
- Precision_and_recall
- F1_score
- Sensitivity_and_specificity
- Receiver_operating_characteristic
- Receiver_operating_characteristic#Area_under_the_curve
- Discounted_cumulative_gain
- Cross-validation_(statistics)
- Errors_and_residuals
- If residual is consistantly >0 or <0 on a range of the training set => the model has failed to capture something in the data or we use wrong type of model (e.g. linear reg on parabolic data; DataSkeptic/Heteroskedasticity)
- Clustering
- See also the Calinski-Harabasz Index: http://stats.stackexchange.com/questions/97429/intuition-behind-the-calinski-harabasz-index
- Others
- Working with Text
- Part_of_speech
- Semantic_similarity
- Tf–idf
- Cosine_similarity
- Okapi_BM25
- See also Mr Gomez page on Weka: http://www.esp.uem.es/jmgomez/tmweka/
- Named-entity_recognition
- Conditional_random_field
- Latent_Dirichlet_allocation
- Sentiment_analysis
- Web_mining
- Web_crawler
- Text_mining
- Document_classification
- Automatic_summarization
- Working with Images
- http://mirror.imagej.net/plugins/mexican-hat/index.html
- If your model seeks to penalize near misses, the Mexican hat function is a good choice.
- Working with concepts (Ontologies)
https://en.wikipedia.org/wiki/YAGO_%28database%29 http://wiki.dbpedia.org/ http://conceptnet.io/ http://cogcomp.org/Data/QA/QC/definition.html
- Visualization
- Data_visualization
- Exploratory_data_analysis
- List_of_graphical_methods
- Category:Statistical_charts_and_diagrams
- Statistical_graphics
- Visual_perception
- Heat_map
- Misleading_graph
- Pareto_chart
- Need to develop "critical thinking":
- (Statistical) tests
- A/B_testing
- Evaluating an hypothesis
- Statistical_power
- Statistical_hypothesis_testing
- P-value
- Student's_t-test
- Chi-squared_test
- Type_I_and_type_II_errors
- Detecting abrupt changes in time series
- Stationary_process
- Structural_break
- Chow_test
- Kruskal–Wallis_one-way_analysis_of_variance
- F-test
- F-statistics
- Pairwise_summation
- CUSUM
- MOSUM: https://cran.r-project.org/web/packages/strucchange/vignettes/strucchange-intro.pdf
- Time series / Chaos
- Machine Learning Techniques
- Statistical_classification
- One-class_classification
- Binary_classification
- Multiclass_classification
- Multi-label_classification
- Structured_prediction
- Cluster_analysis
- Elbow_method_(clustering)
- Nearest_neighbor_search#Approximate_nearest_neighbor
- Regression_analysis
- Linear_regression
- Logistic_regression
- Ridge_regression
- Kriging
- Multivariate_adaptive_regression_splines
- Association_rule_learning
- Apriori_algorithm
- Survival_analysis
- Monte_Carlo_method
- Monte_Carlo_algorithm
- Multinomial_logistic_regression
- Lasso_(statistics)
- Expectation–maximization_algorithm
- Markov_chain_Monte_Carlo
- Hidden_Markov_Models
- Viterbi_algorithm
- Convolutional_code
- Forward–backward_algorithm
- Markov_random_field
- Mean_field_theory
- Mean_field_particle_methods
- CART
- Decision_tree_learning
- Decision_tree
- Pruning_(decision_trees)
- ID3_algorithm
- C4.5_algorithm
- Random_forest
- Support_vector_machine
- Support_vector_machine#Support_vector_clustering_.28SVC.29
- Support_vector_machine#Regression
- Conditional_random_field
- Latent_semantic_analysis
- Genetic_algorithm
- Evolutionary_algorithm
- Evolutionary_computation
- Voronoi_diagram
- Local_outlier_factor
- Ordered_weighted_averaging_aggregation_operator
- Support_vector_machine
- Neural Networks
- History: http://www.chronicle.com/article/The-Believers/190147/
- The various types of NN as a picture: http://www.asimovinstitute.org/wp-content/uploads/2016/09/neuralnetworks.png
- Types_of_artificial_neural_networks
- Comparison_of_deep_learning_software/Resources
- Artificial_neural_network
- Perceptron
- Feedforward_neural_network
- Multilayer_perceptron
- Radial_basis_function_network
- Long_short-term_memory
- SNNS
- Time_delay_neural_network
- Recursive_neural_network
- Recurrent_neural_network
- Hopfield_network
- Content-addressable_memory
- Boltzmann_machine
- Self-organizing_map
- Learning_vector_quantization
- Long_short-term_memory
- Liquid_state_machine
- Autoassociative_memory
- Convolutional_neural_network
- Autoencoder
- Neuroevolution
- Neuroevolution_of_augmenting_topologies
- Deep_learning
- Deep_learning#Deep_neural_network_architectures
- Deep_belief_network
- Generative_adversarial_networks
- Signal Processing
- Fuzzy Logic
- Fuzzy_logic
- Inference_engine
- Fuzzy_logic
- Type-2_fuzzy_sets_and_systems
- T-norm_fuzzy_logics
- Adaptive_neuro_fuzzy_inference_system
- Fuzzy_control_system
- Working with spatial data
- Ensemble Techniques
- Ensemble Learning = Boosting, Bagging or Stacking: http://stats.stackexchange.com/questions/18891/bagging-boosting-and-stacking-in-machine-learning#19053
- Applying Bagging should help reduce variance and overfitting.
- Applications
- Bayesian_spam_filtering
- Root_cause_analysis
- Inpainting
- Experimentation framework
- Goal: test various parameters on various algorithms to determine the best model(s)
- Weka's "Experimenter" mode: http://weka.sourceforge.net/manuals/ExplorerGuide.pdf
- AutoWeka: http://www.cs.ubc.ca/labs/beta/Projects/autoweka/
- R::mlrMBO: https://github.com/mlr-org/mlrMBO
- Coding / Exposing API to the rest of the application
- Microservices
- Map-Reduce framework
- Scrapping
- Storage
- Apache_Hadoop#HDFS https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
- Apache_HBase http://hbase.apache.org/
- Apache_Hive https://hive.apache.org/
- Transfers - to/from RelationalDB
- Transfers - serialization/streaming
- Storage - In memory
- Admin
- Apache_ZooKeeper http://zookeeper.apache.org/
- Apache_Cassandra https://cassandra.apache.org
- Ambari http://ambari.apache.org/
- Apache_Oozie http://oozie.apache.org/
- Programming
- ML
- Working with text
- Working with text - Data Viz
- Small/Micro Data
- Multi-Agent Systems
- Agent-based_model
- Multi-agent_system
- Agent-oriented_software_engineering
- https://www.researchgate.net/publication/266182243_Agent_Groupe_Role_et_Service_Un_modele_organisationnel_pour_les_systemes_multi-agents_ouverts [JFerber: AGR Methodology]
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.7968&rep=rep1&type=pdf [YDemazeau: Vowels Methodology]
- Quantum Machine Learning
- Quantum_machine_learning
- Quantum_tunnelling
- Quantum_annealing
- Adiabatic_quantum_computation
- Resources
- http://www.wildml.com/deep-learning-glossary/
- http://deeplearning.net
- https://www.datacamp.com
- http://www.learnpython.org
- https://www.codecademy.com/learn/python
- http://www.dataschool.io/how-to-get-better-at-data-science/
- http://simplystatistics.org/2015/03/17/data-science-done-well-looks-easy-and-that-is-a-big-problem-for-data-scientists/
- Social network for DataScientists
- Books
https://github.com/janishar/mit-deep-learning-book-pdf
- http://neuralnetworksanddeeplearning.com/
- http://deeplearning.net/tutorial/deeplearning.pdf
- https://cours.etsmtl.ca/sys843/REFS/Books/ebook_Haykin09.pdf
- http://hagan.ecen.ceat.okstate.edu/nnd.html
- http://www.dkriesel.com/en/science/neural_networks
- https://torres.ai/research-teaching/tensorflow/first-contact-with-tensorflow-book/
- https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/DeepLearning-NowPublishing-Vol7-SIG-039.pdf
- http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/
- http://www.greenteapress.com/thinkstats/thinkstats.pdf
- http://www.greenteapress.com/thinkbayes/thinkbayes.pdf
- http://www.greenteapress.com/thinkpython/thinkpython.pdf
- http://r4ds.had.co.nz/
- https://web.stanford.edu/~hastie/Papers/ESLII.pdf
https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print10.pdf
http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf
http://infolab.stanford.edu/~ullman/mmds/booka.pdf
http://www.guidetodatamining.com/assets/guideChapters/Guide2DataMining.pdf
https://github.com/ajaymache/machine-learning-yearning
- Paid Books
- "Artificial Intelligence for Humans, Volume 1: Fundamental Algorithms", Jeff Heaton, 2013, ISBN:9781493682225
- "Artificial Intelligence for Humans, Volume 2: Nature-Inspired Algorithms", Jeff Heaton, 2014, ISBN: 978-1499720570
- "Artificial Intelligence for Humans, Volume 3: Deep Learning and Neural Networks", Jeff Heaton, 2015, ISBN: 978-1505714340
- "Introduction to Machine Learning (Adaptive Computation and Machine Learning)", E. Alpaydin, MIT Press, 2004, ISBN: 978-0262012430
- "Machine Learning: An Artificial Intelligence Approach", R.S. Michalski, J.G. Carbonell, T.M. Mitchell, Symbolic Computation, 1983, ISBN:978-3540132981
- "A collection of Data Science Interview Questions Solved in Python and Spark vol I & II", Antonio Gulli, CreateSpace, 2015, ISBN:978-1517216719
- "Artificial Intelligence a Modern Approach", Stuart Russell and Peter Norvig, Prentice Hall, 1995, ISBN:978-0131038059
- "An Introduction to MultiAgent Systems", Michael Wooldridge, John Wiley & Sons, 2009 (2nd ed), ISBN:978-0470519462
- "Data Mining: Practical Machine Learning Tools and Techniques", Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal, Morgan Kaufmann, ISBN:978-0128042915
- "Agent Intelligence Through Data Mining", Andreas L. Symeonidis, Pericles A. Mitkas, Springer/Apress, ISBN:978-0387257570
- "Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence", Gerhard Weiss, 2000, ISBN:978-0262232036
- "Data science at the command line", Janssens, O'Reilly.
- Also look for MachineLearning, DeepLearning, Spark, Mahout, R, Python, SciKit-Learn, Data/Text Mining, ElasticSearch, Natural Language, Statistics @ O'Reilly, Packt, Manning/In Action, HeadFirst
- Lists of good books
- News/Blogs/RSS
- https://blog.acolyer.org/
- https://www.reddit.com/r/machinelearning
- https://www.reddit.com/r/statistics
- https://www.reddit.com/r/datascience
- https://www.reddit.com/r/bigdata
- http://www.kdnuggets.com/
- http://www.becomingadatascientist.com/
- https://rdatamining.wordpress.com/
- http://www.r-bloggers.com/
- https://dataaspirant.com/
- http://www.joyofdata.de/blog/
- https://www.dataiku.com/blog/
- https://www.datacamp.com/community/
- http://beautifuldata.net/
- http://www.datatau.com/news
- http://dataelixir.com/
- http://www.oreilly.com/data/newsletter.html
- http://blog.kaggle.com/
- http://blog.yhathq.com/
- http://simplystatistics.org/
- http://fastml.com/
- http://www.win-vector.com/blog/
- http://fivethirtyeight.com/
- http://www.dataschool.io/
- https://research.facebook.com/blog/datascience/
- http://deeplearning.net/feed/
- http://learningwithdata.com/
- http://blog.plot.ly/
- https://datasciencelab.wordpress.com/
- https://shapeofdata.wordpress.com/
- http://datalab.lu/
- http://www.pythonweekly.com/
- http://pbpython.com/
- https://plus.google.com/communities/105141578068503684401 ( https://plus.google.com/+JaanaNystr%C3%B6m/posts/MKCV3vNsn1g )
- http://blog.revolutionanalytics.com/2012/12/the-most-influential-data-scientists-on-twitter.html
- http://www.kdnuggets.com/2012/12/most-influential-data-scientists-on-twitter.html
- https://journal.r-project.org/
- Podcasts
- http://www.learningmachines101.com/
- http://www.thetalkingmachines.com/
- http://dataskeptic.com/
- http://www.partiallyderivative.com/
- http://www.ocdqblog.com/podcast/
- http://blog.pivotal.io/podcasts-pivotal
- https://www.udacity.com/podcasts/linear-digressions
- http://datastori.es/
- http://radar.oreilly.com/tag/oreilly-data-show-podcast
- http://freakonomics.com/radio/freakonomics-radio-podcast-archive/
- http://simplystatistics.org/category/podcast/
- http://data-informed.com/multimedia/podcasts/
- http://www.bbc.co.uk/programmes/p02nrss1
- YT Channels
- https://www.youtube.com/user/keeroyz
- https://www.youtube.com/channel/UCWN3xxRkmTPmbKwht9FuE5A
- https://www.youtube.com/channel/UCioEIe1o73G-oGR4b34E7Dg
- https://www.youtube.com/channel/UCNIkB2IeJ-6AmZv7bQ1oBYg
- https://www.youtube.com/channel/UC9LfrPNcIyHspci0t2W4T_w
- https://www.youtube.com/channel/UCHBWJGoZMkhJyElgvuN1U1w
- https://www.youtube.com/user/dataschool
- https://www.youtube.com/channel/UCtY8JjMQpzYb5FFvUr2JnUw
- https://www.youtube.com/channel/UCRhUp6SYaJ7zme4Bjwt28DQ
- https://www.youtube.com/user/sentdex
- https://www.youtube.com/user/DataScienceDojo
- MOOCs
- Generic
- Weka
- Andrew Ng
- Yann Lecun
- Ans Rosling (visualization)
- From renown Universities
- https://www.coursera.org/specializations/jhu-data-science
- https://www.coursera.org/specializations/machine-learning
- https://www.coursera.org/specializations/data-science-python
- https://www.coursera.org/specializations/big-data
- https://www.coursera.org/learn/machine-learning
- https://www.coursera.org/learn/r-programming
- https://www.coursera.org/learn/data-scientists-tools
- https://www.coursera.org/learn/python-data-analysis
- http://www.holehouse.org/mlclass/
- http://online.stanford.edu/course/statistical-learning
- http://work.caltech.edu/telecourse.html
- https://www.udacity.com/course/data-analyst-nanodegree--nd002
- https://www.thinkful.com/courses/learn-data-science-online/
- https://www.edx.org/course/introduction-computer-science-mitx-6-00-1x7
- https://www.coursetalk.com/
- https://github.com/justmarkham/DAT7#bonus-resources
- http://datasciencemasters.org/
- http://www.wolfram.com/broadcast/c?c=99
- http://www.wolfram.com/broadcast/c?c=97
- http://www.wolfram.com/broadcast/c?c=397
- DataSchool
- Jobs
- https://datajobs.com/
- http://www.analytictalent.com/
- http://www.kdnuggets.com/jobs/index.html
- https://fr.hired.com/
- Teaching
http://edison-project.eu/edison/edison-data-science-framework-edsf
- Curated list of similar pages
https://github.com/search?utf8=%E2%9C%93&q=curated+list+awesome+frameworks&type= https://github.com/josephmisiti/awesome-machine-learning https://github.com/onurakpolat/awesome-bigdata https://github.com/onurakpolat/awesome-analytics https://github.com/analyticalmonk/awesome-neuroscience https://github.com/igorbarinov/awesome-data-engineering https://github.com/quantmind/awesome-data-science-viz https://github.com/fasouto/awesome-dataviz https://github.com/qinwf/awesome-R https://github.com/datascience-python/awesome-datascience-python https://github.com/caesar0301/awesome-public-datasets
Content Disclaimer
Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.
- The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
- There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
- It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
- Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
- Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.